Video is a unique multimedia data type, in that it comes with distinguished spatio-temporal constraints. Content-based video retrieval thus requires methods for video sequence-to-sequence matching, incorporating the temporal ordering inherent in a video sequence, without losing sight of the visual nature of the information in the sequence. Such methods will require reliable measures of similarity between the video sequences. In this paper, we formulate the problem of video sequence-to-sequence matching as a pattern-matching problem and propose the vstring edit distance as a suitable distance measure for video sequences.