Video face tracking: a work in progressThe reported average frame rate is 6fps on a Pentium IV 3GHz processor.
Object trackers of all the above families use a variety of features, and selecting the right features plays a critical role in this context. The uniqueness of a feature is key for easily distinguishing objects in the feature space. For face tracking, local descriptors, such as gradients and histograms of gradients, are present in much of the relevant work in the area. Color is also an important feature, since the tone of skin, hair and other facial attributes are, up to a certain degree, distinctive from the tones of other regions normally found in a scene. Besides these basic features, other face-tracking schemes may consider appearance-based models or hybrid feature sets.
For video indexing and search tasks, face tracking is often used together with clustering or other non-supervised techniques. For example, in 2013 Zhang and coworkers presented a system to extract temporal face sequences from videos and group them into clusters, with each cluster containing video clips of a same person. Their system employs face detection (to locate an initial occurrence of a face) and bi-directional (i.e., forward and backward) face tracking. The face regions found in these two ways are combined into a temporal face sequence, from which representative faces are selected based on face image qualities. (A face sequence may contain too many face variations for clustering). Next, the system extracts appearance and temporal features from the representative faces and performs a similarity analysis. Finally, face sequences belonging to the same person are grouped by a semi-supervised agglomerative clustering, taking as input a similarity matrix resulting from the previous step.