Jia-Yu Pan

Advanced Tools for Video and Multimedia Mining

Degree Type: Ph.D. in Computer Science
Advisor(s): Christos Faloutsos, Howard Wactlar
Graduated: May 2006

Keywords: Multimedia data mining, video mining, multi-modal pattern discovery, biomedical data mining, independent component analysis, random walk with restarts, image captioning, time series and text mining

Abstract

How do we automatically find patterns and mine data in large multimedia databases, to make these databases useful and accessible? We focus on two problems: (1) mining "uni-modal patterns" that summarize the characteristics of a data modality, and (2) mining "cross-modal correlations" among multiple modalities. Uni-modal patterns such as "news videos have static scenes and speech-like sounds," and cross-modal correlations like "the blue region at the upper part of a natural scene image is likely to be the `sky'," could provide insights on the multimedia content and have many applications.

For uni-modal pattern discovery, we propose the method "AutoSplit." AutoSplit provides a framework for mining meaningful "independent components" in multimedia data, and can find patterns in a wide variety of data modalities (e.g., video, audio, text, and time sequences). For example, in video clips, AutoSplit finds characteristic visual/auditory patterns, and can classify news and commercial clips with 81% accuracy. In time sequences like stock prices, AutoSplit finds hidden variables like "general growth trend" and "Internet bubble," and can detect outliers (e.g., lackluster stocks). Based on AutoSplit, we design a system, ViVo, for mining biomedical images. ViVo automatically constructs a visual vocabulary which is biologically meaningful and can classify 9 biological conditions with 84% accuracy. Moreover, ViVo supports data mining tasks such as highlighting biologically interesting image regions, for biomedical research.

For cross-modal correlation discovery, we propose "MAGIC," a graph-based framework for multimedia correlation mining. When applied to news video databases, MAGIC can identify relevant video shots and transcript words for event summarization. On the task of automatic image captioning, MAGIC achieves a relative improvement of 58% in captioning accuracy as compared to recent machine learning techniques.

Thesis Committee

Christos Faloutsos (Co-chair)
Howard Wactlar (Co-chair)
Christopher Olston
Shih-Fu Chang (Columbia University)

Jeannette Wing, Head, Computer Science Department
Randy Bryant, Dean, School of Computer Science

Thesis Document

CMU-CS-06-126.pdf (7.28 MB) (212 pages)

About Main page

Admissions Main page

Academics Main page

People Main page

Research Main page

Jia-Yu Pan

Advanced Tools for Video and Multimedia Mining

Abstract

Thesis Committee

Thesis Document