Multimedia and Data Mining Course ID 15826 Doctoral Breadth Course: Software Systems - (-) Classes marked with a "-" (dash) are intended as more advanced topics for CSD doctoral and 5th year master's students in the specific research area. Description The course covers advanced algorithms for learning, analysis, data management and visualization of large datasets. Topics include indexing for text and DNA databases, searching medical and multimedia databases by content, fundamental signal processing methods, compression, fractals in databases, data mining, privacy and security issues, rule discovery, data visualization, graph mining, stream mining. Key Topics Database topics: Traditional databases: Advanced hashing and multi-key access methods, for main-memory and for disk-based data. Text databases: indexing text and DNA strings, clustering, information filtering, LSI (singular value decomposition). Multimedia databases: Searching by content in signals: Time sequences, photographs and medical images, video clips, feature extraction, continuous media storage and delivery. Tools: Fundamental signal processing methods: Discrete Fourier Transform, wavelets, JPEG and MPEG compression. Singular Value Decomposition: revisited Fractals in databases: Self-similarity/non-uniformity of real datasets, fractal dimensions, selectivity using fractals and multifractals, fractal image compression, self-similarity in web-traffic patterns. Data Mining: Graph mining: ``Laws'' in large graphs (power laws; 'small world' phenomena); graph generators; social networks. Sensor and time series mining: linear and non-linear forecasting Review of Statistical methods, Review of AI-methods, Database methods - Massive datasets: Association rules; Frequent sets; Single-pass learning algorithms; Information compression and reconstruction; Sampling; Condensed data representations; Datacubes; Cube-trees; Function finding. Security and Privacy Protection. Visualization of large data sets More tools: approximate counting algorithms; Independent Component Analysis. OVERVIEW OF RECENT TOPICS: trust and influence propagation; Future directions. Required Background Knowledge Introductory database course 15-415/615 or 15-445/645 (familiarity with B-trees and Hashing), or permission of the instructor. Course Relevance Section R is reserved for students who are unable to register for an in-person section due to a government visa/travel restriction or a documented medical condition. Enrollment in this section will require university-level approval. Register for an in-person section unless you are absolutely certain when you register that you will not be able to attend in-person this Fall. Assessment Structure A midterm (20%) Homeworks (10%) (hw1: 1%, hw2,3,4: 3% each) A Project (40%) A Final exam (30%) Course Link http://www.cs.cmu.edu/~christos/courses/826.F19/