Multi Media Information Processing
Course Description Spring 1998

Instructors :Raj Reddy(rr@cs), Roger Dannenberg(rbd@cs) and Bob Thibadeau(rht@cs)

Until recent years, most computing tasks dealt with numerical, text, and symbolic data, and Computer Science has emphasized these discrete data types. Now, digital representations of audio, video, and images are common. These new data types are often called "continuous media" because they represent quantities that vary continuously over time and/or space. Computers are rapidly becoming the technology of choice for continuous media production, manipulation, and distribution. Consequently, an understanding of continuous media is essential for many modern computing tasks.

Multimedia Information Processing (MMIP) teaches students to work with continuous media on computers. Students will learn to capture, process, compress, search, index, store, and retrieve various kinds of continuous media. The course is team and project oriented. Projects will require work with audio, scanned images, digital video, and other media, all in digital form. Readings will provide a conceptual and technical framework for project work.

The goal of MMIP is to make students comfortable manipulating continuous media. Students will learn the underlying concepts and be able to apply their understanding to practical problems such as selecting sample rates and image resolution, selecting appropriate compression schemes, creating continuous media for the Web, and using various software tools to manipulate audio, images, video, and other media.

Jan 13 and 15 Intro to MMIP : Scanning, Sound, Image, Video(RR)

Discrete Data Vs. Continuous Data
Numbers and Text Vs sound, image and video(1D, 2D and 3D)
Sampling, A2D and D2A
Approximate discrete representations and quality : resolution and dynamic range
Data Rates and Data size
Live Demonstrations of degradation of quality as F(R,D)
Medium
Dedicated Workstation Disk, Handheld, CDROM, Internet, Web TV
Models of Mediums and Critical Timing Paths
Applications
Digital Libraries, Entertainment, Education, Marketing
Assignment : make a personal MM Homepage using a commercial package

Jan 20 and 22 Scanning(RHT)

Difference between a Scanner and a Camera
When to Scan: Spatial, Depth, Color Resolution, Light Control
Device Models
What can be Scanned
Velocity Control
Techniques for Correcting for Scanning Inaccuracies
Assignment :

Jan 27 and 29 Sound Processing(RBD)

Human hearing
Overview of sampling
Foldover/aliasing, prefiltering
Quantization noise - relate to sample size
Bandwidth - relate to sample rate
Dither
Reconstruction
Oversampling
Microphones, Mixers, Recorders
Assignment :

Feb 3 and 5 Image Processing(RHT)

Image Creation and Image Display
Types of Images: Pixel Shapes, Depths, Sizes, Projections
Model Based versus Painted Images
3D Graphics - Photograph
Color and Grayscale
Human Visual Limitations
Image Manipulations
Spatial Frequency and FFT : Theory of Image Manipulation
Aliasing
Image Reconstruction
Assignment :

Feb 10 and 12 Video Processing(RBD)

CRT technology
Scanlines
Vertical Blanking, Vertical Retrace
Interlace
Composite Video
Standards: NTSC, PAL, SECAM
(Digital video covered under "compression"?)
Audio and Video Production
Assignment :

Feb 17 and 19 Compression(RHT)

Information Limits, Uncertainty and Entropy
Vector Techniques including Run Length
Huffman
Zempel-Lev GIF
The differences between
file type, TIFF, and compression and type GIF
Fourier, Discrete Cosine Transformation, Wavelet Hybrid Techniques,
JPEG, MPEG, Vxtreme, and on.
Assignment :

Feb 24 and 26 Optical Character Recognition(RHT)

History of OCR
BIP, OCR Systems, CAERE Omnifont, Adobe Capture
How OCR Works
Scan, Segment, Classify, Merge, Dictionary, Assign Fonts
Why it doesn’t work and when to expect it won’t
Techniques for correcting errors
Assignment :

March 3 and 5 Error Correction and Pattern Recognition(RR)

Error Detection and Correction
Coding theory
Parity and Cyclic Redundancy Check
Error Correcting Codes
Pattern Recognition
Feature Estimation
Distance Metrics
Alignment and Normalization
Classification Vs. Description
Assignment :

Mar 10 and 12 Sound analysis and compression(RBD)

Formats and standards: AIFF, AES/EBU, S/PDIFF
Sample rate conversion
DPCM, ADPCM
Masking, Perception-based coding
LPC
Assignment :

Mar 17 and 19 Music(RBD)

Music Representation
MIDI and Synthesizers
Synthesis techniques:
non-linear
additive
table-based methods
sampling
physical models
Music understanding
Assignment :

March 31 and Apr 2 Speech Recognition and Speech Synthesis(RR)

Speech Signal : Production and Perception
Acoustic Phonetics
Signal Processing and Analysis
Sorces of Variability
Speech Recognition System structure
Signal processing
Hidden Markov Models and Learning
Representation of Knowledge
Beam Search
Assignment :

April 7 and 9 Image Analysis Recognition and Understanding(RR)

History of Image Recognition
Problems of Image Recognition
Standard Paradigms for Image Recognition
symbol assignment, 3d reconstruction, sorting
Advanced Interactive Paradigms
Bin picking (games), shoot’m up
Assignment :

Apr 14 and 16 Digital Video and Video Compression(RBD)

Fundamental MPEG
Vector Quantization
Example: QuickTime architecture
Assignment :

April 21 and 23 Image synthesis and Graphics(RHT)

Principles of Image Synthesis
Aliasing
Oversampling
Principles of 2D and 3D Graphics
Principles of Animation and Film (Graphics and Sound)
Assignment :

April 28 Video Segmentation, Indexing and retrieval(RR)

Video Signal Capture
Speech Transcription
Video Segmentation and alignment
Index Generation
Search and Retrieval
Assignment :