T-3: Learning Human Perception and Semantics Understanding in Multimedia Applications
Monday Morning, June 23, 09:30 - 12:30
Presented by
Cha Zhang, Microsoft Research and Tsuhan Chen, Carnegie Mellon University
Abstract
Human has always been considered one of the most important factors in multimedia applications. However, the high level human understanding of scenes/events and perception are very hard to model with traditional signal processing techniques. Recently, data-driven, machine learning based methods have attracted more and more attention. Compared with traditional algorithms, data-driven learning methods rely on training examples instead of accurate modeling of semantics and perception, which have been shown to be very effective for various applications. For instance, in early works for image retrieval, similarity between images are often computed based on low level features, which is often insufficient to model the semantic distance. Recent machine learning methods can instead recognize objects in images, making semantic distance computation much more feasible. Another example is image/video enhancement. While traditional image enhancement methods such as image equalization are very popular, they are not directly linked with human perception, and hence may not always produce good results. Instead, if a certain set of good-looking images can be provided, better results could be obtained via machine learning methods.
In this 3 hour tutorial, we give a brief introduction on some popular machine learning methods, and show how these machine learning methods can be applied in multimedia applications to learn semantics and human perception. In particular, we present schemes to enhance low quality videos by learning from professionally taken celebrity images, to relight human faces by learning from illumination data sets, to discover objects in image/videos by examining similar objects in a large database, to detect active speakers by learning cues from both audio/visual information, etc. Our goal is to provide multimedia researchers a set of new tools and a new perspective for solving their daily research issues. The target audiences for the tutorial are graduate students who are interested in machine learning, or scientist who are interested in learning how to apply machine learning methods to multimedia applications.
Speaker Biographies
Cha Zhang is currently a Researcher in the Communication and Collaboration Systems Group at Microsoft Research, Redmond. He received the B.S. and M.S. degrees from Tsinghua University, Beijing, China in 1998 and 2000, respectively, both in Electronic Engineering, and the Ph.D. degree in Electrical and Computer Engineering from Carnegie Mellon University, in 2004. His current research focuses on applying various machine learning and computer vision techniques to multimedia applications, in particular, multimedia teleconferencing. During his graduate studies at CMU, he worked on various multimedia related projects including sampling and compression of image-based rendering data, 3D model database retrieval and active learning for database annotation, peer-to-peer networking, etc.
Dr. Zhang has published more than 30 technical papers and holds 8 U.S. patents. He won the best paper award at ICME 2007. He co-authored a book titled Light Field Sampling, published by Morgan and Claypool in 2006.
Dr. Zhang has been actively involved in various professional activities. He was the Publicity Chair for International Packet Video Workshop in 2002, and the Program Co-Chair for the first Immersive Telecommunication Conference in 2007. He served as Technical Program Committee members for numerous conferences such as ACM Multimedia, CVPR, ICCV, ECCV, ICME, ICPR, ICWL, etc. Starting Jan. 2008, he serves as an Associate Editor for Journal of Distance Education Technologies.
Tsuhan Chen has been with the Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania, since October 1997, where he is currently a Professor and Associate Department Head. From August 1993 to October 1997, he worked at AT&T Bell Laboratories, Holmdel, New Jersey. He received the M.S. and Ph.D. degrees in electrical engineering from the California Institute of Technology, Pasadena, California, in 1990 and 1993, respectively. He received the B.S. degree in electrical engineering from the National Taiwan University in 1987.
Tsuhan served as the Editor-in-Chief for IEEE Transactions on Multimedia in 2002-2004. He also served in the Editorial Board of IEEE Signal Processing Magazine and as Associate Editor for IEEE Trans. on Circuits and Systems for Video Technology, IEEE Trans. on Image Processing, IEEE Trans. on Signal Processing, and IEEE Trans. on Multimedia. He co-edited a book titled Multimedia Systems, Standards, and Networks.
Tsuhan received the Charles Wilts Prize at the California Institute of Technology in 1993. He was a recipient of the National Science Foundation CAREER Award, from 2000 to 2003. He received the Benjamin Richard Teare Teaching Award at the Carnegie Mellon University in 2006. He is elected to the Board of Governors, IEEE Signal Processing Society, 2007-2009. He is a member of the Phi Tau Phi Scholastic Honor Soc
