This lecture will cover basic concepts and models of deep neural networks. Two types of models, multilayer perceptron (MLP) and convolutional neural network will be introduced. The training algorithm, backpropagation algorithm will be detailed based on MLP. Some tricks for training such as stochastic gradient descend, dropout, batch normalization will also be introduced. Homework: 1. Classify two handwritten digits using softmax based on pixel inputs. 2. Derive the gradient of the loss w.r.t. the parameters in some specific layers. 3. Classify two handwritten digits using a simple MLP. Some starter MATLAB codes for Problem 1 and 3 will be provided.
In this lecture, we will cover the basic concepts of deep generative models (DGMs) as well as several popular examples. Different from deep neural networks, deep generative models adopt a probabilistic approach to learning a good representation for describing the data. With a data model, DGMs can generate new samples out of the training set, complete the missing entries in the input data, and perform prediction tasks (e.g., image classification). DGMs are good at both unsupervised learning and semi-supervised learning, where both labeled and unlabeled data are leveraged to learn a single model. We will cover the basic algorithms to learn DGMs, together with some real applications. Homework: Problems will be included in slides.
This lecture will cover the basic architecture and usage of TensorFlow. We will cover the design concepts of TensorFlow, including the programming interfaces and the DAG execution model. Specifically, we will talk about variables, sessions and operations. We will also discuss how multi-device and distributed training are implemented and some optimization used in TensorFlow. Experiment: Training a CNN based hand-written digit recognition model with TensorFlow Goal: Learning how to use TensorFlow to train neural network models Hint: Building a neural network with variables and operators, load the training data, and then use an optimizer to optimize the parameters. Dataset: MNIST hand-written digit dataset
Abstract: Social media is prevalent today, and multimedia information is dominating the social information stream. How to analyze and understand the organization, curation and evolution of social multimedia is of ample significance for both industry and research. In this talk, we will glue together the recent works that bring social media, a valuable source of sensing user needs and social knowledge, into the loop of multimedia computing, and more specifically focus on the application of deep learning models in social multimedia analysis, including social image representation learning, social image search, social recommendation, as well as representation learning for social networks. Experiment: Train a deep learning model for ad-click behavior prediction in social image-ads streams.
Deep learning has been much advanced by the academia and industrial communities in recent years, in this lecture, we will introduce how deep learning models and techniques can help addressing the natural language understanding problems. More specifically, we will show the use of deep learning models to represent words, sentences, documents, and how to apply such models to classification, sequence labeling, and generation problems. The applications include sentiment sentiment analysis, dialogue and conversation, and so on. Homework: Teaching RNN to Learn Addition Arithmetic In this homework, we try to use recurrent neural network (RNN) to learn how to perform addition arithmetic. Given a set of samples like where a, b are integers, you will be asked to approach the addition operator with RNN models.
Natural Language Processing, a field of artificial intelligence, aims at enabling computers to understand and generate human languages. In recent years, deep learning techniques have been introduced to NLP and achieved great success in many areas such as machine translation. This lecture will cover the basic idea of deep learning for NLP and some recent important advances. We will cover distributed representation learning of words, phrases and knowledge, recurrent neural networks for text sequence modeling, and deep learning algorithms for various NLP applications. Experiment: Train a representation learning model for a toy knowledge graph extracted from Freebase.
Multimedia Data embed users’ emotions. Understanding the emotional impact can benefit many applications such as human computer interaction, information retrieval and context-aware recommendation. However, fulfilling the task is not a trivial issue, and the biggest challenge is to model the multimedia data to capture the intrinsic relationships between various low-level features and the emotional impact. Another, challenge usually ignored is that multimedia data are mostly generated in social networks nowadays, where users have complex and subtle influence with the emotional impact of each other. In this talk, we study the problem of understanding the emotional impact of multimedia data by deep learning methods and its applications. We will introduce a novel notion of dimensional space as the intermediate layer to model high level semantics of emotions first, and more specifically focus on deep learning methods which can incorporate both the low-level features and the social correlations to better predict the emotional impact. Finally, we will introduce several applications on affective computing based on deep learning such as audio-visual emotional speech synthesis, stress and depression detection, and inferring emotions from social media data. Experiment: Train a deep learning model for emotion recognition on speech/image data.
In spite of the significant progress made in intelligent robots using various sensors including camera, infrared, laser radar, tactile array, and so on, the issue how to fuse multi-modal information to improve the perception capability is more attractive and challenging. The multi-modality information offers complementary characteristics that make them the ideal choice for robust perception. In this talk, we present the recent research results on multi-modal perception and recognition using sparse coding and deep learning. We show some examples on RGB-D fusion, visual-tactile fusion for robotic application. Finally, we will present some examples using deep reinforcement learning for robot control. Experiment: Train a deep learning model for multi-modal object recognition.