This lecture is a general introduction to deep learning. First, a brief history of deep learning will be sketched. Second, an overview of deep learning models and applications will be provided. Third, related math and machine learning basics will be reviewed. Finally, the roadmap of the summer school is introduced.
This lecture will introduce two types of important deep learning models, multilayer perceptron (MLP) and convolutional neural networks. The training algorithm, backpropagation algorithm will be detailed based on MLP. Some tricks for training such as stochastic gradient descend, dropout, batch normalization will also be introduced. Lab: In this experiment you design MLPs to classify MNIST handwritten digits dataset, and write the BP algorithm by yourself. This dataset is a widely used dataset in machine learning. It consists of 60,000 training samples and 10,000 testing samples. Each sample is a 784 × 1 column vector, which is transformed from an original 28 × 28 pixels grayscale image. The digits range from 0 to 9. Implement MLPs using python for classifying MNIST handwritten digits. 1) Construct an MLP with one hidden layer of 256 units using sigmoid activation function and cross entropy loss 2) Redo using ReLU activation function and compare the performance A code framework is provided including a softmax_cross_entropy layer class and SGD. Your task: 1) Add momentum to SGD with γ=0.9 2) Implement a FcLayer class in a fc_layer.py file, a SigmoidLayer class in sigmoid_layer.py, and a ReluLayer class in a relu_layer.py file
This lecture will cover the basic architecture and usage of TensorFlow. We will cover the design concepts of TensorFlow, including the programming interfaces and the DAG execution model. Specifically, we will talk about variables, sessions and operations. We will also discuss how multi-device and distributed training are implemented and some optimization used in TensorFlow. Experiment: Training a CNN based hand-written digit recognition model with TensorFlow Goal: Learning how to use TensorFlow to train neural network models Hint: Building a neural network with variables and operators, load the training data, and then use an optimizer to optimize the parameters. Dataset: MNIST hand-written digit dataset Lab: Handwritten letter recognition Description: In this lab, you will learn to recognize handwritten letters with CNN. You will need to fill some parameters of a given neural network program to make it work. Then you are encouraged to modify this neural network (by adding layers or change parameters of existing layers) to improve the accuracy of recognition.
Abstract: Social media is prevalent today, and multimedia information is dominating the social information stream. How to analyze and understand the organization, curation and evolution of social multimedia is of ample significance for both industry and research. In this talk, we will glue together the recent works that bring social media, a valuable source of sensing user needs and social knowledge, into the loop of multimedia computing, and more specifically focus on the application of deep learning models in social multimedia analysis, including social image representation learning, social image search, social recommendation, as well as representation learning for social networks. Lab: Novel image categories classification Description: In this lab, you will learn how to adapt a pre-trained deep model to novel image categories classification. You will read the structure of AlexNet, add some code to complete the softmax regression and fine-tuning method, change the code to get fc7 feature for training data and testing data, and write new code for metric-based novel category classification.
Deep learning has been much advanced by the academia and industrial communities in recent years. In this lecture, we will introduce how deep learning models and techniques can help addressing natural language understanding problems. More specifically, we will show the use of deep learning models to represent words, sentences, documents, and how to apply such models to classification, sequence labeling, and generation problems in natural language processing. Though sharing overlaps with those deep learning models in computer vision, deep learning for NLP has its unique features and differences. Deep learning can be widely applied to address the tasks of information extraction，sentiment analysis and opinion mining, question answering, dialogue and conversation systems, machine translation, machine comprehension, and so on. Lab: Sentence-level Sentiment Classification with RNN Description: In this lab, we focus on the task of fine-grained sentence-level sentiment classification. This task is defined as: given a natural language sentence, classify its sentiment label. For example, given the input as ‘This movie is awesome!’, the output sentiment label is ‘very positive’, and given the input as ‘The film seems a dead weight.’, the output sentiment label is ‘very negative’. In this task, we use five-class sentiment labels: 0(very negative), 1(negative), 2(neutral), 3(positive),4(very positive).
Knowledge is power. In recent years, deep learning techniques have been introduced to construction, representation and applications of knowledge graphs and achieved great success. This lecture will cover the basic idea of deep learning for knowledge graphs and some recent important advances. We will cover distributed representation learning of knowledge including both entities and relations, deep neural models for extracting knowledge from plain text, and deep learning algorithms for various KG applications. Lab: Knowledge Representation Learning with TransE Description: In this lab, we focus on the task of distributed representation learning of triple facts in knowledge graphs. This task will learn low-dimensional embeddings of both entities and relations, which can be further applied for similarity computation between entities, and relation prediction between entities. For example, the method will find the entity ‘Bill Gates’ is more similar with ‘Steve Jobs’ than ‘Barak Obama’, and will also be able to identify the relation ‘Founder’ between ‘Bill Gates’ and ‘Microsoft’.
Multimedia Data embed users’ emotions. Understanding the emotional impact can benefit many applications such as human computer interaction, information retrieval and context-aware recommendation. However, fulfilling the task is not a trivial issue, and the biggest challenge is to model the multimedia data to capture the intrinsic relationships between various low-level features and the emotional impact. Another challenge, usually ignored, is that multimedia data are mostly generated in social networks nowadays, where users have complex and subtle influence with the emotional impact of each other. In this talk, we study the problem of understanding the emotional impact of multimedia data by deep learning methods and its applications. We will introduce a novel notion of dimensional space as the intermediate layer to model high level semantics of emotions first, and more specifically focus on deep learning methods which can incorporate both the low-level features and the social correlations to better predict the emotional impact. Finally, we will introduce several applications on affective computing based on deep learning such as audio-visual emotional speech synthesis, stress and depression detection, and inferring emotions from social media data. Lab: 1: Depression detection via harvesting social media Task: Classifying the users into depressed group and non-depressed group. Data: Previously extracted features from the raw data crawled from Twitter. Experiment details (based on TensorFlow): splitting data, loading data, normalizing data, building linear classifier and DNN classifier, fitting and evaluating the model. 2: Appreciating clothing styles through deep learning Task: using an autoencoder connected with SVM model to map clothing features to fashion styles defined in Fashion Semantic Space. Data: Using a public dataset which has detailed annotation of visual features and aesthetic styles for clothing images. Experiment details (based on TensorFlow): loading data, completing the structure and training process of autoencoder, fitting autoencoder output with DNN regressor, evaluating final loss.
In spite of the significant progress made in intelligent robots using various sensors including camera, infrared, laser radar, tactile array, and so on, the issue how to fuse multi-modal information to improve the perception capability is more attractive and challenging. The multi-modality information offers complementary characteristics that make them the ideal choice for robust perception. In this talk, we present the recent research results on multi-modal perception and recognition using sparse coding and deep learning. We show some examples on RGB-D fusion, visual-tactile fusion for robotic application. Finally, we will present some examples using deep reinforcement learning for robot control. Lab: In this project, you should establish a multi-modal deep learning architecture for RGB-D datasets and perform object recognition tasks. We will provide you a basic architecture and you are encouraged to incorporate your idea to improve the recognition performance. Throughout this project you can understand how multi-modal information with different characteristics can be seamlessly combined for object recognition.
In this lecture, we will cover the basic concepts of deep generative models (DGMs) as well as several popular examples. Different from deep neural networks, deep generative models adopt a probabilistic approach to learning a good representation for describing the data. With a data model, DGMs can generate new samples out of the training set, complete the missing entries in the input data, and perform prediction tasks (e.g., image classification). DGMs are good at both unsupervised learning and semi-supervised learning, where both labeled and unlabeled data are leveraged to learn a single model. We will cover the basic algorithms to learn DGMs, together with some real applications. We will also learn a programming library to implement deep generative models. Lab: 1) Some exercises will be included in slides. 2) In the lab, the students will learn to use ZhuSuan, a python programming library built on TensorFlow, to implement a generative model, e.g., mixture of Gaussian and variational auto-encoder.
In this lecture, we will cover the basic concepts of reinforcement learning, which is a major category of machine learning. We will also examine the recent development of deep reinforcement learning, which leverages deep learning techniques for sequential decision making. The concepts will be illustrated by using several popular examples, including AlphaGO and AlphaGO Zero. Lab: In the lab, the students will examine a reinforcement learning algorithm to play GO and improve it.
Graph neural networks (GNNs) have generalized deep learning methods into graph-structured data with promising performance on graph mining tasks. However, existing GNNs often meet com-plex graph structures with scarce labeled nodes and suffer from the limitations of non-robustness, over-smoothing, and overfitting. The lecture will cover introduce basic GNN and its recent advances, for example the propagation based architecture including adding self-attention (GAT), integrating with graphical models (GMNN), and neighborhood mixing (MixHop), etc. The student will be encouraged to understand (implement) the basic GNN model and also to develop/propose novel ideas to extend research in this field.
Deep learning models have made significant strides in machine understanding and even outperformed human on tasks such as single paragraph question answering (QA). However, to cross the chasm of reading comprehension ability between machine and human, three main challenges lie ahead: 1) Reasoning ability, 2) Explainability, and 3) Scalability. The lecture will first give the definition of cognitive graph, followed by some basic ideas from cognitive process of humans. You will be encouraged to develop new approaches by incorporating cognitive theories such as dual process theory (system 1 and system 2) into deep learning. The goal is to advance System 1 (intuitive) deep learning to System 2 (reasoning and logic) deep learning.