publications
a selection of my recent publications (since 2014) by categories in reversed chronological order.
2023
- LLMA Latent Space Theory for Emergent Abilities in Large Language ModelsHui Jiang2023
Languages are not created randomly but rather to communicate information. There is a strong association between languages and their underlying meanings, resulting in a sparse joint distribution that is heavily peaked according to their correlations. Moreover, these peak values happen to match with the marginal distribution of languages due to the sparsity. With the advent of LLMs trained on big data and large models, we can now precisely assess the marginal distribution of languages, providing a convenient means of exploring the sparse structures in the joint distribution for effective inferences. In this paper, we categorize languages as either unambiguous or ε-ambiguous and present quantitative results to demonstrate that the emergent abilities of LLMs, such as language understanding, in-context learning, chain-of-thought prompting, and effective instruction fine-tuning, can all be attributed to Bayesian inference on the sparse joint distribution of languages.
@misc{jiang2023latentspacetheoryemergent, title = {A Latent Space Theory for Emergent Abilities in Large Language Models}, author = {Jiang, Hui}, year = {2023}, eprint = {2304.09960}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, url = {https://arxiv.org/abs/2304.09960}, dimensions = {true}, }
2021
- MLF-BookMachine Learning FundamentalsHui Jiang2021
This lucid, accessible introduction to supervised machine learning presents core concepts in a focused and logical way that is easy for beginners to follow. The author assumes basic calculus, linear algebra, probability and statistics but no prior exposure to machine learning. Coverage includes widely used traditional methods such as SVMs, boosted trees, HMMs, and LDAs, plus popular deep learning methods such as convolution neural nets, attention, transformers, and GANs. Organized in a coherent presentation framework that emphasizes the big picture, the text introduces each method clearly and concisely “from scratch” based on the fundamentals. All methods and algorithms are described by a clean and consistent style, with a minimum of unnecessary detail. Numerous case studies and concrete examples demonstrate how the methods can be applied in a variety of contexts.
@book{Jiang-MLF-2021, author = {Jiang, Hui}, title = {Machine Learning Fundamentals}, publisher = {Cambridge University Press}, year = {2021}, doi = {https://doi.org/10.1017/9781108938051}, url = {https://books.google.ca/books?id=jSFGEAAAQBAJ}, }
2020
- ReLUOn Approximation Capabilities of ReLU Activation and Softmax Output Layer in Neural NetworksBehnam Asadi, and Hui Jiang2020
In this paper, we have extended the well-established universal approximator theory to neural networks that use the unbounded ReLU activation function and a nonlinear softmax output layer. We have proved that a sufficiently large neural network using the ReLU activation function can approximate any function in L1 up to any arbitrary precision. Moreover, our theoretical results have shown that a large enough neural network using a nonlinear softmax output layer can also approximate any indicator function in L1, which is equivalent to mutually-exclusive class labels in any realistic multiple-class pattern classification problems. To the best of our knowledge, this work is the first theoretical justification for using the softmax output layers in neural networks for pattern classification.
@paper{asadi2020approximationcapabilitiesreluactivation, title = {On Approximation Capabilities of ReLU Activation and Softmax Output Layer in Neural Networks}, author = {Asadi, Behnam and Jiang, Hui}, year = {2020}, eprint = {2002.04060}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, url = {https://arxiv.org/abs/2002.04060}, dimensions = {true}, }
2019
- ConvexNNWhy Learning of Large-Scale Neural Networks Behaves Like Convex OptimizationHui Jiang2019
In this paper, we present some theoretical work to explain why simple gradient descent methods are so successful in solving non-convex optimization problems in learning large-scale neural networks (NN). After introducing a mathematical tool called canonical space, we have proved that the objective functions in learning NNs are convex in the canonical model space. We further elucidate that the gradients between the original NN model space and the canonical space are related by a pointwise linear transformation, which is represented by the so-called disparity matrix. Furthermore, we have proved that gradient descent methods surely converge to a global minimum of zero loss provided that the disparity matrices maintain full rank. If this full-rank condition holds, the learning of NNs behaves in the same way as normal convex optimization. At last, we have shown that the chance to have singular disparity matrices is extremely slim in large NNs. In particular, when over-parameterized NNs are randomly initialized, the gradient decent algorithms converge to a global minimum of zero loss in probability.
@paper{jiang2023learninglargescaleneuralnetworks, title = {Why Learning of Large-Scale Neural Networks Behaves Like Convex Optimization}, author = {Jiang, Hui}, year = {2019}, eprint = {1903.02140}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, url = {https://arxiv.org/abs/1903.02140}, dimensions = {true}, }
- PSLA New Perspective on Machine Learning: How to do Perfect Supervised LearningHui Jiang2019
In this work, we introduce the concept of bandlimiting into the theory of machine learning because all physical processes are bandlimited by nature, including real-world machine learning tasks. After the bandlimiting constraint is taken into account, our theoretical analysis has shown that all practical machine learning tasks are asymptotically solvable in a perfect sense. Furthermore, the key towards this solvability almost solely relies on two factors: i) a sufficiently large amount of training samples beyond a threshold determined by a difficulty measurement of the underlying task; ii) a sufficiently complex and bandlimited model. Moreover, for some special cases, we have derived new error bounds for perfect learning, which can quantify the difficulty of learning. These generalization bounds are not only asymptotically convergent but also irrelevant to model complexity. Our new results on generalization have provided a new perspective to explain the recent successes of large-scale supervised learning using complex models like neural networks.
@paper{jiang2019newperspectivemachinelearning, title = {A New Perspective on Machine Learning: How to do Perfect Supervised Learning}, author = {Jiang, Hui}, year = {2019}, eprint = {1901.02046}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, url = {https://arxiv.org/abs/1901.02046}, dimensions = {true}, }
- FreebaseQAFreebaseQA: A New Factoid QA Data Set Matching Trivia-Style Question-Answer Pairs with FreebaseKelvin Jiang, Dekun Wu, and Hui JiangIn Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Jun 2019
In this paper, we present a new data set, named FreebaseQA, for open-domain factoid question answering (QA) tasks over structured knowledge bases, like Freebase. The data set is generated by matching trivia-type question-answer pairs with subject-predicate-object triples in Freebase. For each collected question-answer pair, we first tag all entities in each question and search for relevant predicates that bridge a tagged entity with the answer in Freebase. Finally, human annotation is used to remove any false positive in these matched triples. Using this method, we are able to efficiently generate over 54K matches from about 28K unique questions with minimal cost. Our analysis shows that this data set is suitable for model training in factoid QA tasks beyond simpler questions since FreebaseQA provides more linguistically sophisticated questions than other existing data sets.
@inproceedings{jiang-etal-2019-freebaseqa, title = {{F}reebase{QA}: A New Factoid {QA} Data Set Matching Trivia-Style Question-Answer Pairs with {F}reebase}, author = {Jiang, Kelvin and Wu, Dekun and Jiang, Hui}, booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics (NAACL)}, month = jun, year = {2019}, address = {Minneapolis, Minnesota}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/N19-1028}, doi = {10.18653/v1/N19-1028}, pages = {318--323}, }
- KNG-MRCExplicit Utilization of General Knowledge in Machine Reading ComprehensionChao Wang, and Hui JiangIn Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), Jul 2019
To bridge the gap between Machine Reading Comprehension (MRC) models and human beings, which is mainly reflected in the hunger for data and the robustness to noise, in this paper, we explore how to integrate the neural networks of MRC models with the general knowledge of human beings. On the one hand, we propose a data enrichment method, which uses WordNet to extract inter-word semantic connections as general knowledge from each given passage-question pair. On the other hand, we propose an end-to-end MRC model named as Knowledge Aided Reader (KAR), which explicitly uses the above extracted general knowledge to assist its attention mechanisms. Based on the data enrichment method, KAR is comparable in performance with the state-of-the-art MRC models, and significantly more robust to noise than them. When only a subset (20%-80%) of the training examples are available, KAR outperforms the state-of-the-art MRC models by a large margin, and is still reasonably robust to noise.
@inproceedings{wang-jiang-2019-explicit, title = {Explicit Utilization of General Knowledge in Machine Reading Comprehension}, author = {Wang, Chao and Jiang, Hui}, booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL)}, month = jul, year = {2019}, address = {Florence, Italy}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/P19-1219}, doi = {10.18653/v1/P19-1219}, pages = {2263--2272}, dimensions = {true}, }
- GRAPHContent based News Recommendation via Shortest Entity Distance over Knowledge GraphsKevin Joseph, and Hui JiangIn WWW ’19, San Francisco, USA, Jul 2019
Content-based news recommendation systems need to recommend news articles based on the topics and content of articles without using user specific information. Many news articles describe the occurrence of specific events and named entities including people, places or objects. In this paper, we propose a graph traversal algorithm as well as a novel weighting scheme for cold-start content based news recommendation utilizing these named entities. Seeking to create a higher degree of user-specific relevance, our algorithm computes the shortest distance between named entities, across news articles, over a large knowledge graph. Moreover, we have created a new human annotated data set for evaluating content based news recommendation systems. Experimental results show our method is suitable to tackle the hard cold-start problem and it produces stronger Pearson correlation to human similarity scores than other cold-start methods. Our method is also complementary and a combination with the conventional cold-start recommendation methods may yield significant performance gains.
@inproceedings{10.1145/3308560.3317703, author = {Joseph, Kevin and Jiang, Hui}, title = {Content based News Recommendation via Shortest Entity Distance over Knowledge Graphs}, year = {2019}, isbn = {9781450366755}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3308560.3317703}, doi = {10.1145/3308560.3317703}, location = {San Francisco, USA}, booktitle = {WWW '19}, dimensions = {true}, }
2017
- FOFE-NERA Local Detection Approach for Named Entity Recognition and Mention DetectionMingbin Xu, Hui Jiang, and Sedtawut WatcharawittayakulIn Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), Jul 2017
In this paper, we study a novel approach for named entity recognition (NER) and mention detection (MD) in natural language processing. Instead of treating NER as a sequence labeling problem, we propose a new local detection approach, which relies on the recent fixed-size ordinally forgetting encoding (FOFE) method to fully encode each sentence fragment and its left/right contexts into a fixed-size representation. Subsequently, a simple feedforward neural network (FFNN) is learned to either reject or predict entity label for each individual text fragment. The proposed method has been evaluated in several popular NER and MD tasks, including CoNLL 2003 NER task and TAC-KBP2015 and TAC-KBP2016 Tri-lingual Entity Discovery and Linking (EDL) tasks. Our method has yielded pretty strong performance in all of these examined tasks. This local detection approach has shown many advantages over the traditional sequence labeling methods.
@inproceedings{xu-etal-2017-local, title = {A Local Detection Approach for Named Entity Recognition and Mention Detection}, author = {Xu, Mingbin and Jiang, Hui and Watcharawittayakul, Sedtawut}, booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL)}, month = jul, year = {2017}, address = {Vancouver, Canada}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/P17-1114}, doi = {10.18653/v1/P17-1114}, pages = {1237--1247}, dimensions = {true}, }
- WSCause-Effect Knowledge Acquisition and Neural Association Model for Solving A Set of Winograd Schema ProblemsQuan Liu, Hui Jiang, Andrew Evdokimov, and 4 more authorsIn Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, Jul 2017
This paper focuses on the investigations in Winograd Schema (WS), a challenging problem which has been proposed for measuring progress in commonsense reasoning.Due to the lack of commonsense knowledge and training data, very little work has been found on the WS problems in recent years.Actually, there is no shortcut to solve this problem except to collect more commonsense knowledge and design suitable models.Therefore, this paper addresses a set of WS problems by proposing a knowledge acquisition method and a general neural association model.To avoid the sparseness issue, the knowledge we aim to collect is the cause-effect relationships between thousands of commonly used words.The knowledge acquisition method supports us to extract hundreds of thousands of cause-effect pairs from large text corpus automatically.Meanwhile, a neural association model (NAM) is proposed to encode the association relationships between any two discrete events.Based on the extracted knowledge and the NAM models, in this paper, we successfully build a system for solving WS problems from scratch and achieve 70.0% accuracy.Most importantly, this paper provides a flexible framework to solve WS problems based on event association and neural network methods.
2016
- HOPEHybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Learn Neural NetworksShiliang Zhang, Hui Jiang, and Lirong DaiJournal of Machine Learning Research, Jul 2016
In this paper, we propose a novel model for high-dimensional data, called the Hybrid Orthogonal Projection and Estimation (HOPE) model, which combines a linear orthogonal projection and a finite mixture model under a unified generative modeling framework. The HOPE model itself can be learned unsupervised from unlabelled data based on the maximum likelihood estimation as well as discriminatively from labelled data. More interestingly, we have shown the proposed HOPE models are closely related to neural networks (NNs) in a sense that each hidden layer can be reformulated as a HOPE model. As a result, the HOPE framework can be used as a novel tool to probe why and how NNs work, more importantly, to learn NNs in either supervised or unsupervised ways. In this work, we have investigated the HOPE framework to learn NNs for several standard tasks, including image recognition on MNIST and speech recognition on TIMIT. Experimental results have shown that the HOPE framework yields significant performance gains over the current state-of-the-art methods in various types of NN learning problems, including unsupervised feature learning, supervised or semi-supervised learning.
@article{JMLR:v17:15-335, author = {Zhang, Shiliang and Jiang, Hui and Dai, Lirong}, title = {Hybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Learn Neural Networks}, journal = {Journal of Machine Learning Research}, year = {2016}, volume = {17}, number = {37}, pages = {1--33}, url = {http://jmlr.org/papers/v17/15-335.html}, dimensions = {true}, }
- GANGenerating images with recurrent adversarial networksDaniel Jiwoong Im, Chris Dongjoo Kim, Hui Jiang, and 1 more authorJul 2016
Gatys et al. (2015) showed that optimizing pixels to match features in a convolutional network with respect reference image features is a way to render images of high visual quality. We show that unrolling this gradient-based optimization yields a recurrent computation that creates images by incrementally adding onto a visual "canvas". We propose a recurrent generative model inspired by this view, and show that it can be trained using adversarial training to generate very good image samples. We also propose a way to quantitatively compare adversarial networks by having the generators and discriminators of these networks compete against each other.
@paper{im2016generatingimagesrecurrentadversarial, title = {Generating images with recurrent adversarial networks}, author = {Im, Daniel Jiwoong and Kim, Chris Dongjoo and Jiang, Hui and Memisevic, Roland}, year = {2016}, eprint = {1602.05110}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, url = {https://arxiv.org/abs/1602.05110}, dimensions = {true}, }
- HORNNHigher Order Recurrent Neural NetworksRohollah Soltani, and Hui JiangJul 2016
In this paper, we study novel neural network structures to better model long term dependency in sequential data. We propose to use more memory units to keep track of more preceding states in recurrent neural networks (RNNs), which are all recurrently fed to the hidden layers as feedback through different weighted paths. By extending the popular recurrent structure in RNNs, we provide the models with better short-term memory mechanism to learn long term dependency in sequences. Analogous to digital filters in signal processing, we call these structures as higher order RNNs (HORNNs). Similar to RNNs, HORNNs can also be learned using the back-propagation through time method. HORNNs are generally applicable to a variety of sequence modelling tasks. In this work, we have examined HORNNs for the language modeling task using two popular data sets, namely the Penn Treebank (PTB) and English text8 data sets. Experimental results have shown that the proposed HORNNs yield the state-of-the-art performance on both data sets, significantly outperforming the regular RNNs as well as the popular LSTMs.
@paper{soltani2016higherorderrecurrentneural, title = {Higher Order Recurrent Neural Networks}, author = {Soltani, Rohollah and Jiang, Hui}, year = {2016}, eprint = {1605.00064}, archiveprefix = {arXiv}, primaryclass = {cs.NE}, url = {https://arxiv.org/abs/1605.00064}, dimensions = {true}, }
- NAMProbabilistic Reasoning via Deep Learning: Neural Association ModelsQuan Liu, Hui Jiang, Andrew Evdokimov, and 4 more authorsJul 2016
In this paper, we propose a new deep learning approach, called neural association model (NAM), for probabilistic reasoning in artificial intelligence. We propose to use neural networks to model association between any two events in a domain. Neural networks take one event as input and compute a conditional probability of the other event to model how likely these two events are to be associated. The actual meaning of the conditional probabilities varies between applications and depends on how the models are trained. In this work, as two case studies, we have investigated two NAM structures, namely deep neural networks (DNN) and relation-modulated neural nets (RMNN), on several probabilistic reasoning tasks in AI, including recognizing textual entailment, triple classification in multi-relational knowledge bases and commonsense reasoning. Experimental results on several popular datasets derived from WordNet, FreeBase and ConceptNet have all demonstrated that both DNNs and RMNNs perform equally well and they can significantly outperform the conventional methods available for these reasoning tasks. Moreover, compared with DNNs, RMNNs are superior in knowledge transfer, where a pre-trained model can be quickly extended to an unseen relation after observing only a few training samples. To further prove the effectiveness of the proposed models, in this work, we have applied NAMs to solving challenging Winograd Schema (WS) problems. Experiments conducted on a set of WS problems prove that the proposed models have the potential for commonsense reasoning.
@paper{liu2016probabilisticreasoningdeeplearning, title = {Probabilistic Reasoning via Deep Learning: Neural Association Models}, author = {Liu, Quan and Jiang, Hui and Evdokimov, Andrew and Ling, Zhen-Hua and Zhu, Xiaodan and Wei, Si and Hu, Yu}, year = {2016}, eprint = {1603.07704}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/1603.07704}, dimensions = {true}, }
- STOCKLeverage Financial News to Predict Stock Price Movements Using Word Embeddings and Deep Neural NetworksYangtuo Peng, and Hui JiangIn Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Jun 2016
Financial news contains useful information on public companies and the market. In this pa- per we apply the popular word embedding methods and deep neural networks to lever- age financial news to predict stock price move- ments in the market. Experimental results have shown that our proposed methods are simple but very effective, which can signifi- cantly improve the stock prediction accuracy on a standard financial database over the base- line system using only the historical price in- formation.
@inproceedings{peng-jiang-2016-leverage, title = {Leverage Financial News to Predict Stock Price Movements Using Word Embeddings and Deep Neural Networks}, author = {Peng, Yangtuo and Jiang, Hui}, booktitle = {Proceedings of the 2016 Conference of the North {A}merican Chapter of the Association for Computational Linguistics (NAACL)}, month = jun, year = {2016}, address = {San Diego, California}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/N16-1041}, doi = {10.18653/v1/N16-1041}, pages = {374--379}, dimensions = {true}, }
- FSMNFeedforward Sequential Memory Networks: A New Structure to Learn Long-term DependencyShiliang Zhang, Cong Liu, Hui Jiang, and 3 more authorsJun 2016
In this paper, we propose a novel neural network structure, namely \emphfeedforward sequential memory networks (FSMN), to model long-term dependency in time series without using recurrent feedback. The proposed FSMN is a standard fully-connected feedforward neural network equipped with some learnable memory blocks in its hidden layers. The memory blocks use a tapped-delay line structure to encode the long context information into a fixed-size representation as short-term memory mechanism. We have evaluated the proposed FSMNs in several standard benchmark tasks, including speech recognition and language modelling. Experimental results have shown FSMNs significantly outperform the conventional recurrent neural networks (RNN), including LSTMs, in modeling sequential signals like speech or language. Moreover, FSMNs can be learned much more reliably and faster than RNNs or LSTMs due to the inherent non-recurrent model structure.
@paper{zhang2016feedforwardsequentialmemorynetworks, title = {Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency}, author = {Zhang, Shiliang and Liu, Cong and Jiang, Hui and Wei, Si and Dai, Lirong and Hu, Yu}, year = {2016}, eprint = {1512.08301}, archiveprefix = {arXiv}, primaryclass = {cs.NE}, url = {https://arxiv.org/abs/1512.08301}, dimensions = {true}, }
2015
- SWELearning Semantic Word Embeddings based on Ordinal Knowledge ConstraintsQuan Liu, Hui Jiang, Si Wei, and 2 more authorsIn Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL), Jul 2015
In this paper, we propose a general frame-work to incorporate semantic knowledge into the popular data-driven learning process of word embeddings to improve the quality of them. Under this framework, we represent semantic knowledge as many ordinal ranking inequalities and formulate the learning of semantic word embed- dings (SWE) as a constrained optimization problem, where the data-derived objective function is optimized subject to all ordinal knowledge inequality constraints extracted from available knowledge resources such as Thesaurus and Word-Net. We have demonstrated that this con- strained optimization problem can be efficiently solved by the stochastic gradient descent (SGD) algorithm, even for a large number of inequality constraints. Experimental results on four standard NLP tasks, including word similarity measure, sen- tence completion, name entity recognition, and the TOEFL synonym selection, have all demonstrated that the quality of learned word vectors can be significantly improved after semantic knowledge is incorporated as inequality constraints during the learning process of word embeddings.
@inproceedings{liu-etal-2015-learning, title = {Learning Semantic Word Embeddings based on Ordinal Knowledge Constraints}, author = {Liu, Quan and Jiang, Hui and Wei, Si and Ling, Zhen-Hua and Hu, Yu}, booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL)}, pages = {1501--1511}, month = jul, year = {2015}, address = {Beijing, China}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/P15-1145}, doi = {10.3115/v1/P15-1145}, dimensions = {true}, }
- FOFEThe Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language ModelsShiLiang Zhang, Hui Jiang, MingBin Xu, and 2 more authorsIn Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL), Jul 2015
To bridge the gap between Machine Reading Comprehension (MRC) models and human beings, which is mainly reflected in the hunger for data and the robustness to noise, in this paper, we explore how to integrate the neural networks of MRC models with the general knowledge of human beings. On the one hand, we propose a data enrichment method, which uses WordNet to extract inter-word semantic connections as general knowledge from each given passage-question pair. On the other hand, we propose an end-to-end MRC model named as Knowledge Aided Reader (KAR), which explicitly uses the above extracted general knowledge to assist its attention mechanisms. Based on the data enrichment method, KAR is comparable in performance with the state-of-the-art MRC models, and significantly more robust to noise than them. When only a subset (20%-80%) of the training examples are available, KAR outperforms the state-of-the-art MRC models by a large margin, and is still reasonably robust to noise.
@inproceedings{zhang-etal-2015-fixed, title = {The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models}, author = {Zhang, ShiLiang and Jiang, Hui and Xu, MingBin and Hou, JunFeng and Dai, LiRong}, booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL)}, month = jul, year = {2015}, address = {Beijing, China}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/P15-2081}, doi = {10.3115/v1/P15-2081}, pages = {495--500}, dimensions = {true}, }
2014
- CNN-SP-JConvolutional Neural Networks for Speech RecognitionOssama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, and 3 more authorsIEEE/ACM Transactions on Audio, Speech, and Language Processing, (2016 IEEE SPS Best Paper Award) , Jul 2014
Recently, the hybrid deep neural network (DNN)-hidden Markov model (HMM) has been shown to significantly improve speech recognition performance over the conventional Gaussian mixture model (GMM)-HMM. The performance improvement is partially attributed to the ability of the DNN to model complex correlations in speech features. In this paper, we show that further error rate reduction can be obtained by using convolutional neural networks (CNNs). We first present a concise description of the basic CNN and explain how it can be used for speech recognition. We further propose a limited-weight-sharing scheme that can better model speech features. The special structure such as local connectivity, weight sharing, and pooling in CNNs exhibits some degree of invariance to small shifts of speech features along the frequency axis, which is important to deal with speaker and environment variations. Experimental results show that CNNs reduce the error rate by 6%-10% compared with DNNs on the TIMIT phone recognition and the voice search large vocabulary speech recognition tasks.
@article{6857341, author = {Abdel-Hamid, Ossama and Mohamed, Abdel-rahman and Jiang, Hui and Deng, Li and Penn, Gerald and Yu, Dong}, journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing}, title = {Convolutional Neural Networks for Speech Recognition}, year = {2014}, volume = {22}, number = {10}, pages = {1533-1545}, doi = {10.1109/TASLP.2014.2339736}, dimensions = {true}, }
- DNN-AptFast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech RecognitionShaofei Xue, Ossama Abdel-Hamid, Hui Jiang, and 2 more authorsIEEE/ACM Transactions on Audio, Speech, and Language Processing, Jul 2014
Fast adaptation of deep neural networks (DNN) is an important research topic in deep learning. In this paper, we have proposed a general adaptation scheme for DNN based on discriminant condition codes, which are directly fed to various layers of a pre-trained DNN through a new set of connection weights. Moreover, we present several training methods to learn connection weights from training data as well as the corresponding adaptation methods to learn new condition code from adaptation data for each new test condition. In this work, the fast adaptation scheme is applied to supervised speaker adaptation in speech recognition based on either frame-level cross-entropy or sequence-level maximum mutual information training criterion. We have proposed three different ways to apply this adaptation scheme based on the so-called speaker codes: i) Nonlinear feature normalization in feature space; ii) Direct model adaptation of DNN based on speaker codes; iii) Joint speaker adaptive training with speaker codes. We have evaluated the proposed adaptation methods in two standard speech recognition tasks, namely TIMIT phone recognition and large vocabulary speech recognition in the Switchboard task. Experimental results have shown that all three methods are quite effective to adapt large DNN models using only a small amount of adaptation data.
@article{Xue2014FastAO, title = {Fast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech Recognition}, author = {Xue, Shaofei and Abdel-Hamid, Ossama and Jiang, Hui and Dai, Lirong and Liu, Qingfeng}, journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing}, year = {2014}, volume = {22}, pages = {1713-1725}, url = {https://api.semanticscholar.org/CorpusID:18929718}, doi = {10.1109/TASLP.2014.2346313}, dimensions = {true}, }