报告题目：Explainable AI, Attention Mechanism and Transformer
会 议 码：685-6702-5904
报 告 人：许东 教授
Dong Xu is Curators’ Distinguished Professor in the Department of Electrical Engineering and Computer Science, with appointments in the Christopher S. Bond Life Sciences Center and the Informatics Institute at the University of Missouri-Columbia. He obtained his Ph.D. from the University of Illinois, Urbana-Champaign in 1995 and did two years of postdoctoral work at the US National Cancer Institute. He was a Staff Scientist at Oak Ridge National Laboratory until 2003 before joining the University of Missouri, where he served as Department Chair of Computer Science during 2007-2016 and Director of Information Technology Program during 2017-2020. Over the past 30 years, he has conducted research in many areas of computational biology and bioinformatics, including single-cell data analysis, protein structure prediction and modeling, protein post-translational modifications, protein localization prediction, computational systems biology, biological information systems, and bioinformatics applications in human, microbes, and plants. His research since 2012 has focused on the interface between bioinformatics and deep learning. He has published more than 400 papers with more than 21,000 citations and an H-index of 74, according to Google Scholar. He was elected to the rank of American Association for the Advancement of Science (AAAS) Fellow in 2015 and American Institute for Medical and Biological Engineering (AIMBE) Fellow in 2020.
Deep learning models are frequently regarded as “black boxes”, which can produce accurate results, but humans may not understand. However, significant progress has been made in explainable AI (XAI) to provide model explain ability and interpretability, further leading to improvement of model performance. A central approach to opening “black boxes” is the attention mechanism, and a highly successful example of using an attention mechanism is transformer. In this talk, I will review the scope of XAI and some model-agnostic methods (LIME and SHAP). I will introduce some major techniques of saliency map (gradient-based backpropagation, deconvolutional networks, and class activation maps). These methods help understand and visualize the driving features in a model. I will explain attention, attention mechanism, and multi-head attention. Finally, I will show how transformer works and present several related models (BERT, ELECTRA, and GPT).