Download - Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers by lifelonglearner, Peter Hase

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: LessWrong Top Posts

Education

Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers by lifelonglearner, Peter Hase

2021-12-11

Download Right click and do "save link as"

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers, published by lifelonglearner, Peter Hase on the AI Alignment Forum. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. Peter Hase. UNC Chapel Hill. Owen Shen. UC San Diego. With thanks to Robert Kirk and Mohit Bansal for helpful feedback on this post. Introduction. Model interpretability was a bullet point in Concrete Problems in AI Safety (2016). Since then, interpretability has come to comprise entire research directions in technical safety agendas (2020); model transparency appears throughout An overview of 11 proposals for building safe advanced AI (2020); and explainable AI has a Twitter hashtag, #XAI. (For more on how interpretability is relevant to AI safety, see here or here.) Interpretability is now a very popular area of research. The interpretability area was the most popular in terms of video views at ACL last year. Model interpretability is now so mainstream there are books on the topic and corporate services promising it. So what's the state of research on this topic? What does progress in interpretability look like, and are we making progress? What is this post? This post summarizes 70 recent papers on model transparency, interpretability, and explainability, limited to a non-random subset of papers from the past 3 years or so. We also give opinions on several active areas of research, and collate another 90 papers that are not summarized. How to read this post. If you want to see high-level opinions on several areas of interpretability research, just read the opinion section, which is organized according to our very ad-hoc set of topic areas. If you want to learn more about what work looks like in a particular area, you can read the summaries of papers in that area. For a quick glance at each area, we highlight one standout paper per area, so you can just check out that summary. If you want to see more work that has come out in an area, look at the non-summarized papers at the end of the post (organized with the same areas as the summarized papers). We assume readers are familiar with basic aspects of interpretability research, i.e. the kinds of concepts in The Mythos of Model Interpretability and Towards A Rigorous Science of Interpretable Machine Learning. We recommend looking at either of these papers if you want a primer on interpretability. We also assume that readers are familiar with older, foundational works like "Why Should I Trust You?: Explaining the Predictions of Any Classifier." Disclaimer: This post is written by a team of two people, and hence its breadth is limited and its content biased by our interests and backgrounds. A few of the summarized papers are our own. Please let us know if you think we've missed anything important that could improve the post. Master List of Summarized Papers. Theory and Opinion. Explanation in Artificial Intelligence: Insights from the Social Sciences. Chris Olah’s views on AGI safety. Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness? The elephant in the interpretability room: Why use attention as explanation when we have saliency methods? Aligning Faithful Interpretations with their Social Attribution. Evaluation. Are Visual Explanations Useful? A Case Study in Model-in-the-Loop Prediction. Comparing Automatic and Human Evaluation of Local Explanations for Text Classification. Do explanations make VQA models more predictable to a human? Sanity Checks for Saliency Maps. A Benchmark for Interpretability Methods in Deep Neural Networks. Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? ERASER: A Benchmark to Evaluate Rationalized NLP Models. On quantitative aspects of model interpretability. Manipulating and M...