Download - #114 - Secrets of Deep Reinforcement Learning (Minqi Jiang)

Discover

Podcast Features
Your all-in-one podcasting solution.

Blog to Podcast
Turn your blog into an engaging podcast.
Livestream
High-performing audio live, without limits.

Podcast Studio
Easy-to-use audio recorder app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Podcast App
The best podcast player & podcast app.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.
Live Streaming
Receive livestream rewards from your audience and earn
recurring income from your Fan Club membership.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Advertisers
Enterprise
Pricing
Discover

Machine Learning Street Talk (MLST)

Technology

#114 - Secrets of Deep Reinforcement Learning (Minqi Jiang)

2023-04-16

Download Right click and do "save link as"

Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB Twitter: https://twitter.com/MLStreetTalk

In this exclusive interview, Dr. Tim Scarfe sits down with Minqi Jiang, a leading PhD student at University College London and Meta AI, as they delve into the fascinating world of deep reinforcement learning (RL) and its impact on technology, startups, and research. Discover how Minqi made the crucial decision to pursue a PhD in this exciting field, and learn from his valuable startup experiences and lessons.

Minqi shares his insights into balancing serendipity and planning in life and research, and explains the role of objectives and Goodhart's Law in decision-making. Get ready to explore the depths of robustness in RL, two-player zero-sum games, and the differences between RL and supervised learning.

As they discuss the role of environment in intelligence, emergence, and abstraction, prepare to be blown away by the possibilities of open-endedness and the intelligence explosion. Learn how language models generate their own training data, the limitations of RL, and the future of software 2.0 with interpretability concerns.

From robotics and open-ended learning applications to learning potential metrics and MDPs, this interview is a goldmine of information for anyone interested in AI, RL, and the cutting edge of technology. Don't miss out on this incredible opportunity to learn from a rising star in the AI world!

TOC

Tech & Startup Background [00:00:00]

Pursuing PhD in Deep RL [00:03:59]

Startup Lessons [00:11:33]

Serendipity vs Planning [00:12:30]

Objectives & Decision Making [00:19:19]

Minimax Regret & Uncertainty [00:22:57]

Robustness in RL & Zero-Sum Games [00:26:14]

RL vs Supervised Learning [00:34:04]

Exploration & Intelligence [00:41:27]

Environment, Emergence, Abstraction [00:46:31]

Open-endedness & Intelligence Explosion [00:54:28]

Language Models & Training Data [01:04:59]

RLHF & Language Models [01:16:37]

Creativity in Language Models [01:27:25]

Limitations of RL [01:40:58]

Software 2.0 & Interpretability [01:45:11]

Language Models & Code Reliability [01:48:23]

Robust Prioritized Level Replay [01:51:42]

Open-ended Learning [01:55:57]

Auto-curriculum & Deep RL [02:08:48]

Robotics & Open-ended Learning [02:31:05]

Learning Potential & MDPs [02:36:20]

Universal Function Space [02:42:02]

Goal-Directed Learning & Auto-Curricula [02:42:48]

Advice & Closing Thoughts [02:44:47]

References:

- Why Greatness Cannot Be Planned: The Myth of the Objective by Kenneth O. Stanley and Joel Lehman

https://www.springer.com/gp/book/9783319155234

- Rethinking Exploration: General Intelligence Requires Rethinking Exploration

https://arxiv.org/abs/2106.06860

- The Case for Strong Emergence (Sabine Hossenfelder)

https://arxiv.org/abs/2102.07740

- The Game of Life (Conway)

https://www.conwaylife.com/

- Toolformer: Teaching Language Models to Generate APIs (Meta AI)

https://arxiv.org/abs/2302.04761

- OpenAI's POET: Paired Open-Ended Trailblazer

https://arxiv.org/abs/1901.01753

- Schmidhuber's Artificial Curiosity

https://people.idsia.ch/~juergen/interest.html

- Gödel Machines

https://people.idsia.ch/~juergen/goedelmachine.html

- PowerPlay

https://arxiv.org/abs/1112.5309

- Robust Prioritized Level Replay: https://openreview.net/forum?id=NfZ6g2OmXEk

- Unsupervised Environment Design: https://arxiv.org/abs/2012.02096

- Excel: Evolving Curriculum Learning for Deep Reinforcement Learning

https://arxiv.org/abs/1901.05431

- Go-Explore: A New Approach for Hard-Exploration Problems

https://arxiv.org/abs/1901.10995

- Learning with AMIGo: Adversarially Motivated Intrinsic Goals

https://www.researchgate.net/publication/342377312_Learning_with_AMIGo_Adversarially_Motivated_Intrinsic_Goals

PRML

https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf

Sutton and Barto

https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf

More Episodes

Gary Marcus' keynote at AGI-24

2024-08-17

Is ChatGPT an N-gram model on steroids?

2024-08-15

Jay Alammar on LLMs, RAG, and AI Engineering

2024-08-11

Can AI therapy be more effective than drugs?

2024-08-08

Prof. Subbarao Kambhampati - LLMs don't reason, they memorize (ICML2024 2/13)

2024-07-29

Sayash Kapoor - How seriously should we take AI X-risk? (ICML 1/13)

2024-07-28

Sara Hooker - Why US AI Act Compute Thresholds Are Misguided

2024-07-18

Prof. Murray Shanahan - Machines Don't Think Like Us

2024-07-14

David Chalmers - Reality+

2024-07-08

Ryan Greenblatt - Solving ARC with GPT4o

2024-07-06

Aiden Gomez - CEO of Cohere (AI's 'Inner Monologue' – Crucial for Reasoning)

2024-06-29

New "50%" ARC result and current winners interviewed

2024-06-18

Cohere co-founder Nick Frosst on building LLM apps for business

2024-06-16

What’s the Magic Word? A Control Theory of LLM Prompting.

2024-06-05

CAN MACHINES REPLACE US? (AI vs Humanity) - Maria Santacaterina

2024-05-06

Dr. Thomas Parr - Active Inference Book

2024-05-01

Connor Leahy - e/acc, AGI and the future.

2024-04-21

Prof. Chris Bishop's NEW Deep Learning Textbook!

2024-04-10

Philip Ball - How Life Works

2024-04-07

Dr. Paul Lessard - Categorical/Structured Deep Learning

2024-04-01

012345678910111213141516171819

Get this podcast on your
phone, FREE

Create your
podcast in
minutes

Full-featured podcast site
Unlimited storage and bandwidth
Comprehensive podcast stats
Distribute to Apple Podcasts, Spotify, and more
Make money with your podcast

Get started

It is Free

Podcast Services
MONETIZATION & MORE
KNOWLEDGE BASE
Support
Podbean

Privacy Policy
Cookie Policy
Terms of Use
Consent Preferences

More Episodes

Podcast Services

MONETIZATION & MORE

KNOWLEDGE BASE

Support

Podbean