Podbean logo
  • Discover
  • Podcast Features
    • Podcast Hosting

      Start your podcast with all the features you need.

    • Podbean AI Podbean AI

      AI-Enhanced Audio Quality and Content Generation.

    • Blog to Podcast

      Repurpose your blog into an engaging podcast.

    • Video to Podcast

      Convert YouTube playlists to podcasts, videos to audios.

  • Monetization
    • Ads Marketplace

      Join Ads Marketplace to earn through podcast sponsorships.

    • PodAds

      Manage your ads with dynamic ad insertion capability.

    • Apple Podcasts Subscriptions Integration

      Monetize with Apple Podcasts Subscriptions via Podbean.

    • Live Streaming

      Earn rewards and recurring income from Fan Club membership.

  • Podbean App
    • Podcast Studio

      Easy-to-use audio recorder app.

    • Podcast App

      The best podcast player & podcast app.

  • Help and Support
    • Help Center

      Get the answers and support you need.

    • Podbean Academy

      Resources and guides to launch, grow, and monetize podcast.

    • Podbean Blog

      Stay updated with the latest podcasting tips and trends.

    • What’s New

      Check out our newest and recently released features!

    • Podcasting Smarter

      Podcast interviews, best practices, and helpful tips.

  • Popular Topics
    • How to Start a Podcast

      The step-by-step guide to start your own podcast.

    • How to Start a Live Podcast

      Create the best live podcast and engage your audience.

    • How to Monetize a Podcast

      Tips on making the decision to monetize your podcast.

    • How to Promote Your Podcast

      The best ways to get more eyes and ears on your podcast.

    • Podcast Advertising 101

      Everything you need to know about podcast advertising.

    • Mobile Podcast Recording Guide

      The ultimate guide to recording a podcast on your phone.

    • How to Use Group Recording

      Steps to set up and use group recording in the Podbean app.

  • All Arts Business Comedy Education
  • Fiction Government Health & Fitness History Kids & Family
  • Leisure Music News Religion & Spirituality Science
  • Society & Culture Sports Technology True Crime TV & Film
  • Live
  • How to Start a Podcast
  • How to Start a Live Podcast
  • How to Monetize a podcast
  • How to Promote Your Podcast
  • How to Use Group Recording
  • Log in
  • Start your podcast for free
  • Podcasting
    • Podcast Features
      • Podcast Hosting

        Start your podcast with all the features you need.

      • Podbean AI Podbean AI

        AI-Enhanced Audio Quality and Content Generation.

      • Blog to Podcast

        Repurpose your blog into an engaging podcast.

      • Video to Podcast

        Convert YouTube playlists to podcasts, videos to audios.

    • Monetization
      • Ads Marketplace

        Join Ads Marketplace to earn through podcast sponsorships.

      • PodAds

        Manage your ads with dynamic ad insertion capability.

      • Apple Podcasts Subscriptions Integration

        Monetize with Apple Podcasts Subscriptions via Podbean.

      • Live Streaming

        Earn rewards and recurring income from Fan Club membership.

    • Podbean App
      • Podcast Studio

        Easy-to-use audio recorder app.

      • Podcast App

        The best podcast player & podcast app.

  • Advertisers
  • Enterprise
  • Pricing
  • Resources
    • Help and Support
      • Help Center

        Get the answers and support you need.

      • Podbean Academy

        Resources and guides to launch, grow, and monetize podcast.

      • Podbean Blog

        Stay updated with the latest podcasting tips and trends.

      • What’s New

        Check out our newest and recently released features!

      • Podcasting Smarter

        Podcast interviews, best practices, and helpful tips.

    • Popular Topics
      • How to Start a Podcast

        The step-by-step guide to start your own podcast.

      • How to Start a Live Podcast

        Create the best live podcast and engage your audience.

      • How to Monetize a Podcast

        Tips on making the decision to monetize your podcast.

      • How to Promote Your Podcast

        The best ways to get more eyes and ears on your podcast.

      • Podcast Advertising 101

        Everything you need to know about podcast advertising.

      • Mobile Podcast Recording Guide

        The ultimate guide to recording a podcast on your phone.

      • How to Use Group Recording

        Steps to set up and use group recording in the Podbean app.

  • Discover
  • Log in
    Sign up free
The Swyx Mixtape

The Swyx Mixtape

Technology

Snorkel.ai: Unlocking Subject Matter Experts to make Software 2.0 [Alex Ratner]

Snorkel.ai: Unlocking Subject Matter Experts to make Software 2.0 [Alex Ratner]

2021-07-08
Download Right click and do "save link as"

Source: https://www.thecloudcast.net/2021/06/automated-data-labeling-for-ai-apps.html
See also: https://softwareengineeringdaily.com/2020/04/09/snorkel-training-dataset-management-with-braden-hancock/

Software 2.0 is Andrej Karpathy's idea that instead of coding business logic by hand, the applications of the future will be trained by data. In other words, machine learning. But ML is limited by the quality of data available, and there is a lot of unstructured, unlabeled data out there that is still being manually labeled today. Scale.AI is a well known startup that has done very well offering a scalable manual labeling workforce, however they are still bottlenecked by the number of subject matter experts available for labeling critically important data, like cancer diagnosis and drug trafficking rings. In order to get labels from subject matter experts, you typically have to put them through a very tedious process of labeling to build up a useful structured dataset upfront before any useful machine learning can be done.

I did some very minor ML work about 5 years ago and found Christopher Re's work on DeepDive at Stanford. It takes a revolutionary approach by making it easy to write the labeling functions themselves. This turns the labeling process into an iterative, REPL like experience where subject matter experts can suggest a function, see its impact right away, and continue refining it, assisted by AI. DeepDive is now commercialized in a startup called Snorkel.AI, so I was very excited to find a clear explanation of Snorkelflow from its CEO, Alex Ratner.

Here it is!

Transcript


[00:01:15] Alex Ratner: [00:01:15] SnorkelFlow is a platform that's meant to take this process of building machine learning models and AI applications. And I get all starting with buildings, the data that they rely on that fuels them and make it, in a nutshell, look more like an iterative software development process. Then you know, this kind of 80, 90% upfront just, hand labeling exercise.

[00:01:34]And so snorkel flow supports that entire iterative loop of, actually laboring data. Can be by hand in the platform, but also most centrally programmatically by letting users, what we call labeling. Basic idea, is that rather than say asking your, legal associate at a bank to, or your doctor friends to sit down and, label a hundred thousand contracts or a hundred thousand electronic health records have them, right.

[00:02:00]Sharistics are bits of their expertise look for this keyword or look for this pattern or look for this, et cetera. I'm like a bridge from old, expert knowledge type input. Modern machine learning models using one to power. The other. So a snorkel flow is an IDE basically, and has a no-code UI component as well, but let's not people either via code or by pushing buttons for even, non-developer subject matter experts say to.

[00:02:24]Programmatically labeled their data by writing these labeling functions and then uses a bunch of modeling techniques. A lot of which was actually, the work that, that the co-founding team. And I did in, in, in our kind of thesis work around how you take a bunch of programmatic data and clean it up and turn it into a final.

[00:02:41]Instead of clean training data for machine learning models, and then actually in snorkel flow, you can, autumn, basically push button train best-in-class open source models. You can then analyze where they're succeeding or failing and, and use that to go back and iterate on your data.

[00:02:54]And there's a Python SDK throughout the whole thing. So many of our customers will mix and match. Will you start.  Create the training data set and then train the model on some other system, et cetera. But what's normal flames of support. Is it basic iterative development process where, you know, rather than just spending months to label a training at once and then being stuck with it and having to throw it out and start all over again, anything in the world changes your upstream input, data changes your downstream objectives.

[00:03:18] Change, making it again more like an iterative process where you push some buttons or write some code. That label the data. You compile a model or train it, but you can think of it like compiling and then you go back and debug by, by iterating on your data, everything centers and snorkel flow around looking at your data and iterating on how it's labeled to improve models.

[00:03:38]Brian Gracely: [00:03:38] I'm curious. So you mentioned you mentioned in there's a there's a Python SDK, which for anybody who, works in data science, data modeling, right? Python is your language to Frank sort of the language you use or are you a couple of them, that's the language that, you how you do your program, but I'm curious, like in today's world, Do data scientists consider themselves programmers or is there still Hey, look, I work on the numbers, I'm good at building models and the numbers, but I don't think of myself as a programmer.

[00:04:08] Like how do you bridge those two worlds together or do you not really have to bridge them together? How much does the data scientists have to go? I have to focus on numbers and models versus I have to focus on programming, something to do stuff. What's their world look like?

[00:04:21]Alex Ratner: [00:04:21] It's a great question. I think I, I haven't been are currently I'm part of four or five different data science institutes or something. And I don't even still know. I mean, the data science is such a broad umbrella term. There's so many different varietals of us and, and types.

[00:04:35] And so I do think there's a very broad spectrum of, the data scientists. An ML engineer and just, loves writing codes are the one that, to your point really just wants to push some buttons and get back to the numbers and the modeling and the outcome. And, we definitely, try to support the range through a layered approach.

[00:04:50]And, we, we have , but on top of that, we have a a no-code UI that allows you to write these wavelength functions without writing code. So for example, if you're trying to train a CA a contract classifier and snorkeled flow, you can, write Lateline functions based on clicking on keywords or pressing buttons with kind of templates for types of patterns or signals you want to look for.

[00:05:11] So, No we try to support basically, if you want to move fast and you're a non developer, or you're just not looking to spend time there, you can just do it in push-button way. But then if you want to go and customize or inject custom logic or really get creative, you can always fall back to the Python SDK.

[00:05:27] And so, I mean, I think a lot of the what we're trying to accomplish in the very beginning, right? Raised me abstraction know level at which you're interfacing with and programming your machine learning model or your AI application. And the first step is the hardest, right?

[00:05:39] If you think of the way that hand labeled training data is, it's like the machine code, or really actually, just so you know, I think of it as like the ones and zeros, literally for binary classification cases. Yeah, a lot of the effort behind the circle project and the company is just, or was just getting from that layer to the layer of...

view more

More Episodes

Writing Advice [David Perell, Courtland Allen]
2021-06-17
Everything is a Remix [Kirby Ferguson]
2021-06-16
wtf is dbt? [Drew Banin]
2021-06-16
The REAL Lesson of Tuesdays with Morrie [Mitch Albom]
2021-06-14
[Second Brain 5] Finale
2021-06-12
[Music Fridays] In The Heights — Original Motion Picture Soundtrack
2021-06-11
Apple Pie Positions and Certainty Theater [Shreyas Doshi]
2021-06-11
Cloudflare at TechCrunch Disrupt 2010 [Matthew Prince]
2021-06-10
EPOC Personal Branding [Sam Parr, Shaan Puri]
2021-06-09
The Goddess of Everything Else [George Hotz, Scott Alexander]
2021-06-08
[Second Brain 4] Intermediate Packets / Bottom-Up Idea Exploration
2021-06-06
[Music Fridays] The Thong Song — Sisqo
2021-06-04
Lineage Driven Fault Injection [Kolton Andrus]
2021-06-04
Hinge's Last $25,000 [Justin McLeod]
2021-06-03
Nuclear Plant Security [Malicious Life]
2021-06-01
Time Block Planning [Cal Newport]
2021-06-01
[Second Brain 3] Distilling Notes
2021-05-29
[Music Fridays] Ludwig Göransson — Black Panther, This is America, The Mandalorian
2021-05-29
The Power of Personal Podcasting [swyx]
2021-05-28
Robo-caller Payback Time [Josh Browder]
2021-05-26
  • ←
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • →
012345678910111213141516171819

Get this podcast on your
phone, FREE

Download Podbean app on App Store Download Podbean app on Google Play

Create your
podcast in
minutes

  • Full-featured podcast site
  • Unlimited storage and bandwidth
  • Comprehensive podcast stats
  • Distribute to Apple Podcasts, Spotify, and more
  • Make money with your podcast
Get started

It is Free

  • Podcast Services

    • Podcast Features
    • Pricing
    • Enterprise Solution
    • Private Podcast
    • The Podcast App
    • Live Stream
    • Audio Recorder
    • Remote Recording
    • Podbean AI
  •  
    • Create a Podcast
    • Video Podcast
    • Start Podcasting
    • Start Radio Talk Show
    • Education Podcast
    • Church Podcast
    • Nonprofit Podcast
    • Get Sermons Online
    • Free Audiobooks
  • MONETIZATION & MORE

    • Podcast Advertising
    • Dynamic Ads Insertion
    • Apple Podcasts Subscriptions
    • Switch to Podbean
    • YouTube to Podcast
    • Blog to Podcast
    • Submit Your Podcast
    • Podbean Plugins
    • Developers
  • KNOWLEDGE BASE

    • How to Start a Podcast
    • How to Start a Live Podcast
    • How to Monetize a Podcast
    • How to Promote Your Podcast
    • Mobile Podcast Recording Guide
    • How to Use Group Recording
    • Podcast Advertising 101
  • Support

    • Support Center
    • What’s New
    • Free Webinars
    • Podcast Events
    • Podbean Academy
    • Podbean Amplified Podcast
    • Badges
    • Resources
  • Podbean

    • About Us
    • Podbean Blog
    • Careers
    • Press and Media
    • Green Initiative
    • Affiliate Program
    • Contact Us
  • Privacy Policy
  • Cookie Policy
  • Terms of Use
  • Consent Preferences
  • Copyright © 2015-2025 Podbean.com