Unstructured Data and LLMs with Crag Wolfe and Matt Robinson

The majority of enterprise data exists in heterogenous formats such as HTML, PDF, PNG, and PowerPoint. However, large language models do best when trained with clean, curated data. This presents a major data cleaning challenge.

Unstructured is focused on extracting and transforming complex data to prepare it for vector databases and LLM frameworks.

Crag Wolfe is Head of Engineering and Matt Robinson is Head of Product at Unstructured. They join the podcast to talk about data cleaning in the LLM age.

Sean’s been an academic, startup founder, and Googler. He has published works covering a wide range of topics from information visualization to quantum computing. Currently, Sean is Head of Marketing and Developer Relations at Skyflow and host of the podcast Partially Redacted, a podcast about privacy and security engineering. You can connect with Sean on Twitter @seanfalconer .

 

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

Sponsors

Notion isn’t just a platform; it’s a game-changer for collaboration. Whether you’re part of a Fortune 500 company or a freelance designer, Notion brings teams together like never before. Notion AI turns knowledge into action.

From summarizing meeting notes and automatically generating action items, to getting answers to any question in seconds. If you can think it, you can make it. Notion is a place where any team can write, plan, organize, and rediscover the joy of play.

Dive into Notion for free today at notion.com/sed.

​​This episode of Software Engineering Daily is brought to you by Authlete.

Are you trying to protect your API with OAuth or struggling to build an OAuth server?

Implementing OAuth yourself can be challenging, and even risky. Meanwhile, one-stop identity solutions can be expensive, missing necessary features, or not fit into your existing architecture.

Authlete can help.

Delegate complex OAuth implementation to APIs designed and developed by the experts that authored many of the OAuth standards. With Authlete, you can use your existing authentication system and the language of your choice to quickly build your OAuth server. And you’ll always stay up-to-date with the latest specifications.

Focus on developing applications and shipping features. Leave the complicated OAuth implementation to the experts.

Authlete is the trusted OAuth service for leading financial, healthcare, and media companies.

Get started today with a 90-day extended free trial at Authlete.com/sed.

FlagSmith is an open -source feature flag software that lets developers release features with confidence. This lets you test in production, stop monster pull requests, and get more control over deployments. It’s easy to get set up, whether you’re trying feature flags for the first time, are tired of managing them in -house, or are looking to move away from slow development cycles and legacy systems with feature management.

You can get up and running for free on SAS and less than five minutes to test feature toggling for your app. Once you’re going, click around with out -of -the -box feature flag functionality and easy integrations with tools like Jira without any bloat.

For maximum control and flexibility, you can also choose how to deploy Flaksmith. Options include on -premise, self -hosted, SAS, and private cloud. cloud. Try feature flagging for free by visiting flagsmith.com.

Software Daily

Software Daily

 
Subscribe to Software Daily, a curated newsletter featuring the best and newest from the software engineering community.