Open-Weight AI Models

Open-weight models are AI systems whose trained parameters are publicly released, which allows developers to run, fine-tune, and deploy them independently rather than accessing them only through a hosted API. While closed-weight models from companies like OpenAI or Anthropic are delivered as managed services, open-weight models give organizations direct control over how the models are deployed and used. Importantly, the performance of these models is steadily improving and they’ve become credible alternatives for production workloads, with advantages in customization and data privacy.

Fireworks AI is building a platform focused on serving and customizing open-weight models at scale. The platform includes optimized inference infrastructure, multi-hardware support across NVIDIA and AMD, and reinforcement fine-tuning capabilities.

Benny Chen is a Co-Founder of Fireworks AI. In this episode, he joins Gregor Vand to discuss his path from Meta’s ML infrastructure teams to co-founding Fireworks AI, why open-weight models are becoming increasingly competitive, how custom kernels and speculative decoding improve performance, reinforcement fine-tuning, and much more.

Gregor Vand is a security-focused technologist, having previously been a CTO across cybersecurity, cyber insurance and general software engineering companies. He is based in Singapore and can be found via his profile at vand.hk or on LinkedIn.

 

 

 

Please click here to see the transcript of this episode.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

Sponsors

turbopuffer is how companies like Anthropic, Cursor, Notion, Atlassian, and Ramp ship their most ambitious search features. turbopuffer is a serverless vector and full-text search engine built on object storage. It’s up to 95% cheaper than traditional search databases, and just as fast. With turbopuffer you can index and search 50 million documents at 10 millisecond p90 query latency for less than 100 dollars a month. Head to turbopuffer.com/sed to get your first month free.

In mobile application security, ‘good enough’ is a risk.

Guardsquare uses advanced, multi-layered code hardening techniques and automated runtime application self-protection and mobile application security testing, combined with real-time threat monitoring, to deliver the highest level of mobile app security.

Discover how Guardsquare brings all these together to provide mobile app security for your Android and iOS apps without compromise at www dot Guardsquare dot com.

Today’s episode of Software Engineering Daily is brought to you by Unblocked.

Your coding agents have access to your codebase, maybe you’ve even connected other tools via MCPs. But access doesn’t mean context. Agents can’t reason across MCPs, they don’t know your architectural decisions, your team’s patterns, or why the API was shaped the way it is. So agents look in the wrong place and deliver bad outputs. Then you spend time correcting—turn after turn.

Unblocked is the context layer your agents are missing. It synthesizes your PRs, docs, Slack, and tickets into organizational context that agents actually understand – so they make better plans, write higher quality code, use fewer tokens, and require fewer correction loops.

If you’re running Claude Code, Cursor, or any agentic workflow, Unblocked is worth a look.

Get a free three-week trial at getunblocked.com/sedaily.

Software Daily

Software Daily

 
Subscribe to Software Daily, a curated newsletter featuring the best and newest from the software engineering community.