EPISODE 1846 [INTRODUCTION] [0:00:00] Announcer: JigsawStack is a startup that develops a suite of custom small models for tasks such as scraping, forecasting, VOCR, and translation. The platform is designed to support collaborative knowledge work, especially in research-heavy or strategy-driven environments. Yoeven Khemlani is the founder of JigsawStack, and he joins the podcast with Gregor Vand to talk about making use of small models for diverse applications. Gregor Vand is a CTO and founder, currently working at the intersection of communication, security, and AI, and is based in Singapore. His latest venture, Wyntk.ai, reimagines what email can be in the AI era. For more on Gregor, find him at vand.hk or on LinkedIn. [INTERVIEW] [0:00:59] GV: Hi, welcome to Software Engineering Daily. My guest today is Yoeven Khemlani. Welcome, Yoeven. [0:01:05] YK: Hey, hey, good to be here. [0:01:06] GV: Yoeven, you are the founder of JigsawStack. Ultimately, a Singapore company. But you're now in the US, which is a pretty well-trodden path, I think, for this part of the world. Before we get into JigsawStack, which I cannot wait to get into, is a very interesting product and very timely. I'd love to just hear a bit more about your background. I know this is, as they say, not your first rodeo when it comes to having founded a company. Yeah, just tell us about your journey to JigsawStack. [0:01:34] YK: Yeah, I think it didn't start very exciting, or it's like every other engineer. I'm an engineer like everyone else. I love building products. I love exploring new technologies. And that's kind of like where I started. So I started as a game developer, which many choose not to take that path because it's one of the hardest industries to kind of break into from a revenue standpoint or from like a salary standpoint, like your initial career, but something I enjoyed. Did that for a bit. Love building. Did love the industry. Went into the banking industry. Didn't love the corporate industry. Decided to leave that. And as I went to study my master's at Imperial, COVID hit right at the right moment. I decided to drop out. I didn't want to pay over $200,000 just to sit in my dorm room. So I decided to drop out, and eventually that's when I got inspired to start this company called Stayr way back. I think it's like four years ago. It was a hotel aggregation company in Southeast Asia that basically shot them bookings for the pool, the gym, and different things within the hotel because domestic markets were growing back then. Then that grew to a point. We went into co-working spaces, and eventually we sold it. And I think that kind of kicked off my path of like, "Okay, start-ups is where I want to be." I love building and I love building for people, right? It's like how can I make money from the things that I built? And that's when I realized that I specifically love building tech for developers and tools and software, and that's when I went into this rabbit hole of what should I do next? And this was like during the GPT 3 when it came out. I was like, "AI space is getting interesting." And a lot of technology being built is for frontend human-in-the-loop applications, right? Chatbots or tools that people can use in the loop. And I was like, "Can we bring this technology to backend applications where there's no humans in the loop. It just works and takes away processes that used to require a lot of manual intervention?" The most common thing right now is web scraping, right? Everybody's trying to scrape the web. Can we structure it in a way where you don't need to write Puppeteer code anymore or Playwright code anymore? And basically, you just give a URL and you prompt fields that you want to extract and it does the work for you at a 98% accuracy, not just mark down like actual fields that you can pull out. That's the pain point that we realized that can we automate these backend tasks? And we went down this rabbit hole of fine-tuning models and training. And how can we increase the accuracy to 97, 98, and get it up there? That's kind of how JigsawStack started. And we started narrowing our field there. [0:04:03] GV: Nice. I think you've kind of hinted out there. But JigsawStack, what is - in sort of, I don't know, one sentence, what is JigsawStack? [0:04:10] YK: JigsawStack is a suite of small models to automate your backend task. That's the way I like to phrase it. [0:04:16] GV: Awesome. Okay, so let's kind of dive just into that. You kind of already touched on, there was a pain point around pure web scraping. But then I guess when did that morph into small models versus just going down like, "Oh, we're just going to make a web scraping model," basically? [0:04:32] YK: Yeah. The first challenge that we tried to tackle is problems like can we use GPT-3 or GPT-4, one of the big, large LLMs, the best in the class at that point to solve this problem? Most of the times, we get good solutions for very human-in-the-loop, things you get marked down generated from a website that you can use to pass into a chatbot to then talk with that markdown. But you don't get structured data that you can actually use. For example, if you want to do pricing comparisons between two products on Amazon, typically, a developer would need to write some code to kind of extract that specific price field that they need to, and OpenAI or with any kind of scraper to kind of achieve that at that same accuracy a developer would be able to. That's when we realized that you had to fine-tune. When we started fine-tuning, we tried 400B models like the Lama 3.1 at that time. And then we started scaling down. We said, "Can we take the same and bring it to a 70B model?" Because then we reduce our cost, increased efficiency. And then we started going down even further. Can we bring that down to a 13B model? 13B didn't work really well. Now we're kind of on average at the 70B scale. And that's where we see it's easy to deploy, it's cheap. And since we're so specialized in that one use case, we can get rid of a lot of the other generic use cases built into the model, right? And kind of read and train a lot of it, specifically for that one thing. That's where the small model kind of utility really came in. [0:05:54] GV: Nice. Let's sort of maybe go through a few, I guess - well, you have many small models, is my understanding. Could you kind of go through a few of those? You've touched on kind of web scraping, which is still very much in the product today, I believe. [0:06:06] YK: Yeah. [0:06:07] GV: What are some of the other ones? [0:06:08] YK: Yeah. So when we launched the web scraping that blew up and it kind of opened this door of like, "Where should we go next?" We didn't want to dive super deep into web scraping because we realized the technology that we built for web scripting, it was 50% the model, 50% the infrastructure, because we trained the model specifically to Puppeteer, for example, that controls Puppeteer and injects JavaScript code. Now the question is like, how can we broaden the scope to do other forms of data extraction? And that's when we exploit data extraction as one pillar, right? OCR is an example of pulling out text from things. Traditional OCR, like Optical Character Recognition, uses machine learning to kind of recognize the characters. A very traditional way of doing it and very inaccurate way of doing it. Now we combined it with today's tool, like Vision LLMs, right? That gives us a whole new realm of OCR with bounding boxes and these kind of tool sets. And then we went into things like speech-to-text. It's basically extracting text from an audio. That's where we took Whispertree, optimized the shit out of it and basically made it one of the fastest speech-to-text. And we're actually faster than Groq at this point. That's the direction where we're taking, so that data extraction is just one example. Eventually, we realized that generative or data transformation was another big pillar. A lot of the users were asking like, "Hey, I scraped this chunk of data from," let's say, "a Mexican passport." They we're doing passport verifications with our OCR. And then now they need to translate some of the information. Translation became a huge pillar by itself where people were moving away from Google Translates and the typical translation providers because the quality of the product, LLMs that exist today. Can we take a model, train it specifically for translation, and then add languages as we go, right? And then become a specialized translation model. That's what we basically started to do for data transformation as well. [0:07:56] GV: And I saw you give a presentation a couple weeks ago and you were talking also about translating text in images. That's a kind of interesting challenge, right? [0:08:04] YK: It is. I think people have seen this in like consumer apps, like Google, where you point the app at an image and it kind of overlays a bunch of texts in different language. They have this in Apple, Microsoft. And we were looking into this and we're like surprised, there's no API for that. Literally, all these cloud providers don't provide an API for that specific service. I think one reason it could be because it's not high quality. They're actually just overlaying like a blurred text on top of the current text. What we are exploring is that we saw that image ads and image-generated content is becoming a big thing in the market, right? Is there a way that we can take an existing image, understand the text on that image, then translate it and diffuse it back with the same style, right? Basically, don't affect the style and the way the shape of the text is. And that's something to explore. With diffusion, you can really do a lot of that. You can train a model on glyphs, on text glyphs to then basically translate the thing. I think we're like two weeks away from like version one. And then version two is going to be the full diffusion model that we're going to launch. Right now, we're going for English and Chinese as the two major languages to kind of diffuse against. And then we're going to include Hindi, and Arabic, and different languages here. [0:09:12] GV: Awesome. I'd like to touch on - and I guess part of the product offering, and you can correct me whether this is sort of an amalgamation of bits of the product or totally standalone, but the Prompt Engine is something that we've talked a bit about before previously outside the podcast. And I had a sort of specific use case, which I can always talk about. But could you just describe what is Prompt Engine and what is that solving? [0:09:34] YK: Prompt Engine, it solves three big problems. First is prompt management, which there's a lot of providers in the market that do that for you. The second is prompt model routing. Again, another suite of products in the market that does that for you. And the last is prompt techniques, right? Techniques that you can apply to obviously increase the quality of the output. You can reduce your token costs and all these other things. And we see in the market, there's so many challenges for developers using three different products just to solve this. They use LangChain, plus they use another product just to track their tokens, and another product just to evaluate their model outputs and stuff. What we managed to do is can we train a really small model, 1B, 13B size, to basically make these decisions for you at runtime? We took different data sets in different industries, law, history, English, mathematics, and we trained a really small model that can make decisions on which model to pick at runtime based on your input of your prompt. Whenever you give an input, say generate a poem, da-da-da-da-da, it will understand, "Okay, it's a writing-focused product. It needs to generate poems. Then we have a dataset that has been trained. Okay, poems are really good with GPT-4.5. They're really good at writing. It'll automatically pick that model for you, but it will also run the same prompt against five other models. And when you get the output, every single time you run that same prompt, it narrows down the best model based on this concept called mixture of agents. Have you heard of that idea where it basically runs it and then uses LLMs as a judge to basically say, "Hey, three out of five, this thing gives the same output. Typically, this is going to be the correct output." And then we merge all these three into one output and then it gives that. It's a chain of techniques that kind of basically gives you this thing. From a user perspective, you don't pick Claude 3.7, or GPT 4.5, or Gemini 2. You basically get everything. And the selector, just the model picks that for you, and you can also store the prompts and then rerun and stuff. [0:11:38] GV: Yeah. I mean, the example I spoke with you about previously was it was around time zones. And I think we all know in programming, time in time zones is a hilariously difficult thing to still achieve various scenarios on. That was kind of why I at least ran through Prompt Engine initially to kind of get an understanding of what is this product and like how can it actually help us. And it was super interesting because it was far and away better from a results perspective. If I put into just pure GPT, or I'm talking about GPT over the API, or just go to Claude and just put it in there, or go to Prompt Engine and say like, "Hey, we've got three people in three different time zones. Here's a few rules, though. Don't start a meeting before 6am for any of these participants." Strangely, the other two really struggle with it. It's like you have to keep correcting them when they come up with these, like, "Oh, here's the three times. Yeah, one's at 5am." It was like, "I explicitly said not before 6am." "Oh, yes, you're right." And then da-da-da-da-da. Whereas prompt engine came back pretty reliably covering that. I think that's a very - I mean, can you sort of help explain why then it's able to produce that result? [0:12:48] YK: When you send your request, what happen behind the scenes is basically it sents a request across five models, right? And typically, what would happen is that maybe two or three out of that five models gave the right answer. And the smaller model that we trained accessed that judge, right? Models, small models are really good at consuming data, but horrible at generating. They can make a decision if you give it a lot of information to be like, "Yeah, this is the right one based on its context." But it's really bad at saying, "Hey, I can give you the right answer." If you tell it, "Okay, these are the five of the answers, and pick the best answer." It automatically uses a weighted average. It'll be like, "Okay, this two has these answers coming up more often. Plus, from my understanding, this is the prompt, the initial prompt, this is the answer," and it will automatically start merging those top two answers together. What happens is that you're always guaranteed the best answer, and you're not going to get the worst answer because two out of three or three out of five of the models are basically giving you that same answer or similar answers, and it emerges the output. Even if one of the model gives - let's say they add AM and the other model adds PM, when you combine it, it overrides based on the initial prompt. It's a mixture of agents. It's just one technique that we apply on it. Recently, using the 01 concept of chain of thought is another thing, another idea, but implementing an entire model like R1, or o1 mini, or o3 mini from the engine, it's going to take tons of token cost and like take a really long time from this. But we applied the same technique of chain of thought into our models from scratch way before the whole DeepSeek R1 came out. But obviously at the smaller scale. So it runs a lot faster and cheaper for the user. That's why it tends to work more accurately, especially even for structured data. If you're asking for a structured response, you almost guarantee get a structured response. Very simple as that. [0:14:36] GV: I think one thing I'm curious though, if a lot of users who are relying on LLMs, they're not actually tuning models themselves. That's kind of usually a sort of secondary exercise once you have an approved product-market fit or so forth. And clearly, the next step is to then tune a model and become more deterministic for their use case. At least where I say I kind of get used to understanding the likelihood of what's going to come back from a specific model. For example, if I look at our use case we have, it doesn't really matter the details of it. But if I put that use case to a Llama model versus to GPT, again, very different responses back, and I'm like, "Okay, we're always going to go with GPT for this one, because I'm just confident what's coming back." In the prompt engine example, how does that work in the sense - like, the result might be "better". But I guess with model behind the scenes have like flipped. I'm just trying to get my head around - [0:15:29] YK: 100%. [0:15:30] GV: - how much can the same prompts going to Prompt Engine end up then changing from sort of - I don't want to say quality perspective, because quality clearly is what the product is about. But change from sort of structure or like, "Oh, that's completely different to what I was shown -" [0:15:44] YK: Expecting. [0:15:45] GV: Yeah. [0:15:46] YK: 100%. That's an interesting question. The way we do that, that's why we always suggest the user to create with two parts to this system. You create a prompt first, and then you execute that prompt. Because that's where the prompt management layer comes in. The reason we suggest to the user to always create the prompt and run it. The typical LLM, you just execute the prompt, right? You just generate the prompt. But in our approach, you create - because you're storing this thing, there's a level of memory that takes place. Meaning every time that you run your prompt, we store a generated version of that promt of like how it outputted it every single time. Let's say if you run this from six, seven, eight, 10 times, it basically will always store the version of the output previous to that next run in the database. And then we use that as a baseline as well. If you keep changing your promt and you keep tuning your promt, we know clearly something is wrong. But if you kind of run the same execution or the same prom that we know, we say, "Hey, this is positive, this is positive, this is positive. This is how the structure should kind of look like. This is the output it should kind of look like." We give it some form of point system at the back where we're saying, "This is good. This is bad. This is good. This is bad." And the more you run, the more accurate it gets actually and the less models that we run against. Initially, it runs against five. By the time you're on your fifth run, it's running against two models. By your tenth run, you're running just one model. And we stick to that model. You don't change your model by your 10th run. If you're using, let's say, it automatically picks GPT-4.0, and GPT-4.0 keeps being the best model for your answer, it will just stick to GPT-4.0 throughout the lifetime of that prompt that you created so that you don't have changes and breaking changes as you scale, unless the model fails, right? That's why we have fallback. Let's say the model is down, and then we fall back to another model. [0:17:26] GV: Nice. Something you touched on earlier, and I know it's quite a big piece of the product when it comes to, I guess, advertising this to developers, is the speed. Can you talk a bit about how does that work? Let's just talk about small models versus large models, maybe specifically first, and then move on to like JigsawStack's infrastructure. And generally, how is that making things so much faster? [0:17:47] YK: We started with the idea of GPU-poor. I mean, we don't have a lot of GPUs, right? We don't even have a H100 physically. A lot of AI companies you see today, the first thing is like, "Let's buy our own GPUs for training." We came up with the idea of like, "Hey, we're going to be GPU-poor." As much as possible, we need to work on A100s, A10Gs, any of the smaller GPUs. I mean, at that time, it was not small, but now it's considered small. We have to kind of be restricted to that scale. That was one of the ideas or like the methodology that we stick with. The second is deployability. The biggest issue with, I think, LLMs or big providers like OpenAI right now is that distribution has been a big blocker for them. They are distributed with Azure. Or people have to use OpenAI directly. And that's why you see a bunch of these enterprise AI companies coming out trying to train open source models because you can't self-host OpenAI as models anywhere, being the best in class. That's one of the biggest blockers currently. We looked at this problem. We don't want to be in that realm, right? We cannot afford to basically be - we have that as a blocker where enterprises want to use our OCR model and we're like, "Oh, we can't self-host because we rely on 30 different proprietary APIs that require like this much work." And that's when the small model idea came in because most enterprises don't have the resources to run large language models at 400B scale. If we can focus on training the model to be deployable and cheap to run anywhere, it became an easier distribution thing for us from the get-go. It's hard to build for sure. But in the long run, the cost kind of pays off because we own this proprietary model that we can eventually then go out to the market and be like, "We can distribute on your systems easily. No restrictions." You want AWS, you want Azure, you want GCP, you want your own on-house GPU. Or eventually, when we get big enough, we can even host on Groq to be even faster. And that's like the kind of opportunity that we have with smaller models. Cost efficiency is a big part of it. [0:19:42] GV: Yeah. Just so I kind of read that back, Groq is behind the scenes from an inference perspective or multiple or - [0:19:49] YK: For prompt engine, we use a bunch of Groq under the hood as well. Llama Guard is a feature that they have that's pretty interesting that there's a lot of promt guarding. And we didn't want to add Llama Guard into our platform without Groq because it's one of the fastest, and we didn't want that bottleneck. We do speak to Groq. And the goal is if we can get big enough volumes, we can also host. Because the cost of hosting on Groq is huge, right? You need millions and millions of users to kind of like be using YouTube and host one small model on Groq. We want to obviously reach that scale where we can start hosting other models on infrastructure dedicated for small models. [0:20:23] GV: Okay. Let's, I guess, then talk about just general DevX here. Obviously, we've got a very technical listener base, and this is precisely who should be using JigsawStack. What's the kind of likely first? Kind of what does the product look like to a developer on day zero when they're getting up and running? [0:20:39] YK: Yeah. Right now it looks like an NPM install, right? You just honestly NPM install JigsawStack. Get your API key. That's it. Everything else works. We built the libraries in a way where you don't need documentation. As long as you install the library, you should have enough typings that kind of guide you in what you need to do. From day zero, every field needs to be a named field. Everything needs to be descriptive. Everything needs to be typed. Even our Python library is fully typed so that we - as a developer, my best experience is using Stripe or a library like Supabase. Literally, like Stripe Dot, you get your options, customer.subscribe. We wanted to be that intuitive, right? You don't need a documentation in most scenarios. But obviously, if you need advanced configs, you go there. But you should be able to get started with just the library. And that's the kind of the developer experience thought that we had. Can the library just be self-sufficient enough? They NPM install? They'll figure it out. They PIP install? They'll figure it out. And that's kind of the design structure that we went with, yeah. [0:21:42] GV: I like that you mentioned both Stripe and Superbase. I think they're both great reference points. Stripe, especially on the API side, and Superbase just for a pure, I think, DevX when it comes to, yeah, what that product is. My understanding is the API itself, it's a pretty consistent structure across all the services. Is that the way to think about it? Yeah. [0:22:00] YK: Exactly. We kind of have a system in place where we kind of structure every API to be pretty consistent in the way you call it, in the way you structure your body, JSON to kind of pass that data over. All the keys. We use the same key across all the APIs. So you don't get confused. Everything is a URL or file store key. There's no confusion in that sense. Yeah. [0:22:20] GV: Nice. I mean, I guess we kind of have to talk about pricing and how you do charge for this. And I think most of these products, we're talking just in AI and GenAI especially. It's a usage-based model. Could you just talk a bit about like how have you determined at the moment any of the best balance for accessibility for developers versus then sustainable business economics, that kind of thing? How have you looked at it so far, and how do you see that evolving? [0:22:49] YK: Yeah, I think it's a good time to announce as well. We're changing on pricing. [0:22:53] GV: Right. Okay. Wwe're recording on 12th of March today. [0:22:56] YK: 12th of March. Yeah, exactly. We'll be changing it in a week or two-ish. The reason is pricing, I think, is an ongoing change. We have been learning a lot. Since the time we launched, we've removed so many products as well. And we've added a lot of new products. The reason we've done that is because we kind of speak to customers, we develop, and we understand truly what they want. When we started the product, we built on things what we wanted. And then now it's like based on what developers wanted the whole time. We reached a point where we realized that the pricing that we have doesn't make sense for a lot of the products that we are offering. Big issue is that, for example, we charge right now 0.05 cents per API call. It made sense for things like the AI scraper, the OCR, because you have like a fixed cost that you can scale with, and then obviously get discount pricing the more you use. And it worked in competition to providers like AWS and like GCP, right? This is very similar pricing model. You have a fixed price to an API cost. What happened is that we started releasing products like speech-to-text and text-to-speech. And what happened is that if you're charging 0.05 cents for one hour of transcribing a video, you're still getting charged the same for five seconds. It didn't make sense to a lot of users at scale, right? And managing discounts manually for each customer didn't make sense. A lot of the users came back to us like, "Hey, why not just use token-based pricing?" I'm like, "Yeah, why didn't we not just use token-based pricing?" Every LLM provider is using it. Every developer is used to token-based pricing. And if we can just shift to that kind of idea, it gives, one, the flexibility to us to now charge for more usage in a more minute way. Meaning if you use less, we charge you less. You use more, based on the output, we'll charge you more accordingly. Secondly, it lets us explore more technologies because now we can allow you to configure specific APIs to run for longer period because we're not kept by that 0.05 cents cost, underlying cost behind that scene. Now we're shifting to this token-based price pricing where we're estimating it to be around $1.40 per million tokens, which is actually pretty good, because we're trying to keep ourselves as cheap as possible to most users for scale, right? 140. I think Claude is at 15 bucks. 140 for like infra plus, like a specialized model is something that we think it's fair. Now, obviously, we're still going to test more. We've been getting a lot of feedback. We haven't got a lot of experiments yet because we've not launched it. But when we launch it, we'll get more feedbacks, figure it out. And we'll still keep adjusting pricing to what developers are comfortable with at the end of the day. [0:25:26] GV: And I guess both - well, I mean, today, but I guess with this future pricing model, there'll still be an intro-free tier of some description or - yeah, yeah. [0:25:33] YK: Oh, yeah, for sure. 100%. It'll be like a million free tokens every month. Yeah. [0:25:37] GV: This is sort of not pricing exactly, but it is related. I guess, context window. I'm trying to get my head around how does context size play with small models? Because I guess, if I think about without you saying anything further, I think, well, context window, small model. Surely, it's just a really small context window. But what is it? Yeah. [0:25:55] YK: Yeah. And this is a small context window. But we don't really rely on the context window as much. It's not all language models. A lot of them are trained models to control infrastructure. The AI script, for example, I gave, which basically takes in feels, right? I need the price. I need the description of this place. I need the image on the left side of the - whatever. Very small. And for me, a context input. Huge output. On the output level, that's where you see basically the model generates JS Puppeteer code and gets injected into Puppeteer, and then the puppeteer outputs a huge context. But it doesn't need to go back to the model, right? And that goes to the user. And then that context gets basically post-processed. Context is not a big problem for us because we are more of an infrasight problem. If you're obviously building a chatbot or like somewhat of a chat system that it requires a huge amount of context to be, then yes. I think context makes a big difference, typically for most chat applications or like human-in-the-loop applications, but not so much on - we actually never had like a really context problem at this point, yeah. [0:27:03] GV: Yeah, okay, that's super interesting. I want to talk about the future of the product as well as currently with the developer community that you have. Let's start actually with current, which is like developer community. Who's using it? I'm aware you guys did a hackathon maybe a couple weeks ago, which looked very interesting. Maybe what was kind of some of the results of that? And like what is the current kind of community that are using JigsawStack? And how do they talk to each other? [0:27:27] YK: I think there's two big communities. One is the typical startup, Indie Hackers. There's people who love the product, loves to play with it, try it out. But they are - I think for a DevTool, like Superbaser, Vercel, or any product on the market, it's always - we face this kind of problem where it's when you need the product versus when you discover the product, right? A lot of times, you discover the product, but then you don't need it at that point. And when you need it, you forget about that product. Kind of having us exist constantly in front of developer to the point where they need it and then it reminded of us as a key thing for us. That's why we target a lot of startups. While they might not need us today, they enjoy the product. Eventually, when they build their subscales or they build a product that does need JigsawStack, we're the first one to give a try, give a shot. Or compare us against like GCP and AWS, the competitors that we're going against, and then basically be like, "Hey, yeah, this is a better product. Let me try it out. Let me go for it." Startups is our key thing. Series A companies as small at this point. We do do one or two enterprises, but enterprises is something that are a bit more challenging for us at a scale. We're happy to do it because we are still a small team. It's like three to five of us at this point. And as we scale, we'll take on more and more larger companies that are series A and bigger, but they have special needs that they needed to deployed only in Australia or only in the specific parts of the US and stuff like that, which we want to kind of do at a later scale. But right now, it's mostly startups. And we get better feedback, to be honest, at this point. Because it's so high, like the hackathon. Because of the hackathon, on that day, I was not even building. I was building JigsawStack case. Everybody came to me, and they were like, "Hey, I have this bug over here. Or I didn't even know this field existed over here for the AI scraper." And I'm like, "Okay, updating the documentation and fixing this field." The feedback loop is really good in real time, right? When you do stuff like that from the startup community, and that's what I love. [0:29:20] GV: The startup community both have, I think, quite a very high bar versus some people, and then like quite a low bar at other places. It's like when you look at the product, very high bar for like, "This should just work. I should be able to just get running in like 10 seconds." And then a fairly low bar when it comes to like, "Oh, maybe this piece of documentation isn't great, but that's okay." Yeah. Is that kind of how it feels? [0:29:40] YK: Oh, for sure. It depends on where you talk about it. The one on Reddit, the bar is very high. Reddit is anonymous. They're like, "This is a good shit." And I'm like, "Oh my God." You speak to the same guy in real life. He's like, "Oh yeah. No, I love your products." Sometimes you don't know. But yeah, I think you're right, basically. Developers are very intuitive. I think the more senior of a developer you speak to or more a developer that's comfortable with their skill, they're very intuitive. If the product works or they can figure out they love to fix things, right? If they can fix the product for you, they would, right? Especially if they can. That's why if the documents are broken or something. Well, it's not big of a problem for most engineers, right? And that's what they used to. So they will go and they will take screenshots and be like, "Can you fix this? Can you update this?" And even the biggest of companies like Google, Vertex AI doesn't work half the time. The expectation from a startup is - [0:30:39] GV: I have experience of that. Just thinking like, "No, surely. It's not actually Google. Just the API doesn't work right now. No, it doesn't." [0:30:45] YK: Exactly. If Google can do that. The expectation from JigsawStack startup is like - surprisingly, from companies, actually is a lot higher. In the US, a lot of the startups that we meet there, the expectation from a startup is like we expect you to be way better than GCP. I think in Southeast Asia, it's a bit different, where they're like, "Yeah, I can expect you to be worse than GCP because you're a small guy," which is good. I like the expectation in the US where the expectation is like, "Hey, I expect you to be better than GCP. And that's why I'm going with you in the first place, right? I'm not going with GCP." That kind of, you know, downtime, and these things are very important to us. And the clarity, the feedback that we get from a lot of these companies and engineers, very upfront and very clear. And they only complain about the things that have real problems, right? If it's actually down, they will complain about it. If the docs are wrong, it's not a big issue, right? They will just come and screenshot, like, "Can you fix that?" And they solved the problem already. It's a forgiving community when it's the right thing that goes down. [0:31:43] GV: And I guess now just looking kind of future and obviously like roadmap and, obviously, super early days for you guys. Roadmap is always one of these very difficult things sometimes, I think, to kind of pin down. But I mean, for example, I saw a post I think you guys put out very recently about sort of how you compare it to, say, Mistral when it comes to OCR, because like Mistral had this big splash with how their OCR is, in theory, well and a way better than anyone else. And you sort of made some good comparisons of why, in many cases, I think you argue that you're actually better. Is the future direction sort of let's take these cases that the big guys are doing and try and perfect them? Or are there just like completely other use cases you're adding, thinking of adding? [0:32:23] YK: We want to focus on two big pillars. The first is data extraction. The second is data transformation. We're going to stick in this realm. As long as it falls within this realm, we're competing with any big guy that comes in the split market, right? Mistral was never in this market. They were always in the LLM market. And then one day, they dropped an OCR model and were like, "Oh, okay, cool." They want to get into the small model kind of like space as well. And that's when it got exciting. I'm like, "Let's try it out." And after trying it out, it wasn't as good as they claimed to be based on their title of the article. Well, they literally used the word worlds world's best. I don't like shitting on certain things. But I love Mistral. I love [inaudible 0:33:00], especially the first few models that they launched. It was like they kind of kind of initiated a lot of the open source scenes like way before the Llama and the guys did. I love them for what they did in the open source world. But when they did this Mistral OCR and they were like world's best. Like, "Really?" I re-benchmarked it and it was like very clear far apart from the world's best. And so I just had to put it out there and kind of show like, "Okay. Hey, there's a lot of room for growth." And I'm happy that they're doing it. And it kind of ups the OCR market. Let's actually be the world's best and then claim that title, right? This time we did the benchmark, we're like, "Hey, yeah." It took us a lot of time to get our OCR model out. And we were surprised that there's a world's best, especially when Google Gemini level of resources, couldn't be the world's best ever. I'm surprised that Mistral did. We were kind of like benchmarking it, and we were like, "Oh, yeah. A lot of room. A lot of room for improvement." [0:33:59] GV: I mean, in terms of like future of just the company team, very recent announcement a couple of days before us recording this, funding has increased, which is awesome. And sort of what is that going to enable you to do? And how do you see the team going up or not going up? I mean, obviously, we're in the golden age of small teams now, which I love because I've actually always been a small team person and was very frustrated through the 2010 to '20 period when you had to keep defending while you had a small team. How does it look for you guys for the next couple of years? [0:34:29] YK: We raised one and a half million. The goal is to grow the team to a five-man team, including myself. Keep it very lean. And then the goal is to get the product to a solid standpoint, right? We're in beta still. We want to get ourselves out of beta. We're launching two significant products. The first one is the embedding model. It's a multi-modal embedding model. We realized that embedding is only text space. But everybody embeds PDFs, images and bunch of different documents. We need an embedding model that can support all of these document text natively. We're launching that. And the image-to-image translation that we spoke about. One new idea that we had is in the data extraction space is I think Microsoft releases segmentation model where you can segment buttons in different fields in their site. Can we combine that with object detection and other forms of detection into a single model, right? Segmentation, object detection, and a few things into one. And that's something that we're exploring. We're going super deep into some of the detection space and the embedding space. The next, I think, one year is diving way deeper into the technologies that we've built, improving the quality of each thing rather than scaling into more products. We're kind of like, "This is where we are stopping the product roadmap, but now we're just going to go deeper." How can we improve the developer experience? How can we make it even more seamless? Can we improve our AI scraper to make it even cheaper and make it faster by updating the engine under the hood? There are so many new Puppeteer alternative engines that are coming out as faster to run. Can we scale that? Make that faster? I think the next one is really about the quality of the product, the developer experience, and how deep we can go, and distribution, rather than scaling the the product roadmap. The product I shared with you right now, basically going to be like the last three additions. And after that, we're just going deep. [0:36:13] GV: That's a great strategy. I think, I mean, for anyone listening and kind of getting going with the product, that's nice to know in the sense that what you're using is just going to improve. And sort of you've already been through that phase of shaking out like what bits actually make up the product and what kind of gets people excited and also useful. Are you going to be hiring? We've got a great technical listener base. Because it's great if you say yes or no, because then you either don't get flooded with emails or you do. [0:36:36] YK: Yeah, go to jigsawstack.com/careers. We're always hiring at this point. And the significant role we're hiring for right now is a founding full-stack AI engineer. We only hire A-style players. We have three questions. One of the questions is, "Do you have a site project?" That's my benchmark. As an engineer, if you don't have a site project that you're working on for yourself or for fun, then you don't need to apply. Just don't apply. [0:37:00] GV: I think that's great. Probably a caveat there is if you're already working on a startup but then realize this is a better opportunity, that effectively is your side project, right? Yeah. But yeah, if you're perhaps in a sort of regular role, but itching to kind of get into something more interesting, side project is always kind of looked upon very favorably by people like yourselves. Just a kind of, I guess, closing question generally was, I think you founded this effectively solo. Is that correct? [0:37:25] YK: Yeah, I'm a solo founder. It's not that I wanted to be a solo founder. I think being the technicals - because in my previous company, both my co-founders were non-technical. When I typically go for a non-technical product and building, then it's easier to find a lot of co-founders in that space that specialize in a specific space. I think the same way non-technical founders find it challenging to find a technical founder, it's difficult for a technical founder to find another technical founder. It's even more difficult because the issue is that you have a belief system of how things are built, right? And then you're always kind of like in this realm of the way you see things. And finding another technical founder is even more challenging. I've tried. I couldn't find a perfect fit. And that's why I'm like let's just find the specific roles. And then eventually, those people become co-founders, right? And that's why I'm kind of hiring the founding team that you eventually scale. You get equity, and your equity grows. And I can give away, honestly, more equity because I'm a solo founder. That's a good benefit there. [0:38:22] GV: I think it's great to call out. I mean, there's so much out there, basically saying almost, "Don't even try. Unless you find your co-founder." And I think there's quite a few examples in the last couple of years where I've seen solo founders doing incredibly well. I just kind of thought it was good to call out. [0:38:40] YK: 100%. I'm not really solo. I have a really great team. And I don't feel alone in the company. And that's the thing, right? I think solo founders just have to build their team better, and that's the only challenge. And so it's not a pain point. I think it's not a big pain point for me, yeah. [0:38:55] GV: Yeah, absolutely. And yeah, I mean, you sort of founded it effectively in Singapore, I believe kind of moving - well, is it west or east? It's kind of equal distance pretty much. Well, let's just say we're going - you're in the eastern states. Is that right? Yeah. [0:39:08] YK: Yeah. Moving to San Francisco. One big reason is, because when we launched in Singapore, naturally, I'm here. So it's easy to launch here. We put it on Hacker News, put it in Reddit, we put it everywhere we could. Just majority of our customers and users are from the US at this point. At least at the initial stage, we want to focus on selling the US customers or in that region. Our second biggest market is actually the UK. And so we just think being in that region makes more sense for me to speak to customers, get a better feedback loop. Just be in that same energy at the forefront of technology in that space, right? Southeast Asia is still a big market, obviously, but we require a lot more capital to tackle Southeast Asia. And that's something that we want to come back down the line. [0:39:48] GV: Yeah, for sure. Well, Yoeven, it's been so great to catch up and to hear all about JigsawStack at this stage in the journey. I'm sure we'll catch up again in time when JigsawStack is probably 10 times the size in only a couple of years or something like that. Yeah, just wishing you guys all the best. And we'll be following along. [0:40:04] YK: 100%. Thanks for having me on. [END] SED 1846 Transcript (c) 2025 Software Engineering Daily 1