EPISODE 1781

[INTRODUCTION]

[0:00:00] ANNOUNCER: Over the years, Google has released a variety of ML, data science, and AI developer tools and platforms. Prominent examples include Colab, Kaggle, AI Studio, and the Gemini API. Paige Bailey is the Uber technical lead of the developer relations team at Google ML developer tools, working on Gemini APIs, Gemma, AI Studio, Kaggle, Colab, and JAX. She joins the podcast to talk about the specialized task of creating developer tools for ML and AI.

This episode of Software Engineering Daily is hosted by Jordi Mon Companys. Check the show notes for more information on Jordy's work and where to find him.

[INTERVIEW]

[0:00:52] JMC: Paige, welcome to Software Engineering Daily.

[0:00:54] PB: Excellent. I am so excited to be here and really excited to have the opportunity to talk to you and also, love the questions that you are asking before we hit record. I think this is going to be a fun conversation.

[0:01:06] JMC: I do have a point to make at the beginning, because you're one of the owners of one of the funnest social media handles. You are DynamicWebPaige. But I do have a question about it. From being fun, have you ever done any dynamic web page design, web page loading that credits you with the honor of being the owner of such a handle?

[0:01:26] PB: I am not gifted in the web design space, or the web app creation space. For that, I look to all of my dear friends who are working on things like Next.js and all of the JavaScript and TypeScript libraries. I will say that I did have the pleasure and the honor really of working with the VS Code team for quite some time when I was at Microsoft. That's not really web design, but it is very much like the JavaScript TypeScript contingent. I love and adore creating VS Code extensions just because they're super easy to create if folks haven't experimented with them previously, and they are also very, very useful in the sense that you can have VS Code extensions do a broad spectrum and variety of things.

[0:02:07] JMC: We were chatting about the fact that I had been following you for years now and that you, in my vision of the industry, you've been always in this AI space. Probably, we would have called it ML, or any other terms in the past. I was thinking about my own career and I've always been in developer tools and DevOps platforms, stuff like that. I did have a short stint way back when, like in 2013, 2014, if I'm not wrong, and what I at the time would call and probably it's still called Langtech industry. Companies and products that I participated in the development products are machine translation, but also intendment and analysis and so forth. I bring this up not only to point out that I'm certainly not the expert in this field, that's why you were here. But also, because it feels from my experience and following this field a bit from afar, but now quite close that it all stems that this AI revolution that LLMs have put out there two years from now, it all feels that it stems from language, from written language and spoken language, but written language, right? What are your thoughts on that statement?

[0:03:11] PB: I will say, I'm glad to meet another machine learning veteran. I started building models, I think around 2009, 2010. It's been a wild and crazy ride since then. I will say that the transformer models and things like GPT-2 and GPT-3, they originally started focus just on text and code, the written word, but now we're getting into this really brave new world of multimodal models. Not just to this underpinning language backbone, but also really interesting capabilities in terms of video understanding, audio understanding and transcription, image understanding, coupled with text and code as well.

You can get a lot more out of the models, even apart and aside from text and from code understanding, which is very exciting. I'm sure you remember back in the day, even to just get a model to be adapted doing a single task, it took months of getting the right data in order and trying to experiment with different model types and trying to do hyperparameter tuning and then even just to get the smallest percentages of improvements. Now, all of these models can do relatively well for all of the tasks that we had been using single task models were out of the box.

[0:04:26] JMC: Are we experiencing a step function evolution of Word2vec, that technology is that powered NLP that will be, I guess, the way in which I would classify that previous stage of text-based AI, ML? Are we experiencing just in a natural evolution, or just the underpinnings of what's happening with video, with multimodal models have a different nature?

[0:04:49] PB: Yeah, it's a great question. I think everything started with the transformer paper around 2017. Then we're building a whole bunch of models, building on the concepts expressed in those papers. One of the coolest things I think now is that people are building these AI systems that couple together different model types. As an example, if you're using Gemini, you're using a mixture of experts model that is really, really good at multimodal use cases. If you want things like audio as output, or video as output, or images as output, the model is not yet capable of that. It can generate text, but it is not going to be giving you the image, or video outputs that you would get from something like Imagine or Veo. When we start seeing these really novel new approaches, I'm sure you've experimented with NotebookLM.

[0:05:43] JMC: Mm-hmm. Yeah.

[0:05:45] PB: Yup. Where you can, for folks who might not be familiar, I encourage you to go try it out. Notebooklm.google.com, you can input a PDF, or a GitHub repo, or anything, and it suddenly generates a podcast recording of two people discussing in great detail.

[0:06:01] JMC: We should point out at this point that this is not a NotebookLM conversation. This is a real -

[0:06:07] PB: That would actually be very hilarious to give the transcript and to see how well the NotebookLM folds the - 

[0:06:14] JMC: I was actually fiddling with some ideas to see if we could do something like that. Maybe in the next situation of this conversation, this interview. But this is a real one that's happening today, the 19th of November.

[0:06:24] PB: Yeah. I love that these AI systems that are increasingly multimodal systems give you the ability to create not only text and code as output, but also images and video. It really resonates with, I'm thinking in particular of my cousins, they love watching videos. It's a stretch to ask them to read something a little bit more long form. I think, to really be able to engage with audiences and to help people learn and to understand and to really hit every single learning style, we're going to need to experiment with different modalities of outputs, not just inputs.

[0:06:59] JMC: Yeah, correct. You work at Google. Google in a very Google fashion has joined of anything. I mean, you've mentioned the papers that revolutionized this field, these mostly, if not all came from Google, but has joined the release of models in an abrupt way, in the sense that it's put out so many things out there. Give us a sense of what is Google doing with AI, specifically with this new generation of AI and what products and models do you focus on?

[0:07:29] PB: Yeah. My particular role is I'm the Uber TL for our ML developer tools, which is a new org that was created at Google just a few months ago. The products on this team are the Gemini APIs, AI Studio, Kaggle, Colab, JAX and the open-source stack for JAX and also Gemma, our open-source model family. Basically, everything that you can imagine from a 3P-facing ML developer perspective lives in this ML developer org. The things that are top of mind for me for these tools are really growing the number of students, researchers, and also early-stage startups that are incorporating AI into their products. 

I think for enterprise customers, there are a whole bunch of other great tools that exist within the Google cloud, like Vertex AI product offerings. But to really be able to move quickly and to experiment with the latest models, Gemini APIs, AI Studio are the place where you should go to try that out, and really the only place where you can get access to the latest Gemini models.

[0:08:30] JMC: Before we dive into the products, what is the typical persona that you're engaging with? Because I find fascinating, the fact that we're talking about ML developers. Are there real people and not one, five, six, but dozens, potentially hundreds and thousands of people that are able to not only train models, like the ones that you mentioned in the Gemma family, but others, and able to also deploy them in a fashion that software engineers and developers without any first name, or surname, are they able to do CI/CD with those things? Is such a bigger, such a persona exists?

[0:09:08] PB: It's a really different way of building software, I would say. The personas for each of the tools would be slightly different. As an example, for JAX, JAX for folks who might not be familiar, it's a machine learning framework that Google uses to build all of our models. It took off like gangbusters. I think all of the papers being produced by DeepMind over the last few years are using JAX. It feels very similar to another numerical library that you might be familiar with called NumPy. But it gives you the ability to build models, or to build physical systems, and to dynamically scale them in a very straightforward way. You can build a JAX model and with zero code changes, it can run on CPUs, GPUs, TPUs, and any arbitrary hardware backend, as a result of it using this thing called XLA, which is a machine learning compiler that was originally created for Google to be able to interact, or to deploy models very efficiently on TPUs.

JAX, when I think about the canonical JAX user, my brain is just like, "Oh, my God. People who are building large language models, or multi-modal models, or who are doing highly complex dynamic physical modeling." That is the group. That cohort is quite small. The number of people who are building models from scratch with JAX is quite small. Then when I think of the Gemma audience, the Gemma audience is slightly different. Gemma is a model that's already been created. You can either fine-tune it, or you could do continued pre-training on it, but you're probably using a high-level Python API to do that. You could also just take the model checkpoints and deploy them on multiple devices, or deploy them in browsers, and the user groups for both of those aspects are a little bit different.

The people who might be fine-tuning Gemma, perhaps they're wanting to create evals, or perhaps, they're wanting to do some research on it, perhaps they want to use it as part of their product, but that's different cohorts than maybe from the building models from scratch JAX humans.

The beautiful thing about the Gemini APIs is that if you can make a REST API call, then you can call the Gemini model. It's the same with OpenAI with Anthropic. We just recently released OpenAI library compatibility. If people have already been preferring the OpenAI models, it's just a three-line code change to get the Gemini models being used instead. The MLOps process, in all honesty, feels a lot simpler than it did when you were having to worry about data versioning, model versioning, etc. If you're just making a REST API call, you do have to worry about which model you're calling, you have to worry about the format for your prompts, but there is a whole bunch of other machine learning maintenance work that's just taken out of the equation. It actually simplifies the DevOps process in a number of ways, as opposed to building your own models from scratch, deploying them, and maintaining them.

[0:11:59] JMC: I presume those three personas, even the first cohort that you mentioned, they're probably very acquainted with low-level programming, despite that target architecture agnosticity of JAX that you mentioned, they must have this small cohort, must have that knowledge. Have all of these three cohorts been present in the recent Kaggle workshop that you actually come from finalizing right now? Give us a sense of what's happened there and how people can know about future upcoming, and there's going to be another edition of that.

[0:12:28] PB: Yeah, thank you for the question. We recently did a five-day generative AI intensive course on Kaggle, which is a platform at Google that originally was for competitions, but is now more of a model hosting, dataset hosting, learning platform. I think we weren't expecting so many folks to be interested in learning about the programming, but we ended up having, I think, around 150,000 students register. Everybody was forking the notebooks, running them, asking great questions on Discord. The content was really around prompting models, retrieval, embeddings, fine-tuning models, and then also, implementing evals and these MLOps behaviors.

We had students that were really the spectrum from just getting started with the Gemini APIs to just getting started with Gemma. Lots of variation in terms of skill sets and backgrounds. But I think everybody from what I can see really enjoyed it. I especially loved that the curriculum we designed was focused not just on the model calls, but also on all of the additional features that you need to have around the models in order to make these systems production ready. Like, setting up retrieval, or prompt management, or really designing strong evals. All of these things are very important to get the right outcomes from the models.

[0:13:49] JMC: Let's actually focus on that. Let's double-click on that. This is Gemma exclusively related, right?

[0:13:55] PB: No, Gemma and Gemini. But the course was predominantly focused on the Gemini APIs.

[0:14:01] JMC: Okay. What about those? Can you give us a broad overview of what the APIs are capable of?

[0:14:07] PB: Yeah. So, the Gemini APIs are the recommended way to interface with our Gemini models. They support video, audio, text, code, etc. All of those modalities that I was just describing as inputs. The Gemini 1.5 Pro model has on the order of a 2 million token context window, which means that you can send to the model a whole bunch of information right at inference time. That means that you can analyze full videos, multiple code bases simultaneously, all of the above all at once and be able to get sense out of it without having to go through the process of standing up a vector of database, or fine-tuning.

Gemini 1.5 Flash is our smaller version of Gemini and has a 1 million token context window, which is still a lot. It's also much, much faster and much, much cheaper than most other models out on the market. I think it's 0.075 cents per million tokens. We also have a Gemini 1.5 Flash 8B version, which is around 2-ish cents per million tokens, which means that you can record, as an example, everything that you're doing on your laptop screen, 365 days a year, 24 hours a day, and it would still cost less than a cup of fancy coffee to analyze all of the videos and to be able to make sense of all of the things that you're doing.

Google has really invested a lot in making sure that our models are performant, efficient, but still very capable, and also not really breaking price points for anyone. If you look at artificialanalysis.ai, the Gemini 1.5 Flash and 1.5 Pro models are always the most cost-effective frontier models on the board.

[0:15:48] JMC: Indeed. Yeah, very affordable. Where does Data Gemma fall into this picture that you're describing?

[0:15:55] PB: Yeah. Our Gemini APIs, they're all proprietary models, which means that we haven't released the source code, or the data used to train, or the checkpoints, or anything of that nature. They're just available via these REST APIs. Gemma is a family of open-source models that we've released all the things for. You can look at the code on GitHub. You can download them from HuggingFace. You can experiment with them. You can fine tune them. Our latest version of Gemma is Gemma 2, which comes in a variety of sizes. 2 billion parameters, 9 billion parameters, and I believe 27 billion parameters.

The smaller models are small enough that you can embed them within a browser, so embed them within Chrome, or embed them on a mobile device, like a Pixel. They give you the ability to do a lot of interesting text-only large language model work. You can generate code, you can generate text, and then you can also fine tune these Gemma models to do a broad spectrum of things. Data, as an example, you mentioned Data Gemma, there's also Poly Gemma, which helps with multimodal understanding. So, you can understand images.

There's Shield Gemma for security use cases. I think the last time I looked, there were tens of thousands of Gemma fine-tuned variants on HuggingFace. Lots and lots of people stretching them, fine-tuning them, making them great for specific use cases.

[0:17:18] JMC: What is the rationale behind releasing Gemma's open source, Gemini as close source? What is Google's stance on this rationale?

[0:17:27] PB: Well, I obviously can't speak for Google. But from my perspective, I think it's really nice to have both options, to be able to call to a performant proprietary model, send your data to a server, and then for other use cases, you might have different constraints. You might be under different cost constraints. One of the nice things about open-source models is that if you're running them locally, that's free. You're just using your onboard compute. You might want to customize in ways that you would not be able to with a proprietary model, or you might be operating in an area where perhaps, you don't have Wi-Fi connectivity, in which case, having an open-source model that's onboard for your mobile device, or for your laptop is mission critical. You can't be sending your data elsewhere.

There are some companies that also have data privacy constraints. They don't want to be sending their data off-site, which means that REST APIs are out of the question. Having a version of Gemini that's not a mixture of expert's approach, but is a much lighter weight, very efficient model that's also open source so people can tweak it, customize it to their delight is really powerful.

[0:18:39] JMC: Of the techniques that are more popular these days, like RAG, can you explain the differences between them, RAG, REG, I believe, or REIG. I'm not sure how to pronounce that. Which ones are the most popular and what are the use cases why people use them for?

[0:18:54] PB: I think for folks who might not have experience with these different approaches towards retrieval, just think of them as ways that you can get better performance out of your model's outputs, and then also ground your model's outputs in data sources, which helps mitigate hallucinations and helps with accuracy of the model outputs as well. As an example for retrieval, you might want to, one example that I hear quite often from customers is I would really like to ground the model's outputs based on my own company's internal data.

If somebody asks a question about HR benefits, or they ask a question about a specific club that is just internal to the company, I want to be able to source the outputs to not just use information that it might have learned from the Internet somewhere, but to have it extract insights from my company's data sources and use those to guide the outputs. This is nothing really new. I think internal corporate search is something that everybody has been interested in for quite some time. But the retrieval phase is really doing this extraction from sources that might be relevant, giving that to the model and then having the model summarize those insights as outputs.

If you haven't experimented with - there are a couple of approaches for this that have been baked in wholesale for the Gemini APIs out of the box. One is grounding with Google search. You can turn on grounding with Google search, and if you ask the model a question, it will first use the top 10, or however many results from Google and use those to summarize and ground its answers, which gives you a higher confidence in the accuracy of the outputs. Then there's another feature that's only available through Vertex, where you can say, "I want the model's responses to be grounded in the data that I have located in this particular GCS bucket." You could say like, "Hey, here's a pointer to all of my company's data. Hey model, if you're going to be giving outputs, use these data sources to help with your summarization," and then have pointers back to those sources. Just think of retrieval as a way to figure out what information to stick in the context window to help the model with more accurate summarization and its outputs.

[0:21:26] JMC: Would this technique work for the following use case? I'm a CTO, I'm a senior developer hiring junior developers and I want them to be constrained by, influenced by, and hopefully, learn the company guidelines. Can I feed those assets into this retrieval technique and therefore, allow for any junior developer to be able to be provided with answers that are fine-tuned to, again, the coding style of the company, the policies that need to be followed, etc.? Would it work in the same way?

[0:22:00] PB: I think you could attempt it with retrieval, but dependent on how many guidelines you have at your company and also, your stylistic guidelines for code bases. It might be worthwhile to first experiment with just putting that information into the context window. As an example, with Gemini, I had mentioned before that you can have 1 million tokens, 2 million tokens just given to the model. If you do that with a repo and you say like, "Hey, here's my company's code base, now please generate outputs aligned with the conventions in this code base," as well as any style guide, or any guidelines that you might have, Gemini should be able to do that out of the box.

Then oftentimes, if you have stylistic constraints, if you do just add that as a preamble in your prompt, if you're giving me code recommendations, or if you're doing completions in this way, make sure to follow these stylistic conventions. Usually, the model pays pretty close attention without even needing to set up something like retrieval or fine-tuning.

[0:23:02] JMC: What about AI Studio? I haven't used it, but what I get from the name is that, is this a playground where I can use all of this?

[0:23:09] PB: Yeah, absolutely. Aistudio.google.com is, and every time I mention it, I feel I need to open up a browser tab and start showing things. It's aistudio.google.com is a place where you can go. You can experiment with the different Gemini models. The Gemini 1.5 Pro family that I had mentioned before, as well as Flash and some of our newer model versions. You can also experiment with image generation within AI Studio. You can turn on features, like function calling. If you want to do tools use, you can turn on code execution, search grounding. You can compare models against each other. You can also fine-tune models. You can generate API keys and track usage over time, and all without having to wrangle with the Google Cloud console, which I think can sometimes be quite overwhelming for junior developers.

[0:24:04] JMC: Then, I presume AI Studio is open to both all of the cohorts that we've mentioned before. Those that have extreme expertise already in fine-tune and actually developing models themselves maybe to those new people that are just starting. Have you seen the most junior people start getting acquainted with AI Studio? What is the main use case they go about resolving for?

[0:24:26] PB: Well, it's pretty much everything. Given that there are connectors to drive, that you can upload files, that you can record yourself speaking, or videos. You can basically just use it for any model question that you might have. One example that I always like to show is upload a video and then ask for extracting out all the logos, along with the timestamps where the logos are occurring, transcribing all of the audio from the video, identifying all of the different speakers in the video, describing, or summarizing the events from the video with timestamps, dividing it into chapters, identifying any electronic equipment. All of these things are just things that you can ask a natural language within the context of AI Studio.

All of these, it makes me laugh, because I'm sure you remember in the before times there were all of these dedicated single task models that were sometimes available as things like cognitive services, or other specialized video intelligence APIs. Now, pretty much all of those, you can just use Gemini for. It's just a prompt as opposed to trying to figure out which API you should be calling and doing API key management for all of them.

[0:25:40] JMC: Of the latest features of the APIs, which ones are your favorite and why?

[0:25:44] PB: I really, really love code execution and function calling. Just because, so code execution for folks who might not be familiar, it gives you the ability to say, "Gemini, I'm going to ask you a question, or I'm going to ask you to help me with a task," and you have the ability to write and execute arbitrary Python code in order to solve it. It's setting up a sandboxed environment with the Python standard library, as well as a few other additional libraries. Then giving the model the ability to write and execute code for you. If it gets it wrong the first time, it will just keep going and going until it gets the correct answer. This is just available out of the box. It's a one-liner change. All you have to do is say, tools equals code execution and to turn it on, and you're off to the races.

Function calling, likewise, very cool. It gives you the ability to identify tools that the model can call. It might be like, "Hey, Gemini, you have access to this database," so you can write SQL code against the database. You have access to this weather API. You have access to this model that can do satellite image segmentation. You could use that as a tool. Then you can ask highly complex questions and get Gemini to select which tools it needs to use in order to answer the question, as well as to execute any arbitrary code for those tools. It's giving the model a lot of flexibility, otherwise it would not have.

[0:27:15] JMC: This feels that it's going into the fascinating field of a agentic properties. Before we dive into, in the code execution example that you just gave, how would the model know that it's achieved the right answer? Should the test be provided in the prompt, or?

[0:27:34] PB: You don't have to provide the tests. I think for most code execution, it's just looking for a specific output. Going to break the rules for podcast folks. I will describe what I'm showing on my screen. I've just pulled up AI Studio. I'm selecting our smallest Flash model. Gemini Flash 8B. I'm turning on code execution, which is just a little toggle button that you can share. First, I'm going to show what it looks like without code execution turned on. Then I'll show what it looks like with code execution. You can ask questions like, "Please, give me the dates of every single Monday in the year 2026." If I hit run, the model will give me a really troubling response that'll say, "Unfortunately, I can't provide a complete list of every Monday, because this would require a calendar program, or something similar."

If I turn on code execution and then rerun that same prompt, the model recognizes that it needs to write Python code. Then it runs it for me, until it gets the correct response. You can see here that the first iteration of Python code, it ran, it didn't get the correct response. Then it just kept going. It said, "I saw there was a bug in the previous code." Then it was able to get the correct response.

[0:28:57] JMC: Fascinating. Just for the record, we do have a YouTube channel, so this might be actually uploaded there. For those of you intrigued about the interface of AI Studio, you'll find it there. Otherwise, on the URL that Paige mentioned, and it's quite intuitive.

[0:29:10] PB: Excellent.

[0:29:11] JMC: I mean, this is obvious.

[0:29:12] PB: Amazing. There's also one other thing that I adore about AI Studio is that after you do all of these really interesting explorations in the UI, if you hit this get code button, it gives you the exact code that you would need in order to rerun the experiment that you just did. For tools use, or for code execution, it's a one-liner that just says, tools equals code execution, to be able to give Gemini the ability to write and to debug code over and over again.

[0:29:41] JMC: What's supported? That's always my next question. Go, Kotlin.

[0:29:45] PB: Yeah.

[0:29:46] JMC: Didn't you announce something about Android Studio very recently?

[0:29:50] PB: Yes. Gemini models have been baked into Android Studio as well for code completion, as well as code generation. If you want to be able to use AI assistants within Android Studio, or Colab, or some of our other coding IDEs that already exists and is powered by Gemini.

[0:30:09] JMC: Yeah. We saw it on screen a minute ago. But for those that are not watching, so Curl, Python, the plenty of languages with this. More obviously, there's myriad of them in the world, but a wide range of supported languages at the minute. Following on the field on the questions of agenticness, agency, rather, how do you feel? What's your personal view? I'm not asking now about the future, and I don't want to get you to talk about the roadmap, or stuff that is not shareable, but where do you see these things going, like models being able to act by themselves? This is a very broad way to describe it. But yes, how do you see that?

[0:30:48] PB: Well, I think we're already getting into this world where the most interesting use cases for models, at least from my perspective, are these kinds of write, run, execute code, do it in a wild loop, until it works, sort of scenarios. Though, I think that in order to help people have confidence on these use cases, there has to be transparency over every action that the model is taking, as well as overseer like, yes, go ahead stage for folks, if there are any changes to the system that are going to be made.

I will say, one of the - I had mentioned before that we're baking models into the Chrome browser. If you want to try out Gemini Nano within the Chrome Canary release, that's available for you to test today. The Gemini Nano is also embedded within Pixel devices. What that means is that -

[0:31:37] JMC: Within the Pixel device itself in the hardware, or rather in Chrome running in a Pixel device, or both?

[0:31:44] PB: It's not embedded within the hardware, but the model is baked into the operating system. It's running on device. If you do have these models that are running onboard, then suddenly, you can start imagining really interesting step-by-step behaviors that the models might be able to make on your behalf. As an example, I would love to be able to say, "Hey, Gemini, please look on my calendar and find the next best time for me to go and do yoga, or something," and have it be able to both look at the calendar for my yoga class, look at my work calendar, and then try to figure it out for me and schedule time. Those are all things that could be done today in theory. It just takes someone setting up those step-by-step calls to do with the model.

[0:32:33] JMC: I wonder from a compliance perspective, if the model eventually will after performing the tasks and hopefully, correctly, will be able to deliver a chain of thought proof of what the process has been so that, again, someone verifying not only the tests, eventually, but also that, the process has been logical, or compliant, right? That would be probably something that an enterprise user would be thinking of.

[0:32:59] PB: Or before the model takes action, it could say like, "Hey, Paige. It looks like you have some free time around next Friday at 12. Do you want me to go ahead and book it?" In which case, I could say yes or no.

[0:33:13] JMC: What else has you really excited about what's coming up?

[0:33:16] PB: Also, very excited about these multimodal paradigms. NotebookLM was really enchanting for a number of reasons. But I think, partially because not everybody is a text learner. I love to read. I adore it. I probably read too much. But many people prefer video content, or they enjoy listening to books as opposed to reading them. Giving folks the flexibility of being able to learn in the way that is most effective for them, I think is really exciting. Then also, just from the perspective of I could write books all day, or tell stories to my nieces and nephews, but I was never able to - I am not gifted in drawing.

[0:34:00] JMC: Me neither.

[0:34:01] PB: Having the ability. Yeah. It's a very challenging skill to learn. Being able to generate videos, or images is also pretty magical. We have a video model that is getting released through API as well called Veo, which gives you the ability to both describe a video and have it have it displayed in six-second chunks, or to seed the first frame of the video with just a static image. That's been really cool to see.

[0:34:28] JMC: Where else can everyone find you? Where else can actually people know about the releases of that pertain Gemini, the APIs, Google AI Studio and all the things that we've been talking about?

[0:34:41] PB: Yeah. I strongly recommend, as I mentioned I'm DynamicWebPaige pretty much everywhere, but I strongly recommend following our Google devs Twitter handle. That should get you inside into the all of the latest new features that are coming for Google developers. Then I would also recommend following some of the other folks on the team, Logan Kilpatrick, Chris Perry, who's the PM for Colab, and Matt Velloso, who's the lead for Microsoft for the full team. Google developers Twitter handle, my assumption is that everybody is going to have similar properties around threads, or BlueSky or LinkedIn, but I will send the Google devs Twitter link via chat right now, so you'll have handy access to it.

[0:35:27] JMC: Is there any point in doing the Kaggle course that we've talked about and that I'll include a link in the show notes?

[0:35:33] PB: Yeah, I definitely think so. The Kaggle course, it's a five-day generative AI intensive course. Around one hour a day is the expected time for the coursework and then a follow up hour to listen to the live stream. Each day includes Colab notebooks, so you can walk through code examples. It includes podcasts summarizing a whole bunch of white papers, or just the white papers themselves if you would prefer to read, and includes a live stream. Then there's also a lot of great Discord discussion about the course itself.

If you're interested in generative AI, both function calling, building agents, prompting models, doing retrieval, interacting with embeddings, writing evals, this should be a really great crash course for you to try.

[0:36:15] JMC: Yeah. The curriculum looks fantastic. I really look forward to actually glancing over it and being able to understand. At least, I'd be happy with 30% of it.

[0:36:24] PB: Awesome.

[0:36:25] JMC: Because I shouldn't have point out that I'm not a developer.

[0:36:28] PB: I think everybody can be a developer these days with these generative AI tools, to be honest.

[0:36:32] JMC: That's true. That's very true. Actually, these tools that we understand code bases, that C++ code bases that are way beyond my understanding, and I'm really happy that it's opened the gates of my understanding to this particular case, arcane and low-level programming languages and code bases.

[0:36:50] PB: Awesome.

[0:36:51] JMC: Anything else that we didn't touch upon that you would like to mention before we conclude?

[0:36:55] PB: Nope. Just the takeaway for everybody should be, if you haven't tried out at aistudio.google.com, go explore it, test it out on your own data. We have a very generous free tier. I strongly, strongly encourage you to take advantage of it.

[0:37:11] JMC: Well, thanks so much, Paige. Take care. Have a splendid rest of your week.

[0:37:15] PB: Excellent. You too.

[END]