EPISODE 1585

[INTRODUCTION]

[0:00:00] ANNOUNCER: Speech technology has been around for a long time. But in the last 12 months, it's undergone a quantum leap. New speech synthesis models can produce speech that's often indistinguishable from real speech. I'm sure many listeners have heard deep fakes where computer speech perfectly mimics the voice of famous actors or public figures. A major factor in driving the ongoing advances is generative AI.

Speechlab is at the forefront of using new AI techniques for real-time dubbing, which is a process of converting speech from one language into another. For the interested listener, we recommend hearing the examples with President Obama speaking Spanish or Elon Musk speaking Japanese. Check out the show notes for a link to the video.

Ivan Galea is the co-founder and President at Speechlab, and he joins the show to talk about how we're on the cusp of reaching the holy grail of speech technology, real-time dubbing, and how this will erase barriers to communication, and likely transform the world. 

This episode is hosted by Lee Atchison. Lee Atchison is a software architect, author, and thought leader on cloud computing and application modernization. His bestselling book, Architecting for Scale is an essential resource for technical teams looking to maintain high availability and manage risk in their cloud environments. Lee is the host of his podcast, Modern Digital Business, produced for people looking to build and grow their digital business. Listen mdb.fm. Follow Lee at softwarearchitectureinsights.com and see all his content at leeatchison.com.

[INTERVIEW]

[0:01:47] LA: Ivan Galea is the co-founder and President of Speechlab and he is my guest today. Ivan, welcome to Software Engineering Daily.

[0:01:54] IG: Great to be here, Lee.

[0:01:56] LA: Thank you. Thank you. So most speech recognition uses AI nowadays, period, but not the newer generative AI. It's more, like you said, machine language versus large language models. But I'm assuming that's correct, and can you elaborate on what the differences are between those and why that difference is important?

[0:02:19] IG: Yes, great question. Speech recognition has been around for quite some time. What has really shifted is the magnitude of the models. Together, with the magnitude of both the data and the size of the models, the capabilities, and the closeness to natural soundness of the output of these models. So the underlying technology hasn't fundamentally changed. The technology has been around, there's been a lot of very interesting research. What has happened over the last 12 months or so is that, from a capability standpoint, like any other aspect in AI, there's been a shift fueled by the size and the magnitude of these models.

[0:03:04] LA: Why is it that people have such a love-hate relationship with this? People either love the idea of AI, and it's helping them and doing all these great new things, or they are afraid of it, and they want to stay away from it. Why do you think that's the case?

[0:03:22] IG: I think like any other new technology over time, over the last several 100 years, every time that there is a new technology, there is a fear about what this new technology would bring about. I think, specifically with AI, and speech technology in particular, there will be multiple impacts to existing jobs. We saw the Hollywood strike that ended recently that you alluded to, it's a critical case in point. Some of these technologies will have an impact to current jobs that are employed within Hollywood. If we look back at history to determine some sort of forecasting of what is likely to come, most likely, there will be new jobs that would be created. But some of the existing jobs what either sees or needs to be adapted. And every time you have change, it always carries with it an element of fear of the unknown fear of what does this mean to me and my job, and we need to navigate through that together.

[0:04:27] LA: Yes. I think the technology in general has done that in all industries, but certainly in the movie industry, in Hollywood. Graphics technology has done a lot for animation, and it's changed the jobs of people in animation, for instance. So we see the technology influence a lot. Is AI different than just technology? I mean, is it the same sort of thing that newer technology means newer jobs, and older jobs going away, or is it more than that for AI?

[0:04:54] IG: I think we're getting into the subjective realm, so I can give you my opinion. It's hard to determine exactly how things will pan out. I think that this is more of a fundamental change in technology, than maybe like, you know, the introduction of a new technology in terms of user interface. This is more akin to the notion, the introduction of the Internet back in the day. Several decades ago, that is a fundamental shift to both the set of technical capabilities, but also business models that are at play. That's already quite fundamental. 

I think when it comes to speech, there's also an element of speech, and language defines who we are as people, it defines us as individuals. Like my voice is unique, your voice is unique. Then it also defines people, the language has a big – is a big part of the cultural DNA of individual people. So speech has this very innate part of who we are as humans. Given that we're talking about the impact of AI within the context of speech technology, this is not just a question of technology. It gets to the heart of who we are and how would we react when we're making these developments.

[0:06:26] LA: Right. It's more akin to, as you said, the creation of the Internet than anything than it is to just a new technology. Because it's not just changing business models, it's creating brand new business models, and obsoleting old models, and changing how we do business in general, or how we live in general for that matter.

[0:06:45] IG: Yes. I would go to say, that I think it also poses questions that are more profound, like, who are we as individuals? At that point where we are able to clone somebody's voice to a very high level of fidelity, how do we determine actual reality versus fake? I think we're at the very, very early cycles of this. But as a people, we have to address these concerns. This goes back into privacy and to the rights that we have. Going back to the Hollywood strike, a lot of the strike was around the economic rights. How should they be the distribution, like an equitable distribution of the economic rights for a voice actor? These are profound questions.

[0:07:34] LA: Right. Right. I meant to get into this later in the episode, but maybe this might be the time to talk about it. The ethical implications of AI, and AI speech in particular are quite profound. The ability to fool people, and the ability to create deep fakes sort of thing is a real issue for the industry. Now, I know you take an approach where you require opt-in before you use a voice. The person who you're creating a voice for has to approve before you even create the voice. I think that – I know that's required by certain jurisdictions for different laws, but it's something that you take very seriously, I know. But as an industry, how do we address this? What do we need to do to make sure that the deep fakes, the dark deep fakes, if you will, are always recognizable?

[0:08:25] IG: I think we're, as an industry, we're still finding our way through this. I think legislation is as expected, not necessarily evolving at the same pace as technology. Talking now, specifically, from a Speechlab perspective, my company, where I work, we have a number of technologies that we haven't made it available in the public domain. Voice cloning, in particular. Because we're still are developing workflows around permissioning, and workflows where we feel confident, that are the right things to do and that meet criteria for specific customers that we're partnering with.

The notion of ethics, the notion of ownership, the notion of permissioning is critical. I don't think anybody has the answer. I think we have to evolve this. We have to communicate, like from a perspective of Speechlab, we spend a lot of effort creating design partners for verticals work that we're participating in. Then, look at the best in class for those verticals, learn from them in terms of the use cases. How are they looking at our technology, and build products that meet their workflows? I think in general, that's going to be even more critical than the technology itself.

[0:09:48] LA: That makes sense. You mentioned your product now and you focus on two different terms for speech. You talked about dubbing and you talk about voiceover. Can you, first of all start off by telling me what's the difference between the two?

[0:10:03] IG: Dubbing involves speech-to-speech translation. So within the context of, let's say, Hollywood or a television series, dubbing has happened for many, many years. It's heavily involved humans, professional translators, and professional actors. Typically, you have a script, a piece of content, movie, or TV that is in language one, and you want to translate it, and dub it into language two. Dubbing is the act of changing the languages from one to two. From a technology perspective, there are multiple steps. But from an output perspective, it's really a question of translation, and then narrating it in a similar voice, and emotion, and express the ability as the source. Voiceover, it's a little bit – it's different. Voiceover is essentially, you have a text, and you're narrating the text. So your voice actor that is reading a book to create an audiobook, that's part of voiceover.

[0:11:13] LA: Got it. Got it. Now, like you say, dubbing has been around long before computers were a thing. I mean, there's manual dubbing. I mean, I grew up with the old Japanese Godzilla movies. I remember what dubbing was like in the old days. In fact, the technology for doing dubbing hasn't really changed dramatically until relatively recently. That's the idea of being able to buy a computer, by AI. Listen to what's being said, translate it in another voice, in the other language, but keeping the sound, keeping the, like you say, the emotion. And in some cases, keeping the pacing equivalent and similar so it can be a direct substitution in is an interesting challenge. In fact, one of the problems that I always see with dubbing is that of lip-syncing. Lip-syncing is a huge problem with the dubbing industry. That's also an opportunity area where AI can help. You have seen some cases so far, where you've been able to adjust lip positioning to match what the real words are versus the words that were originally said that caused the lips to move. Do you do that sort of changes as well in your AI process?

[0:12:27] IG: From a Speechlab perspective, we're doing early work on lip sync. It's not an area that we've been focused on so far, and we don't provide it in production available, but still, like in research. In general, we view lip sync as a solved problem from a purely technical angle. If you're looking at a – our vision in the near long term is to have real-time automated translation. Imagine you're at the United Nations, there is somebody speaking a language that you don't speak, you put in an earphone or an earpiece, and you listen to a high-fidelity translation in real-time or near real-time that captures all the nuances and all the context. That's the mission.

[0:13:17] LA: Audio only. Audio only is the idea.

[0:13:20] IG: Audio and video, but the idea is that that's where we'd like to go, so real-time. Real-time, obviously, it's going to be both audio and video. So imagine that we're having a Zoom call, and you're having a regular conversation with somebody that doesn't speak your language, and you're listening, you can continue having that conversation. The question here is the quality level. You can have some of these conversations today, but the quality tends to be poor. Speech in particular, our language is full of noise. It's incredibly versatile.

If one of the use cases that my co-founder, Seamus McAteer had when he was inspired about this particular area. His wife is an occupational therapist in San Francisco, working primarily with a non-English speaking, with an immigrant community speaking Cantonese. In that case, you want to have a very high-fidelity translation in real-time. Imagine a doctor-patient, the quality of the translation is critical. Same thing can multiple other areas. Speech is a very nuanced, versatile area, where quality, and number of use cases are really fundamental. Obviously, there is a distinction between having a short clip video, cat doing something on the piano where quality is not necessarily critical. What we're focused on are for these high-quality use cases that we believe where there is a lot of value.

For that, the quality of both the content, and in many cases, the lip sync is also critical. So we're addressing the lip sync specifically, from a technical perspective, there's been a lot of improvement within models that are available for lip sync. The bigger challenges are, separate from the lip sync are more around capturing the context, and the nuances, and the timing. If you can capture the timing for segmentation the right way, lip sync become much easier. Obviously, if you don't do the timing right, lip sync become off. There is nothing worse than watching the poor dubbed movie. You mentioned watching some Japanese movies when you were a kid. You want to get immersed in whatever content that you are watching, that you're participating in. Lip sync is one aspect of the content, and doing it right is critical. But there are other aspects on the content and the spoken language that are equally critical.

[0:16:03] LA: The human quality content, it's not just the words, it's how it said, the flow, everything like that, intonations. But also with a video, it's the lip-syncing and the position of the head, and the position of hand, emotions, and things like that as well.

[0:16:20] IG: Just on that, Lee, it reminds me of a Seinfeld episode where they're picking up a phrase, this is kind of like dates me that I'm using this reference. But there is a one particular episode where they pick up a phrase, and they use the phrase in different contexts, which means completely different things. You can pick up like any word saying, yes. In a particular way, it means yes. In a different way, it means no. So like the content of the word, and the translation is the same, but the meaning is completely different. That just gives you a sense of the richness of language.

[0:17:00] LA: Yes, exactly. Yes. I love the idea of getting to the point, you mentioned the Zoom conversation of getting to the point where you can have a fully interactive Zoom conversation, high-quality with a real person, and personally you know. But you're speaking in your own language, they're speaking in your language, or at least it seems to you that that's what they're doing. But from your perspective, it is the person, they're just speaking a different language.

[0:17:26] IG: That's exactly right. That's our vision. Earlier this morning, I was having a conversation with somebody who works at the United Nations. This topic, obviously, it's critical, like you have a melting pot of people representing countries, each of them speaking different languages. And being able to do this at scale, and at lower cost, and that high quality, we believe goes a long way to reduce barriers that are created by languages.

[0:17:55] LA: Right. Right. You bring up a lot of use cases more so than I saw or I was initially thinking when I started doing some research into your company. But your site right now, you focus a lot on the creator marketplace, with people who want to create content, and then distribute that worldwide in multiple languages. That's by itself, that's a decent-sized segment of the industry. But what you're describing here is a lot more than that. It's a way of doing business worldwide, not just communicating content, but interactive discussions, and how we communicate with people. It is as revolutionary as you said, as the initial introduction of the Internet and the ability to communicate worldwide instantaneously. But now, it's interactive with real communications going on that is in your own language, does not have the language barrier anymore. 

When are we going to get to that point? I mean, we're working on that now. Right now, you can do very high communications by spending a lot of time doing a translation, or you can do a real-time translation that is very noticeable that it's not very good, and you can do both of those. What's the changing point here? Is there a point where this is going to suddenly get much better or is it a slow evolution? When are we going to get to the point where we have truly interactive conversations without the obvious talking to a fake?

IG: I think we're close. I think it's a question of few years, not decades. Like a couple of things about the problem statement. Speech is an area where there are a lot of fat tails in technical speak, meaning that the exceptions are really important for a given use case. Going back to the example that I mentioned a short while ago, having an occupational therapist, or a nurse, or a healthcare provider, a doctor talking with a patient that is speaking a different language, the exceptions, even if it's – you cannot have a system that misses 5% or 1% of the language that is spoken, because it's a life and death situation.

That's one simple example for healthcare. Any other examples like, we're working – talking from Speechlab that perspective, we're working with a number of publishers within the media industry. We're also actively working with multiple global players on the e-learning side. If you're talking about an instructor that is talking, giving a very technical lecture, you want to make sure that the quality is high. That's kind of like the state of the art today. 

Now, between that and between having real-time, there are multiple steps that need to be tackled and solved at scale. One area that we're starting to explore, and this gives you a sense of the versatility of this space, is when we're talking about real-time, it doesn't necessarily need to be a different language. Let me give you an example. One use case that we're exploring with a government entity, what they're interested in for security reasons, there are a few other commercial applications. Imagine that we're having a conversation, and for some reason, security or otherwise, you need to remain anonymous. Imagine a system that can change any characteristic, and randomize any characteristics of your voice, that attaches your voice to who you are, actually. 

For example, we might want to change the gender, we might want to change the age, we might want to randomize any other aspect of your expressibility. Right now, you're calm, you might want to change that. Right now, you're very excited, you might want to change that in real-time. So we're having a conversation, and I have no idea who am I talking to from an identity perspective, because that identity has been masked in an anonymous manner. Now, security use cases are obvious. You might also want to use it to create truly objective customer feedback, because you're not having any subjective input when you're doing customer feedback. There are multiple use cases, both from a homeland security perspective, as well as from a commercial perspective. All of these are stepping stones towards that vision that we talked about a few moments ago.

[0:22:41] LA: Yes. I'm trying to think exactly how to put this, but I can imagine the cases where, for instance, in legal proceedings, where jurors are allowed to see a witness, but have their gender and race removed, or randomized. So you don't know who they are, or what they represent in. So they remove that from the equation of whether they trust the individual. I can imagine tons of use cases like that, where it'd be very valuable to be able to do that, but it would have to be real-time and have to be believable.

[0:23:15] IG: Lee, some of these like when you start thinking about it as an industry with try to do it, sometimes you're watching a TV show, and they want to anonymize a person, and they turn the person's voice to this deep voice that you cannot recognize. You've seen it, I'm sure.

[0:23:32] LA: And the light behind their head and you can't see their face.

[0:23:35] IG: It's completely silhouetted, so like we've seen it. I think that we're at this very exciting point in technology, where there are more variables that you can have, and the output is at a very high-quality output.

[0:23:49] LA: That is an interesting future, and much deeper than what we thought. Of course, I was contemplating with a lot of this. But of course, then the next question comes up, and that is the detection aspect. Because if we can do this, then so can bad actors. This is where things like deep fakes come in, and all that sort of stuff. Throughout history, technology changes have required technology changes to notice the technology change. This is especially true with weapons. You get a bigger gun; we get a better shield for that gun.

As time goes on, things stay usually balanced, and there are times in history where that's not the case. Same thing with technology too. Technology is a weapon when you start talking about bad actors. So the weapon of deep fakes is a valuable and a very powerful weapon. But is there going to be technology that allows us to always detect them to the point where we can deflect that weapon, lack of a better word?

[0:24:55] IG: Lee, if we look back at history and the way that technology has evolved, and as you were mentioning, when the gun is created, there is – the police need to have something which is stronger to counter. I think it has to happen. When it comes to deep fake, I think it's happening already. I think you hear stories about kids being kidnapped, and somebody's – there was a case in Texas, I believe, where the voice of her daughter crying calls somebody, her mom, and basically says that she was kidnapped, and that she needs to do a ransom. Obviously, her mom is completely distraught. Has it happened a few moments later, her actual daughter called her calm, and she has no idea what was going on. In this case, a bad actor cloned her daughter's voice and try to ask for money. 

I think we're at the tip of the iceberg of how technologies like this can be misused. This is not novel. Any new technology, there are always a lot of good cases and there are bad cases. It's not specific to AI. It's just AI creates new possibilities. As an industry, I think it's really critical that we take this seriously. I talked about having permissioning, and within the workflows itself, and into the technology, especially in terms of ID'ing when a cloned voice is being done. I think it's part of our joint responsibility as an industry to do.

I think from a regulatory framework, I'm sure that there are active discussions both in Europe, and in the US, and I'm sure in other jurisdictions about creating, adapting laws to preserve and to clarify the rights of an individual. What's the right about how my voice is being used? That's something that may be until a few years ago wasn't really a burning issue. With this new technology, it becomes a burning issue.

Obviously, there are economic elements of this. We talked about the Hollywood strike. I think that there's still a lot of development that needs to happen, including businesses that have to be created within this space. The element, the chances, the possibilities of bad actors using it is critical. I worry about this in many multiple domains, including the political domain. Few weeks ago, a very popular actor made the statement that his likeness, both voice and video was used in an ad without his permission. I think we're at the tip of the iceberg of this, basically, over the coming years.

[0:27:46] LA: What role do you think companies like yours have in this? Are you advocating to legal bodies, to congressional bodies? What type of role are you playing, and your company playing in making sure that this is addressed?

[0:28:02] IG: I think the first responsibility is how do we ensure that the products, the technology that we're building cannot be misused, or there is a – we provide the right solutions to our customers to ensure that the rights of the individuals are addressed. For example, I mentioned that we have very advanced cloning technology. Before we launch it, we want to make sure that we are comfortable, and the design partners that we're working with are very comfortable that it cannot be misused. It's really funny, like some of these things from a technology perspective, we get excited about the technology itself. And then you talk with customers, and then you have a better awareness of how they are looking at it from the outside in, which is really important.

In one particular case, we were very excited that the likeness of an individual that had a very unique accent was really good. In this case, he shared it with his parents, and even his parents couldn't tell the difference. For us, that was very successful, we're very proud of ourselves. Then we're talking with a customer, and obviously, like we're partnering with a customer. The customer have their own agents. The agent could be a lecturer for a new learner, or it could be a voice actor, or it could be somebody within a media company. It could be like anybody that his voice or her voice has a following.

Their primary concern is not necessarily the fidelity of the voice. That's great. Their primary concern is, how do I have control? How can I make sure that this technology using my own voice is not being misused? The first thing is, let's make sure that we're building technology responsibly, and we're creating the right tools for our customers, and their customers to be used effectively. I think as an industry, there's a question of educating the market. There's also a question of building IDs when we're doing voice cloning to create a distinction, a differentiation between deep fake, and between the products that we're doing. Obviously, the third aspect, which is the regulatory framework is as critical. That's also an area where we would expect and welcome a lot of development in the coming years.

[0:30:32] LA: Make sense. As a creator, I create a lot of content. I've used products that have helped me create scripts from what I say, or while editing, I need to insert one new word that I didn't record quite right. There are multiple products out there. I'm sure you're familiar with a product called Descript, but there's a lot like that. How do you compare to products like that? Are you a technology company that powers those sorts of products? Or do you see yourself competing with those sorts of products? Are you thinking in a totally different direction?

[0:31:08] IG: Lee, from the angle of this space that we're participating in, I can say, we're in the speech technology space. If you frame it that way, yes, we are participating in the same space. Now, the reality is that speech technology is so vast and broad, that our focus is very distinct from a company, for example, that is focused on videos for marketers or for creators like yourself. This space of speech is really vast. I would imagine that five years, 10 years from now, there will be many, many businesses that are strong, and are doing really well. And they are targeting distinct use cases from each other, starting from anything related to audio, to anything related to video, to audiobooks, and the list goes on and on.

[0:32:00] LA: Right. Today, then, who would be your biggest competitors? Is there anybody doing the same sorts of things that you're currently doing? Or is the space so big that no one's really colliding with everyone yet?

[0:32:12] IG: I think there's a lot of activity within this space. I think if I look at the broader market of dubbing, I think there are a few players that are investing in core tech. And there are many players that are white labeling technology, and focusing more on marketing, and other solutions. We have a belief at Speechlab that you need to have ownership of the end-to-end technology, and you need to invest in customer use cases. We're specifically focused on enterprises, and we want to make sure that we're building technology, and we build the right product and use cases that satisfy their workflows. That's our focus. 

But within the overall space, it's very active in multiple ways. For example, if I want to dub a short TikTok video, it's not something that we focus on. I can mention 20 different startups that are focused on that. It's just not an area of interest for us.

[0:33:21] LA: Right. That makes sense. It's still, like you say, it's a vast space, and there's enough room for all of these companies without a lot of collisions. The collisions will come, but they're not there yet, anyway. Let's look to the future a little bit. I would normally say, tell me what the industry is going to be like in three years. But given what's been happening with AI, we need to start with one year. What's the industry going to look like a year from today? But then, what's it going to look like in three years, and then, forever, meaning 10 years? What's the transition going to look like?

[0:33:55] IG: Let me reverse-engineer it. I think, not necessarily forever. I think a number of years out, I think single-digit years out, we would have high-quality translation and multiple bursts of languages.

[0:34:10] LA: Real time?

[0:34:11] IG: Real-time or near real-time. I think that would shift a lot of preconditions that we have both in terms of business models, in terms of where employees are based, in terms of availability of talent. If you remove the time zone perspective, living in the English-speaking world, we tend to look that – we tend to believe that all the talent speak English. That's not necessarily the case. It's only a small fraction of the world's population that speak fluent English.

Recently, I sit on a board of a Brazilian technology startup, and I was having a few conversations with a company that provide technology talent in Brazil. They have over 40,000 technology individuals, professionals that were available as contractor or contractor to hire. Only 10% of those speak fluent English. Now, that's a talent pool in my time zone that I don't have availability for. That's just a tiny piece of a much larger opportunity. If you look at it purely from a business perspective, there are a lot of untapped opportunities that would be slowly removed. I'm talking a few years out through technology like this. That's kind of like, let's say, three to five years out.

I think bringing it closer here, and I can keep going on, like we talked about the healthcare opportunity. Talent is not localized. Talent is available everywhere. I think language is still a barrier, that since the beginning of time, we've taken for granted. This is an exciting time where barriers created from language will slowly disappear. That means that content in one language would be available in other languages. That's very exciting. You can look at entertainment content, like Netflix. From other countries, there was a notion, precondition that in the US, we're not interested in content, entertainment content from another country. That didn't prove to be the case.

I think in specific verticals, there will be new opportunities, business opportunities that would get created. I think, equally or perhaps even more so, there'll be more opportunities at a personal level. Being able to speak in somebody's language, or to listen in somebody's language would reduce differences. God knows, at the time when we're recording this when there are wars everywhere, how critical that would be. That's a few years out. 

I think closer to where we are, I think there will be a lot of evolution in this idea of audio and content generated from text. I think it's more natural for us to have something spoken than written. In some cases, including my son who is dyslexic, it's also a better way to consume information. This technology would create more opportunities for either ways that sound more natural, or that people based on their own internal preference, the preference of how you learn, there'll be more opportunities for such use cases. I think we'll see more of that in the near future.

[0:37:50] LA: In the very near future, improve text-to-speech, voiceovers, those sorts of technologies, changing written content, and do audio and video content in real ways. In three years or so, fully interactive conversations where language isn't a barrier. In less than 10 years, five, 10 years, whatever, in that sort of time period, we're doing the same thing, but where words are believable and human quality. And you know it's not a translation, you think it's the real person.

[0:38:26] IG: Yes, I think that's a fair assessment. I mean, I would just mention the announcement from Spotify only a few weeks ago. But having their library of podcast or some parts of the library of podcasts in English, that they would make it available in other languages. So similar to that, I would imagine that there will be other areas of recorded content that would also be available in other languages. I believe in the US, something like 3%, small single-digit percentage of books published in the US are available as audiobooks. Globally, it's even smaller than that. That's content that will become easier to find in your own language. There's an element of personalization hear that sometimes it's also missed. 

Like if I look at marketing, and the evolution of marketing from one size fits all, maybe a couple of decades ago to digital marketing, where you have an audience of one. It shifted the whole phase of marketing. I would imagine that these technologies that we're talking about would have a similar angle, similar impact by creating this level of local personalization. The way that we speak, the way that we speak within my community, within my country is critical to the identity of who we are, so it's very important. 

I'll tell you a little anecdote. One of the languages that we support today, Spanish. If you don't know, you might say, well, Spanish is Spanish, whether it's spoken in Mexico, or Columbia, or Uruguay, or Spain. Obviously, that's not the case. Even within a particular region, there are certain ways certain phrases that are spoken that make it local. It makes it more personable. That's the advantage of technology like this.

[0:40:25] LA: This is great. Thank you. Yes, this is a really exciting space. I'm really looking forward to see how Speechlab and companies like yours actually make this better. I'm anxious to see what happens from the regulation standpoint and the ethics standpoint, because like you say, we have a long ways to go there. But it's critically important because the technology is not going any – it's not going to go away, it's here. We have to do it, deal with it in an ethical and safe way as much as we can. 

Thank you. My guest today has been Ivan Galea, the co-founder and President of Speechlab. Ivan, thank you very much for joining me in Software Engineering Daily.

[0:41:06] IG: Thanks, Lee. Really enjoyed our time together.

[END]