Nenshad Bardoliwalla, Director of Product Management for Vertex AI, unpacks the three core layers of Vertex AI: the Model Garden, where users access and evaluate a diverse range of models; the Model Builder, supporting fine-tuning and prompt optimization; and the Agent Builder, designed to develop AI agents capable of performing complex, goal-oriented tasks.

 
 
 

216 Audio.mp3: Audio automatically transcribed by Sonix

216 Audio.mp3: this mp3 audio file was automatically transcribed by Sonix with the best speech-to-text algorithms. This transcript may contain errors.

Nenshad:

One of the remarkable changes that has happened in the industry when it comes to generative AI is that software developers and even business people are able to interact with these models so easily. They don't necessarily have to train anything to start getting results very quickly, and so a lot of the experimentation to see whether a model is good for my use case can actually be done by software engineers as well, as you know, business analysts and the like, because all they're doing is typing their prompt into a box and seeing well, does this kind of look like what I was hoping?

Craig:

Imagine a world where our spaces work smarter, not harder, where technology and humans collaborate to create something extraordinary. At JLL, they're not just imagining this future, they're building it. JLL's AI solutions are transforming the real estate landscape, accelerating growth, streamlining operations and unlocking hidden value in properties and portfolios. From predictive analytics to intelligent automation, they're creating smarter buildings, more efficient workplaces and sustainable cities. JLL shaping the future of real estate with AI. Welcome to a brighter way. To learn more about JLL and AI, visit JLL dot com slash AI. Again. To learn more about how JLL is shaping the future of real estate with AI, visit JLL dot com slash AI.

Nenshad:

So Vertex is a brand that was launched in 2021, and it represents Google Cloud's AI platform. Google Cloud's AI platform, so you know a combination of core machine learning services like training and inference, and notebooks, along with pre-built models, including large foundation models, as well as, more recently, capabilities like enterprise search. So the Vertex portfolio has definitely grown over the last few years.

Craig:

Okay, and so you'll talk about what Vertex is and what it does, and does it support different models beyond Google models?

Nenshad:

Absolutely so. We have a property at Vertex that we call our model garden Uh, and the entire notion behind the model garden was that we would uh support not only uh Google's uh leading first party models so uh, imagine and the like but we would also have a very strong representation for open source models and, most interestingly, third-party models. So we are the only platform that provides that combination of our own models, open source and third-party models, and we allow all of them to compete freely on the platform.

Craig:

Yeah, so Claude and GPT-401, they're all there.

Nenshad:

So the Anthropic models from Claude are there, mistral's models are there, Meta's Llama models are on the model garden. Today we do not have a partnership with OpenAI, so those models you have to get from somewhere else.

Craig:

Yeah, okay, so we'll start in a minute. Just one thing that I'm really interested in just in the last month or so and I've interviewed Cerebrus a few times from when they first came out and then with their second wafer engine, and then I ran across Andrew Feldman wafer engine, uh, and then I ran across, Andrew Feldman in at a conference and I interviewed him again, uh, and then at that same conference, I interviewed uh a guy so Rodrigo Leong from SambaNova and I'm really curious about, beyond the models that you host, is Google using these other inference chips? Because it sounds to me I mean, probably you're using Croc, but it sounds to me like there's a lot happening on the inference side. But I'll leave that.

Nenshad:

We've actually invented our own chips in that area. Okay, I'm happy to talk about that.

Craig:

Okay, so why don't we start Nanshad by having you introduce yourself, tell us how you got to Google Cloud, what your responsibilities are, you can talk about what the Vertex platform is, and then I'll jump in with questions.

Nenshad:

Great Well, thank you very much, craig. I'm Nenshad Bardoliwalla, director of Product Management for Vertex AI. I've been at Google for almost two years and I am responsible for the foundational platform components of Vertex, which include being able to train and tune models, to be able to experiment with models in notebooks, to evaluate those models, as well as being able to do inference with those models, monitor those models and steer them through their lifecycle in production so that entire gamut of capabilities. It's probably the largest part of Vertex AI that's carved out as a separate area is my responsibility. I came to Google. Oh, please go ahead.

Craig:

No, no, no, Go ahead and I'll cut that out.

Nenshad:

I came to Google after a 20 plus year in enterprise software and data. Specifically, I have only always had three intellectual interests in my life Computers. I was a hacker since I was a very young kid I'm talking Commodore 64 era, so I'm dating myself the human mind and I'm also a very avid and passionate musician. As you might be able to tell over my shoulder, there are a number of noise-making devices there for my guitars. So my entire career has actually been focused on data analytics and AI 10 years at large companies like Siebel and SAP and then 12 years in the startup world, which you know. A number of that time was spent at one of my companies called Paxata, which pioneered a self-service data preparation platform which we then sold to DataRobot, which is a unicorn AI company, and that really set me up for the privilege of being able to be part of the Vertex team here at Google.

Craig:

Yeah, so tell us about Vertex. What exactly is Vertex?

Nenshad:

So Vertex is a lot of things. Let’s break it down into a few different layers, right. The first layer of Vertex is what we call our model Garden, so we have a vision that customers should be able to choose any model that they think will be helpful for them to solve their use case, that meets their regulatory requirements, that meets their latency and performance requirements and so on. And, as you know very well, craig, there is a massive number of models that are available in the market now, and so the model garden is our curated place in Vertex AI where customers can get access to state-of-the-art Google models. So, for example, Gemini 1.5002, which just came out last week, is available inside the Model Garden. Our Imagine 3 model, which is our text-to-image model, is available in the Model Garden, and a number of other Google-specific models are available there. But we also believed from the outset that choice is extremely important to our customers, and we show that openness by offering in the Model Garden a very rich collection of open-source models. So you can find favorites like Stable Diffusion or Meta's Lama 3.2, which just also came out last week, and over 100 other open source models, including Google open source models like Gemma 2, which has done very well in the market. And then we combine that with our third party model support, and so we offer models from Anthropic, like the Claude family. We offer models from Mistral, like Mistral Large and their Codestraw model. We offer AI21's Jamba models and there are more coming seemingly each week.

Nenshad:

So the base layer of Vertex AI platform is the model garden and the ability for customers to choose and this is what's really interesting, Craig competing models. So if you look on the leaderboards of people who do comparisons of models, you will find Gemini and Lama and Anthropic all competing on the same leaderboard and Vertex AI platform. Our mentality is that customers should choose the best model for them, whether it's a Google model or not. So foundation layer one is the model garden. Layer two is what we call our model builder layer. So this is for being able to take a model and being able to fine tune it, so adding your own data to make the model respond in a way that makes sense for you. This is the layer that allows you to do a prompt management, prompt versioning, prompt optimization, prompt you know revisioning and so forth. This is the layer that allows you to deploy those models with the fine-tuning capability and it includes capabilities to monitor the recitation or the usage of specific code or word fragments from third-party sources, to monitor the safety, so that we ensure people are using the models in a way that aligns with their brand values and a host of other capabilities in that regard. So the model builder works hand-in-hand with the model garden to allow customers to interact with the models, tune them for their needs and then ultimately make them available for their usage.

Nenshad:

Then the third layer is the last part of Vertex that we'll talk about today is what we call the agent builder. So you've probably heard Craig. Agents are kind of the rage right now. Everybody seems to be talking about them, and at Google we have some very unique assets that make agent building very powerful. Library for doing agent building is offered as a managed Vertex service, but we also combine that with a really rich set of componentry from our enterprise search, because if you think of retrieval, augmented generation or RAG, the retrieval part is search, and so we offer very powerful search capabilities. We also offer the large language models that allow for the generation of the results and, uniquely to Google, we even offer the ability to ground the results of the agents using Google Web Search, so you can actually get very accurate, up-to-the-minute results from Google Search as part of your agent, which, of course, we are uniquely positioned to add, so three layers of what we call Vertex AI today the model garden, the model builder and the agent builder.

Craig:

Can you talk about a couple of things? So you say over 100 models, and I was looking at the Hugging Face leaderboard the other day. How did these models differentiate themselves? How do you decide when you're in the model garden which model you should use? Is it like shopping in the grocery store and you know the brand that does the most advertising, that's most familiar to consumers, gets most the traffic? Or is there true differentiation among models?

Nenshad:

There is. So first, a note about something you said that's a good reminder to me is that actually Model Garden is also directly integrated with Hugging Face. So in fact, even though Model Garden offers, you know, 150, 160 curated models, we also have a one-click deploy, literally one click, from Hugging Face directly into Vertex Model Garden so you can get access to thousands of other models that way. But let me answer your question about you know which model should I use, because it's one that I get multiple times a day. You have to understand a few. You have to ask yourself a few different questions about what you're trying to do. Number one what is the use case? What am I actually trying to do? Am I trying to summarize content? Am I trying to generate new types of content, for example a marketing brochure? What regions and what regionalization do I need? Do I need to generate this content in Hindi or just in English, for example? I need to understand how quickly I need the response right. Some models take much longer. I mean there are physics involved here. The larger the model, the slower it's going to be to respond right. Involved here the larger the model, the slower it's going to be to respond right. So you want to be able to choose a model that is the right size for the latency and accuracy that you're looking for in the use case. What is the model good at? Some models are much better at doing reasoning. Other models are much better at being able to invoke different tools in agentic type workflows. Other models are really strong at certain text classification type tasks.

Nenshad:

So you have to know what you're trying to do when you're evaluating the model. And then there is, of course, cost right. Different models, different models, cost different amounts. We at Google have been particularly aggressive. Our new 1.5 models are priced at half the cost of some of the competing models in this space, and that obviously makes it very attractive for customers.

Nenshad:

So you have you have probably about 10 dimensions that you need to look at when you're evaluating what is the right model for me, and the goal of the model garden is to surface a lot of that context to make it easier for you to winnow down like what are the three models I want to try for this use case, because there usually will be three. And then the next thing you have to do if I may, craig, no, no problem, the next thing you have to do is actually evaluate those models right. So you have to evaluate the models, not just with the leaderboard information that you get from public sites like LMSYS, which are very useful, but generic benchmarks. But that doesn't tell me whether, for my enterprise use case in my organization, with my tuned data, whether the model that I'm considering will actually do a good job. So you have to build your evaluation data sets and use tools.

Nenshad:

For example, vertex has a generative AI evaluation tool that allows you to actually compare the responses of multiple different models to see how close they are to what you really want to get. And what's really cool is that you can actually you can have human rated information as part of that analysis. You can even have models serve. I mean, you can have of these factors into account, run their evaluations and then ultimately say, okay, the most accurate model was Model A, but it also costs a little bit more. Is that additional cost worth the accuracy? The customer can then decide and if they say yes, they can go with Model A and deploy that in production, or they can choose Model B, which costs a little bit less but maybe is not as accurate, but that might be totally okay. So this is the logic that we help customers with and the tools that we provide customers with in the process.

Craig:

Yeah, who is doing the choosing and all of this evaluation within an enterprise? Is it just whatever software engineer is working on whatever project working on whatever project, or is this becoming a specialized job of understanding, keeping up with what models are doing and which ones do what?

Nenshad:

So there are a couple of interesting trends to reflect on towards this question. One is that in previous eras of machine learning, with predictive ML, building regression models, classification models and the like, that was very much a data scientist-type task, and model evaluation is very well understood in the data science world for predictive, predictive models. So you do find a lot of data scientists who have the skill set of doing model evaluation now performing that function for organizations for generative AI models too for organizations for generative AI models too. That being said, one of the remarkable changes that has happened in the industry when it comes to generative AI is that software developers and even business people are able to interact with these models so easily. They don't necessarily have to train anything to start getting results very quickly, and so a lot of the experimentation to see whether a model is good for my use case can actually be done by software engineers as well as business analysts and the like, because all they're doing is typing their prompt into a box and seeing well, does this kind of look like what I was hoping for? I can tweak that prompt, et cetera.

Nenshad:

But so typically we find that people do a lot of experimentation who are not the data scientists. They kind of winnow it down to a couple of models and then, if they're happy with the results, they can just move forward. Otherwise they can turn to their data scientists and say I think these three models are kind of the right ones. Could you please run a more formal systematic evaluation? And we encourage customers to do that, because literally every day, Craig, a new model comes out. It changes all the leaderboards overnight, and so you need a systematic programmatic methodology for doing this in a repeatable fashion.

Craig:

Yeah, and you were talking about AI tools to help identify models for particular use cases. When someone goes on the platform, are they clicking on drop-downs that give options and you know, like Airbnb, I need two bedrooms, I need it in this price range, I need it bedrooms, I need it in this price range, I need it. Or is uh, does the? Do you state your use case uh, to a conversational agent? And it does? It comes up and says these are the three best models. You should try them out.

Nenshad:

So, um, I think we're definitely headed towards that agentic world of helping customers to choose models. Today, what customers do is they get a series of choices they can make, like what modality are you trying to interact with? Right? That's another thing that you have to look at for models, right? So some models only handle text in and text out. Some models only handle code in and code out. Some models only handle code in and code out, right, like programming code. Some models only handle text in and images out, right?

Nenshad:

So you want to allow people to choose what modality they're going to be working with and also what task that you want the model to do.

Nenshad:

For example, text classification is very common as a use case and some models are much better at it than others. So we try to make it very easy for customers, using a series of filters and user experience affordances, for them to be able to narrow down the set of models that meet their criteria. And then, for the individual models, we provide what we call model cards, and those model cards give them really detailed information about, like where was this model trained, what kind of data was it trained on? What use cases is it really good for? Here are some examples of prompts that work really well to show the power of these models, and so I think of it as a sort of progressive choose your own journey, in that you start with the whole field of every model that's possibly available, but then very quickly you start narrowing down the decision space to be able to determine which set of models, based on your criteria, is actually going to be helpful for your use case to.

Craig:

And as you said new models appear everyday. Is the market big enough? Is the demand large enough that all of these models are getting used, or is it sort of like you've got these beautiful rose bushes in the middle of the garden and then these straggling sunflowers off to the side that no one's really paying attention to?

Nenshad:

So our observation is that it seems to follow the, the market distribution seems to follow the 80-20 rule.

Nenshad:

80% of the inference that customers are doing are on a let's call them less than half a dozen, you know, or half a dozen or so models less than a dozen. You know between half a dozen and a dozen models that are very popular, that get tried very quickly, that have lots of tooling built around them, etc. And those tend to be the dominant models and whereas for many of the other models they're just much more. In many cases they're much more niche. For example, they're only designed or trained on certain types of medical information or they're only, you know, trained on certain financial information, still extremely valuable, but they're not broad, you know, broad purpose, right, they're only meant for a specific industry and, in some cases, a specific task. So you will definitely find that the majority of the action in the models that we observe is, you know again, between half a dozen and a dozen models that are very, very regularly tried and used and put in production by customers and the rest are just less so.

Craig:

And when you say less, so is it? I'm just trying to get a sense of scale, because it's the entire world that's using these models, Is it? You know there will be a model with like two or three people using it, or enterprises using it? Does it get to that point, or do most models have at least a couple of hundred users? I mean, I have no idea what the scale is.

Nenshad:

There's definitely a long tail. We absolutely, you know, in our model garden today you will see, you know, models that are used by millions of people daily, and you will see models that are used once a day, right, uh, you will see models that are used by every customer that you know I could possibly name in Google cloud and other models that, because they're much more specific, they're only used by a handful of entities. So there, there is definitely a broad distribution between massive usage and then, I'd say, a very long tail. You know that the curve kind of goes down like this and, you know, asymptotically starts to approach zero once you get out of those top six to 12 models.

Craig:

Yeah, you mentioned a prompt framework. What was that and can you talk about that?

Nenshad:

Sure. So our customers have told us that they struggle from a couple of pretty hard challenges. One is that they don't see consistency when they try the same prompts between models from one provider to another. For example, I may be a customer using something besides Gemini today, and then I see that Gemini is now 50% the cost of the model I'm using now, which is a significant savings, the cost of the model I'm using now, which is a significant savings. And so I would love it if I could literally just pull the plug from model one and then just plug into model two.

Nenshad:

But it doesn't quite work like that because the way every model responds to its individual prompts is different.

Nenshad:

So we have just introduced some new technology, our prompt optimizer, that can actually take your prompts and the outputs from Model A and then feed it into Model B, into the optimizer, and get very similar results to what you were getting with Model A. So so use case number one is this really supports migration, uh, between different, different models uh, so that customers can uh have that that choice and flexibility. But that also holds. That also holds true for this, for the same models in the same family, for example, when we come out with Gemini 1.5001, and then now we just came out last week with Gemini 1.5002,. There will be differences in the way the models respond to prompts, and you can use the prompt optimizer to help in that use case too, with the comparison being between 1.5001 and 1.5002. But you can see that this programmatic way to sculpt prompts and the system instructions to help me get the results I'm looking for is very valuable for customers in a world where the models and the cost structures are constantly changing.

Craig:

Yeah, do you in that prompt optimization, if you're moving from one model to another, so you take the prompt that's been working in one model, you go through the optimization into another model? Do you then work with the prompt that has been output by the optimization engine and build on that, or do you keep working through the optimization engine because you're not really familiar with how prompting works in the new model?

Nenshad:

Yeah. So the way, the way it works, is actually really fascinating.

Nenshad:

What, what actually changes, is not the prompt itself, but the system instructions that are fed to the second model.

Nenshad:

So what we have found uh and this is uh, one of the beauties of uh and privileges of working at Google is we get to work with the DeepMind research team. You know, arm in arm on a daily basis, and our cloud AI team also has a very strong research board. So as they find interesting customer problems and develop technology, we can really rapidly bring it out to market. And so, in this case, prompt optimization was built by our cloud AI research team although we have many other examples, for example, the Gemini models themselves that come from our collaboration with DeepMind and this technology allows us to generate a system prompt that continues to learn based on the input prompts that we're giving it and the outputs that we're looking for, and so it kind of iterates the system prompt until it gets a set of instructions that allows Model B to emulate the behavior of Model A, Model B to emulate the behavior of Model A. It's pretty fascinating, but the prompts themselves stay the same.

Craig:

Yeah, and do you also have like prompt libraries? If someone's trying to do something fairly complicated that someone has already figured out how to do, you can, does that exist?

Nenshad:

It does. So, remember I told you there are three layers of Vertex AI platform the model garden, the model builder and the agent builder. In the model builder layer we have a service called Vertex AI Studio, and in Vertex AI Studio we have a number of facilities for allowing customers to save their prompts, version their prompts, so that they can reuse them and share them with other people, and we also provide a number probably 50 or 60 examples out of the box, depending on what type of use case, what type of modality to show you different ways to get value quickly out of the models. So, between the pre built library, which is static, and then within your organ not static, but it, you know it evolves over time as we add more examples and then your organization, where people are constantly adding new prompts and new examples, we have this really rich set of data that we can use to provide people with examples for how to prompt in their organization.

Craig:

Now I'm going to skip to the agentic layer Sure. How does that work? and what's the extent of activities that? you can build an agent for at this point?

Nenshad:

Great question. So I'll start by saying that agents are still a nascent technology, although we do have customers who are actually deploying agents today very successfully for the use cases that they've designed it for right. As an example, adt you know the Home Alarm Company. You probably know them. You may even have an ADT device in your house. They've used Vertex AI to build a customer agent right that allows the customers millions of their customers to select, order and set up their home security. So you know pretty sophisticated flow.

Nenshad:

If you ask me, what agents allow you to do is set a goal right. The agent has a goal or a purpose. Um, one example I love to use is help me plan my trip to Southern California. Right. I need, I need a travel a travel agent which I know is silly, but it helps me to remember it but an agent to help me plan my travel. So I need, need a goal right which is help me plan this trip. I need a set of tools that the agent can call to help me in that journey. So who would I want to call if I were going to Southern California? Who are the places that I probably call. I have, you know, reasonably young kids, so I'm probably going to call Disney. I'd love to have a tool or extension to their reservation system, probably want to call SeaWorld. I probably want a tool that works with Google Maps so I can chart out the distances and look at traffic information and the like. I have a tool that you know potentially could even allow me to enter information about you know, my credit card or other ways that I want to pay. So the agent has to have access to multiple different tools in order to do its work Right.

Nenshad:

And then and then the third part of the agent is it has to actually be able to plan and reason about how it is going to go, solve this problem right. And when you think about it, if you say I want to book a trip, right, which is sort of the highest level, node, okay, where are you going? How many days? Monday, Tuesday, Wednesday, Thursday, Friday. Okay, so now I have the next layer of you know the dates that I'm going to go. Well, what do I want to do on each one of those dates? And each one of those dates, I'm going to invoke different tools. I'm going to have different constraints, like what's my budget, how far can I travel on a daily basis, and so forth. And so when you combine those three things a goal driven system that has a number of tools available to it and can plan and hierarchically decompose the problem to get you to that goal, you have an extremely powerful set of technologies that I think we're just scratching the surface of today.

Craig:

Yeah, you know it’s funny I saw Yuval Harari on Bill Maher's show the other day and he told a story which I'd heard a different variation of, of sort of trying to emphasize how dangerous or powerful these models are, and he was set the AI, he said, was set a task and it ran into a roadblock because it needed to solve a CAPTCHA, you know, and the story goes I've got to track down this story that the AI went on Upwork or one of these freelance platforms and hired somebody to solve the CAPTCHA for it, and the excuse the AI gave the person on the other end is that well, I'm blind, so I can't see the CAPTCHA. I mean, the story is to me is uh, this? I've been hearing that story for a couple of years now, before there were agents, and I just I just think it's BS and I wish I had been on Bill Maher's show because I would have called Harari on it. But what are the modalities that an agent can use at this stage on your platform to interact with tools? Can it make phone calls using a robotic voice? Can it solve a CAPTCHA or can it?

Craig:

You know, the world is a pretty complicated place and the digital world is even more complicated, yeah. So what's the extent of the capabilities of these agents that are being solved? I understand agents like you know. You say to Alexa you know, sent my thermometer at 68 degrees. Yeah, to me that's very straightforward. I understand how that works. But anyway, what's your answer?

Nenshad:

so I think I think there's a maybe a broader point that needs to be addressed, which is um, and then I'm happy to answer your direct question. The first thing I would say is we have to be very smart about how we design these systems, and we have to put in not only guardrails right, for what we want our agents to do and not do but we also need to put touch points with humans in place, so that, so that we make sure that things don't get automated beyond our level of comfort, right. So, going back to that travel example right, if an agent could do all of this for me and come back to me with an itinerary, that's awesome. But I absolutely want to approve that itinerary. I absolutely want to say, actually, I don't want to go through Santa Monica on Wednesday. I've already seen Santa Monica, but I didn't tell the agent that. That's why I came up with the plan.

Nenshad:

Like the, you want to set up very clear touch points where the agent can interact with a human and, depending on what you're trying to automate, you, you will increase that or dial it down. Right, there are more innocuous tasks, right, like Alexa and your thermometer, right. Very simple, very deterministic. Please do this. It goes, does this Once the agent starts to be able to, you know, be able to plan and be able to invoke tools right Like go generate these images for me or go call this location.

Nenshad:

And we have examples of these technologies at Google which are pretty incredible, and the voices don't sound robotic, they sound like other people.

Nenshad:

And so my general point, I think, is that I think, technologically, we have the ability to do many, many things today with agents with speech, with voice, with image, with video generation. You could have an agent that could actually create a video, because the agent generates a prompt it submits. So the possibilities are limitless. But I think the question that we always have to ask ourselves is are humans involved enough in the process, at the right points, for what this agent that I'm currently designing is trying to do? So I you know I can't attest to the veracity of the story that that you shared with me. I have no idea whether it's true or not, but I would say that, for people who are in this space actively designing these systems, we would, we would have put many safeguards, would have put many safeguards, we would add a combination of not only the language model, being able to plan but also rules in a deterministic way, like thou shalt not do these things.

Craig:

No, and I understand that. My question is today could an agent do that? Could it run into a problem where it needs to solve a CAPTCHA? Could it on its own, without a human saying, oh it needs Upwork, so here I'll log into Upwork for it. Just on its own, could it go through all that steps, send the messages, hire an Upwork. I just don't believe agents are that capable yet.

Nenshad:

I would say that strikes me as beyond what I think I have seen agentic technology do different. Because the process you just described of hiring somebody, you know. Identifying what the then, first of all, knowing that there is a service like uh, was it Upwork? Did you mention? Uh, you know, knowing that that service exists, knowing that the purpose of that service is to get access to human, human beings, being able to, um, convince you know somebody by the model lying that it is a human being who is visually impaired, this seems pretty fanciful to me.

Craig:

Yeah, yeah, that sort of thing really bothers me.

Nenshad:

I can understand why.

Craig:

And then everyone's like oh, my god, um, okay, so back to uh, vertex. Uh, we were going to talk about the, the hardware that you have in your data that people have access to, because I've been talking to, um, you know Cerebrus and Samba Nova and these guys about these new chip architectures that are built for inference are 10 times or more faster than traditional GPUs. Obviously, openai and Azure is married to GPUs, but increasingly it seems there should be a market for this faster inference.

Nenshad:

There most certainly is, in fact, one of the, I think, crowning achievements of Google's infrastructure, which was born out of a very real practical need. If you look at ads, if you look at search, if you look at Gmail, if you look at Google Maps, we use AI extensively we have for more than a decade across all those products, and we determined that the cost of inference, if we were to rely on the available technologies at the time, we would not be able to scale offering AI to the world with the existing technologies that existed. And so a pioneering group of individuals at Google, more than a decade ago, created the Tensor Processing Unit, or TPU, and TPUs are, if I might be so bold, the precursor of many of the innovations that you are seeing come to the market right now, and quite a few of those companies are started by people who worked on TPUs previously.

Craig:

That's right.

Nenshad:

So we pioneered machine learning, specific computation, down to the silicon over a decade ago. We are in our sixth generation of that technology that we've announced publicly. It powers that.

Nenshad:

Those TPUs power Gemini. They power all the consumer sites that I mentioned before. They power Imagine and I will also tell you that some of the world's largest model provider companies insist on coming to Google Cloud because that is the only place to get TPUs and when you have to do training at the scale of building your own large language model or other foundation model, the economic benefits of a hyper-optimized infrastructure, which not only include the TPUs themselves but the way they're networked, the way they're laid out in the data center, etc. We have a significant head start on the rest of the market because we recognized this opportunity a long time ago and it's great to see new innovation in the space. By the way, there are some really exciting companies out there, but TPUs have been available in Google Cloud for quite some time and they are an exceedingly popular offering because people who have used them know how absolutely powerful they are.

Craig:

Yeah, and I understand the lock that NVIDIA has had because of CUDA and you know, in building applications for a certain chipset. But inference, it's irrelevant, you don't? You're just hitting a model and getting the answer back. So how, on inference, how much faster are TPUs than GPUs? Do you have a metric?

Nenshad:

Yeah, there isn't a public metric that I could share with you, and I also want to be very clear that NVIDIA is one of our largest and most strategic partners. We have made the decision. Just go back to what I told you about model garden. We are willing for innovation to flourish, uh, in our company, even if there's competition, and so we view it, uh, and our customers view it as a uh. You know a great thing that we offer the absolute state of the art NVIDIA GPUs in our platform. We partner very closely with NVIDIA. They help us with, you know, they have specific requirements around data center, networking, configuration, etc. And we work with them to do that. So it's not really a head to head comparison that we do. It's much more about fitting the use case to when the customer thinks that a TPU will do a better job for what they're trying to do than a GPU. But the beauty is they have choice.

Craig:

Right, and the TPU you do offer TPUs in the cloud too.

Nenshad:

Yes, yes, Absolutely.

Craig:

Imagine a world where our spaces work smarter, not harder, where technology and humans collaborate to create something extraordinary. At JLL, they're not just imagining this future, they're building it. They're not just imagining this future, they're building it. JLL's AI solutions are transforming the real estate landscape, accelerating growth, streamlining operations and unlocking hidden value in properties and portfolios. From predictive analytics to intelligent automation, they're creating smarter buildings, more efficient workplaces and sustainable cities. JLL shaping the future of real estate with AI. Welcome to A Brighter Way To learn more about JLL and AI, visit JLL dot com slash AI. Again. To learn more about how JLL is shaping the future of real estate with AI, visit JLL dot com slash AI.

Sonix is the world’s most advanced automated transcription, translation, and subtitling platform. Fast, accurate, and affordable.

Automatically convert your mp3 files to text (txt file), Microsoft Word (docx file), and SubRip Subtitle (srt file) in minutes.

Sonix has many features that you'd love including automated translation, world-class support, advanced search, powerful integrations and APIs, and easily transcribe your Zoom meetings. Try Sonix for free today.


 
blink-animation-2.gif
 
 

 Eye On AI features a podcast with senior researchers and entrepreneurs in the deep learning space. We also offer a weekly newsletter tracking deep-learning academic papers.


Sign up for our weekly newsletter.

 
 

WEEKLY NEWSLETTER | Research Watch

Week Ending 1.12.2025 — Newly published papers and discussions around them. Read more