
Yoav Shoham, co-founder of AI21 Labs, shares his insights on the evolution of AI, touching on key advancements such as Jamba and Maestro. From the early days of his career to the latest developments in AI systems, Yoav offers a comprehensive look into the future of artificial intelligence.
Eye on AI #244 - Audio.mp3: Audio automatically transcribed by Sonix
Eye on AI #244 - Audio.mp3: this mp3 audio file was automatically transcribed by Sonix with the best speech-to-text algorithms. This transcript may contain errors.
Yoav: 0:00
I think language models, even the latest so-called reasoning models, they have their limits of what they can do and really the world is going towards AI systems now, away from just pure language models, no matter what their specific architecture is. So language models aren't going away, but they really are becoming part of a bigger picture.
Craig: 0:19
This episode of Eye on AI is sponsored by the DFINITY Foundation. Of Eye on AI is sponsored by the DFINITY Foundation. Dfinity Foundation is a Swiss not-for-profit that is home to some of the world's leading cryptographers, computer scientists and experts in distributed computing. Their mission is to shift cloud computing toward a fully decentralized state by supporting the internet Computer, also known as ICP. If you don't understand anything about the Internet Computer, you can see my episode with Dominic Williams.
Craig: 0:58
Icp's vision is that most of the world's software will be replaced by network resident software. That's an evolution of traditional smart contracts. To achieve this vision, icp is designed to make smart contracts as powerful as traditional software, all while remaining tamper-proof, unstoppable unstoppable, transparent and verifiable. Since DFINITY's launch in 2016, they stand as Switzerland's most extensive blockchain research and development initiative and have been awarded more than 500 research grants worldwide. Dfinity remains steadfast in its mission to drive the advancement of the decentralized internet. Network. Resident software can now be used to run AI models and RAG infrastructure, preventing them from becoming quote-unquote hot wallets from where data can be stolen, and increasing resilience and increasing resilience. Furthermore, the technology has been designed to allow AI models to spin up and modify, running web applications and internet services solo by addressing several key challenges. If you're interested in reading more about the internet computer, visit internetcomputerorg. I also encourage you to listen to my episode with Dominic Williams, in which he explains the internet computer.
Yoav: 2:33
I got interested in AI when I was in college and went to study in the state. I went to Yale and then I was offered the job at Stanford as an assistant professor there, and we're talking back a long time ago, 1987. I'm an old guy but I honestly didn't know that that's what I wanted to do. But I figured Stanford was a good place and I'll have a good time until I figure out what I really want to do. And 28 years later, I'll have a good time until I figure out what I really want to do. And 28 years later I retired as a full professor and along the way, I ran the AI lab there for a while and my work there was largely theoretical, not-for-profit research.
Yoav: 3:21
I worked on logic and philosophy and game theory as it pertains to computer science and AI. In fact, I have an online game theory course together with some colleagues that more than a million people view it already. This is for a not particularly useful topic, but pretty but not useful. But I do have an applied site. So I started several companies, ai21 being the last one. That's me briefly.
Craig: 3:55
And how did you start AI21? And then we can talk about what you guys are doing there.
Yoav: 4:06
AI21 is seven years old now and the idea germinated.
Yoav: 4:14
I'd sold my last company to Google and was thinking that AI started to put all the eggs in one basket, which was deep learning. Llms weren't a thing and large language models weren't a thing, but deep learning was, and machine learning in general, and it sort of worked. I mean, there's a reason why people gravitated to it. But my feeling was that and this is statistics basically statistics will never give you the robust reasoning of the kind that one needs and that you saw in AI early on, and we should kind of put together a good old-fashioned AI with modern-day deep learning-based AI. So that's kind of an odd reason to start a company, but truthfully, that's the reason we started the company. There's me and my co-founder, ori Goshen, who's half my age twice my brain, and another guy who's not that younger than me, but somewhat younger but certainly very smart. Amroun Shashua was well known for having started and running Mobileye, the company, but he's also quite an accomplished computer science professor and I knew him from the academic side of things. That's why we started the company.
Craig: 5:40
And you were combining, going back to symbolic AI and expert systems and infusing them with deep learning. Is that what you were doing in the beginning?
Yoav: 5:57
So we did a lot of experimentation for the first 20 employees, I'd recorded lectures on long-term presentation and I forced the poor guys to learn about logic and about temporal reasoning and about frame systems and so on, because they knew modern AI well, but they didn't know that. But it wasn't so much to take egg persistence and glue machine learning onto them, but to think more deeply what's missing from modern AI and how best to imbue it. And for about three years we did nothing but technology building and, coincidentally, we started the company right around the LLM revolution, because that's the year that the Transformer paper came out. It was clear to us that the work was in language, not in vision. Vision is quote-unquote an easy problem. Object recognition it's obviously not easy. But to know that this is a mug I don't really care what this pixel over on the side is, but there's nothing local in language. Take a sentence and change a word over here, the whole. You can't get away from semantics. And so it's clear that that's where the work was. And then, coincidentally, clear that that's where the work was. And then, coincidentally, transformers came out and we felt we had to be very good at language models and we, if I can say so myself, became very good at that.
Yoav: 7:38
When GPT-3 came out, we were, I think, the heaviest, most demanding users in terms of the use cases and how we banked on it and at some point we decided we need to build our own and we built our first model, called Jurassic One. It's not our most innovative model. It's a standard left-to-right autoregressive model GPT-like. It was slightly bigger, had five times the vocabulary. People who tested us found us a little better than GP3. But it was really at the time a novel thing to do, maybe even a dairy thing to do, and a small company doing and putting out this. And at the time we were playing around with technology and building this language model. But at that point this was already three years into our existence or thereabouts two and a half years, maybe two years.
Yoav: 8:33
We didn't want to just be a research lab and we had, and still have, really an amazing collection of talent. Honestly, I've had the luxury of working with very smart people but such a big kind of density of brain. I'm always the dumb person in the room. But we didn't want to just be, and I always mention DeepMind as an example of a group I really admire. But early on they really didn't have. There's not a big business in solving Atari games. So the question was what business should we be in? And it was clear to us we wanted to be in sort of the enterprise kind of B2B world. But there was no business there then and so we had built our own business.
Yoav: 9:24
It was an application which today is a very busy space of writing assistants, but at the time it was quite novel. There was nothing like it. We had an LLM-powered writing assistant called WordTune and it really was amazing. It still is. It very quickly crossed 10 million users. It really was amazing. It still is, it crossed it very quickly, crossed 10 million users. It was a freemium model generating kind of a lot of revenues, but it was never meant to be our core business and it's still there. But of course, the business became the enterprise and it was somewhat audacious. You know, there we are. We're three years old, we have this application called WordTune, we have our own model. So we're, let's say, competing with Grammarly on, you know, writing assistance. We're competing with OpenAI on LLMs and we're like 30 people, you know. And then we raised some money, but less than OpenAI. So it was a little audacious, but that's what we did and in time, you know, we focused more and more on the enterprise. That's kind of roughly our arc.
Craig: 10:39
Yeah, when you say focus on the enterprise, I mean there are a couple of things I'm interested in. I was doing some reading about Jurassic 2, actually, and I read that you guys used Amazon SageMaker to build that. Is that right, Our Jurassic?
Yoav: 10:55
models were trained. We actually trained on both AWS and GCP. Aws and GCP we tended to use the bare metal, not use much of the infrastructure on top of it. We were quite portable. Our models starting with about a year ago were trained only on Amazon. Because they're a broke new grounds, they were no longer a pure transformer architect.
Yoav: 11:28
The thing with transformers I mean, why did transformers take? Is because up until then there were RNNs and things that look at the input and didn't have a long context to look back into. And, like I said, vision is kind of local that work, but language isn't. So the attention-making transformer allowed you to relate different district parts of the input. The thing is you pay a high price for that and it's suddenly becoming quadratic complexity. And if your input is like 1,000, 1,000 squared is fine, but now we're pushing a million. A million squared is not fine. And so our guys created this new architecture, a model called Jamba. It's a Jamba family, in fact. We've just now released Jamba 1.6. It's amazing now released Jamba 1.6. It's amazing.
Yoav: 12:24
And the way the model kind of is architected. It's mostly a so-called state-based model, specifically based on the Mamba model that came out of academia. We were the first one to really just scale it to a large model. So the advantage of state-based models is that they're more efficient. You're back to sort of linear-ish complexity and you're not as bad because you try to remember the path and the state you carry along. That's why they're called state-based models. But that's not as effective as the attention mechanism or transformers.
Yoav: 13:01
What our guys did every so often every eight layers in that case put a attention layer and there were a lot of ablations about exactly how to do it.
Yoav: 13:11
But the end of it was a model whose performance in terms of the quality of the answers was competitive with the best model of similar size and in terms of latency and throughput and memory requirements there's no comparison way way more efficient. This is a mixture of expert models, so you have a total number of parameters and a smaller number active at any moment. So our Jamba small was 52 billion parameters, the total of which, oh my God, 12 billion I think, were active, and the large model was 398 billion, of which 94 billion were active. So the small model would fit on a single GPU, which is mind-boggling, and the large model would fit on a single 8GPU pod, and the throughput and latency are just no comparison. They're almost linear and not quite linear because you have a little attention, but almost linear. So those are the models that we're putting out now and, like I said, we just released Jumbo 1.6, which is our latest.
Craig: 14:22
Can you talk a little bit about Mamba? I haven't done an episode on that and a lot of people, a lot of the listeners, have heard about it but don't really understand it about its development and how you integrated it.
Yoav: 14:43
So, first of all, I really encourage you to speak with the creators of Mamba. They're nice, smart guys, academics. I'm sure they'll be able to go into as much detail as you want. But, like I said, it's a state-based model. So it's order, aggressive, goes left to right and looking at the input, token by token, and predicting the next token and doing the stochastic gradient descent to update the weights. The difference is, as it goes along, it carries state and updates the state. And now the state doesn't capture the entire history. It can't, but it captures enough of it. So you have this great, it's like a memory aid that the model has. So that's how it operates and, like I said, it has certain advantages of speed and an advantage over more vanilla left-right kind of RNN type mechanisms, but not quite competitive with explicitly every type looking at the entire history with the attention hits attention hits.
Craig: 16:11
I had Sepp Huckreiter I can never pronounce his name the way he did the creator of the LSTM on fairly recently, and he's pursuing that and is building a company around it, around sort of an improved LSTM. Is Mamba nothing like LSTM or is it a completely new architecture?
Yoav: 16:38
Completely different. It's hard to say, and I think what Mamba did? It opened the door for people to experiment, to be brave enough to either experiment with totally new kind of models or tweaks. Mamba itself had several versions. We ourselves didn't take Mamba as is, but we did a lot of ablations. People have looked at now taking a second look at bidirectional models, not these left to right autoregressive one, but, you know, with masking. So I think I think something very healthy that people aren't intimidated by successive transformers and it'd be interesting to see what comes up. I have to say I think language models, even the latest so-called reasoning models you know the O1s and R1s and so on they have their limits of what they can do and really the world is going towards AI systems now, away from just pure language models, no matter what their specific architecture is. So language models aren't going away, but they really are becoming part of a bigger picture.
Craig: 17:52
So in this hybrid model you were saying, you have a Mamba layer and then several layers above that of a transformer. Exactly, you stack them with Mamba transformer, mamba transformer, is that right?
Yoav: 18:12
Yes, except it's not Mamba transformer. There's a Jamba block and, by the way, I really encourage people to go and read because we actually gave quite explicit description of Jamba, the architecture and how it was built. We actually open-sourced the, sourced the, open weighted the model. We felt it's particularly novel for for the community to help kind of improve things. But in the Jamba block there were eight layers, one of which is Transformer, the rest are Mamba layers and there are also a mixture of expert layers there. Yeah, that's essentially what it looks like.
Craig: 18:50
And so how is it performing on all the various benchmarks that everyone is watching these days?
Yoav: 19:01
So let me first make a general kind of pontificate generally on this topic of benchmarking.
Yoav: 19:10
You know we are somewhat cynical about the benchmarks. They're well-intentioned and they give you some signal about the quality of the model, but it's a very weak signal for two reasons. One, which is sort of an unavoidable one, is the correlation between the tasks that are encoded in the benchmark and what you see in the real world is not strong. And so you may perform great on GSM8K or on MMLU or score high on you know, lmsis or ArenaHard or something, and that doesn't necessarily mean that when you go and actually try to work on a real problem in the real world, the model will be better and the model will score less. That's one reason we take it with a grain of salt.
Yoav: 20:07
The other is, unfortunately these are really easy to game and now we've been so, by the way, to answer your question, we score well on these, despite my cynicism. They are a measure, they are an aid in optimizing your training. We never cheat, but we take it with a big grain of salt and I can tell you that our customers also take it with a big grain of salt. There's, you know, especially the sophisticated sort of you know people. They'll always make a face when you discuss these benchmarks. But I'm saying that we're okay.
Craig: 20:58
Yeah, I mean, you can train to those benchmarks, can't you First?
Yoav: 21:04
of all, yes, and I do know about some others where people explicitly gave instructions. You know we need to win this benchmark and you could do it even without cheating. I mean, the most blatant thing you could do is take the test data and put in the training data. But if you don't do that, you just spend more cycles in that area. Mathematics, logic, you know whatever it is, whether it's natural data or produce a lot of synthetic data in that area. You'll do better there, which doesn't necessarily mean you'll be better in real applications. So people do that. So if you throw enough flops at it and you throw enough data at it, you'll get good performance of the benchmarks.
Craig: 21:56
Back to DeepMind's AlphaGo. People have been using reinforcement learning directly to train the models, not to refine or fine-tune the models. What do you think about that? Did you use that approach with Jamba?
Yoav: 22:18
By the way, this year's Turing Award winners are the two pioneers in AI of reinforcement learning, andy Bartow and Rich Sutton. Certainly, rl is having its moment now for a variety of reasons. Rlhf is a misnomer, it's really not reinforced attorney, it's sort of reward modeling, in this case by humans. But RL is playing a role in the jumbo models that's out there. We didn't use it. We didn't find the need there Recently we have find the need there Recently, we have. But when you're referring to R1 or O1, those quote-unquote chain of thought models, the RL plays a particularly important role there. As it happens, when you look closely it's not enough. But that's really the primary way in which at training time not at inference time, but at training time you try to guide the model towards these chains of inference, chains of tokens the correct way to say it that are more likely to be productive than others, and that's the role of RL in those models.
Craig: 23:39
But you guys, how did you train Jamba we?
Yoav: 23:42
did you know I'd call it standard pre-training, with a lot of quality data, both natural and synthetic. We then did a lot of alignment tuning of various kinds and using really I mean SFT supervised fine tuning there, and that's primarily how we trade Jamba.
Craig: 24:11
So, from your point of view, jamba, is this new hybrid architecture or hybrid model? Are you going to continue building larger models? What's the goal? You said at the outset that your interest was in working with enterprise.
Yoav: 24:33
Right, so I'm not in position to sort of announce what we'll do in the future, but I will say that we believe language models are and will continue to be important. It's very important for a company like ours to be very good at them. So I don't think that's going to go away. But increasingly the emphasis is on AI systems and in particular, we just announced Maestro, which is a planning-based orchestrator that works with multiple LLMs as well as other tools and code and what have you and orchestrate it in a way that really will serve the enterprise, to give the reliability and predictability, and increasingly we think that will be the emphasis in the industry and certainly for us.
Craig: 25:33
So are you going to continue coming out with larger and larger Jambas, or is size of both compute and training not your goal here?
Yoav: 25:54
Our goal is to produce the technology that serves the enterprise reliably. That's our goal. Language models are part of it and the quality of the model is a function of multiple things. And the quality of the model is a function of multiple things. Size, unlike a sandwich in language models. Bigger is not always better and it's not only the amount of training data. A lot of it is the quality of the training data and the ability to specialize the model to quickly, efficiently, reliably in certain domains and, like I said, the efficiency of it, the way to serve it efficiently. That'll continue to be our focus but, like I said, I can't speak specifically to what kind of models we'll be training. Yeah, I can't speak specifically to what kind of models we'll be training.
Craig: 26:52
Yeah, and you were saying with Jamba it is open weight and it has a particularly large context window Is that right? And very high throughput, token throughput compared to transformers. You were saying you can fit a model on a single GPU. So what are some of the applications that you imagine with this model?
Yoav: 27:31
Well, I don't need to imagine, but we have customers using it for a variety of you know the type of applications. Aren't that mysterious? You see it across the board. We are working with a bank in a call center setting to do question answering and troubleshooting. We're working with a big retailer to do product descriptions. These online retailers have millions of products and writing product descriptions is time-consuming, costly, error-prone so we're helping with that. Consuming, costly, error prone so we're helping with that. We are working with financial institutions to query financial documents, and so the applications really vary.
Yoav: 28:30
And and what we're finding is that enterprises realize there's a big gap between a flashy demo and something that actually works. And what we see in the industry is that we're in what we like to think of as the second phase of the modern AI revolution. So up until maybe two and a half years ago, so you had sporadic experimentation the CIOs or the CEOs had just gone, finished their migration to cloud and the new energy and go ahead and play AI if you want, and then suddenly somebody turned on the switch and today there's no CEO in the world that doesn't say I'm an AI first company or I want to be one, and so we're in the era of mass experimentation. We have kind of customers, partners with hundreds of use cases, but the drop-off between an experiment, a PLC, to a prototype to actual deployment is precipitous. There's one study by AWS that shows the drop of 94% 6% of the project's CO2 production, and we think we're on the cusp of the transition to the third phase of mass deployment.
Yoav: 29:56
But for that to happen you need to deal with these two issues of have the model be efficient, because the economics of LLM is not the economic traditional software. It can be extremely expensive, certainly with these quote-unquote reasoning models. And the second is accuracy, because these models are probabilistic machines and sometimes they give you wonderful creative answers and then they give you total garbage. And so if you're a high school student writing an essay, it's okay. If you're occasionally wrong, maybe you have time to correct, and if you don't, that's okay. Worst case, you'll get a poor grade. But if you're writing a summary of a patient checkup and you make a mistake, or if you're writing a memo to your client or your boss and you make a mistake, you can't recover from that. And tackling that is required to get to the phase of mass deployment. And this is where AI systems will come in, because no language model, no matter how good, how many parameters, how much data, how many quadruple guardrails you put there, they will never escape. We call it prompt and pray.
Craig: 31:35
So long as you're in the prompt and pray world enterprise won't adopt it at scale of your deployments on-prem, where people have their own GPUs? Or is this primarily being accessed through the cloud? And then, what do you build around the Jamba model to protect against hallucinations?
Yoav: 31:50
So, first of all, our models can be accessed really anywhere. You certainly can get them on the cloud. We're on all the hyperscalers, we have our own SaaS, but we also, in fact, we're viewed as the best on-prem solution and we have it installed in some really significant companies. So it's all of the above In terms of how do you get to high accuracy Again, you, and it's a fool's errand to try to force the LLM to be deterministic. When you call the LLM twice in a row, you'll get different answers. This is the way it works, and so you need to build a system around it. Also, an LLM is not that right. You need to call tools. If I'm going to access my proprietary data, you need some rag-like system to access that. Certain things.
Yoav: 32:54
God did not put neural nets on Earth to do arithmetic. Hp gave us a calculator in 1970. We don't need to reinvent that wheel and get a crooked one at that. So you have these pieces of code that do stuff reliably. You want to use those, and so what you want is a system, not an AI, an LN, an AI system that knows to work with these LNs by the way, not just Jamba. So Maestro, our new sort of orchestrator will work with Jamba and work with others as well, and is just smart about when to use which for what, how to coordinate them, how to parallelize things, how to plan ahead and give some performance guarantees about how much will it cost me.
Yoav: 33:47
And right now, if you go to the so-called reasoning models, they'll go and think, generate these sequences of so-called thinking tokens and come back after half an hour with an answer maybe a bad answer, a good one, but you've just spent a dollar. You can't have that, not in the enterprise, not at scale. And so that's what the world we're looking at now these orchestration systems that can work in a production environment and really juggle everything in a way that transparent to the user, give the user control when they need it. And so that's the world we're in, and this is what I think, certainly where Maestro is, but I think where the world is going in general.
Yoav: 34:40
So what I was going to say is what you see now happening in the enterprise, before such an orchestration tool became available, is that people do manual coding of static chains. So they would go and write a script, write a program. We'll call the LLM here, check its output, call another LLM, or the LLM again call some custom code. Somebody went and wrote that code and that's again. That's manual, it's a one-off for one use case. You have a different use case, a different workflow, you need to start it from scratch, and so that was the way until now. People controlled for the unruly behavior of LLMs. But now we have AI to help control this through a very deliberate planning process.
Craig: 35:36
Yeah, what do you think of this GPT-4.5, that you let it run long enough and it comes up with an answer that's more accurate, less likely to be a hallucination? Is that a system as you're talking about, or is that a model?
Yoav: 36:03
First of all I have to say, to caveat it, they haven't really shared the deal behind any of their models. But it clearly is a model. But it's a model of those chain of thought variety where they've trained the model to predict not just the very next token but a long sequence of tokens that hopefully represent to predict not just the very next token but a long sequence of tokens that hopefully represent some useful thinking pattern or reasoning behavior, culminating the right answer. And sometimes it works. The thing is it's still a prompt-and-pray regime. It can improve the answers. It's costly.
Yoav: 36:50
Like I said, it'll cost you much more than a single call, even a single call to an LLM, which itself can be expensive and you get no guarantees. And it's something of a misnomer to call these reasoning models, because the thing that you see along the way sometimes corresponds to what you might think of logical or coherent reasoning and sometimes it's just stuff. And so you know I call them large musing models. One of my colleagues, rao, who is a professor and also one of our consultants, calls them chains of thoughtlessness, and it's a little unkind because they do serve a useful purpose but they will not give you the reliability and cost control and controllability in general that the enterprise needs.
Craig: 37:58
Maestro is out, jamba is out. What's next in your, you personally, in your work.
Yoav: 38:09
Me personally, it's complicated. I try to not interfere with smart people who are doing useful work. I will tell you I have a pet project, which is I do in my kind of spare time on the weekends together with my colleague, kevin Leighton-Brown, who is a professor from Canada brilliant and also he happens to be working with the company. But this is a totally academic work on we call it understanding. Understanding what does it mean to understand something? And do LL really understand? It gets to interesting philosophical questions, but in my spare time that's what I do.
Yoav: 38:52
But in terms of stuff that actually matters to the world and the company, we think this world of AI systems that are based on explicit planning is in its very early stages. There's a ton of work to do there. What we've just released is a limited version of our system. It allows you here you are, you could call an LLM Jamba, but you could call Opus, you can call Grok, you could call GPT 4.0, even, or instead you could call Maestro, who would use those respective systems but amplify the performance dramatically. On average it amplified the accuracy by 50%. But that's the early functionality that we're making available. There's much more work to do in an AI system, how to plan things ahead, both at training time, at entrance time and so there's a ton of work to be done there.
Craig: 40:10
I wanted to ask you mentioned understanding and, as you said, this is kind of a bottomless philosophical question, partly because no one has defined understanding in any kind of a scientific way. But Jeff Hinton certainly believes that LLMs understand and think. What's your view and where does understanding? There's a gradient or a spectrum of understanding that ends in awareness. Where do you stand and where do you think the current LLMs are on that spectrum and we're not LLMs but AI models and where do you think? Do you think that it'll continue to climb that
ladder, or we've created something that has something that we call understanding, and that's as far as it's going to go.
Yoav: 41:27
A lot of stuff packed into that. Let me first say that I have the utmost regard for Jeff and for another guy, andrew Ng, who's my colleague at Stanford and both brilliant people. But I mention Andrew because Andrew and Jeff had an online conversation about whether aliens understand and they agreed between them they understand something to some extent. I think that is an ungrounded conversation because they have a defined understanding. And what Kevin and I did to start out by saying what does it mean to understand and how would you evaluate it? And then we can meaningfully speak about and it's actually a technical paper with lots of kind of equations, but I'll give you the gist of it In order to demonstrate understanding.
Yoav: 42:33
First of all, you don't just understand. You understand something, a domain. You understand arithmetic. You understand human biology, something you understand, a domain you understand you have to understand motorcycle maintenance. Okay, now, what does it mean to understand within that scope of that domain? So there are really two or three things that you have to kind of keep in mind. One is you've got to be competent. Keep in mind One is you've got to be competent.
Yoav: 43:13
In other words, if I ask you does the system or the person understand arithmetic? You're going to ask the question and get answers. If, by and large, the answers are wrong, I don't care what. That person or that system doesn't understand arithmetic. That's kind of one baseline requirement of general confidence. Let's call it a passing grade. You don't need to be 100 to be qualified in understanding arithmetic, but you can't get 15. That's one thing. The other thing is you can't be ridiculously wrong and what's ridiculous is a matter of context here. But let's say I want to know if the system understands arithmetic or multiplication, and I'll give it some multiplication problems and it'll take me fail on maybe long ones. But then I ask it to multiply two by two and it'll say five and I'll say, are you sure? And I'll say, oh yes, it'll give me an explanation of why two plus two equals five. And maybe, as some recent chess plot do try to gaslight me into believing that two plus two equals five, that system does not understand mathematics. You've got to have general competence, you can't be ridiculous.
Yoav: 44:35
And this third element is really important, which is explanations. You've got to give good explanations. And that's interesting. It took us a while to figure, understand why explanations are relevant here. And you see, in high school, in school in general in exams a teacher will say explain your answer. And why do they do that?
Yoav: 45:01
You ask the student a small number of questions, three or five or ten. The set of questions you could have asked. It's huge, not infinite, but it's many millions of problems in arithmetic you could have given them. So what you would like to know is that the answers they gave represented. If you had answered others, you might have confidence you would have gotten good answers there also. If you had answered others, you wouldn't have confidence you would have gotten good answers there also.
Yoav: 45:32
And one way is to ask enough questions that the statistics are such that it's unlikely that they would have lucked into a correct answer. But the way to amplify that is to actually give an explanation. So if I tell you, show me one, solve this one multiplication problem and tell me how you solve it, and I give you my multiplication procedure. And you believe that I didn't just memorize it, I actually applied it. That's a good question of why should you believe it. But let's say you believe it. You don't need to ask me any more questions Now.
Yoav: 46:15
You could have asked me any others and I would have applied the same procedure, or you believe I would have. So those are some of the elements. It goes to want me to really understand something and when you look at language model today and take a domain that's not trivial, then you can show that they really don't understand the domain. What would it take to understand them? That's a longer discussion. But yeah, now you asked another question and folded into you, said, said, if we did understand, will that get us to awareness? And now I'll really take you really far afield, if I may Stop me, if you, when I was a professor at Stanford, I had a freshman seminar that I called Can Confuses Think, can they Feel?
Yoav: 47:09
I had a freshman seminar that I called Can Confuses Think, can they Feel? And this is kids who haven't been corrupted yet, the smart kids. And I would start with six questions Can a computer let's see, I can reconstruct it Can a computer think? Can they understand, can they be creative, can they feel, can they have free will and can they be conscious?
Craig: 47:39
There.
Yoav: 47:39
I got to sixth and El-Mishik, you have to vote no, depends, no hedging, yes or no, and at the end of the course, after we spoke about AI and machine learning and everything, they vote again and the answers would be different. But I think the most interesting thing that came out of it is that people realize that it's not so much they don't know where the machine could, in principle, say, be aware, like you asked. If they start to question, what does it mean for people to be aware? And that's the thing that's fascinating about AI it forces us to think about ourselves, not just about machines, and I think that may be the most fun thing about AI.
Craig: 48:28
That's fascinating. I should have you for another episode just to talk about that because I have more questions. But if somebody wants to try Jamba or Maestro, you say it's available everywhere. You say it's available everywhere, but the most direct way, I imagine, is to go to AI21. So what's the URL?
Yoav: 48:59
So if you, go to our landing page, ai21.com. All the information will be there. I just wanted to maybe clarify something. Jamba 1.6 is widely available. It's open waits. Unless you're a big company with more revenues and some threshold, then you should pay us, because we paid a lot of money to develop it, so we need to recoup that money somehow. Maestro is in closed preview right now, and so we're working with a small number of companies and there's a wait list and we encourage people to sign up. We'll gradually bring more people in and at some point it will be an open preview and we'll announce that.
Craig: 49:48
Okay, and at some point it'll be an open preview and we'll announce that. Okay, and it's AI21.com. Is that the URL?
Yoav: 49:57
That's the landing page, and from there you'll see exactly where to go.
Craig: 50:01
This episode of Eye on AI is sponsored by the DFINITY Foundation. Dfinity Foundation is a Swiss not-for-profit that is home to some of the world's leading cryptographers, computer scientists and experts in distributed computing. Their mission is to shift cloud computing toward a fully decentralized state by supporting the internet computer, also known as ICP. If you don't understand anything about the internet computer, also known as ICP, if you don't understand anything about the internet computer, you can see my episode with Dominic Williams. Icp's vision is that most of the world's software will be replaced by network resident software. That's an evolution of traditional smart contracts. To achieve this vision, icp is designed to make smart contracts as powerful as traditional software, all while remaining tamper-proof, unstoppable, transparent and verifiable.
Craig: 51:06
Since DFINITY's launch in 2016,. Since DFINITY's launch in 2016, they stand as Switzerland's most extensive blockchain research and development initiative and have been awarded more than 500 research grants worldwide. Dfinity remains steadfast in its mission to drive the advancement of the decentralized internet. Network. Resident software can now be used to run AI models and RAG infrastructure, preventing them from becoming quote-unquote hot wallets from where data can be stolen, and increasing resilience. Furthermore, the technology has been designed to allow AI models to spin up and modify, running web applications and internet services solo by addressing several key challenges. If you're interested in reading more about the internet computer, visit internetcomputerorg. I also encourage you to listen to my episode with Dominic Williams, in which he explains the internet computer.
Sonix is the world’s most advanced automated transcription, translation, and subtitling platform. Fast, accurate, and affordable.
Automatically convert your mp3 files to text (txt file), Microsoft Word (docx file), and SubRip Subtitle (srt file) in minutes.
Sonix has many features that you'd love including powerful integrations and APIs, transcribe multiple languages, collaboration tools, secure transcription and file storage, and easily transcribe your Zoom meetings. Try Sonix for free today.