Meta’s Joe Spisak on Llama 3.1 405B and the Democratization of Frontier Models
As head of Product Management for Generative AI at Meta, Joe Spisak leads the team behind Llama, which just released the new 3.1 405B model. We spoke with Joe just two days after the model’s release to ask what’s new, what it enables, and how Meta sees the role of open source in the AI ecosystem. Joe shares that where Llama 3.1 405B really focused is on pushing scale (it was trained on 15 trillion tokens using 16,000 GPUs) and he’s excited about the zero-shot tool use it will enable, as well as its role in distillation and generating synthetic data to teach smaller models. He tells us why he thinks even frontier models will ultimately commoditize—and why that’s a good thing for the startup ecosystem. Hosted by: Stephanie Zhan and Sonya Huang, Sequoia Capital Mentioned in this episode: Llama 3.1 405B paper Open Source AI Is the Way Forward : Mark Zuckerberg essay released with Llama 3.1. Mistral Large 2 The Bitter Lesson by Rich Sutton 00:00 Introduction 01:28 The Llama 3.1 405B launch 05:02 The open source license 07:01 What's in it for Meta? 10:19 Why not open source? 11:16 Will frontier models commoditize? 12:41 What about startups? 16:29 The Mistral team 19:36 Are all frontier strategies comparable? 22:38 Is model development becoming more like software development? 26:34 Agentic reasoning 29:09 What future levers will unlock reasoning? 31:20 Will coding and math lead to unlocks? 33:09 Small models 34:08 7X more data 37:36 Are we going to hit a wall? 39:49 Lightning round
- Published
- Published Jul 30, 2024
- Uploaded
- Uploaded Jun 11, 2026
- File type
- POD
- Queried
- 00
Full transcript
Showing the full transcript for this episode.
AI-generated transcript with timestamped sections.
[00:00] You know, if I was a founder right now, I would absolutely adopt open source. It forces me though to [00:05] look at the engineering complexion of my work, right. And think like I'm going to need people doing LLM ops and, [00:11] And things like data fine tuning and how to build RAG and things. And APIs, there's plenty of APIs that allow you to do this, but ultimately you want control. Like your moat is your data. Your moat is your interaction with users. [00:41] Hi, everyone. Welcome to Training Data. [00:45] Today, we're excited to welcome Joe Spisak, Director of PM for Generative AI at Meta, where he leads LAMA and third-party ecosystem efforts. [00:54] Joe spent the last decade in AI, leading product at PyTorch and working on initiatives that span protein folding and AI math. [01:00] many of which have spun out from meta into their own startups. [01:04] We're speaking to Joe just two days after the Llama 3.1 405B launch, and we're excited to get his view on questions like... [01:11] Where is the open source ecosystem headed? [01:13] Will models commoditize even at the frontier? [01:16] Is model development becoming more like software development? [01:19] And what's next in agents and reasoning, small models, data, and more. [01:24] Jo, thank you so much for being here today. We're so excited to have you just two days after the Llama 3.1 405b launch. It's an incredible gift to the ecosystem. We'd love to learn a little bit more about what specific capabilities you think the 405b is particularly unique at, especially in comparison to the other state-of-the-art models.
[01:47] Oh, thanks so much for having me. This is so much fun. I haven't done a podcast like this since something pre-COVID. So it's like fun to be in the same room and just like, you know, chatting about this cool stuff. Yeah, I mean, we're like beyond excited and meta. This is something that I think a lot of us have been working on for such a long time, months and months and months. And, you know, we kind of put out that. [02:07] nice little appetizer, I'll call it, in April of Black Obama 3. And I was actually like, are people really going to be that excited about these models? And their response was through the roof. Oh my god. [02:20] Everyone's excited, but they really... [02:21] don't know what's really coming. And so like, yeah, I kind of hold that [02:25] Yeah. [02:26] kind of had to hold that back for a while and kind of keep it to ourselves and like and then kind of build up for this launch. And the 4.5b is a monster. It's a great model and [02:36] I think the biggest thing we've learned about the four or five BUs is just a great, it's like a massive teacher for other models. And we kind of had that plan all along because when you have a big model, you can use it for like improving small models or just like distillation. And that's how the eight and seventies became. [02:52] the great models that they are. [02:54] I mean, in terms of, like, capabilities, like, you know, we listen to the community. We listen, obviously, to our own product teams, right, because we got to build products for Meta. And, I mean, long context was, like, one of the biggest things people wanted. And we have, you know, much longer context internally even than what we released. But we saw, like, just the use cases, like, start to build up. Multilingual, I mean, we're a global company. So we released more languages, many, many more to come because, obviously, like, Meta has billions of people on their platform
[03:24] And so I think that was like, to me, those are like table stakes things, but they're like really done well on the models. Like I think like we spent a lot of time in post-training on our different languages and improving them and safety. Just they're really, really high quality. So we don't just like pre-train out like a ton of data and say, look at us, we're multilingual. You know, we actually did a lot of work in our SFT phase and supervised fine tuning and a lot of safety work. [03:54] Thank you. [03:54] I think one of the coolest things that I'm excited about-- well, there's just a couple of things I'm excited about. But one is tool use. Yeah. I think the models-- oh, my god. Zero Start tool use. This is going to be crazy for the community. [04:06] I'm going to show a few examples, like we can show like calling Wolfram or Brave Search or Google Search, and it works really great. [04:13] But zero-sat tool use is going to be a game changer. The ability to kind of call a code interpreter and actually like run code or, you know, kind of build your own kind of, you know, plug in for things like RAG and other things like and have that really be state of the art. I think it's going to be a really big game changer. [04:31] And I think just the fact that we released... [04:34] the 4.5 itself and we changed our license so you can actually use our data [04:38] That was a big deal. That was a big discussion. We had many meetings with Mark on that and [04:44] And ultimately, like landing on a place where, you know, this was like this pain point for the community for so long. They're like, these closed models, like I can't use the outputs or maybe I can use them, but maybe I'm using them slightly inscrupulously or whatever. Like we actually are encouraging people to do it. I'm sure that was a tough...
[05:02] decision to make. Walk us through the things that you had to consider in actually making that leap to open up licensing in that way. Yeah, licensing. It's impermissible. [05:12] Oh, licensing is like a huge topic in itself, obviously. We could probably spend a whole podcast talking about it. I don't want to, but we could. I think we wanted, number one, just to unlock new things. [05:24] Like, I think we wanted to have the 405 and our Lama 3.1 models differentiate, give people new capabilities. Like, we just looked at what people were really excited about in the community, not only in enterprise and products, but also in the research community because we... [05:40] obviously have a research team and, you know, we work with academia and we, we, we talk to folks. I mean, you know, Prissy Ligon at Stanford texts me all the time saying, you know, when are you going to release it? When are you going to release it? Can I use it? Can I use it? I'm like, [05:53] Percy, like, you know, stay, stay, stay patient. But I think we we we heard them and we knew kind of what they wanted. And I think ultimately we wanted Lama everywhere. We wanted just adoption, you know, maximal adoption, really the world using it and building on it. And I think Mark even used in his his letter, he put out like, you know, the new standard or standardized. So I think like to do that, you kind of have to. [06:21] enable stuff like that, where you kind of have to unblock [06:25] all these different use cases and... [06:27] really look at what the community wants to do and make sure that that you don't have these kind of artificial barriers and that's what the discussion really was and
[06:35] And so actually, even beyond that, we started working with partners like NVIDIA and AWS, and they started building distillation recipes and products. [06:43] even synthetic data generation services. [06:46] which is pretty cool. I mean, you can start to use those and actually create specialized models from it [06:51] And the data that, I mean, we know how good the data is because we used it in our smaller models. It's really good and it improves our models significantly. [07:00] I want to pull on the open source that a little bit more. Sure. And I've read Zuck's manifesto. It was great. But [07:07] I'm still, I'm trying to wrap my head around like what's in it for meta? This is a massive investment. [07:14] The open source, in some ways, you're laying a lot of money on the table because you now have a state-of-the-art model. [07:19] that you're offering to everybody for free. And so I guess [07:23] Is this an offensive move? Is this a defensive move? What's in it for meta? Yeah. [07:27] I mean, we've... So... [07:30] Well, first of all, our business model doesn't depend on this... [07:33] model to make us money directly. So we're not selling a cloud service. We've never been a cloud company. We've always [07:42] I would say with a partner ecosystem all the way back to the five years I was helping to lead PyTorch and the ecosystem, the community we built around that. [07:50] Like, we never... [07:52] built a service we probably could have in some way, but it would have been weird. We saw basically... [07:58] Going back to PyTorch, we kind of saw it as this kind of lingua franca kind of bridge, you know, to this like area of high entropy. It's kind of a weird way to say it, but like there's all this innovation happening. How do we kind of build a bridge to it and actually be able to harness all that innovation?
[08:13] And the way to do that is to be open and is to kind of get the world building on your stuff. And I think that's, that ethos is kind of carried over into Llama. And, um, you know, if you look at PyTorch, like that was a huge way for us to kind of pull in. [08:27] At the time when we really started working on PyTorch in earnest, computer vision and CNNs and all that, if you remember that, old times now. [08:37] But we actually would see these architectures come constantly. The people would, and they'd write code and they'd publish it in PyTorch. And we'd take it internally. We'd evaluate it. People would open source models and put them out on model zoos. [08:49] And we'd evaluate them and we'd see just [08:51] how quickly the community was improving things. And we actually leveraged that, especially for like integrity applications where we released like hateful memes and some of these other things [09:00] data sets, we just saw the improvements like week over week, month over month, and it was built on something that [09:06] like we were using internally. So it's very easy for us to just take it in inside. [09:11] So I think like lava is... [09:13] It's definitely similar in that regard where... [09:16] You know, when academia and when companies start to red team these models or try and jailbreak them, we want people to do that to our models. And so we can improve. And I think that's a big reason. And it's like, be careful what you wish for, right? Of course. But like, it's the same with Linux, right? Linux is open source and the kernel is open source. And people will, you know, it's much more secure when things are transparent and bugs can be pushed faster. And so that helps us a lot. Yeah.
[09:44] I think it's... [09:46] You know, [09:47] There's also the angle of, you know, we... [09:51] We don't want this to turn into a completely closed environment. I think just like today, if you look at Linux and Windows and... [10:01] Like, in my opinion, um... [10:04] There's, you know, there's, there's room for both, right? There's room for closed, room for open and people use depending on what they need and the applications. [10:12] I think that there's got to be a world of open models, and I think there's going to be a world of closed models, and I think that's totally fine. [10:19] What was the primary argument against open sourcing? Was there one? [10:23] Um, I mean, there was definitely like competitive concerns. We talked through... [10:28] you know, do you want to give your technology, you know, put it out there and [10:32] And I think we're like less concerned about that. [10:35] Because we're moving really fast. Yeah. Like if you look back, I mean, I've been, you know, back, I've been in Meta like close to what, six or seven years now. [10:44] in the last [10:45] You know, you're so we've done [10:48] We had our Connect launch. We released Purple Llama last December. [10:52] We released Llama 3, 3.1. Before that, we released Llama 2 in July. Llama 1 was like in February. So like just if you think about the pace. The velocity is incredible. The pace of innovation that's like coming out of our team and our company is like just at a crazy pace right now. [11:10] So I'm not too worried about it. I don't think we're that worried about it. [11:13] So.
[11:15] I'd love to kind of move into your personal views on the broader ecosystem. I think a lot of the questions that people have center around what happens to the value of all these models, especially as consumers. [11:27] meta open sources, more of them at the state of the art level. With Llama 3.1, with [11:34] OpenAI launching GPD40 mini, what is your view on do models commoditize even at the seat of the art frontier? [11:41] Well, this is a great question. I mean, I think if you look at just even the last two weeks, I mean, 4.0 Mini is a really, really good model. [11:49] Um, you know, input. [11:50] I think input per million tokens is something like 15 cents, 60 cents out. [11:54] Uh, so it's incredibly like [11:57] Cheap to run? [11:58] But it's also an excellent model. Like it's just like they they've done an incredible job in distilling and getting to something that's like really, really performant yet really, really cheap. So I think like, you know, Sam is definitely pushing on that. And then if you look at what we've done in like last week and pushing out. [12:15] I would say, like, pretty... [12:16] pretty compelling to see their models across the spectrum. I do think like, [12:22] it's rapidly getting to a place where, you know, the model is going to be kind of a commodity. I mean, I think there's this frontier of like data where, you know, I mean, we can certainly gather data from the Internet. We can license data. [12:34] But at some point, there is kind of like some frontier of limitations, I think, that we're all going to have. [12:40] And this goes back to our conversation this week on kind of the better lesson of [12:44] of data and scale and compute is that enough.
[12:47] It's probably not quite enough, but it's like... [12:50] compute and data. [12:51] becomes... [12:52] kind of, if you have enough of both, you know, you can kind of get like a first order approximation of the state of the art without any anything else is kind of what we've seen. [13:02] So I do think the models are commoditizing. I think the value is elsewhere. [13:06] And I look at Meta and I look at our products, I look at what we're building, like that's honestly where the value is. [13:10] For us, it's meta AI, it's our agent. [13:13] Um, you know, it's all of the technology that we're going to put into Instagram and WhatsApp and all of our end products where we actually are going to monetize, where we're actually going to, you know, to add real value. The model itself, I think definitely we'll keep, we'll keep innovating new modalities, new languages, um, new capabilities. That's what, you know, that's what research is, right? It's pushing the frontier in emerging capabilities, and then we can leverage those in products. [13:37] But the models are definitely pushing in that direction. If that's the case, and all these existing companies that have massive distribution and wonderful applications that are already out in the wild can just adopt these state-of-the-art models, what advice would you give to the whole wave of new startups that are trying to make it out there, either building their own models, using other state-of-the-art models, and then trying to build applications on top? [14:02] Yeah, I mean, there's definitely like some model companies or companies that are building, you know, they're training, pre-training foundation models. And it's expensive. It's like, I think we're [14:11] I can't say how much Llama 3 costs, but it was very expensive. And Llama 4 is going to be even more expensive.
[14:19] And so I, to me... [14:21] given kind of the state of play and things, it, to me, it doesn't make that much sense. If I was a startup to try and go and, and do a pre-training. Like, I think the, like, llama models are absolutely incredible as foundations to build on. [14:33] And so I do think like there is, you know, if I was a founder right now, I would absolutely adopt open source. It forces me, though, to. [14:42] look at the engineering complexion of my work, right? And think like, I'm going to need people doing LOM ops and, and, and, you know, things like, you know, data fine tuning and how to build rag and things. [14:54] And APIs, there's plenty of APIs that allow you to do this, but like ultimately you want [14:58] Control. [14:59] Like your moat is your data, your moat is your interaction with users. [15:03] And you're also, you may want to deploy these things onto a device at some point. [15:08] and uh and have kind of a mixed interaction or something you might want to have like small crew like [15:13] Sim, simpler queries, like running on your device and have like, you know, very low latency interactions with your, your users, you might want to split. [15:19] and have a more cloud-based approach for more complex queries, more complex interactions. [15:25] And I think the open source approach gives you that flexibility. [15:29] It gives you the ability to modify the models directly on the weights. [15:32] Um, you can run the weights, you can distill them yourself. Um, there's going to be distillation services that allow you to take your weights, distill them down to something smaller. Like that's pretty awesome. We're like just now seeing the beginnings of that. Uh, so I think like in my mind, like control matters a lot.
[15:49] And ownership of the weights. There are a lot of API services where you'll do fine-tuning your model. So you're bringing your own data. You're fine-tuning everything. [15:58] And they use something called low rank adaptation or Laura. And unfortunately, you don't actually have access to those lower weights at the end of it. [16:06] you're kind of like forced to use their inference. So you're like, hmm, let's see, I'm kind of like held hostage here. Like I've given my data, [16:13] I don't have access to like the actual IP that was generated from that data. And now I have to enforce to use their intern service. Like that's not a good deal. So I think the... [16:21] Open source kind of like brings inherent freedom. Yes. [16:24] I think that approach doesn't so [16:27] What do you think of Mr. Large was announced, I think, maybe a day after LAMA 3, 341. What do you think of them? And I guess more broadly for everybody at the frontier. [16:39] Is everyone kind of pursuing the same recipes, the same techniques, the same kind of compute scale data, etc. And so like, you know, everyone's kind of gonna be roughly similar at the frontier? Or do you think you guys are doing something very different? [16:53] So first of all, I'm Nistral. I mean, amazing team. It was one of my old teams in FAIR. They were working on through improving and AI and mathematics. So Guillaume and Tim and the team are, and Marianne, they're incredible people. [17:06] Joe was just talking about [17:08] Fun banter.
[17:12] So, I mean, this, this was like one of the scrappiest teams that I've ever worked with. I mean, the team. [17:18] I don't think ever slept. Um, so it was like basically by day they're doing like pushing even less now, probably even less now. I mean, they would push the state of the art and like AI and they're improving and, you know, um, uh, during the day and we published some work on that. Um, you know, I think, uh, what a couple of years ago now, geez. Um, and, uh, [17:37] But by night, they were basically scrappily grabbing compute to train Llama 1. And so we were building large language models several years ago in FAIR. [17:47] you know, that team basically just... [17:50] Like they were just really ambitious and they were kind of working by night. And that's really where Llama 1 came from. [17:57] So the team is great. I mean, I think they're doing really good work. I think they're definitely challenged in that they're trying to also like open source models, but also make money. And, you know, like models like 4.0 Mini are not helping them. Because like, and this is, I think, why they changed their license, for example, to... [18:16] like, you know, to have a research only license, which kind of makes sense. Um, because they were, you know, open sourcing models and they immediately, like their own ecosystem is like competing with them in a lot of ways because they'll release a model, they'll host it, like use this model. But then they have, you know, together and fireworks and Lepton and all these companies that, [18:34] provide sometimes a lower cost per million tokens offering. So it's a really tough business right now.
[18:43] Um, in terms of like large too, I think it's a really good model. I mean, I, we just on paper, I haven't evaluated it. We haven't looked at it internally yet. Um, uh, [18:52] I think... [18:53] if you look at like artificial analysis, I think they added up and kind of like the, it was a little, [18:58] It was under, I think, like the 70B model. [19:01] in terms of quality, but that's like a blended... [19:04] They blend a bunch of benchmarks to make that distinction. But on paper, it looks really good. We're going to evaluate it. [19:11] I think [19:12] For me anyway, the more the merrier. The more models are out there, the more companies are doing this, the better. It's not like we're not going to be the only one. I think that's good that we're not the only one. [19:21] So, and I think more generally, the Gen AI space, you wake up every single day and you kind of expect something like this, right? You expect them all to be released or something groundbreaking to happen, and that's kind of the fun of being in it. So. Totally. Totally. Do you think everyone at the frontier is comparable though? Like, are you all pursuing comparable strategies? Yeah. [19:40] This is actually a good question because, you know, if you read the Lama 3 paper, which was, I think, 96 pages you ended up at, right? Lots of citations, obviously. Lots of sharing. Lots of sharing. Lots of, like, you know, contributors and core contributors and that. So, like, it was a detailed paper. [19:59] And Lawrence and Angela on the team spearheaded writing that. And I think that was like one of the hardest things. Like developing the model was like relatively easy compared to writing the paper. It was a lot of work putting that paper together. I think if you look at Lama 3, it's...
[20:15] You know, there was a lot of, I would say, innovation that happened, but also we didn't [20:19] Like we also didn't [20:21] I would say take on like a lot of research risk either. Yeah. [20:25] So I would say like the primary things we really did with Llama, with the 4 or 5B especially, was... [20:30] was really pushing scale. [20:32] I mean, it was still, you know, we used group query attention, for example. So, you know, GQA and that improves inference time and, you know, kind of helps solve the kind of quadratic attention computational challenge. We trained on, you know, over 15 trillion tokens. We in post training, we use synthetic data. [20:50] which improved the smaller models quite a bit. [20:53] We trained on over 16,000 GPUs on our training runs, which is something we hadn't done before. [20:59] It's really, really hard to do that because GPUs fail and, you know. It's off the table. Yeah. I mean, everyone's like, "Oh, I'm just going to train 100,000 GPUs." Like, good luck, right? You better have a really, really great infra team, a really great MLSYS team. [21:14] you better be like ready to innovate, um, at that level because this is non-trivial. Um, everyone says it's easy or says you can do it. It's non-trivial. So I think like, [21:25] I almost look at Lama 3 as very similar to like the GBD3 paper. So if you ever talked to like Tom, he was a lead author, Tom Brown, now at Anthropic. And there's a reason why Tom was the first author on that paper is because like a lot of the innovation was really scale. It was really like, how do I take something that's, you know, [21:42] like an architecture and like push it as hard as we can push it.
[21:46] And that involves like a lot at like the MLSys kind of layer and infralayer and like, how do I scale the algorithm? Um, [21:55] And so I think that was really like the mentality we had with like Lama 3.3 and Lama 3.1. [22:00] And I mean, internally, obviously, we have a great research team, we have FAIR, we have research in our org, and we're looking at... [22:06] lots of different architectures and MOE and other things. Um, [22:10] and so uh [22:12] So I think who knows what Lama 4 will be. We have a lot of candidate architectures and we're looking at it. But it's kind of a trade off. It's a trade off between how much risk you take on for research. [22:23] and potentially how much reward or the ceiling of the potential improvements versus just taking something that's relatively known and pushing scale and getting that to improve even more. So ultimately, this becomes a trade-off. I think this is such an interesting point. I actually also think it makes Lama and Meta quite unique in the strategy it's taking. The words that I like you used yesterday were, is model development becoming more like software development? [22:53] - Yeah. [22:54] Unlike what many of the other labs have been doing on pushing more of the research, you guys have been focused on just executing on strategies that you know work. Do you see that? [23:04] representative of the continuous strategy, you think, as you extend LAMA out four, five, six, seven, eight? And then also, how do you think the other research labs and maybe some of the other startups in the ecosystem will react? Will they kind of switch and veer a little bit more to the strategy that you've been taking?
[23:20] I mean, it's a really great question. We don't have all the answers for sure. I think, but there's definitely like some, somewhere in the middle right now is kind of where I see things landing where, you know, we will, we'll continue to push more. [23:33] you know, and on execution, we'll continue to push models out. We'll continue because we want our products to iteratively improve. [23:39] as well as we want that AI, you know, improving constantly. And so we're so there's definitely like a software engineering, you know, analog here that [23:48] that's happening where you can imagine something like a llama train and new features, new capabilities get on that train and we have a model release. [23:57] um you know it's actually it's much easier when you start to componentize the capabilities too like we're doing that with safety right now and [24:03] You saw in the release, we released Prompt Guard and New Lumb Guard, and you can iterate on those components externally, and it's great. Obviously, the core model is much more difficult. [24:16] We'll start to include or start to push on the research side as well because the architecture [24:22] like is going to evolve. I mean, you've seen like, you know, what AI2, for example, has done with their Jamba and their, you know, Mamba. And everyone kind of thinks Mamba is like a new architecture that could have promise. [24:33] I think what's interesting though is like to truly understand like the capabilities of the architecture. [24:39] you kind of have to push the scale. And I think that's what's missing right now in the ecosystem is, you know, if you look at academia, and academia is like a lot of absolutely brilliant people there, but they don't have a lot of access to compute. And that's a problem because they have these great ideas, but they have no way to truly execute them at the level...
[24:56] that's needed to really understand, will this actually scale? [25:00] Because the Java paper and model was really interesting and the benchmarks are great, but they didn't scale it beyond, I think, under 10 billion parameters. [25:08] So you're like, okay, what happens when, you know, we train this in 100s? Like, does it actually... [25:13] Do you still see those improvements or not? And no one really, at least outside of these labs, knows the answer yet. [25:19] So I think that's like one challenge. So I think like, to me, we're going to get into this hybrid space of, [25:24] you know, we are going to push definitely on architecture. We have a very, very smart, um, and well-accomplished research team. Um, but we also are going to be like, you know, we are going to be executing. And I think that's when we start to get like a recipe, um, [25:37] Um... [25:38] We're going to push it to the limits and we are going to continue to release more models on it. But in parallel to that, we have to push on architecture. Yeah. [25:47] um and i think it just makes sense because the next breakthrough you know at some point you're [25:53] kind of a theoretical limit and you need to evolve the architecture. [25:56] Alright, so [25:58] So I see kind of a little bit of an in-between. [26:01] And obviously, we're really good at execution. I think we're pretty good at execution. But we're also good in research. And we just need to marry those two so it makes sense. Because research and products are very different, right? Mm-hmm. [26:11] Like one is, should be pretty deterministic, the product side, and one is. [26:16] inherently non-deterministic right it's like is this gonna work i don't know it's a really big bet um if it fails [26:22] It's research. Like, it should have like a non-zero chance of completely blowing up in our face. We just need to go in another direction. But that's...
[26:30] That's what research is. [26:32] I'm curious about one branch of where a lot of, I think, model research is happening right now, agentic reasoning. And I think you all have announced really great results in reasoning. I'm curious, maybe... [26:43] At a very basic level, how do you define reasoning? And then are you all seeing reasoning fall out of kind of scale during pre-training? Are you, is it post-training? And is there a lot of work left to do on the reasoning side? [26:56] Yeah, reasoning is a bit of a loaded area. I mean, you could argue it's things like multi-step. And I think unfortunately the best examples we have are like the... [27:06] kind of like the sort of semi-gimmicky... [27:09] you know, Bob is driving the bus and like he picks, you know, like those kinds of like things, right? And if you troll a local llama, you'll see a billion of those, right? So, but those actually force the model to take multiple steps to respond to you and think through and logically respond. [27:24] kind of respond. [27:25] I think coding is actually really like, you know, when you look at like pre-training and so like to answer your question directly, like reasoning. [27:35] improvements come in both post training and pre training. [27:38] Um, so what we've learned, um, which is now like, everyone's like, oh, of course this is the case, but definitely like the last year or so, everyone's kind of learned that, you know, code, um, [27:47] having a lot of code in your kind of pre-training corpus really improves reasoning. But that's what you think about it. Like, of course, duh. It's step by step. It's very logical. It's, [27:57] You know, code is very, is just logical by nature and kind of stuff by stuff.
[28:01] if you incorporate a lot of that in your pre-training, your model will like reason better. Um, [28:05] And then we, of course, look at examples and... [28:09] in post training and like, you know, SFT to improve as well. So, you know, we look at the pre-trained model, we, um, [28:17] And it kind of depends on how you balance things as well. Like you can, cause you can balance like. [28:22] how well your model reasons with how well it, you know, um, [28:26] you know, responds in different languages. Like ultimately in post training, like everything's a little bit of a trade off. [28:31] Like you can super optimize things for coding if you want to. And we did that with CodeLlama. [28:35] It was really great, but of course the model will suffer like in other areas. [28:39] And so ultimately it becomes what kind of like Pareto frontier of like capabilities we want to [28:44] like bring out if it's a general model [28:47] And I think like, um... [28:48] Yeah, I mean, ultimately, it's a trade-off. So anyone can kind of pick a benchmark or some capability and say, I'm going to super optimize for it and say, by the way, I'm better than GPD-4. [28:58] Well, great. Anyone can do that. But is your model as generally capable as GPT-4 or LAMA 3.1 or whatever? [29:05] Like that, I think, is a different story. [29:07] What do you think are the future levers to unlock reasoning, um... [29:11] for anyone going forward. [29:14] Right. [29:15] I mean, the obvious answer is data. I mean, the more data, the more code and supervised data that you can get, I think is like, is a natural... [29:26] um, answer. [29:28] Um, [29:30] I mean, I think we need to find applications as well for how we like define it. And that would help us like once you've kind of start finding like those
[29:37] kind of killer applications, then you can like, then you kind of know where to kind of focus in terms of your, your, your, your gated exactly what you're solving for. Like, and this goes back to like evals and like, what is, what is your eval? Cause we're starting to saturate evals. And so we, we tend to, as a community, [29:54] Like we define a [29:56] benchmark or in a metric and we just like optimize the living hell out of it. [30:00] And it's great, but then you actually look at the model in an actual environment, [30:05] And you're like, oh, well, that model has a better MMOU score. Great. But how does it actually respond? Well, it doesn't respond as well, but it has a better MMOU score. And so I think we need better evals and better benchmarks that allow us to [30:20] you know, I would say like find clear line of sight to actual interactions. And I think like, [30:26] you know, the live, what is it called? The Abacus benchmark, the live bench, I think it's called. I can't remember what the name of it. It's pretty good. I was looking at that. And of course, like LMSIS and Chatbot Arena, like these are more natural, even though, you know, it's still not perfect, but it's like, [30:43] moving in the right direction of things are like more human like interactions [30:49] versus like a static data set or a static prompt set that is not that helpful. [30:55] So I think like once we start to find these other like what reasoning use cases make sense, we're going to start to generate more data. [31:01] And you're going to start to improve the model there. And hopefully that has, again, line of sight to a benchmark or an eval
[31:08] that actually... [31:10] feels like it improves the end product. [31:13] And a lot of this actually depends on the end product, of course. What is my application? Yeah. Out of curiosity, within large research labs, coding and math have always been two primary categories in trying to unlock reasoning. In the startup ecosystem now, we're seeing more folks who really want to go from the math angle. Do you have a perspective on whether or not that has led to interesting unlocks? [31:37] I mean... [31:38] I mean, the answer is, yeah. I mean, I think we, if you look at our data or like at least our models, we've like coding and math have been, I would say the primary levers. Um, [31:48] So, uh... [31:50] I mean, it's it's yeah, I mean, I think that's like having more obviously is better because obviously math is also very logical and very like step wise. So obviously you can see the pattern here. [31:59] The more data you have, like that kind of follows that sort of pattern, the more your model is going to be able to reason. And you can see that in how actually models respond. Like if they start and you ask them to like respond and like step me through your thinking process. Right. And it'll actually do that. And some models do better than others. So anything like anything like that, I think scientific papers. [32:21] Like also there's like, um, you know, we, we had a, uh, um, we had some like, uh, [32:27] projects out of fair that like trained on, you know, like archive papers. And you can see like, not only is like code and math, like pure mathematics, but also like scientific paper, which is like science, scientists are very logical in how they write things and how they stepwise and how they like create images of their like charts and stuff. And like.
[32:46] that also I think we've seen, like just general scientific information like helps as well. Interesting. [32:52] Sorry, Galactica was our project, yeah. [32:54] Yeah. [32:55] So Robin Ross from the Peepers of Co. team led that. [32:58] Still, in my opinion, one of the coolest projects ever. It got a lot of bad press, but wow. They were ahead of their time, in my opinion. [33:07] I'd love to talk a little bit about small models. Given the scale of capital and the compute that many startups have, the 8B and 7B models are an incredible gift to the ecosystem. And it's funny that you called them appetizers at the start because I think they're super powerful for that set. But they're also really powerful for a number of different applications where you want smaller models. [33:37] in glass for their size of model. [33:41] So it's interesting, though, when we released, yeah, we released... [33:44] what april llama 3 we released an 8 and a 70. [33:48] the appetizers as we call them. You know, the 8B was actually better than the Llama 270B. [33:53] By leaps. [33:55] So we were, you know, I had to look at the chart and I was like, [34:00] is this right yeah like is that really the case and we're like yeah like it really is um it was that [34:07] What's the intuition for how that happens? I mean, it was more data. We had with 7x more data. [34:13] Yeah. [34:13] Obviously, we put a bunch more compute at it as well.
[34:19] You know, going back to like computing data, you know, being, uh, you know, we're pushing on those. So I think like. [34:24] We just, you know, we saw like just like... [34:27] It's almost like every generation, which is, again, the generations are accelerating, but [34:32] you start to see the benchmarks for a large model basically get-- [34:37] you know, pushed down into the smaller like size regime. And so, you know, 70 becomes an eight. And, you know, like we have internally, we have models where the eight is, you know, [34:46] like on a much even smaller than eight, actually, we're starting to see like really nice benchmarks on even smaller models. So you continue to see like in that, you know, that the models improve at smaller scale. And that I think is just we're pushing the architecture, we're pushing scale. And we're starting, you know, we haven't quite saturated it yet. I think that's really interesting. [35:06] So, you know, for me, one of the biggest [35:11] that I think it, like a small architecture... [35:13] is useful is obviously on device. Everyone loves to talk about on device and Apple's, you know, [35:19] talking about that and Google has Gemini models and Gemini running in Android devices. [35:25] So I think like on device makes sense. I think safety is kind of interesting. [35:29] Because one of the things we have our own internal versions of Lama Guard, which we used that are orchestrated for applications internally at the company and Meta. [35:39] And, you know, today they're built on an 8B model, which is kind of expensive to run if you think about a safety model that's kind of like the secondary model. And so I do think, you know, internally we've been experimenting with much smaller models in that regard.
[35:53] And it creates efficiency, it lowers latency. [35:57] Because really, those models are really just classifiers. [36:01] You know, they're not really autoregressive like chat like interfaces. They really just classify like the input, a prompt of, you know, does that violate this? [36:10] whatever category in the taxonomy, in the output, the model, when it generates, does it violate that kind of stuff. [36:16] So you can actually push those even further. Yeah. I think that there's also like really interesting cases though for. [36:24] like on device where you almost have [36:27] Like when you think about privacy and you think about data, you want to have like your data stay on device. [36:33] you can think about, you know, like a rag, like architecture on device. So you have data, you know, even your chat history that's like on say WhatsApp or, [36:40] other things you can imagine like that model having access to data [36:44] aggregating it and then running some type of you know almost like a mini vector database right where [36:50] We're using rag and doing your kind of fuzzy search or fuzzy matching and with your like small [36:56] model. [36:57] And... [36:58] that becomes its kind of own system in itself. And you can basically do things like local summarization. Like, I don't know, like I get so many text messages like, you know, hey, like summarize my last 15 messages, please. Because like, I've been in meetings and I haven't looked at my phone. And that's like super useful. And then I don't have to send data up to the cloud or anywhere else. [37:16] So there's like those kind of use cases, I think, where small models actually are going to be really compelling. And then for like super complex queries and things, obviously, like you have a big model in the cloud.
[37:26] that can always service those. [37:28] But for many things, I think on device or even in the edge and on-prem, these small models actually can do... [37:34] pretty good. You talked about scaling up computes and data as the two fundamental vectors to improve performance. [37:42] I guess there's been a lot of chatter about how we are going to hit a wall or maybe we're not going to hit a wall on data and maybe synthetic data is the answer, etc. I'm curious your perspective on that. Like, is there an impending wall that we're going to hit most likely of, you know, cheap, accessible data? What do you think? How do we scale beyond that? [37:58] I mean, I think we've shown with this release that synthetic data does help a lot. I mean, I think we've, you know, in pre-training, you know, we train on 15 trillion tokens or give or take. And in post-training, we generated a ton of, you know, millions of tokens. [38:14] of annotated synthetic data. [38:17] A lot of it generated by the 4 or 5B. We obviously paid for annotations as well. [38:23] I do think Synthetic Data is like... [38:27] a potential path forward. I think it's going to, like we know now, and the kind of proof is in the models, right? It's like great to talk about it in that. I do think, you know, data is going to be a challenge at some point for us. And this is why I think, [38:40] you know, uh, companies are licensing a lot of data these days to get access. I mean, open ads licensing data, we're licensing certainly data, um, [38:48] I think having access to services that generate data [38:53] to improve models is, you know, is important. So I think that inherently is an advantage for a lot of companies. I mean, Google has YouTube, right? They can,
[39:01] I'm sure is a value to them. [39:07] Which kind of implies that, you know, bigger companies have an advantage, which is not something that's anything new. Right. We've been talking about this for a long time in terms of a data wall. I don't know. I mean, we're not there yet. [39:18] Um... [39:19] I would say like, let's, let's talk another, just do, let's schedule this for like a year and let's see where we are next year. Um, you know, I'll, I'll save my calendar, uh, for one year exactly from now. And, [39:29] Meta AI. But, you know, let's talk in a year and see where we are. But I, [39:34] We haven't hit it yet. And we're still scaling and we're still [39:38] You know, we're still gathering a lot of data and we're generating data. And our models are still like [39:43] continue to improve so [39:45] Let's close it out with some rapid fire questions. Sure. Sounds great. And what year do you think will surpass the 50% threshold on Sweebench? [39:52] I good question. [39:57] Um, if, if I've learned anything, it'll be faster than whatever, um, whatever answer I give you. Um, [40:03] Because I think any benchmark will sort of zero in on it. People are going to go and figure it out. So I don't have a answer. It'll be fast. I'm sure. You know, one of the questions we have been asking people is in what year will an open source model surpass the other companies on the front, the other models on the frontier? And we have to take out that question now. Thanks to you all. I mean, it's true. [40:26] We're almost there. I mean, I think 4 or 5B is incredible. It's definitely in that class. Yeah, absolutely. Which is incredible.
[40:33] Will meta always open source? [40:34] Lama? [40:35] Huh? [40:36] I mean, I think Mark's pretty committed. You saw his letter... [40:39] I mean we've [40:41] We open sourced for years and years now, back to PyGeorge, to FAIR. [40:45] to Lama Models. [40:47] I mean, this isn't something that's a flash in the pan for the company. The company has been committed to open source for a long time. So I wouldn't never say never, but like, I mean, the company and Mark are really committed. [40:57] Amazing. Joe, thank you so much for being here today. And also for all the work that you're giving to the entire ecosystem. I think the entire AI community is very much grateful for all the work that you've done with pushing out Llama and the advancements to come. It's a huge team. Check out the paper. Look at all the acknowledgments. I think we spent all of yesterday reading it. We need the Star Wars scrolling text of all the contributors because it was an incredibly... [41:23] Big team. I was thinking about that same day. So my hats off to the team. This was a total, I mean, this absolutely took a village to get Lama out there. And I'm so proud and excited to represent the team here. So thank you. Thank you. [41:59] So [42:00] Thank you.
Want to learn more?
Ask about this episode