Transcript
Hi Mike Matchett with Small World Big Data. We are here today talking with Dan about the latest and greatest things going on with, you guessed it, supporting AI and other really cool HPC level workloads that are coming to enterprises everywhere. Hold on and we're going to dive into what's new. Hey, James, welcome back to our show. Glad to see you here. Hi, Mike. Yeah, great to be back. Uh, we've, you know, we've looked at what you've been announcing lately, and it's just a lot of really great stuff. There's so much going on in the in the, in the AI space. There's so much going on with Nvidia. You know, GTC just happened. You guys rolled out a bunch of announcements there. Uh and and just we got we got a couple things to touch on. Um, so let's, let's let's get into it. But first let me just let me just ask, like, what kind of partners are you with Nvidia. Let's just get that out of the way. How close are Ddn and Nvidia. Yeah it's been quite a tremendous kind of five years now. Um, Nvidia selected Ddn for their large AI supercomputer five years ago called Selene. Um, that was at the time one of the top ten supercomputers in the world. And since then, it's been a very tight relationship. Um, we've been supporting Nvidia and their very large workloads on these large superpods and developing features to optimize, make more efficient, accelerate AI frameworks. And, uh, that's been going on now for, yes, at least five years. And it's culminated recently in a new a new system which Nvidia has launched, um, which is also running on. All right. So we're going to talk about that. You know, just to be clear, if you're if you're trying to catch up here, Nvidia makes the GPUs that power AI and everything else in the world today. You know, how does one of the hottest companies Ddn, uh, you guys make, uh, storage or storage company? I guess to be fair, you do make a couple other things, but, uh, usually, uh, high HPC class storage, uh, has been the story for many, many years. And you're starting to say, look, uh, we've got storage that can come to enterprises and help enterprises power through these workloads that they're starting to get, including AI workloads, which just can't avoid that. Uh, so, uh, it makes sense that we've got the, uh, ddn. Who's who's this? Hpc class, world class, uh, supercomputer class storage company working with Nvidia, who's making all the GPUs that are doing all these big AI things right, and coming together and doing this sort of a best of best of breed solutions. Um, and it would be fair, I think, to just call it, hey, it's the Superpod, right? You got superpods out there. Uh, okay. There's enough of me, uh, blushing on there. Let's get into some of your announcements on that. Uh, first of all, let's talk about, uh, what goes on. Uh, you know, what did what did you what did you sort of do with, with exascale recently, which is your, your high end storage solution? What what happened there? Yeah. So, you know, exascale is the parallel file system, um, which we build, support, deliver in the form of primarily appliances, but also into cloud, but primarily appliances. And those are what have been sitting behind Nvidia systems. And, you know, the majority of these superpods that have gone out there. So the large scale AI systems and, you know, things happen in the software world and exascale and in the hardware world of exascale. And we are announcing a new boost in performance. It's already an extremely fast and performance system, but we've added a T, which stands for turbo, onto the end of the logo. And with that comes from hardware improvements, which results in a 30% boost in performance from that single appliance. So you're taking you're taking HPC class parallel file system storage and boosting it 30% just because that's the way the world's going. Yep. And just as you know, all the superpods right now, they're running these appliances and they scale them out. They can do exactly the same. Nothing else changes. Everything stays super robust, super safe, exactly the same appliances with some cool tweaks that give you 30% more for your money, for your watts, for your data center space. Race for space. Uh uh. Okay. So, uh, turbo, you're adding a turbo to HPC, uh, storage, which is not really just again, this is storage we're seeing cloud providers using. We see some enterprises using it. Uh, but it is it is that that high end storage. Um, let's talk a little bit about cloud. Cloud providers. Uh, what what what does the ecosystem look like? Uh, for, for people using superpods. For people using using ddn. Well, to date, it's been, you know, the past three, four years and we've seen huge increase in volume of selling all flash storage systems to support them. It's been, you know, large organizations, social media finance organizations and nations, um, building very large super pods, creating data centers, um, going to NVIDIA, procuring large quantities of DGX systems and networking and coming to Ddn. And we're all coming together using reference architectures for building these large scale systems. Um, but now there's a new tier forming that's been there for a while, but they're really accelerating right now. Organizations like Lambda, like Scaleway, like Vultr. And what they're doing is they're pretty much taking these reference architectures, which are proven a bit like the large NVIDIA systems we mentioned before. Um, and they're offering them out to end users, typically quite large end users. But now you don't need to build your own data center and go through all that, all that, uh, process and costly time and, and effort. You can go straight to one of these companies and essentially get AI at scale as a service. And the really the differentiator for using these rather specialist AI clouds, NVIDIA calls and NPS and NVIDIA Cloud partners is that, um, for your GPU hour, you tend to pay for a GPU hour, you're getting the best, you're getting the best networking, the best GPUs, the best storage. And, uh, you know, compare that to not getting the best. You're still paying for a GPU hour, but you might be constrained on network, you might be constrained on storage. So you basically get the best in class reference architecture rubber stamped by Nvidia with Ddn storage. So it goes fast. And so you get lots of, uh, productivity for your GPU spend. Yeah. As a former capacity planner, I would I would tell people, you know, when you go to the cloud provider and you're paying for an hour of compute time at one cloud provider, an hour compute time on another cloud provider, they're not the same hour. It's not the same amount of compute you're buying. Uh, that's just their measurement of what they're giving you. And you're paying for that measurement. So you do have to look a little deeper when you want to, when you want to do things at this scale. Uh, and by the way, I think I have a Lambda account. I'm not sure I can tell people how much I've rented out of there, but, uh, it's it's that easy to go get your hands on on this, uh, service, uh, as as a cloud utility, so that's great. Um. Uh, what what else? What else? What else do we need to look at when we're thinking of, uh, these these big super pods? What else? What else do you bring together? We talked about the we talked about the nodes on the storage. We talked about the hosting and the provider putting it together with the Dpus and the super pods. Uh, what comes next in that stack? Uh, so. Yeah. So it's, um, it's getting very sophisticated inside these systems, and it's getting sophisticated in terms of how the hardware and the software are kind of talking together and merging. And, um, basically the infrastructure is working as a whole, uh, for the benefit of application productivity. So it's all about how much for this, what for this GPU hour for this space, how much I development productivity can we get out of it. And you know, whilst you know Ddn is the preferred supplier for these NCP, these large cloud providers because we integrate into that stack, um, we're also developing the whole ecosystem all the time. So we're plugging into different layers of that stack. The latest piece of that stack to emerge from the NVIDIA ecosystem is these Bluefield adapters. They've been around for a while. These data processing units dpus, and they've come on the scene and they've got multiple roles. Um, but they they sit on the edge of the network in compute and potentially in storage, and they allow you to offload um, services away from the applications so the applications can run unhindered. And it allows you to add security, um, add efficiency, add performance, ad network stability, uh, by deploying these Bluefield adapters, lots of advantages. It's basically a processor and a network card working together, um, in front of your compute system. Now, we're we've been talking about integrating those Bluefield systems into our storage environments. Um, and by the way, for your audience members, we actually have two we've talked about exascale and we've got a new one called Infinia, which we're bringing to market right now. And these two pieces of software storage software work together to help you address all the challenges of the end to end AI challenge moving data from edge labeling, managing metadata, all this kind of stuff, sharing services, as well as very high efficiency super class systems. We're bringing Bluefield into these systems for multiple reasons. Uh, one. As always with Eden, maximum performance, maximum efficiency. And when we say performance and efficiency, we want that to be delivered to the application. It's not about a number we put on a box and shout about. It's about how much is the application accelerated. And that's we do that by taking away work from the compute, leaving it to the application and doing that work on these Bluefield systems. And then the second thing is making the network very efficient again with storage people. But our job is to make everything else more efficient. And we do that with GPUs by leveraging the fact that they can handle network congestion and relieve that network congestion automatically. It's part of the magic of these Bluefield adapters. And then thirdly, we're going a bit deeper into these Bluefield adapters, and we're using the fact that they can scale across the compute. They provide additional compute resources. They don't hinder applications. We're using that along with our concept of our Infinia storage system, which is a microservices storage system, which means we can take pieces of that storage system and deploy it out on these Bluefield systems, taking advantage of that huge scale and that huge performance. No hindrance for applications. And we can also change the data path, change the way the data is moving from application to storage and make it faster, more efficient. So there's some pretty cool stuff in there. Some of it's part of being part of the ecosystem, and some of it's really kind of revolutionary about changing the shape of kind of storage as we know it. Yeah. When I first covered Bluefield, Bluefield back when it was, you know, with the with the company that Nvidia bought, it was kind of thought of as a network offload card. I just got to take some of the stuff off the off the off the compute card. Uh, but it's got a lot of power in and of itself. And when it starts to become a host for pieces of this storage system coming the other way, and you're like taking little services of the storage and putting those out at the edge on there. That's a whole nother paradigm that this, that this card can support. Uh, that's some pretty cool opportunities there I think are going to come to light, uh, when you're doing that. This is really just the start. Um, the more you think about it and the more you think about microservices based platforms and the fact that you can disaggregate them and deploy them on Bluefield and share out this performance that's available now across the network. Uh, the more radical you can get. So you can see some good things emerging over the next couple of years from these new architectures. Yeah. It's like where where's the computer now? You know, like, I don't even know where. I don't even know how to draw that anymore. It goes a bit back to Sun Microsystems, doesn't it? The network is a computer. The network is a computer. The computer is all across the network on there. Uh, I mean, so, you know, if I, if I'm putting together the dots here, what I'm seeing, what what Eden is focusing on isn't simply the putting the tea in the turbo, uh, on, on the node, but really trying to accelerate and optimize, uh, and enable all the bits between that you, that the customer needs to get that application faster. So this is, this is a, this is showing how you guys are moving away from saying, hey, let's just build a supercomputer in a, in a, you know, in a, in a back room somewhere, uh, to how do we really take this out to the world? How does Infinia get the data from edge to edge to center to this? How does how how do we get the AI goodness in the storage and the data out to the edges? So very interesting, very interesting stuff. Um, uh, any anything anything else you'd like to say about about sort of what you do within within NVIDIA. So you summarized it nice. I mean, we think obviously with storage people, we think really deeply about this, this stuff. Um, and our job in life is to help our customers get the most productive value from their spend, not just storage. Spend the whole infrastructure. And there's an awful lot we can do across three different domains. And I won't go into detail, but we can we can work out exactly how we talk to applications and optimize that. We can make the storage systems themselves faster, something everybody does. But we can do that too. And then we can do this full stack optimization. We can spread our software throughout the stack. So it's not a storage at the end of a network, and it certainly isn't. Our storage software is running right across the compute, across the GPUs, across the network, natively integrated. It's really systems, data center level, um, platforms. Right now, it's not a computer, a network and some storage. It's a data center level concept. Right? It's like we're not we're not we're not buying components as much as we're buying data center, uh, data center, scale out pieces and composing them as we go. It's kind of cool. Uh, and I understand, you know, uh, at GTC, uh, Nvidia talked about their newest supercomputer and, uh, and that's, that's sort of an example of, of what we were just all these pieces we're talking about. Yeah. Yeah, exactly. Um, so the final sort of announcement we've made at GTC is around Nvidia's EOS, which people might be. Familiar with. It's been around for a while, but we're finally talking about how we move the data within this top ten AI supercomputer, which Nvidia has built to run the latest generation of AI codes at massive scale on. So they've chosen DNA as the storage system. Behind that, we've got 48 of our AI 400 x two appliances built into that system. And we're basically talking about how we built that, what was comprising that and how the reference architectures are reflected in that. And essentially it's it's got two purposes. These systems. It's one is to continually improve and evolve the efficiency of the systems from NVIDIA standpoint. And the second one is to provide a reference architecture so that end customers don't have to add risk, don't have to lay productivity, don't have to make difficult choices. They can choose a reference architecture. Know that Nvidia has proven it at massive scale with real world applications, and just replicate that in their data sets, and they get guaranteed performance levels so they know exactly how the application is going to behave. They know the data is going to be safe there. They're going to be able to scale that system in the future. Yeah, I don't I don't remember the exact specifics, but there's like 500, uh, DGX nodes in there and like 5000 A100 GPUs. And as you mentioned, you had 48, almost 50, uh, nodes doing providing the exascale storage for that. So this is some pretty big stuff, uh, that's going on there. And, uh, um, I guess if you're an Nvidia customer, you can you can go look at it, right, and touch it. Yeah. I mean, it's um, it is, you know, hats off to Nvidia, um, for building these systems at massive scale and proving them with the network, the data center, the storage, everything, all the software running real applications before customers get exposed to them. It's, um, it's very impressive what they've done. And, uh, this is not the first time iOS is the latest, latest generation with the A100 GPUs. And I'm sure there'll be more to come. But it's been great working with them and sort of continuing this evolution of the integrated stack. Right. So there's a lot of trust that can happen because they're they've got this thing and it's running and it's big. And you know customers aren't you know most customers I know aren't going to get that big, uh, and ever so, uh, you know, they've already they've already proven it and uh, validated both the, the solution together. So awesome. Awesome. There. Uh, all right. So, uh, that's that's a lot of stuff. Uh, so just to recap a little bit, you've, you've got the you've got the new the new turbo plan. So everything's 30% faster just on the storage end. Um, you're starting to see some adoption by, uh, Nvidia's cloud providers, uh, and, and the specialists especially, who are doing the AI services that I even have an account at. So that's interesting to me. It's like I could be using super pods. I don't even, uh, uh, not even totally aware of that, but that's cool. Uh, and, um, Bluefield Dpus been around, uh, definitely coming into their own, finding some great use cases for them, disseminating storage, uh, services to the point where they're most useful. And, um, uh, the example of all this can be seen in Nvidia's own newest supercomputer, EOS. Eos. So, anything else we missed, James? Well, it sounds like a lot of different little things, but actually, there's a big story behind it. And if you want to hear that big story, our strategy is pretty radical. Um, if you want to talk to Ddn and find out more about really what we're doing and what this is all about, um, go to CNN.com. You'll see a button there saying talk to a specialist. Um, we even could be me. Uh, will give you a call, um, and communicate with you and basically let you know, uh, really what's happening now and over the next 2 or 3 years in terms of our AI strategy? Um, because, you know, you don't know what's going to happen if you're building AI systems. Um, they might scale dramatically and rapidly over time. So I think best start with a proven partner. And, uh, we'd like to give you a bit more information about how things are evolving so you can be prepared for that. Oh, great. Uh, you heard it. If you in this world, you've got something going on. You just even want more information because you're curious. Give that button a push and see. See what? See how they can help you. Uh, thanks, James, for being here today. Thanks a lot. All right. And when are you going to ship me that super pod I'm sucking in anyway? Uh, take care, guys. In the form of Lego. All right, take care, guys. Uh, check it out.