Transcript
Mike Matchett: Hi Mike Matchett Small World Big Data and another one of my favorite topics, which is storage and how storage is evolving and coming along to help us do these new generative AI workloads and integrate with some of the biggest and baddest stuff coming along for us to get smarter and smarter and smarter with AI. So just hold on. I've got Shmuel coming along here in a minute to tell us all about it. Hey, Ryan, welcome back to our show. Ryan Farris: Hey, thanks for having me. It's great to be here again. Mike Matchett: Again. So we've talked about Qumulo in the past. We talked about a lot of the great things you were doing. I think last time we talked about, uh, Anke, which is, uh, something you're doing taking Qumulo from what people thought of as more of a on prem thing into a cloud kind of format. Uh, but today we're going to talk a little bit about AI. So let's start in with that. Uh, what are your let's just start with, like, what are your customers looking to do with AI? What what are they running up against when they start to consider doing supporting AI workloads? Ryan Farris: Hey, it's a topic you can't get away from the topic no matter what article you read or the conference that you go to IO, it's always at top of mind. So we hear a lot about this topic from our customers. And for those of you that don't know what Qumulo is, uh, we are a data platform that provides file and object storage in software defined storage for on premises and in the cloud, and we allow our customers to scale their data anywhere. So naturally, AI is a part of that story, right? Yeah. So I want to talk about AI solutions specifically for the cloud, because between on premises and the cloud, our customers are primarily pushing us more toward cloud solutions. Uh, so let's first talk about that. So I think one of the one of the problem statements and sort of one of the issues that our customers talk about is that if customers have GPUs that they're paying for in the cloud and they're doing ongoing training. Well, first of all, it's likely that training data is going to be derived from cloud native object stores like S3 or maybe Azure Blob and Cloud native object stores are kind of like this, this rich nutrient garden soil. So it provides all the essential elements for AI models to develop and grow. Right. But the challenge is getting that cloud data from an object store over to the GPUs. And sometimes that requires Posix dependent or file based applications as a challenge, because now you have to deal with this, say, 2 or 3PB sitting in S3 or an Azure blob, and then you have to move that up to a temporal repository such that training can actually take place from GPUs, right? Part of that challenge is that those GPUs are so dang expensive that you want to keep them lean, and you want to keep them super active. So moving that data there efficiently, um, means that you kind of have to have two hops. One is from that object storage layer to something like either a temporal local attached storage system, or maybe it's something like Azure NetApp files or Azure files On. The general problem there is that you're not only paying for two hops and you have API transactions from the object layer, but largely it's it's kind of limited in what you can do in scaling out your GPUs because you kind of have this choke point. So it's a little unflexible and it's a little bit, uh, limiting in that regard. So last time you had me on, we talked about Azure native Qumulous, and we had just launched our native service on, on Azure and Azure. Native Qumulous generally helps solve that problem. Okay. And one of our advantages is that we're a solution that has performance elasticity baked into the architecture. So we're able to performance basically provide the elasticity and performance that those hungry GPUs need and keep them 100% busy and super cost effective in that regard. So that kind of brings us to this benchmark that we're about to publish. But does that make sense, Mike? Mike Matchett: Yeah. So before we talk about the benchmarks that you guys are doing, because I think people are going to be really interested in this, do you have some feeling for what the scope of this GPU problem is for people who are renting them in the cloud? I mean, they rent them by time, not necessarily by how you utilize their. So obviously they want to drive to 100% utilization and they're big. These are big, expensive resources. H 100 and H 200 is now coming out. Uh, what do you have a feeling for? For just how problematic that is. Ryan Farris: Yeah, well, I mean, compared to on prem, if you procure a bunch of equipment, you're kind of stuck with that equipment, and you're obliged to keep all that stuff busy, be it A100s or F-100s. It's probably NVIDIA gear, but in the cloud you have this, you have an option to pay by the hour. Or maybe you have a reserved instance. Either way, if you're paying by the hour or reserved instance, you have to have a pretty good feel around how many GPUs you need to run a job, and you kind of have to work up to that layer. But still, you have this fundamental problem of data gravity and moving that data from object to file so that you can feed those GPUs effectively and appropriately. Mike Matchett: All right. So so it becomes real real Real issue. Uh, and uh, this is something that, um, you see a lot more people are doing these days, right? This is this is the thing that that everyone's doing. They don't necessarily want to buy on premise infrastructure and build that out. Their data might not necessarily be there, uh, ready for them in the cloud. So they have to get a couple hops and steps involved. Ryan Farris: Yeah. That's right. I mean, I think when customers, enterprise customers talk to us about AI solutions, it's either on prem or in the cloud or in some kind of hybrid fashion. And a lot of data is already kind of stuck in object or I shouldn't say stuck, but they want to tap into that data and use it effectively. Or if it's on prem, um, then the conversation is largely around Nvidia GPUs and Nvidia Superpods. Well, customers are largely kind of pushing us more toward cloud based solutions because it's much easier to onboard with them. It's easier to onboard with things like Copilot and the suite of things that are just kind of coming out month after month, easier to onboard, faster time to market, and kind of creates a higher level of agility in that regard. Mike Matchett: All right. So let's talk about benchmarking before we get into your specific specific tests. What does the AI workload look like when when we talk about benchmarking and testing for performance. Ryan Farris: Yeah. Great question. So um, benchmarks uh, there's kind of two predominantly common ones in the industry. One is called Mlperf and that's from ML Commons. Uh ML commons.org. I think if you want to check that out. And the other is uh, specs storage and spec storage has a benchmark called AI image. Right. And both of those benchmarks, they do a decent job of synthesizing common file sizes and IO patterns from AI workloads, which include things like large ingest operations and kind of that data discovery period that requires significant throughput. Uh, they do model training based after I think it's TensorFlow is the framework that they use. And then they also do model checkpointing that includes a bunch of small transactions. That's an important part because you kind of have to do both things really well. High throughput, large sequential reads and smaller transactional 4K writes. You have to do both of those things really well in order to shine in this benchmark. So what we did. Mike Matchett: I was I was going to say a lot of people don't understand the checkpointing angle to that. They think they just get these large serial reads and we're good. But anytime you're building a cluster of something that's got to run for a long time, one of the inherent needs of that cluster is to be able to checkpoint so that you can recover and restart at any time. Ryan Farris: Yeah, that's exactly right. Yeah. Checkpointing. If my AI model job failed or if the model training stopped or halted for whatever reason, then you have that reference point to go back to. Uh. All right. Mike Matchett: So so how did you guys so you do these benchmarks. Yeah. How did you do I mean, how does it compare? How should we look at this. Ryan Farris: So we the first time we ran it, uh, we were kind of surprised. We did really well. We were surprised of how well we did. Um, so we we achieved an overall response time that is, uh, spec storages. That's the kind of the output they don't, they don't talk about specifically in milliseconds. They call it an overall response time. They have their own formula for that. But, uh, the overall response time was 0.84 milliseconds of overall response time. So throughout that, what that means is that throughout the length of the job, when you're starting from the first job all the way up to 700 jobs, what you're looking for is linear scalability. And in this case, in Azure native Qumulous case, it's elastic scalability and and evenness with latency and evenness throughout the entire job. So I ran from 0 to 700 or maybe even 7000. And what I'm looking for as a customer or consumer of this information, as I'm looking for evenness throughout the entire benchmark. So, uh, it was great. Uh, but beyond the sheer performance of the benchmark, which is which is awesome, I think what is maybe more important is the price point. Um, so we're declaring and when we push out this press release, uh, that's coming out very soon, and we're declaring how much the actual benchmark costed, uh, and what it would cost in a customer that is paying for this burst performance to run such a highly intensive job in the cloud. And that price point was $400, and the job itself lasted about five hours. So you're getting like five hours of burst compute capacity in the storage layer, of course, for around 400 bucks. The reason why that's significant is because other cloud storage vendors don't offer pricing around performance and price elasticity. And so comparatively, those solutions end up being exorbitantly expensive. Solutions like Azure Files, Azure, NetApp files, Weka, things like that that you would find in the cloud. Now those solutions might perform pretty well. We happen to be faster in this particular benchmark, but if you dig into the pricing you might actually find 10 or 15 x the cost because you have to provision all of this capacity and kind of you're stuck with it annually. Whereas for Azure native acQumulo, you just pay for that burst period when that job starts and all the way to when it ends, and then when it ends, you're no longer paying for that, that burst. Mike Matchett: Uh, all right. So so I just want to unpack that a little bit and emphasize that. So you've gotten a performance result on that for using Qumulo. So it's not like you're the third or fourth tier option down there making a cost trade off. You've got the best performance. It so happens at this point, uh, and you've got a cost that, uh, I mean, a 10th of what the other even the native file system is kind of amazing. Right? Right. So people are saying like, well, why would I add another third party cost on? And in this case, you'd be like, yes, but we're cheaper than even the native file system because of what we do to do this, which is, which is interesting. Um, so, uh, what, you know, tell me, tell me about, um, what's next on benchmarking is this is this sort of the end of the end of the road for that, or you got more plans to do more comparisons? Ryan Farris: Yeah. I think once you're in the benchmarking game, then you just can't really leave it. You have to consistently update. Um, for AI specifically, I think we're going to lean into the aforementioned ML Commons, and we're probably going to start running Mlperf. Um, or maybe we'll try to beat this benchmark. Uh, but apart from AI image and this AI specific benchmark, we're also a great HPC play for things like genomics or manufacturing or other highly, uh, intensive compute jobs that can be run in either burst capacity or with with some level of frequency in the cloud. So we can expect more of those being issued publicly as well. Mike Matchett: All right. So it's not just AI, it's anything where you've got some dense cluster of resources and you need to this high performance storage at a low cost to do this, which is pretty interesting. Yeah. Um, all right. So when, when, when you're looking at jobs like that, though, uh, you know, some of these storage systems you mentioned, there's a fixed capacity, uh, sort of thing that you invest in. Uh, but I understand that you guys can scale a little bit more readily in that cloud kind of fashion. Maybe you could describe that a little bit. Ryan Farris: Sure. Yeah. The the thing that's different or interesting about our architecture is that we've disaggregated our architecture, uh, to, to have our compute sit on top of the object layer. And what that means is that we have a file system transactional layer that sits on top of object, which can elastically scale out and in. And that's where a lot of this performance and cost and the savings come from, where all of your data at rest is persistently maintained and managed in, in native object stores. So we use Azure Blob for that. And then we use Azure, uh, infrastructure, uh, VMs, if you will, uh, to run our transactional layer and every transaction that comes into the file system is from a high performance NVMe SSDs. Right. So you're always getting that snappy awesome performance from that layer. And so if a customer the customer experiences that if they need to scale up from, you know, one gigabyte a second to 100GB a second, we're handling all of that load in the back end and elastically, elastically scaling out the infrastructure that's needed for that load as demanded by the application layer. And then once that application stops, starts to complete its job and it begins a trough period, then the infrastructure backs down and we don't charge the customer anything after that burst subsides. So it's very unique in that regard. Yeah. Yeah. Mike Matchett: I mean I think that's exceptional because it's a very high alignment to cloud utility. Whereas a lot of other solutions we see for storage in the cloud is just capacity based or just going based, like just paying, just paying one way or another. Right? Precisely. Ryan Farris: It's very akin to an auto scaling group in AWS. Um, yeah. It's just elastically scales out and back end. Mike Matchett: Right? So that definitely helps because, um, now tell me a little bit about, um, some of the, uh, ways that you're also integrating with the native cloud services. I understand you've got some things that help people do better with, uh, some of the generative AI programs and services that are native in the cloud. Ryan Farris: Yep, yep. Gen AI is definitely a part of the conversation. It's fun to talk about benchmarking, and it's it's fun to talk about HPC. But ultimately, you might have a customer that just wants to interact with the data in a meaningful way that doesn't actually require. Hpc like or tier one performance. So it's great, but we're also prioritizing cloud native integrations within Azure and AWS for that matter. But for Azure specifically, we have a zero cost Copilot integration. And this is kind of akin to like a librarian in the cloud, where AI is sifting through your endless repository of some kind of large archive, and then pulling out and extracting information needed, and piecing that together to answer questions that are asked through the interface of copilot. All right. So semantic full context search where queries through copilot can be made and then answered right away just using natural language. So we're doing this with these things called graph connectors. And graph connectors allow customers to create a simple schema with with keywords and tags and things that they want to use to interact with the data. Um, and the end result is that, uh, a bunch of customers can now start to interact through Copilot with that data and ask simple questions like give me mortgage statements from X customer between the dates of X and Y, and then just like ChatGPT, you would get that experience in near real time. So a solution super flexible can be customized to read and analyze a bunch of different data types, even at petabyte scale. I think this integration is more relevant for Azure native Qumulo cold. That's our less, uh, that's our less expensive tier specifically designed for storing and interacting with cold data. Um, and it works where your file data is located. So there's no need to import or migrate or other existing files that you need to alter. You can integrate directly with that data repository. Um, one other thing that I'll touch on is that, uh, as opposed to or like compared to ChatGPT, uh, this is super secure. So it, it is designed to run entirely within your own Azure tenant. And so when you're querying copilot, the results returned are from that data repository that you keep, uh, in Azure inside your tenant as a fully managed service. So security conscious organizations, you can imagine how upset they would be if, uh, if they're querying public data. But everything is sitting inside that walled garden, right? Uh, super intuitive, very easy to work with and very easy to onboard. Mike Matchett: Yeah. That's great. And you get a lot of other things with Qumulo as well. We just talking about just scratching the surface as to the performance side of it, but there's a lot of other things that come with this. Um, so so I just to summarize, you got top notch benchmarking, uh, especially in the cloud use case, which is, I think, what most people should be doing. You've got some native integration with things like copilot and on, on Amazon, uh, to help with people who are going that route. Right. It's sort of more like the mom. I wouldn't say mom and pop, but the rank and file organization. Yeah, build building chat and stuff. So you've got some advantages there. Um, uh, anything else coming along that we should be we should be looking at, uh, you know, the workloads and so on. Yeah. Ryan Farris: I think from. A roadmap perspective, uh, we can expect more generative AI and sort of more interfaces into our product to start appearing through our UI called Nexus. That's our single pane of glass from which you you interact with Qumulous storage. Mhm. Uh, that's certainly coming. Uh, we're looking at anomaly detection, uh, and, and ensuring that we can detect anomalies that may be coming from bad actors at the lowest level or at the storage layer that's coming. And then I think we have a couple of surprises as well toward the end of the year that that I won't reveal, but certainly, well, that we're leaning heavily into. Mike Matchett: Definitely come back because we're we're hot on the trails of this AI workload and a lot of people, this is the year for them where they have to do something or provide something. A lot of IT folks are being asked by their businesses to get our generative AI chatbot somehow get one up there and get it running. So I think this is a high demand, high demand topic. And that, uh, if someone then Ryan wants to find out some information about how you can help them, maybe in one of these clouds that you mentioned or just to learn more about Qumulo and I, what would you point him at? Ryan Farris: Yeah. Kulula.com you go to the solutions page and we have an I specific page where we talk about some of these things I mentioned copilot. There's a free GitHub repository that if you're spinning up an Azure native Qumulo trial, you can try out that GitHub repository. Or you can you can use that GitHub repository to deploy these graph connectors and start using, uh, Copilot with your enqueue repository. Um, and then I'll also note that every customer is eligible for a free trial of Azure native AcQumulo, so they can kick the tires and play with it. Uh, at absolutely no cost. Mike Matchett: No cost. I like that. Uh, that's great. Um, and, uh, it's, uh, stay tuned and check out Qumuloative for that, uh, offer. Um, thank you for with the with news when you have it later this year. Ryan Farris: Absolutely. Yeah. I would love to come back on Mike. Thanks for having me. All right. Mike Matchett: Uh, take care, folks. And if you've got your AI workloads and you're doing anything in the cloud and you need that performance at the lower cost, you want to save some money and get a better result. Check out Clumio. Take care.