Transcript
Mike Matchett: Hi Mike Matchett Small World Big Data. We are here today talking about my favorite end of the market, which is high performance computing, super computing, which is now coming down into the enterprise in the form of AI, another one of my favorite topics. But how do you manage data in the enterprise for AI? It's got a couple of corresponding challenges having to do with scale and performance, but we have Hammerspace here today to explain how you can tackle that. So just hold on. Welcome, Molly. Welcome to our show today. Molly Presley: Great to be here. Thanks for having me back, Mike. Mike Matchett: I know we've talked a couple of times before and a couple of different, uh, couple different, uh, solutions. Uh, but Hammerspace is here today. You're going to tell us about how Hammerspace helps with AI workloads. And I'm curious because when we talked before about Hammerspace, uh, it's been in terms of, uh, metadata management and governance orchestration and getting a handle on storage services across multiple silos of storage. Right. How how how do you how did Hammerspace suddenly realize it was in the performance space at this part of the market? Molly Presley: That's a great question. Great observation. So the problem Hammerspace as a company is solving is unifying a customer's data environment into a single environment. So whether they have compute in multiple locations, storage in multiple locations, whether those are clouds, data centers, whatever, um, you can unify your data into one unified data set which your applications and users connect to no matter where they're located. Your compute connects to, no matter where it's located, and your storage connects to wherever it's located. So really, a massive, simple simplification of how data sets, um, are utilized in today's world. Um, but when you say, okay, so that doesn't sound like high performance stuff, um, we just happen to be, um, from our fundamental foundations of how we are architected deep in the roots of high performance architectures. And this is around the concepts of how do you drive hops out of the data path between the compute environments and the storage devices, essentially. And um, we've contributed a lot into the community. I'm sure we'll talk a little bit about this, but, um, how we are really focusing now on this high performance side of the world is you think about, okay, we're connecting data with compute. Compute is extremely expensive. People are spending hundreds of thousands, hundreds of millions, depending on who you are and your compute resources. And you need to make sure you're feeding those with data optimally. And that's really what our job is at Hammerspace. Mike Matchett: I mean, GPUs, yes. Everybody wants them. My my son wants a bigger and bigger GPU in his box here at home, uh, to run not just games, but he's wants to run AI models here too. I mean, everybody's doing it. Uh, every enterprise out there is trying to wrestle with this. Now, I'm sure to say, how how do we support this either in-house or in partnerships? And they look and say, hey, we buy these expensive Nvidia GPUs or AMD GPUs or GPU farms, right? They're not just buying 1 or 2, they're buying tens, 20s, hundreds of GPUs to do this. Uh, and then they look at their storage and go like, yeah, it ain't it ain't putting it out. It's it's for that storage was for file systems. It scales but it doesn't it doesn't pump it out. And this storage over here, maybe we were doing something, uh, fast with it for other workloads, maybe a video workload or a simulation workload, but it doesn't really scale to hundreds of GPUs. And so, uh, how do you how do you bring that together? How does Hammerspace then fit in there and provide that sort of key matrixing between speed and scale? When you look at look at enterprises, data, real estate. Molly Presley: Yeah, absolutely. So I think a lot of enterprises will who are listening to this conversation, this will resonate with them that a lot of them have been trying to take their IT infrastructure, which was designed for something very different. Home directories users. Yeah. And that's where all their data sits largely. And they want to use that data with this high performance computing or GPU computing or AI computing. And those storage systems just can't feed those compute environments. So they have the data, but they can't get it to the GPUs fast enough. So they're in kind of a conundrum. What do I do? Do I have to go buy all new infrastructure? Well, or. Mike Matchett: Some HPC beast of some academic storage system. Right. Molly Presley: Which nobody. Mike Matchett: Different academics and. Molly Presley: Yeah, yeah. Or they have to go buy a bunch of people out of academia. So it's a conundrum. And what Hammerspace does is really brings in the best of both worlds. We do offer under the covers all the technology of a parallel file system like what's used in HPC. And we're used in HPC as a parallel file system, but is presented and deployed as an enterprise NAS. And how we do that is because the the smarts of the parallel file system. I won't go too deeply into this for the sake of time. Instead of being proprietary, software you have to deploy around your environment is built into standard Linux. Our CTO is the Linux NFS kernel maintainer. So if you want the performance of HPC, as long as you're using standard Linux, you can get that with a standard enterprise NAS with Hammerspace, which makes it easy for it and the enterprise to deploy because it meets their security standards, is interfaces and technology they're familiar with. It uses the networks they're used to using. Scene, but is designed to feed massive compute farms because it is an HPC system inside. Mike Matchett: Oh yeah. So when we when we look at a diagram of this, uh, and I have to apologize myself, I previously looked at Hammerspace. It looked like we were adding another layer of metadata management or orchestration compliance. Molly Presley: The marketing girl. Mike Matchett: On on on top of something. And I think a lot of enterprises are going like, well, I don't need yet another layer of complexity in staffing stuff. But when we dig into it, what's really going on? You mentioned Dataops before with, with with a normal NFS kind of application where you've got storage and compute and networking and, and all these things going on. There's a lot of hops that that data has to take. I think you use the word hops. We'll use that between the application and, uh, the storage or the storage in the GPU in this case. And talk about that, what you're doing with Hammerspace, even though from one perspective it looks like you come in and, and a layer across the existing storage, you actually shorten the data paths between the storage and the GPU tremendously. I mean, even have the number of hops in some way. How does how how does someone just get, get get their head around that. Molly Presley: Yeah. And this is a place where a picture speaks a thousand words, but I'll try to draw it with my hands. So you think about you have the hops in like a scale out NAS system that goes from the compute across some controllers, sometimes some internal networks down through the storage system. What Hammerspace does is we come in and pull. A whole bunch of those hops out into an out of band metadata update. So metadata is updated out of band. So the data path literally goes directly from the compute environment to the storage with all the learning and metadata creation out of band. So for speed, you're essentially getting near 100% performance of the networks, the GPUs and the storage hardware that's in place and any of the overhead that has to occur related to metadata. What are the files, who has access to them, how many copies, blah, blah, blah, um, is done out of band. So that's how we are making it a much faster data path while still adding the intelligence we need to know who has access to data. Is it protected those types of things? Mike Matchett: So you're really supercharging someone's existing storage landscape or their existing data footprint, uh, without requiring it to be churned and turned over and re hosted. Right, exactly. Molly Presley: Which is very often we'll see customers who have, you know, think of an IT environment. They probably have 6 or 7 different vendors with data sitting on those different vendors for various performance cost, feature function attributes. What Hammerspace does with our hyperscale NAS is we come in and assimilate the metadata out of those so we know which data exists, move that out of band. So get rid of all the bottlenecks of the performance of the file systems that were in place. So now you have faster access to the data and you don't have to move the data. The data can stay on the NVMe or HDDs or whatever you had, and you just get a storage acceleration because the overhead of the software is taken out of band. Mike Matchett: So acceleration is another good, a good, a good time. We could start thinking of here. Uh, when we talk about, um, enterprise, the speed and scale for enterprises, sort of you talk about hyperscale, uh, NAS or hyper hyper NAS. Uh, you also talked about hyperscalers. Is this a product for hyperscalers or product for enterprises? Um, yeah. Molly Presley: Great question. And naming. It's always tough. Naming is one of the hardest things you can do sometimes. Um, I think building. Mike Matchett: I'm really giving you a hard time today, Molly. But not you personally, but not you personally. Molly Presley: No, it's a great question. So the the analog I like to draw is, um, think about S3. The very first S3 was built by AWS, and it was built on the tenets of a hyperscaler that needs massive efficiency, massive scale and massive simplicity. This is the exact same thing. Yes, hyperscalers have absolutely influenced and are using our hyperscale NAS, but everyone, just like with S3 and object storage, needs the benefits of this architecture. They need fast data paths, efficient use of their hardware linear scale, and for one customer, that might be linear scale of two storage nodes to ten. Um, for another it might be 100 to 1000. But the efficiency, no matter which size your environment is or how many GPUs you have, you'll get the same benefits. So that's the best way I could kind of, um, share why we called it hyperscale NAS because it uses those tenets that anyone can benefit from. Mike Matchett: And I think it also brings in, brings a little bit light to the fact that if you're implementing this, you get not just the speed scale. I can feed the AI GPUs, but now I get this idea that I can manage the data wherever it is and provide the right kind of enterprise agility and protection for that data. Uh, and sort of break down the former silos that I had between different, different parts of storage, including into hybrid and cloud areas. Right. So it's a hybridizing enablement technology as well, right? Molly Presley: Mhm. Yeah. I think that um, AI is a forcing function to some things that have been coming along in our space anyway that um, data sources or data silos are super prominent and you have them or they're in all over the environment for different reasons and they're difficult to unify. But was it on the C-suites agenda to solve that problem? Not really. It was like, you know, let it do some copies, use some rsync, you know, whatever. But now that AI has become so prevalent, there's a driving force to solve this problem that multiple data sources can cause bad results, they can cause complexity and they can slow down AI. And that's time to results of whatever your project is. But it's also inefficient use of the GPUs you spent so much money on. So this idea of. Being able to unify your data into a single data set and then use it where you want it with the compute that's available. Maybe you're using Azure Compute with the AI model you want to use. Maybe that's up in snowflake and with the storage economics you want. Maybe that's in Glacier. You can do that with a single data set, because Hammerspace can place the data where you need it and has knows where all the data is located, and creates just a single data set to work with. Mike Matchett: All right. Really simplifying the task of managing the storage landscape there. Let's just talk real quickly. Close close this out a little bit on the value proposition then. Uh, is it we sort of hinted at this. It sounds like it can make your existing NAS solutions work like they're superheroes here because they're they're being met. The metadata is offloaded, and you're passing them there so you can save some money there. Where else would someone look for the value proposition in deploying hammerspace in an enterprise? Molly Presley: Yeah, absolutely. I think, you know, there is this ability to, like you say, supercharge the infrastructure you have today. Um, but it's also around the efficiencies you drive from, um, needing fewer port counts, needing less hardware to get the same performance. And then ultimately it's around being able to do what you want with your data for your business. So whether you're a research organization, whether you're building movies or, you know, doing backtesting for financial services, your data is no longer hostage to the silo it sits in. Your business can use it where it's needed. Um, your business units can use it where it's needed. And doing this all with a compliant way where, you know, with audit trails, who's used what for where, not allowing people to delete and copy stuff they shouldn't. So there's a lot of enterprise compliance, but it's mostly around you want if you want to be able to use your data and not have it hostage in a silo, um, Hammerspace makes that possible. Mike Matchett: Right? Right. So yeah. So we are we're reducing costs. So bringing in hyperspace can have an ROI to it as well as enablement and and just an acceleration of storage. So you can actually do I to start with. Right. So you've got a couple things to consider there. Um well there's a lot here. And I know previously we've looked at some of the architecture of hammerspace deeper. You guys could go back and look at some of the older videos we've done. But, uh, tell me if someone wants to look at what you're doing now and get up to speed with Hammerspace, what would you point them at? Molly Presley: Molly I would definitely Google Hammerspace hyperscale NAS um, will show up easily and we have some cool landing pages. But once you get on to the landing page, which hopefully Google gets you there quickly, um, there's some really neat technology, white papers if you're a technologist and saying, hmm, I don't really get how this works or how this would plug in my environment. We have some great deep dive technology architectures on that page. Um, if you're more on the business C-suite side and thinking about, you know, how would I calculate the ROI of this in my environment? There's some white papers up there as well. So there's some great resources on the hyperscale NAS page on Hammerspace. Com. Mike Matchett: All right. So for more information check that out. Uh, and you know, every time you come back here, I'm learning something new about how we can actually get our storage unlocked and enable it to do greater things. Thank you, Molly, for coming and explaining some of this today to me. Molly Presley: Uh, these conversations are so much fun. Mike. Thanks for having me. Mike Matchett: All right. And check out Hammerspace. You know, it's got solutions for some of the things that you're really trying to do today. I can almost guarantee it. Uh, take care and see you next time.