Transcript
Mike Matchett: Hi I'm Mike Matchett with Small World Big Data and we are here talking today about guess big data. We are talking about how you might scale your storage and get to petabytes and petabytes of data everywhere in a hybrid world, increasingly hybrid world, and still make it all work. It gets complex when you start building hybrid storage architectures, but we've got Cumulo here today with some solutions, so just hold off. All right. Hey, we've got Ryan here today. Welcome, Ryan. Ryan Farris: Thanks so much. Good to be here, Mike. Mike Matchett: So tell us a little bit about, you know, cumulo what Cumulo is known for before we dive into what's new here and maybe even a little bit about how you got hooked up with Cumulo. Ryan Farris: Sure. Yeah. Cumulo has been around since 2012. A lot of the folks that came from the world of Isilon or Dell Power Scale at this point came from that world and and started building a massively scalable but improved storage solution. So we fast forward to today, we're much more hybrid friendly, cloud centric company. We still do a lot of on prem business, of course, but Cumulo is software defined storage that sits on top of commodity hardware like HP or Dell. At some point in the future, supermicro things like that. And then we also give customers choice to run Cumulo and cloud native fashion as well. So we allow customers to spin up an instance and use it just like they would on prem. And the thing that we're releasing on November 9th is a global namespace to tie those worlds together, to give customers a choice or a better choice in hybrid cloud infrastructure. Mike Matchett: All right, so before we dive into that, let's just say what what what do you see when you look at your customers. What are they coming to you for? What problems are they trying to solve when they come to Cumulo? Ryan Farris: Yeah. Well, one trend that we see quite a bit is geo dispersion. And just the, the, the challenges that come with management of geographically dispersed data. And Mike, you could think about ten petabytes of file data. And it's perhaps it's all sitting in one single location, one monolithic location in a data center. And there's a desire to stretch that data out and pull it to the edge or pull it to wherever it's needed, whether that be a subsidiary that's just been spun up on a different region. Or maybe it's a new group of users. I think enterprises share that general problem around data management at scale, and they want more choice around storing it cost effectively in the cloud or in small, different, smaller footprints at the edge or in other core data centers, kind of tying that all together into one big hybrid infrastructure. Very challenging to do that at scale. Mike Matchett: Yeah. So when we're talking about hybridizing, we're really saying like, my data could be here or there or both places, and I've got workloads that need the data in different places. And now we're not just talking about small data sets, right. We're talking about large sets of data, particularly workloads like AI and the rest of it. Right. There's there's there's just there's just these demands on how to put that together. Do you see a lot of folks moving into hybrid architectures these days, or is there a trend one way or another? We do. Ryan Farris: We see a lot of movement and a lot of desire to do more movement from on prem into the cloud and to, of course, have some healthy and balanced proportionality between cloud infrastructure and on prem. A couple challenges of doing that effectively might either be cost, cost of moving the data there, or cost of kind of re-architecting a workload if that's what the customer wants to do. Or they might try to model an existing cloud file storage service and find out of the gate that, oh my gosh, that's like 15 times the cost of what I'm paying on prem. And stop right there. So with Azure native Accumulo, we have this massive price drop. And with global namespace allowing customers to bi directionally move data in and out of the cloud. A couple industries that this is useful for. It's kind of a cross-cutting set of capabilities that touches every core industry. But just to touch on two Pax workloads and Pax are a vendor neutral archive for radiologists that want to store data in hot fashion on primary storage on prem. But a massive archive might be sitting in the cloud at a cost effective price point and global namespace. What it allows the radiologist and the data owner to do on prem is to present that remote data as if it were local. So if they if a radiologist needs to do a study on patient data that's a few years old, they can just pull it down automatically. And with that being a completely managed experience to where the data is needed, and then the other one that I'll just touch on, there's a much stronger desire, I think, for M&A or media and entertainment customers to use more cloud storage and either burst capacity or as permanent fixture and in burst. What that looks like is that if you have a Geo or if there's some part of the world where a group of of of artists is coming on online at 8 a.m. and expecting high performance compute in their region, then you can seamlessly move the data to where it's most needed and where that performance is required. And then, in the fall of the sun fashion, when the New York set of artists spin up spins up online, then you replicate that data and allow both artists to work on the same data set, so highly applicable across regions, across verticals. But those are two that I would touch on. Mike Matchett: Yeah, there's I mean, again, with. To right. It's touching almost every. It's almost not even a vertical anymore. This is I is a vertical horizontal. It's like touching. Everybody needs to do that. But you know there's a lot of there's a lot of reasons to be in the cloud. Right. For utility and economy and and scale instant scale. But there's some challenges with being in the cloud. One, you don't own. You don't own the things. There's some stuff. There's also increasing cloud costs. And there's, you know, people people want, you know, to own the thing. So there's always this, you know, back and forth a little bit. How do you, in fact the cost issue. Let's just talk about cost for a second. How do you help people rationalize costs between on prem and cloud? Ryan Farris: Yeah, a couple. Just one useful tool out of the gate that I would advertise is that we have a calculator that does a fair and rich representation of what our cloud costs are comprised of compared to on prem. So I would urge customers to go to Zillow.com and try that out. But the tagline and kind of the marquee item here that we advertise is that we're about 75 to 80% less expensive than the nearest competitor for cloud file storage. So the price point is $30 per terabyte per month in true PayGo fashion, just for data at rest. That's what you pay for. And a big part of of this story around saving saving your total cost or lowering your total cost of ownership in the cloud is paying for the burst capacity and the throughput that you need at the time when you need it. And what that means is that if a customer is bursting and they're seeing a bunch of business demand where they need more throughput in their file storage, they they're bursting perhaps from one gigabyte a second to 50GB a second and then back down. And they're only paying for that period in which that burst capacity has occurred. File elasticity does not it's not offered in the cloud anywhere else. And that's a big part of cost savings, just paying only for the burst that that is incurred during that period. Mike Matchett: Mean you've also made it. Then you've taken cost out of the equation. Guess when someone's saying like, I've got to architect a hybrid solution, you know, and should I put the data on premise somewhere or in the cloud somewhere and try to think about cost because you've made the cost almost kind of flush that way, and it's like it's real, real easy then to architect based on performance or other compliance requirements. Ryan Farris: Yeah. As a company tenant, I think it giving customers the choice around where to to spread their data and where to store their data wherever it's needed is a big part of our mission statement. Being able to scale anywhere, irrespective of where your data sits, you can stretch that data to where it's most needed. And if it's about cost, or if it's about performance or about moving more data to the edge, we want to allow customers enough choice to send their data to wherever it's needed. So and I guess, yeah. Mike Matchett: And the subtext here folks just saying cloud storage and you're thinking object storage. We're really talking file. Right. So it's important to keep this in mind. It's like you're you should be comparing this to like, you know, having an on prem, say NetApp file or something, which is going to be like five times more expensive, it looks like. And you can now get global cloud file storage, whether wherever you put it for, for a very controllable cost. So I like that. But it does bring up this other idea though, if I start spreading my data, Ryan, you know, around the globe, you know, I'm building little islands of of data architectures a lot of times. And you know where how many copies am I making and where's where's where's my data master and how do I how do I really manage that? That's got to be a big problem for some of your hybrid building customers. Ryan Farris: Yeah, that's a great point. I think some of the problem statements that we hear today is, you know, around replicating data or saving two copies, it's much more expensive. It's it's the management burden is very difficult when you get into scale and trying to replicate data to where it's most needed. Global namespace relies heavily on caching, where the customer experience that we produce is presenting remote data as if it were local. So if Mike clicks on some video streaming file, but that file actually happens to be sitting in some distant region, well, we proactively pull all blocks that are needed so that that time to first byte is really snappy. And once you have your video cached in your local region or your local zone, you enjoy that performance for however long it's cached. And it's pretty similar to a content distribution network in that regard. So there's very little data duplication between these sites, but total access to the data wherever you sit in the world. Mike Matchett: Right. And and so and if I'm, if I'm sort of managing the storage here, I can determine where the master copy is living for the footprint. Right. And and and make this, make this cumulative global namespace available to everyone who's a client, no matter where they are. Ryan Farris: That's right. Yeah. It doesn't matter if it's 10PB or 1PB. If you want to share a folder or an entire namespace out, that level of flexibility is up to you to configure however you'd like and specifically for archive as well. I think getting at archive data as an active archive is another big workflow that we would unlock with global namespace, where if you have five petabytes of archive, but that data, that cold data might be needing all of a sudden, then you can present that local, that local namespace from the remote archive and and get at it at low latency and, and a high level of convenience. Mike Matchett: I mean, you've also you talked about it as caching. And when I think of, you know, previous schemes for caching, it's been about storage tiering. You know, I've got my, my, my memory, my memory cache, I've got fast flash and maybe slower flash, and then I've got hard drives and stuff and you've sort of now extended this into cloud. Right. Is that is that kind of the paradigm? Yeah. Ryan Farris: That's right. It to us scale anywhere means giving customers that choice. It doesn't matter if your data is meant to be kept on prem or in the cloud. It's true hybrid infrastructure. So as long as it's a Cumulo instance, it can share that namespace no matter where the data sits. Core, cloud edge. Mike Matchett: Yeah. You really you've really stretched the idea of what we would think of as a storage array to be the umbrella across the underlying geographic hosting, the data center, the cloud, the rest of it. And almost in a way, that's what needs to happen. I shouldn't have to always be thinking of all those decisions when I'm saying, how do I deliver the best service? How do I get my data to my customer? How do I feed that workload right? It's like the storage system should be dealing with that. Yeah. Ryan Farris: That's right. Making it as simple as possible to manage data. And I think for customers that have built cloud native applications on hyperscalers, that might be something that they just take for granted, because so much of that data management is built into the Hyperscaler. But for Accumulo, we wanted to build the same principles and semantics and the ease of use into our file system and into our into our data plane, so that if a customer wants to access a piece of data, it's a very cloud like experience, irrespective of where those data stores sit. Yeah. Mike Matchett: And again, just, you know, not not to belabor the point, but when we're talking with Accumulo, usually we're talking about large amounts of data. We're not talking about, you know, a ten terabyte database. We're talking about, you know, eight petabytes of, you know, like PACs data, as you mentioned there. They're talking about like the very largest of of requirements that would be levied on a file system. And now you've made that native in Azure. You've made that global so someone can stitch together a hybrid solution. Right. And what else what else is what else is coming here. What what are the good things Cumulus is doing for us? Ryan Farris: Yeah. Well, two major things that are going to keep our keep us busy for a while. So that's Azure native Accumulo and global namespace. And these two things will just continue to evolve. They'll they'll continue to be more feature rich data management telemetry visibility of data will just keep getting better and better. I should note also that we have a feature called Nexus that allows customers to visualize and see all of their data over that global footprint that we were describing. That's a core capability as well, this visualization and management layer across the entire footprint. So that capability will get richer and richer as well as we just iterate and push more functionality into that SaaS service. Mike Matchett: Right? I mean, that just goes along with like the idea of like, you know, hybrid architectures today are complex and hard and can be complicated to manage, but they're an inevitable future. I think people need to take advantage of of all the different reasons why data should be one place or another for, again, security, compliance, performance, capacity, whatever it is, and you're going to make that very easy at this petabyte scale. I think this is this is great. Um, and and, you know, one of the things that I think people want to want to then say is like, okay, if I want to get more information from on Accumulo, I want to kick the tires. I want to look at and stretch this out. Do I do I have to bring in something and spend, you know, a long time migrating a petabyte of data into it? What can I do? Right. Uh. Ryan Farris: Right. Well, we we support several different types of applications that either move or migrate or replicate your data for you. You can go to Kulula.com to see who those partners are. Um, and, and I think replication specifically from either an isilon or from whatever the case may be, is a pretty easy step. We do it all the time. It's not a multi-month process. Oftentimes it's a multi day or multi week process depending on your data footprint of course. And so we try to make that as easy for customers to do as possible. Mike Matchett: All right. So if someone really wants to push push forward on this you've got a website I'm sure. Would you recommend they take any particular steps here. Ryan Farris: Yeah I think we've rebranded our website in the last several months, but since post launch on November, November 9th, we're going to be talking about some of these features in depth. You can go to Cumulo comm. That's cumulo.com, or you can also discover Cumulo through the Azure Marketplace portal. Either way, you're going to get the same information that ostensibly the same experience. And you can learn about. You can play with the calculator. You can learn about what is behind global namespace and how to use it, and how to purchase either a Cumulus subscription for on prem storage, or you can just start your own free trial for Azure native Cumulo in the cloud that takes ten minutes to instantiate an instance and spin it up and start using it. Mike Matchett: I mean, I think that's the I think that's the approach a lot of people here might take that are watching. This is like, hey, take the next ten minutes and and kick the tires on that, because I think there's some really good things here for people to discover, and they probably just don't even think about. They think it's possible, like, how do I get how do I get a multi-petabyte file system and get rid of some of these anchor boxes that I've got sitting next to me, right. Ryan Farris: Beautifully. Yeah. Beautifully put. That's right. Mike Matchett: Yeah. All right. Well, thank you for being here today and explaining this. I'm looking forward to hearing more about what you are getting into at Cumulus and how you're going to take this forward as you work with larger and larger customers and more and more verticals. Like I said, I don't think it's verticals anymore. Right? I think it's going to be horizontal needs for this stuff. But thank you for being here today. Ryan Farris: Great. Thanks so much, Mike. Pleasure. All right. Mike Matchett: All right. Take care and check it out gmail.com.