Transcript
Good morning. Good afternoon. Potentially. Good evening. Thank you all for joining. We will be getting the session started off here in just a couple of minutes. So hang tight and we'll be talking with you soon. Good morning everyone. I'm Molly Presley and I will be the host of today's event will be joined by David Flynn, the founder and chief executive officer of Hammerspace, as well as Scott Sinclair, a lead analyst over at Enterprise Strategy Group. We will be getting started here in about one minute. Give everyone a chance to get logged in. Essentially, what you can expect from today is we're going to kick off kick off the conversation with about a ten minute video of David talking through the challenges that are happening with distributed data sets and the architectural advantages that are brought to bear by using Hammerspace. After that short video with David ties up, we'll be jumping immediately into a conversation with David and Scott Sinclair. If you have questions, please type them in to either the chat or the and A. We'll be monitoring both of them, and we'll answer them either in real time or in text, or follow up with you after the event, if that's what you prefer. So if you would hang tight for about another 60s, we'll kick it off with the video and then follow on with the conversation with Scott and David. Data orchestration is important for all industries that use data, but especially those where data is their principal product, whether that's making movies, designing drugs and genomics, or designing microchips. All of these industry data actually is the product, but even outside of those, every other industry is also today driven by data. And it's not just the structured data, but also unstructured data. And it's not just the most recent data, it's all data. So what we're seeing is a true transformation where all data needs to be orchestrated. All data needs to be available at your fingertips. The need that's driving this next data cycle is the competitive business environment that we live in today. To stay competitive as a business, you have to be able to hire people potentially around the globe. They expect to be able to work from home and maybe even their home country. You have to be able to do your computing wherever you have economic resources to do that on. You have to be able to retain your data where it's appropriate to do so from a legal regulation perspective. So the driver really isn't new, it's the competitive nature. But now it's become all the more urgent to be able to extract the most value from data as possible. The businesses that are going to win in the next data cycle are those that can bring automation to bear, to incorporate more data more quickly. So in the next data cycle, data is being generated at the edge and being used in the cloud or on prem. The key here is to be able to decentralize the data to support a decentralized workforce, support decentralized computing resources. You have to be able to tap into whether it's human resources or compute resources anywhere in the world. And we're seeing this not just in the world of data and compute, but in physical supply chains. This same need to build something that is much more robust through being agile. There's an added urgency to this. When we look at the opportunity that we're presented with AI and ML, it's going to be the companies who can work with large quantities of data across decentralized resources and bring that to bear with this new class of application that we're seeing today. It's a game changer, and the companies that can use that are the ones that are going to have the competitive advantage. Data architectures are fundamentally changing to address these new business drivers. We're moving from a world where data is manually stored, copied, merged, cataloged to a world where data becomes fully orchestrated and is simply available anywhere and everywhere you need it, all of the time. Different than the world where data is stored in a single centralized silo and all of the computing and work has to be done there in close proximity to it. What we're talking about here is a fundamentally new architecture. It's an architecture built around the concept of data being an orchestrated asset, not something that's stored and manually copied and merged. This allows data to be itself decentralized and available around the world to anybody who needs to use it. So data architectures are fundamentally changing in this next data cycle. Instead of data being a centralized resource that is stored and maybe manually copied and merged, it is becoming an orchestrated asset to where it permeates the environment and can be accessed from within any data center, by any application, by any user. And this allows businesses to have that agility to extract more value from their data more quickly. Data architectures until now have been bound by data gravity. With data orchestration, they are no longer bound by data gravity. One of the key defining features of this new data architecture is that you not only have your data wherever you need it through orchestration, but that it is delivered at a new level of performance that was not achievable before. We have to be able to have our cake and eat it too, and have data be where you need it and be at the performance levels that you need it. This really is a hat trick in that you are able to have the data everywhere, and have it at the extreme performance levels that these new AIML analytic systems need. In the past, data architectures have been defined by the storage systems serving that data. In the future, data architectures will be defined by the data orchestration platform. And that's where you get the opportunity to introduce true parallelism and to get to a new level of performance. Not only do we have these new applications. Ai, ML analytics that are driving increasing needs for performance. But we have the opportunity with ever advancing hardware and ever more specialized hardware for these applications. They need to have data delivered in parallel and with shorter data paths. That data orchestration uniquely can offer. As I was saying, data architectures have in the past been defined by the storage system. This is because the data presentation layer, the thing which is giving you the perception and view of data, the file system has been embedded within the storage system. Data is a platform layer construct and yet you have it being subordinated to infrastructure. Platform is supposed to be on top of infrastructure, not beneath it. And yet that's what you get when you have file systems embedded within the storage. Even worse, when you need to have that data available in a decentralized fashion across different data centers, you're left with having to make copies. And those copies are a fork of the very existence of the data at a specific point in time where it gets orphaned, abandoned, and is now a different piece of data. This grows as an exponential challenge as you start introducing more and more places where you need to be working with the data. So we have to move past the store copy and merge model for presenting and accessing data in the next data cycle. Data architectures need to be done fundamentally differently. The first thing we have to do is to take the file system out from under the storage infrastructure, so that data is no longer subordinate to infrastructure. And this is precisely what we have done at Hammerspace. We have built a file system that is capable of extending across decentralized infrastructure, infrastructure of any type, in any location, and have it still be the same file system and the same data that allows you to have global access from anywhere, at any time to any data. More importantly, what that allows us to do is to take data management and fundamentally transform it into something very different to data orchestration. The difference being that data orchestration is fully automated using AI and ML, and is able to predictively move data to where it is going to be needed or where it is currently being asked for, and do that in a way that does not disrupt the ongoing use of that data, because it is now behind the file system, behind the data presentation layer, it no longer disrupts your access. And what that means is that storage can now recede into the background as interchangeable infrastructure as it should be. This allows data to transcend infrastructure and to become a global asset that is accessible and available everywhere, continuously, without interruption. By having this global data presentation layer, a global file system, it allows us to fundamentally change data management and transform it into something very different. It allows data management to become data orchestration, where the defining difference is that data can now move in a way that doesn't disrupt the view of the data, doesn't disrupt access to the data. Data's movements are transparent to the use of the data. This allows us to, for the first time, truly automate the movement of data using AI and ML. You can now predictively place data where you're going to need to use it, and you can reactively adapt to the use of data. And ultimately, this allows us to take the storage infrastructure and allow it to recede into the background where it ought to be, not because all storage is the same, but precisely the opposite. Because storage systems have many diverse design points and serve different functions and data over its lifetime needs to be stored on different types of storage, sometimes for performance, sometimes for long term retention, sometimes closer to site A, sometimes closer to site B, and that's why we ultimately have to be able to move from a store copy and merge model to a fully. Orchestrated model. What this ultimately means, very fundamentally, is that we're taking applications who used to be at a very great distance from their data and making it so applications are now adjacent to the data presentation layer that allows these applications to become smarter about how they use data across a physically distributed infrastructure. So as you can see, this is a very fundamental change in architecture. And that's one of the reasons why it has been so important that this be done as a standard and in open source. This is where I'd like to call out our CTO and many of our advisors. Trond Myklebust is the kernel maintainer of the storage networking stack in Linux, where this is fundamentally made possible. And Gary Grider is one of the early advocates of this type of architecture, going back over a decade. And then Kai Lee, who introduced fundamental architectural changes in how we do disk to disk backup, and Dedupe Waldman, who introduced new forms of high performance networking and InfiniBand. They've all been key to driving this as an open, standard based solution. Okay. Great. Thank you David. That was a great overview of not just the problems that Hammerspace is looking to tackle, but also a lot of the challenges that the industry is currently experiencing and trying to figure out what are not just their architectures, but also their business strategies to address. I'm really glad that Scott Sinclair joined you today to talk a little bit more in depth about some of the topics that we covered in the video, so let's just get started. Let's talk about storage silos and data silos. What are the problems with these silos of data, and why do we really need break down those silos? Scott, I don't know if you want to jump in first here. Yeah. Sure. I was going to I was going to give the host the the advantage. But absolutely, I'll jump in and and comment on our research that we're seeing at Enterprise Strategy Group. I mean, you know, Molly, you talk about silos, right. And that's one of the things we've done is in this effort to accelerate access to all the benefits of different clouds and different locations, whether on premises or off premises. We've we've scaled up resources. But what that's led to this world of distributed data everywhere and the silos that it creates are inhibiting business. I mean, what we see in our research and this shouldn't surprise anybody. Nearly every organization is is hybrid or multi cloud nowadays. It's hard to find somebody that is only in the data center or only in one of the public cloud providers. But what's really fascinating out of this, and I think we all know it intuitively, but it doesn't get talked about enough. Um, the world is is distributed and it's only going to get more distributed. We're not seeing, like some sort of weird consolidation where everyone's moving to one cloud provider or back on premises or anything like that. What's really fascinating is even as environments scale, as cloud adoption increases, we're starting to see an equilibrium start to merge, where about 50% of businesses that are that have stronger adoption of public cloud resources. When we ask them, what is your investment look like moving forward, the investment levels tend to stay about the same on premises, off premises, everything else. So what that really says is, look, we're we're living in this world where data is going to be spread everywhere for a while, and the silo problem is on tight to fix. I mean, a couple quick stats real quick. 88% of organizations we polled said, look, they believe leveraging multiple cloud providers provide strategic benefits, which means, hey, look, there's no consolidation coming anytime soon. And 87% just echoed that by saying their application environments only will become more distributed across more locations over the next two years. So distribution is going to happen all the time. And if we don't do something about it, it's only going to lead to more silos, which causes more problems, more risk, more cost as we scale. It reminds me of laws and thermodynamics of entropy. It doesn't go backwards. Exactly. You use the term equilibrium at the same time, and it's that, you know it's happening even more with advancements in AI. It's no longer 1 to 1. You don't have a storage system supporting an application. You want to get all data from all applications now into a number of different tools, AI tools that can consume it. So, you know, it ends up being a many to many problem. Whereas in yesteryear it used to be maybe a 1 to 1. Yeah, absolutely. Your comment on AI is so critical because, you know, I've been in this industry for, you know, two and a half decades. And I think one of the biggest differences in now versus when I started was we used to store data. Now we're actually trying to use it, and we're trying to use as much of it as we possibly can. And AI is a perfect example, which means you have to access all your data or have make it accessible. So I think that's an interesting kind of segway into the next topic. But I've been reading, as we all have. You can't pick up anything your computer magazine without hearing about AI, but regularly we see the problem of data silos, and that could be the storage system it sits in. It could also be organizational that a business unit or human has a data silo that they don't know how to share with the larger AI initiative or organization. And so that idea of now that we want to use data, not just store it, having it in silos is bad or is is certainly going to slow down your initiative. But we also need to figure out, okay, great, I want to put that data in motion and start to move it to the applications computers that need it. Maybe David, you can start here and talk a little bit about why data orchestration is so critical and a little bit about what we're doing at Hammerspace. Well. That's right. I mean, it's one thing to get access and that's been talked about is for these different systems, you need access, but access is meaningless if it doesn't have high performance. And the only way to get performance is if you actually move the data and have data in close proximity to where you're going to do the computing, you know. And the other corollary here is that the computing is no longer general purpose processors. It's at least GPUs, if not other future exotic processors. And these have to be done at scale. So really you have to reposition the data across the infrastructure to be able to use it in these different applications. It's not enough to to access it remotely. And that's why I think what we're talking about here is a very fundamental paradigm shift. In eras past, data was made permanent through storage, and then you accessed it over a network. And we're talking here about orchestration, where fundamentally the presumption is that data is always in motion. And you need to simply be able to have consistent, continuous access to it even while it's moving. And that movement of data while you're accessing it is what is enabled through a technology like Hammerspace, where the file system can span and sits out front of everything. Then the movement is done from behind the facade, from behind the data presentation layer in a way that doesn't disrupt it. And with that, it can be both push based on policy, proactively positioned where you're going to need it in advance. And this is where you can actually use AI to help determine those those things in advance. And then it's also reactive to the need of the data. And what enables all of that is the principle of granular encapsulation, the fact that you're moving it down to the individual file. And so these concepts of orchestration actually overlap with compute orchestration in the container world, it's the same kind of principles of of lightweight encapsulation granular and the ability to to move things. And we have to get to a world where data is orchestrated versus where data is stored. You know, David, I'm just going to jump in because I absolutely agree with all of that. I mean, you know, just kind of building off of what we're seeing in our research at the Enterprise Strategy Group, you know, for example, 50% of organizations are around that say, look, they're moving data across locations all the time or regularly, like, this is this is a common thing that people are having. So not only are becoming distributed, but not only are application and data environments becoming more distributed, but movements happening all the time. And organizations need to accelerate that or make that simpler. But the fascinating thing, and this is something that literally we just got in a study and it blew me away. What's really interesting is in addition to the distribution of data, you mentioned this idea of you want the app close to the data. Absolutely. You do. I mean that that reduces risk. It reduces cost, it speeds up performance. It provides a better application experience. I think we can all acknowledge that. But what was fascinating is how few organizations actually have a process in place or an organizational paradigm in place to where they actually achieve that. So one of the things that I've been tracking is this what we call Rise of distributed applications, which is an app where, you know, the app may reside in like one cloud provider and the data resides maybe on premises or vice versa or something like that. 98% of organizations identify that they have some level of distributed apps in their environment. And the majority of organizations we polled have over 100 of these inter cloud connections or cross location connections. So essentially where they have an app running in one place and the data somewhere else. And so you just think about all the problems that creates around cost, around management, around, you know, performance challenges. And when we looked into why, one of the fascinating elements of it is 40% of organizations say, look, our organizational structure, how we make decisions and where we do things and who owns what. There's just different processes in place. So the point of it is, yes, you need in this ideal world, we need data and apps close together to to provide a better experience, reduce cost, reduce risk. And you would think. Architecturally organizations approach with that in mind, but often due to the organizational dynamics, it doesn't happen at the beginning, which only increases the importance of being able to move it easily after the fact. Because once you spin something up, you need to be able to move it. You bring an interesting point, because I've always thought that it's interesting because data is the very definition of digital. It's the thing that lives in the virtual, right. And yet the computing side is what is tethered to, you know, actually racking and stacking hardware and having to glue it to our physical world by providing it energy. Right. And and so you want to you really end up needing to move data to the compute instead of the old conventional wisdom of move the compute to the data. And now compute has to be at such a large scale that and potentially exotic processing. And so it really becomes even more incumbent on us to be able to move data to the compute instead of vice versa. And yet that old wisdom of move the compute to the data prevails. I hadn't really thought of it as an organizational problem, but you have that as well. And then there's the fact that some of these things are only available in certain clouds. So you have to actually put the app in this cloud because the let's face it, the cloud vendors want to go up the value chain and offer things of more value, offer more applications, and that means you're having to get the data to the respective cloud where it wants to run. So all of these things are begging, you know, to get that magic of the co-location of compute and data. We have to free the data from the silos and make it mobile so that the data can follow where the computing needs to happen, instead of the other way around. Absolutely. And I think one of the reasons why the adage is move the app to the data is because, you know, you know, frankly, you guys have great technology to move the data that's not available in a lot of places. Right? So it's that's right. That data gravity mindset, that perception persists. But yeah, absolutely. And the the reason for that is that in the past to move the data, it's different data. It's a copy. It's a fork in time it's a different thing. And it's dead. It's no longer the living asset. It's no longer going. It's a you know, it's a fork. And and that's why you really can't get that many to many relationship of the many different data sets to many different applications that you need. It's no longer 1 to 1 where you build a silo to support an app, and you build another silo to support an app. Now we need to get access to all of those silos with a myriad of different applications and tools in different facilities. And that many to many means that we can't be managing forks of the existence of the data. Yeah, absolutely. And the reason why I bring up that distributed that organizational stat is in addition to all the problems, you know, I think there's just a perception where most people believe, oh, that doesn't happen very often in our environment, because we're going to always make sure apps and data is in the same place. And and that's not true at all organizations organizationally, often it's very difficult because people can't see, you know, six months or two years or five years ahead of which apps are going to need which data. So at the bottom line is really the only way to solve this is simplifying the movement. Exactly. I think, David, it would be worth you talking a little bit more. I've heard you in other venues talk about this concept of when you isolate data, you have an analogy of chopping off the limb. But you know, the concept of once we have these forks, there's a there's increased cost, but there's also problems with data quality, which is the correct version of data that you use. Could you talk a little bit more about that? Well, I think of it as, you know, the tools that are used today from the outside of the presentation layer from outside of the file system when you copy. It's very much like creating Frankenstein's monster. You're literally cutting limbs off of a live being, shipping them around and grafting them in somewhere else to work on them, and then cutting them off and sending them back. So this, this store copy and merge model is fundamentally, you know, making Frankenstein's monster. You've ultimately you have arms growing out your ears because you can never trust that you've merged it correctly. So you end up keeping every single copy that you've made as you do things. And it's even worse if this is between organizational boundaries, between whole companies. Right. And all of this leads to just massive proliferation of old and dead copies of data with an uneasiness about where is my data? And, you know, the irony here is that, you know, when we set out to build Hammerspace and I talked about it as abstracting data from the underlying storage so that you can have continuous access to it even while it moves across the infrastructure. You know, I thought, okay, we're abstracting data that makes data more abstract, but it actually is the exact opposite. It makes data more concrete because now you can think about this is my data, and my data exists independent of any point of infrastructure that might hold it from one point in time to the next. So it's actually about introducing an abstraction layer so that we can get more concrete about the concept of what is my data, and in particular about which about the, you know, what is my curated data, which has been highly, you know, festered over to make sure that it is my golden copy. And you don't have to make any other copies at that point. You sure you have snapshots? You can have snapshots that that articulate the timeline where it's frozen in time, but you're not having to do copies as a tool to position data on different infrastructure elsewhere. That happens behind the scenes in orchestration. Yeah, I just want to jump in on that. I mean, we we've done multiple studies where we've looked into this and one of the most transformational things organizations can do or IT organizations can do for the business as a whole that impacts revenue. Everything else is get the right data to the right people as fast as humanly possible. I mean, if you just boil it down to that and and the word right is very important in that, in that statement, you can get data to, to, to the right people, but it's not the right data. Then essentially what you're doing though is you're you're slowing down time, you're slowing down operations, you're slowing down time to revenue, you're delaying initiatives, all adding risk to the business. So the ability to understand what that it's the right data is insanely valuable. And I think that leads the conversation of data gravity. David. Feel free. I was just going to say, I think that, you know, just want. To make one last point. Yeah. And then this also goes to the cost aspect. And we shouldn't forget that, that, you know, if the model is store, copy and merge and has a tendency to induce you to keep around old copies because you're not sure about what is golden or not, and you need those copies on different sites. This this ends up being a very big cost issue. As soon as you can have a single global namespace and maintain a golden copy that can be in close proximity to everywhere that that needs it, so you can consume it with high performance. But it's the same piece of data. Now you can reduce those copies, get rid of cost. So that's that's a key thing. But but all of this is as you were hinting at Molly. This is really about the nullification of data gravity. And you know the concept of data gravity is what's behind all of these evils, is the fact that that data is massive and applications get captive in orbit around it and its storage system. And, you know, here we're talking about the need for many different applications in different clouds. It's fighting that data gravity problem. Maybe you're having to suck data over the network through a straw, and it's slowing down those applications. You know, it's funny, the further out in the orbit you go, the slower it takes. You know, every orbital period goes up. So you can't, you know, so we have this concept of data gravity. And that's what data orchestration can appear to violate the laws of physics. Because what it's doing is that it's allowing data to be positioned potentially in multiple places at the same time. It's allowing it to move granularly proactively with with policy based push as well as reactively with pull model. So now you can use those applications in different sites with data sets that are now local, even though it's the same data, it's not a forked copy of the data because it's got the same metadata plane that unifies it all. So, Scott, do you think about the complexities of moving data? I mean, David covered some of the technology capabilities and the benefits of the data set. Could you cover from your research a little bit more of the detail, what kind of complexities you've been seeing in moving data? Yeah, absolutely. You know, I think about this a lot. And I think at a high level outside of maybe cybersecurity. The challenge of distributed data is probably the biggest problem in it right now. If we kind of think about it that way and cybersecurity and distributed data are actually related. So those are those aren't mutually exclusive challenges. But at one of the biggest problems for businesses or really especially digital businesses, is how do you continue to keep up the pace of operations at scale? Well, one of the main factors that inhibits that is as you scale data gets more distributed. And how do you understand where is the right data. Where is it going. So for example, in our research, 81% of organizations say, look, we face challenges with application and data portability across locations. That includes data centers, clouds, edge everywhere. But what's really fascinating is, you know, the enterprise strategy group has been around for 20 plus years. And we started our history was actually originally we were called the Enterprise Storage Group because that's what we focused on. So we've been doing research on the challenges of data storage going back multiple decades early on for like the. Up until just recently, the biggest challenges in storage always had to do something with the cost of infrastructure. You know, keeping up with the growth rate of data, how do we protect it, those sorts of things. Recently, those are still high, but the top challenges of data are of storage environments. All have to do with data movement. How do I migrate to the right place? How do I find out where the right data is? Those have surpassed those traditional challenges of infrastructure. So right now again this is a this is a real problem. It's a tangible problem. And going back to my my right data at the right time to the right person concept. We know that this there's a tangible difference in business outcomes between the companies that are good at this, that are moving things well, or can get the right data to the right person more quickly versus companies that fail at that. So essentially, if you want to if as an IT as an IT organization or a cloud operations person, if you want to be that hero for your business, one of the best ways you can do that is improving this data mobility aspect. Because at the end of the day, you know, we talked about it, right. It's very difficult for organizations to come in from an architectural standpoint and just know, oh, I know the perfect place for my data to reside for its life cycle. That's ridiculous. Nobody knows that. So the only way to really address it is to simplify that movement problem. A term I've heard used for that is data is agility. To be agile have to be agile with it. And often that that comes with, you know, conflicts with, you know, the ability to control the data and secure it, comply with legal regulation. That introduces another dimension to the to the problem. Yeah, absolutely. There's and regulations always changing. It's always evolving. We're getting and you we mentioned some of the rise of some of these you know data operations type workloads analytics machine learning. And now as we get into possibly generative AI and other things organizations are. A they're enticed by the potential, the business potential of these new technologies. But at the same time, there is insane levels of risk, too. As you open up things like all the regulatory type information, customer data, all this sort of stuff open to these new technologies and get them accessible everywhere because of all the different location boundaries and the different rules. So again, that just reinforces that tie that importance of mobility and agility. And and this this I would say, is another place where data orchestration versus the traditional model of data management by copy from outside really comes into play because orchestration the sorry for the background. My one year old is not feeling very well today and she came down with something. The modern world working from home. But the data orchestration, because the movement is from behind the file system. Now that the file system is outside of the infrastructure, it allows you to maintain ownership and control of the data, even while it might be physically distributed across different clouds or different data centers, or even across entirely different organizations. And what this means is that the access controls that are on that golden copy that are on your your data, the access controls and the audit trail, all of those things are maintained irrespective of where the data is positioned and where it's moved, because that movement is now from behind the gatekeeper, not out front of the gatekeeper. When you're making copies, you're basically opening Pandora's box and these things are scattered to the wind. There's no way to audit them. There's no way to tell where all they've gone. But if people are able to access them from within the same file system, then you have the full audit trail and you have up to date and most accurate access controls. And you can enforce compliance to regulation by being able to state where that data is allowed to be stored, where it's allowed to be consumed, because all of that now is under the purview of the data orchestration layer of the global file system. Yeah. So I love that because if I, if I kind of play that back, the the controls that you want align with the data are now aligned with the data rather than the location or the system. Right. Which, which limits what what's possible or creates complexity across you move. Because the other big thing that I didn't bring up at the beginning is all these different locations, all have different experiences. They all have different people running it. It's tough to tough to get the training across different environments. So so the more commonality that you can have across in terms of how you access that, that just pays dividends in terms of operational efficiency as well as value to the business. Yeah, that that uniformity in it. And this is why ultimately we need an agent that that represents the data and is independent of the storage systems or storage services or infrastructure providers. And that's really one way to think about the data orchestration layer is it's something that is representing the interests of the data and of the application and organization, you know, expressed in the data in a way that's that's independent of of the the infrastructure. I think it's worth mentioning, because it may not be clear to everyone who's listening that that point you just made, David, is really important that as organizations are driving these data initiatives and the topics you and Scott are discussing and a lot of cases, they're still going to use hardware investments. They've already made infrastructure, servers, storage, that type of thing. And in this data orchestration model they're able to do that. We're representing what to do with the data, but not requiring a big forklift upgrade to a bunch of new infrastructure. Yeah, that's right. I think you can't you can't do this as an infrastructure, as an infrastructure piece. It has to be something that gives you the ability to to leverage all multi-vendor, you know, multi-cloud and use anybody's infrastructure. Yeah. It's a great point. I want to add on that because, you know, I talk a lot with with CIOs and buyers. And one of my favorite comments, because I talk a lot about infrastructure modernization, transformation, these sorts of themes. And someone asked me, he goes, Scott. What's the difference between modernization, transformation and just buying new stuff? I've been buying new stuff forever. How is this actually different? And I think, David, you hit on it. And Molly, you too hit on a very, very important aspect of it because it is this idea of, no, no, we're, you know, we're delivering new capabilities, new transformational ways in which you can access and harness your data that are abstracted or not necessarily tied with the actual infrastructure behind the scenes. So you can you can leverage your existing investments. You can you can integrate faster technology when you need to. But the capabilities, the benefits are not necessarily tied to a net new piece of of net new system thing that you have to rip and replace. You don't have to rip and replace all your existing investments, which is incredibly valuable. Or you're able to leverage some existing investments. The ability to deliver those transformational business capabilities in terms of delivering operational efficiency is really what helps define what becomes truly transformational. And that goes directly to, again, the cost structure. Right. You're talking here about efficiency and utilization and being able to put data. You know where it makes sense. And this this is another interesting point. If you want to really put a point on how broken the relationship between data and infrastructure is today, just consider that I would argue, and this would be an interesting thing for you to pull your your, your readers on is, you know, the organizational structure of the data, the way that the data is grouped, the directories you put it in, the groupings or, you know, file systems, the bundles they put it in, I would argue, is more dictated by how you're going to pack it onto the infrastructure. This data needs to go over here. So I have to group it this way, this date over here we are letting the tail wag the dog. The infrastructure is dictating the very organizational structure of the data to us. And it shouldn't. The data ought to be able to be granularly moved behind the scenes so that the specific types of files, even though they might be in the same directory that don't need the kind of performance levels those can be tiered down and, and the, you know, the types of files that are needed for this type of application or that over on this data center or that data center, you don't have to rearrange the logical presentation of the data to get a different physical positioning of the data on the infrastructure. And the fact that that that we do that today is really indicative of the problem on how you're you're breaking it down and by by having the abstraction layer and allowing you to use all of your existing resources, you can now get better utilization of their particular strengths and weaknesses. Yeah. Oh, I was just going to jump in because, you know, that particular question that we had actually asked that a little bit ago. And what was really fascinating is I think you're you're almost you know, the problem may actually be worse than what you just talked about is now in the era of multi clouds and everything, organizations are doing a little bit more analysis up front because with with cloud, and if you put in the wrong cloud provider you can in different workloads. It it has different impacts on cost. But but in terms of on premises environments and different technologies, we ask, you know, hey what type of logic do you do up front. Everything else? How do you how do you actually approach this? And I want to say, I don't remember the exact numbers off the top of my head, but I want to say the dominant means was we just throw it on whichever one has the most available capacity, which you're like. You don't look at the performance characteristics, you don't look at the availability. And I know that's not true for everyone. If you're watching this, I'm sure you do some more logic, but it was really scary how many organizations well, having. Enough space is the first order. Problem. Exactly. Let's just make sure we have enough space. Exactly. Yeah, interesting. Very interesting. So as we think about that, you know, we've talked a bit about, you know, driving deriving value from data instead of just storing it. The economics and the efficiencies you can get through the software layer, orchestrating and representing the data. But I also think some of those who are listening to this will be thinking about their workflows and their environments and the performance requirements they have. And in the past they've used very specialized architectures, parallel file systems, whatever it might be. You know, thinking very carefully about the low latency aspects of their storage environment. And they're probably think, well, I can't compromise on that. I still need performance. David, maybe you can touch on that a little bit as far as not just what we're doing, but your concept about performance everywhere. Yeah. You know, one of the one of the things everything we've talked about is all well and grand. But generally when you add an abstraction layer, when we talk about, you know, automating movements, all of that implies, oh, it's going to be slower because now you have an intermediary that is gating it. And that couldn't be further from the truth. And it's because of the particular architecture. And we think that that is, by the way, what makes this for the first time possible is that we are using at Hammerspace the architecture of a true high performance parallel file system, where you separate metadata from the body and contents of the files in the data itself, and that way the data can be routed directly and in parallel. And let me say that is this is the first time ever in the world of enterprise Nass that you have a parallel enterprise. Nass. We had scale out NAS introduced after scale up Nass. But it's something else entirely to go to a true parallel. And, you know, while we've seen that in the exotic file systems, in the high performance computing world, like with luster with an L and IBM Gpfs and maybe Weka and and others, those are exotic in the sense that it's custom protocol, it's a custom client, it's a custom storage server. And the thing about Hammerspace is that we're using the NFS standard. Yes, we had to introduce a newer standard that in the NFS 4.2, but it means that it's now built into every version of Linux since Rhel seven. It's built in, and not only is the client built in and the protocol standard, but even the storage node is just anything that speaks NFS, including your existing filers and so forth. And anything that speaks object and any block device that you could easily put Linux on to export as NFS space becomes a storage node. And so we're talking here about, you know, for the first time, not just the client and the protocol, but also the storage node is all open. It's all industry standard. And that has never been done before in the world of high performance in parallel. And I view this as a linchpin necessity, because if you have to put data into a different file system to be able to serve it fast enough, then you're being compelled back to the old world of copy and merge. And so the only way for data orchestration to truly succeed is if it can span even the most high performance end of the spectrum. And so what we have done is we've extended the world of enterprise to a new level of performance capability by incorporating parallel file system technologies to where it can now attain those performance levels. And it couldn't be more timely that we do this because with AI workloads, you're driving HPC levels of performance. Hpc is becoming mainstream because of this need to to feed data at such high rates into, you know, large arrays of GPUs. So this is what really I think makes this magical. There was a time in the past when we introduced an abstraction layer on the compute side. We added hypervisors and and server virtualization. And that came with a very significant overhead and tax that was well worth paying. In the end. We made it more efficient over time, but it still, especially on the I o stream an overhead that people pay to virtualize it. Now to have servers sit there underutilized was a very bad thing, so it was worth doing that. But the interesting thing is here that we're adding an abstraction layer, abstracting data from the storage, storing it the same way that abstracted the server, the OS from the server running it. But in this case, it's actually unlocking true parallel performance that you didn't have before. And I think that's what makes this really, you know, a killer is that you get all of these benefits of unified. Namespace. Gold. Golden. Copy. Data positioned locally to everywhere you need to use it. But then you're also putting in something which can feed it at performance levels that have never been seen before in the enterprise world. David, I just want to jump in. I'm glad you brought up the performance aspect because, you know, and I've I've written on this a couple of times, but I believe one of the most outdated concepts in storage is remember that old what is it? They used to use the pyramid to talk about it, you know, kind of that data pyramid that talked about at the very top, you had a little bit of high performance performance niche. And then you kind of. Had the warm data in the middle, and then you had the cold data at the bottom. That's just not really how how people do it anymore, especially with the rise of business intelligence, analytics, machine learning now environments we need it's not it's not small and fast and big and slow anymore. Everyone needs big and fast. And to your point, they need highly distributed, big and fast. To your point, because we're doing so much analysis and data right now that that old pyramid concept just doesn't apply. The other thing too, is just to highlight artificial. This is not the only workload where this matters, but but I think it's a good, good example because we did some some specific analysis on machine learning workloads. And one of the things that people think about is often when they approach it, they think about those kind of those very hyper specific environments where, well, I have a very high performance environment for the training aspect. And then I maybe I use a colder storage on the back end for the data, for the data lake or the data pipeline. The bottom line is, one of the most beneficial things you can do for these environments is you have to accelerate the entire data pipeline, because data cleansing, all the different things you have to do to prep the data is in is it's resource intensive, it's active. It's where people spend most of their time. And also what you're trying to do is you're trying to improve the utilization of all those highly expensive GPUs that most people had to buy to actually do the training. You're going. Exactly. It goes to the cost structure and utilization. If those systems are sitting idle, waiting for the data to get cleansed, waiting for the ingest to happen, waiting for a checkpoint to be written out. And the thing about these workloads, two things that I've heard from, from our, our customers in this space is, number one, when you ask them what kind of performance, you know, bandwidth, IOPs, latency, small packet, large packet reads versus writes, the answer is all of the above. We're going to need all of it. And it's because they're doing a very diverse set of functions on the data. And and then you have the fact that these are extremely bursty workloads, right. You know, when it needs to ingest, every GPU needs to ingest it once when they need to write out a checkpoint, every GPU needs to write out a checkpoint at once. So it's kind of the worst case scenario in the sense that you've got bursts of activity at which the system is sitting idle and wasting dollars. Right? And you have to get access to the full breadth of the data. So it really does mean that everything, like you said, it's no longer tiering is no longer it. Right? You've got to be much more agile than that and have the large data sets able to be fed at at very high speed. Yeah, absolutely. The other interesting trend that we have on those environments was we asked people that were building them, how much capacity do they think they need? And then we asked people that were running them, how much capacity do they have? And the people that were running these environments typically had about double the capacity of what the people thought they would need when they were building them. So the point is, these environments are going to scale. I called it the the, the massive success problem because typically I, I initiatives tend to be very successful for the business. And what happens when things are successful. Well, we got to do more of that. So they scale very quickly. Yeah. I think wrapped up with both of you as we were prepping for this conversation, mentioned the importance of investing in or designing for flexibility. I think that's almost a direct quote of something you said this week. And David, I think, heard that from a customer he was talking to this week. So maybe let's just wrap up with, you know, your thoughts in that area is a dynamic world. People don't exactly know what their architectures are going to look like, or how many models they'll need to get their data to, and the importance of being able to be flexible in the future. Yeah. You know, I can just jump in and take a stab at that. First I would say, you know, I do a tremendous amount of research in this space. I talk to a number of CIOs, and if there's if there's things that we know, okay, things that I absolutely, for 100% certain can say is you're going to have more data in the future than you have today. It's going to be as distributed as it is today or more distributed in the future. And the next one is you are today. You do not know exactly what the the access profile is going to look like. Six months, 12 months, 18 months from now. You don't have a clear view of that. So if we think about just the things we know, you're going to have more. It's going to be as distributed or or even more distributed. And we don't have a perfect vision on what the future is going to need. So all that translates into, look, you have to invest in agility, flexibility of data movement, because essentially that's the only way to address this problem. And the thought I'll leave with here is that with data orchestration and having that movement for the first time happening behind the data presentation layer, where it doesn't disrupt the ongoing access and use of the data, that for the first time opens up the possibility for the movement, the placement, and the decisions to be done automatically. It's ironic. There's nothing more digital than data. It's the very definition of data. And yet the work of setting up and copying and merger, the, you know, the the state of the art pre orchestration couldn't be more manual. The selection of where to store things and when to take the app, downtime to move it around and all of that. It's been crying out for digital transformation forever. And here data is I mean it's a sin that in the IT world that that it is such a manual process to even if you use tools to automate it, they are one offs right between this point and that point, between this system and that system. Right. It's been crying for it to happen. And and it's with Hammerspace and the data orchestration layer for the first time, you can now actually use machine learning to proactively position data across the infrastructure for these applications and unlock the use of AI and ML to solve one of the biggest challenges with how do we accelerate the cycle of of of getting benefit from AI and ML and that that comes to the ability to distribute data. Great. David Scott, thank you for taking the time today to talk with us about these topics. It's really interesting conversation and certainly challenging a lot of thought processes that have designed today's IT architectures. But also the time is now a great time to be talking about this. Thank you for your time, and certainly for those of you in the audience, if you have any further questions, we'll hang out here for a few minutes. You can just type them in and we'll be sharing a copy of this recording out with you all after things tied up. Thank you so much.