Today, we're talking about hyperscale storage that nearly anybody can deploy within just a few minutes and with just a few commands to get going. Mike Matchett had the opportunity to talk with Bjorn Kolbeck, Founder of Quobyte, a company whose mission is to bring two worlds together: One, to serve demanding scaleout workloads in the enterprise, and two, to offer massive scalability on the operational side of their product.
Mike Matchett: [00:00:00] Hi, Mike Matchett with Small World Big Data. We are here talking to one of the brightest stars in the storage space, and that's Quobyte. We're talking about really hyperscale storage that almost anybody can deploy with just a few commands and get going. It's really some interesting features in there. I've got Bjorn Colback here today. Welcome, Bjorn.
Bjorn Kolbeck: [00:00:23] Hey, Mike, good to be here.
Mike Matchett: [00:00:25] All right, so tell us, tell us what your role is at Quobyte and how you kind of got started with this and what was how you got inspired to create another storage system?
Bjorn Kolbeck: [00:00:34] Sure. I'm one of the co-founders and I started Quobyte together with Felix Upsells, and both of us started our career in high performance computing. So we saw we worked at a supercomputing center and we saw how you basically run very large infrastructures. And after finishing our PhDs, both of us worked for Google, and that was a very different experience, also very large infrastructures. But I think what Google figured out, where we want is still trying to replicate. That is how to manage those very large infrastructures with tiny teams to really make the operation scalable. If you if you know, HPC high performance computing, you know, the joke takes you have an army of PhDs to run ultrafast system or a supercomputer in commercial settings, you can't afford that, and Google really cracked that problem. And that's why we started Quobyte to bring those two worlds together. On the one hand, you know, serve scale of workloads that are very demanding and that have moved into the enterprise today and bring the Google style operations, the massive scalability on the operational side into our product. And that's Quobyte today.
Mike Matchett: [00:01:45] So if you bring this kind of storage as high performance quality storage down to the fact that we don't need that many PhDs, what are they going to go do? I'm just I'm just, I'm just teasing you. You can put all these storage pieces out of work.
Bjorn Kolbeck: [00:01:58] Yeah, if you work in that, you usually have more work than you have time in a day. So if you can make storage less of a less of a task, less of a time consuming thing. I think everyone wins.
Mike Matchett: [00:02:10] All right. So I think people who are watching us are probably now curious as to what Quobyte is made up of. So let's lift the hood a little bit. It's it's a software. It's a software storage system, right? So some people might call that software defined or whatever else, but it's basically something that will run on pretty much any server you bring it to, right?
Bjorn Kolbeck: [00:02:29] Yeah, that's why we call it real software storage, because you can actually download it, install it on any x86 server and turn it into storage system. So we believe that, you know, the power of software. Why everyone loves software is the ability that you can just download it, install it and then you have a running running storage system instead of going with appliances that need to be racked and stack to talking about weeks and actually shipping physical hardware. But here you can download Quobyte and install it yourself, and you have a working storage system that's highly scalable, reliable, has the security features and you have that in minutes.
Mike Matchett: [00:03:05] All right, because it's a scale out system and you've done all this great clustering, you can put it on any hardware, even hardware that's not completely certified to the nth degree, like some of the other guys might want to do. You can run this even in clouds, right? You can run this on cloud instances and create petabyte size clusters.
Bjorn Kolbeck: [00:03:23] Exactly. We actually we don't certify hardware. We recommend hardware but you can use whatever you prefer. And because we're 100 percent software, we also run on the public clouds and you can go wherever your workloads need to be or whatever you decide works for you best and then move between the cloud and back. On-prem Quobyte gives the same environment.
Mike Matchett: [00:03:43] All right, so we're going to we're going to. I know we're going to touch on performance before we finish this discussion, because that's one of the key things here. But let's stick on infrastructure for for a little bit more. So if I bring in nodes that have disk drives in them, you'll use whatever storage is there, ostensibly. But what if I mix flash and hard drives? Do you guys create tiers or do you use one or the other? How do you have customers set that up?
Bjorn Kolbeck: [00:04:08] You can use both at the same time with Quobyte and not just this tiers. I mean, that's a traditional way in our software can also tier between the two, but we can also combine the two storage media inside the same file. We call that out of layout, and that solves the problems you have when you have a mix of tiny files and large files in the same same file system. Because whether it's machine learning or science workloads, you typically have this mix. Sometimes you don't even know. And we solve that by automatically switching between the two, even inside a file as it grows and give you the best price and performance this combination.
Mike Matchett: [00:04:41] So I think this is fascinating. So if I have lots of small files and they fit in the flash layer that I have of the flash disks, you put them there. As the files grow bigger, I can start in flash and a file will extend out as it grows bigger into the hard drive space within the file itself. And now let's touch on performance a bit. How does that help me on performance if it's just it's just partial in Flash? And partially in the hard drive.
Bjorn Kolbeck: [00:05:01] Yeah, for this example, if the first part of the falls and flash, and that basically solves the latency problem, so the first access to the file will be very fast. And then in the background, we start to load the data from multiple hard drives up to, I think, 15 six and hard drives in parallel. Ok. In the files. So basically the time to first byte is reduced the flash and then we stream from from several hard drives. And this is how you get up to two gigabytes per second on a single file format, right?
Mike Matchett: [00:05:29] All right. So you just get this firehose of information and to speed that time the first bite, you can layer the flash in there on the on the first part of the file, which is very clever and that happens automatically, right? You can set that up on a basis and do this, which is cool. And can I can I adjust my policies in where I store things down to a volume or more granular
Bjorn Kolbeck: [00:05:50] So you can actually adjust that down to the file level? So we have all the nice metadata information because we're processed like extension, who owns the file, what's the size? What's the last access agents on? And you can use that to define your own policy. So the order layout makes your life easy, but you can decide that certain users or file types should reside exclusively on flash. Or maybe everything lands on flash and then goes down to hard drives so you can decide that down to level and say OK. Bjorn's KnowBe4 files go to hard drives only because he creates a lot of data, and that's annoying for everyone else. You can go to that level.
Mike Matchett: [00:06:27] All right. And again, this installs. Do you just get it off the internet? It's pretty easy to download. And is it it support containers, containerization? I mean, I'm assuming cloud native, you've got a story there.
Bjorn Kolbeck: [00:06:40] Yeah. You know, the software real software storage. So we also run in containers and we have a chart on our website that you can use to deploy the server part on containers on Kubernetes. The client part, we have a CSI plugin, so you basically run your storage system on Kubernetes and deliver storage to Kubernetes DR.
Mike Matchett: [00:06:59] So for a full Kubernetes support for that storage as well, which is really nice, and I did see something on this that said that if I have the cluster going, it uses some part of your name quorum kinds of voting between the cluster nodes so you can take out any node here for whatever reason it drops or it fails or you're doing upgrades to it and the rest of it still works. How does how does that? How did you architect that?
Bjorn Kolbeck: [00:07:27] Yeah. We basically built the system in a way that failures are OK. And it's not just failures. You know, if you want to do things like non disruptive updates, hardware maintenance, hardware refreshes, adding nodes, moving nodes, all that without disruption. The system has to be able to tolerate the loss of a node and temporarily or permanently. And so we designed a system from the ground up to be OK with that. And the nice benefit of that is that, you know, you can do all your admin work, all your maintenance work while the system is running at full speed. So the need for maintenance windows, anything like that,
Mike Matchett: [00:08:00] I think you might have inherited that sort of mindset from Google that sounds like something of a hyperscale cloud provider would want to do, right?
Bjorn Kolbeck: [00:08:07] Exactly. Because, you know, if you if you think how difficult it is to coordinate maintenance windows even in small organizations, because you need to talk to the application owners so that people in the data center that own the hardware, it's a lot of coordination. If you if you try to in a larger scale, an organization like Google or a hyperscale that it would basically it wouldn't work. You couldn't run at that scale. So that's part of how you run your storage efficiently if you spend too much time on storage. The coordination, the maintenance windows, the way you operate, your hardware is a big part of this.
Mike Matchett: [00:08:41] All right. And let's just circle back now to performance because we're talking about this as I scale the system being a scale out system. How big can I go and do I maintain a performance linearly with scale? Or is there a drop off curve? What can I expect if I'm scaling this?
Bjorn Kolbeck: [00:08:57] So we have customers with hundreds of servers and production, and you can easily scale to thousands of Quobyte servers and it's linear scaling. We both the architects in a way that doesn't have bottlenecks. Part of that is the form of algorithms and how we elect leaders that you mentioned earlier, and that allows you to have linear scaling. So when you double number of nodes, you'll have twice that performance. Whether you go from four to eight or from 100 to 200 do not have diminishing returns. So it's too linear scaling,
Mike Matchett: [00:09:26] Which is which is fast, so linear scaling. And I can take nodes on and offline and work on them and upgrade them without disrupting everybody else. So this is an non-trade, I guess, right? You've got some other kinds of replication going on.
Bjorn Kolbeck: [00:09:40] Yeah, we use the quorum replication that you mentioned. So basically synchronous quorum replication for data redundancy or erasure coding, and we can also combine the two for basically best performance and best cost.
Mike Matchett: [00:09:53] Ok, so so lots of some tuning knob, not that you need a PhD for it, right? But somebody who cares about how much my clouds. We might be able to go in there and say, these big workloads go here, these smaller workloads go there and line them up by what that division or department as budget for licensing. Just just quickly, there's obviously a free version of this. What does that take us to?
Bjorn Kolbeck: [00:10:18] So the free version gives you 250 terabytes of hard drive space and 30 terabytes of licenses. You can download that from the web and just use it also on the cloud. So that gets you started comes with all the features like Kubernetes support, Hadoop. plug in our native driver and so on. So that's that gets you started immediately. And you know, if you have a small, small cluster that should be enough and then you can use our infrastructure addition with basically unlimited capacity. And some of the enterprise features like security and Windows drivers.
Mike Matchett: [00:10:55] All right, so I can get started up to 150 terabytes without without cost on there, which seems big to me, but I know that a lot of your customers really started a petabyte or more, and this is where they're really, really digging into it. But I was like as a file system, that's great. You know, get that up there. When you have an object in this file system, let's talk a little bit about that. It's it's it's a single namespace, I understand in the same cluster. What kinds of protocols can I go after those objects with?
Bjorn Kolbeck: [00:11:25] Yeah, we one of the few systems where file and object are in the same namespace. So that's two protocols that you can use, like the native Linux driver, Windows or Mac, and then access the file as an object and vice versa. That makes data sharing very easy. And then we also have the Hadoop. driver that I mentioned API for high performance computing folks, TensorFlow plugin, and they all share the same namespace.
Mike Matchett: [00:11:49] So if the file in the object, same namespace and I can get at that with the same with different protocols, which is pretty cool. This sounds like you've got some great ways to compete with some of the stodgy or older folks in the industry, and I'm not going to pick any names right now. But one of the research reports we've been looking at and something where you guys really shined was in a life sciences study we recently did with DCG. And let's just talk just briefly about that. The workloads in life sciences are generally large there. The the workflow requires multi protocol support and or there could be a large number of small files. So these are large files or large numbers of small files. And and there's, like you said, sometimes an army of PhD people required to go after it. How are you guys doing with life sciences? Do you have folks using it in that that way?
Bjorn Kolbeck: [00:12:44] We have customers using it, and they particularly love the Oort Cloud because life science is one of those use cases where you have usually a mix of different class sizes. Also depends on the actual application that you have. If you work with genome sequencers, you have a lot of tiny files, then you often have image files, whether it comes from imaging technology like microscopes or MRI. Those tend to be larger. Often you have to work with both on the same storage system. So that's why why license customers love to scale out with the use of the auto layout and then the license? You often work with sensitive data because it's medical data. Sometimes it's identify persons to security is a big one in this segment as well. So this is where we bring the end to end encryption, where we have stronger access control certificates, TLS connections to make sure that the storage that you have a high level of storage security and you get with traditional protocol second access,
Mike Matchett: [00:13:48] You can even encrypt the data, understand so that the storage admins themselves don't see the data. They can only administer the volumes. So sounds like some really great security features there as well. And I suppose that would apply also to other verticals like finance and some other folks that do health care and, you know, oil and gas and stuff like that.
Bjorn Kolbeck: [00:14:09] Yeah. In the end, anyone who has a of problem and then I would say the level to which they need security depends a bit on the sector. So lifestyle is often more conscious about this health care to other sectors like oil and gas. Maybe not as much.
Mike Matchett: [00:14:26] Ok, so high performance computing level storage, cloud level, scale out facility downloadable as software run on anything you want. Where can I get a copy of this, Bjorn, or someone's interested? Tell the audience how they can get their hands on it.
Bjorn Kolbeck: [00:14:43] Go to our website Quobyte. There is a big button in the top right corner for the free edition. Click on it. You copy one command and that launches the install process in a few minutes later, you have a working, reliable SQL Quobyte cluster,
Mike Matchett: [00:14:57] And it's not millions of dollars in six months to acquire and rack into your data center. It's just download and go and you've got a high end storage, folks. Check it out. Bjorn, thank you for being here today and explaining this to us, and congratulations on that top five placement in the Digg report for Storage for Life Sciences.
Bjorn Kolbeck: [00:15:17] Thanks, Mike. Thanks for having me here today.
Mike Matchett: [00:15:19] All right. Thanks, people. Check it out. Download that, that storage and get it going. Take care.