Transcript
Hi Mike Matchett with Small World Big Data and we are here today talking about security. Of course, we're looking at how you really do a better job of finding those hackers, the people trying to get into your environments. You've got tons and tons of data these days, bigger and bigger attack surfaces. You've got like trying to find the needles in the haystack, longer and longer dwell times, it turns out by the hackers and they're of course, they're using AI to do some terrible things to you as well. So the threat environment has gotten pretty hostile. We've got DeepTempo here today talking to us about how they're helping you fight crime. Hold on a second. We'll be right. Back. Yup, yup. Hey. Hi, Evan. Welcome to our show. I think it's been quite some time since we've last talked, and we've both gone forward in this marketplace for some time. But you're now with a cyber security company slash I slashed machine learning. I mean, you're doing things at the cutting edge, at a high and high big scale stuff called DeepTempo. Uh, what what got you into DeepTempo from what you were doing before? Hey, Mike. Thanks for having me. Uh, what got me here is. Look, I think we all know cybersecurity has some challenges. We all read the paper or. Sorry, get our news from TikTok or whatever the case may be. We know bad things are happening. And when you when you double click on it, you see that spending and cybersecurity is up around 200, 250 billion a year, and losses from cyber attacks are measured in the many trillions of dollars. And so as a long time entrepreneur who's had, let's say, 5 or 6 companies hit product market fit, serving largely well, basically almost entirely large enterprise, just chatting with some of those CIOs and others I know in the industry, they've said, look, you're taking a run at something. How about take a take a run at cybersecurity? And how about shaking up? Yes, the technology, which we can talk about, but the industry itself, something's got to change the attackers. As you said, Mike, the attackers are innovating. They're using AI. How can we use AI to defend ourselves better, not just by automating the SoC, which is important, I'm sure, but by finding problems sooner and more complex attacks, including those AI generated attacks. So that's I've never been more excited, actually. I've also never been older. I guess that's true of all of us. Every day is another day. This is a big one, and I couldn't be more excited about what we're doing at DeepTempo AI security. I mean, it's just it's the scale has gotten big. Our perimeters have gotten nebulous. Uh, the hackers have gotten AI as well, right? It's just a huge problem. Uh, but there's lots of solutions out there already. You know, we've got you've got you've got event management tools. We've got detection tools. Just just quickly. Why do you think those aren't up to the job today? I mean, given what you just said, we can draw some lines, but just quickly. What why can't I just use say for example, Splunk and get the job done? Well, of course we we fit into Splunk, but Splunk alone won't enable you to see the sorts of attacks that we can see, and we are focused in as a defense in depth solution. We're focused in at that network layer, dealing with, uh, you know, billions of records per day as an example for one of our design partners. Right, which is a which is a large bank, um, and that scale of ingestion and then making sense out of what is happening is tricky. Um, so this is not rules, which is a typical approach in security. Not bad. But you know, hopefully we are all already have rules on, you know, this port shouldn't be open or this port shouldn't allow this amount of data coming out of it. What this is, is the model is learning many thousand sequence of events and learning which of these might be concerning to you in your environment, and flagging on that and then giving you all the context, like what type of, uh, entity are these mail servers that are starting to behave strangely? Uh, is it, you know, Evans, uh, everything even touches looks odd or whatever the case may be. And mapping back to miter attack. The miter attack is sort of a, you know, lingua franca if you're living there in the SoC, you know, miter attack patterns. So we're giving you that mapping back. But the model itself is using deep learning and is able to see things in just much greater granularity than existing rules based or even traditional ML based systems. So the main reason folks probably look at us and try us out on snowflake, uh, is that it's just more accurate. Secondary reason is probably that they, you know, they've got more threats to fight and a fixed amount of money. So can they use a solution like us running upstream from Splunk, literally on their data lake to reduce spend for those legacy systems while improving accuracy? Right. So some cost efficiency arguments to be made here, uh, before we're finished with this. But let's focus a little bit on that up front technology, because you mentioned some things that are very big data ish. You know, you mentioned talking about running on snowflake and you talked about, uh, doing things that are thousands of events long, uh, which which tells me you're not doing the regular things that are other older, you know, an intrusion detection system might do or just a event. Anomaly detection might normally do. What are what are some of the things you're, you know, open up the hood a little bit. Doing differently at that scale to really be able to operate at that scale? Well, we are standing on the shoulders of giants, as they say. And in this case, the giants are those that invented language models and, uh, large language models. We don't use those specifically. We build solutions called log language models. Okay. And so you can think of them as a cousin of an LLM, except it's an LLM or sad LLM who's only ever seen network security and flow logs. Never got to read Shakespeare, only got to read network traffic. Um, but they end up being quite powerful. And because of all the investment and all the comp sci that's gone into the underlying layers they are able to handle mass quantities of data. A. B. The way these systems are built, they're actually quite adaptable to new domains. And so part of the value is not just accuracy, but being able to take a model that you've built with 100 billion records or more at this point and adapt it for you, uh, new enterprise in snowflake so that it runs great and gives you indicators right out of the box, or maybe with an hour of fine tuning. And that is just like full stop. That's revolutionary. Assume it's equally accurate to the existing machine learning systems. Those machine learning systems tend to take several months to adapt. And what I just said is we're talking an hour, right. To adapt that alone is transformative. And then we believe we actually have better accuracy as well. I mean, just to just get clarity here when you say it rides on snowflake, but it only takes an hour. I mean, even I think moving someone's snowflake into your service would take longer than that. Uh, so. So how are you implementing this? Great question. And that's as experienced folks in and around data. You know, we've always talked about shouldn't we be moving the intelligence to the data as opposed to piping data all over the place? So in the snowflake implementation and again you can run us on prem. You don't have to use snowflake, but it's a it's a favored way to use us. You literally can grab our models and related software as a native app and run it in your environment. So there's really no data movement required. In that case, we're running in your environment. Um, so that that that is that's how that's done. It's part of this effort by snowflake and others in the data lake space to kind of turn themselves into an app store for AI, and we're a bit of a a favored solution, we think, in that we're one of the first, if not the first delivering indicators. So indications of attacks as what they call a native app running right there in in snowflake. But again, we run on premise quite a bit, especially with the banks and other partners. Right. So you're not moving the data, you're bringing the models in their containerized forms into the environment. And that seems to be what you should do. I can imagine there's some other analogies there in the future to be made about bringing the intelligence to the basically more towards the edge than all the data from the edge into the center. Right? Because we're just going to have too much data to keep always moving it to the middle of anything. Um, so let's, let's talk a little bit specifically then about what what you're doing. It's a, it's a network Level kind of interrogation and identification of patterns. Right. But you talked you sort of mentioned it's I, it's a it's a log language model that's looking, if I understand this right at sequences of things that could be, you know, thousand sequences long or, or more, right. How does that differ from what people are doing today? And does that give you the advantage? I think it might give you. Yeah. It um, I mean, I gave a little talk this weekend on this and it's, uh, and sort of the most mind blowing moment, I think, to folks was when we just said, you know, hey, let's, let's talk about what a model like this does. So at every step, it's predicting what should come next in this, uh, several thousand sequence, uh, list. And it predicts for that next moment every possible alternative that it has ever seen, and it has a percentage likelihood of that occurring. So if the next it could be forward packets 80 or whatever. I don't know. Could be backwards packets this. It could be oh it's this service. So somebody flagged it. It predicts all of those. So underneath this thing, just like with the language model, there are basically attention matrices of, you know, let's say 18,000 by 18,000 events every step of the way, in parallel across, you know, thousands of these. So it's sort of mind blowing what's occurring here. But what the so what of that is it? It can be totally non-linear relationships. It can be very long duration relationships. It can it can be much more expressive than traditional ML, which relied in part on humans to do the feature engineering and specifically is really, really. Just said it was x of this and y of that. And now you're saying. You know. Here's this huge probability matrices using the transformers and using, you know, like you said, attention matrices. That's pretty cool application of that technology. So it's it can pick up, you know, I think it's more than hype. Something that you're hearing about now is AI. We're not using it much in defense, frankly. Again, the cybersecurity industry tends to be very slow to change because things are working so well. Right. Sorry, but, uh, but the cybersecurity attackers, they are adapting and adopting AI. So you have to have an approach that is good at picking up novel and, uh, and subtly different attacks. Rules can't do that. Most traditional ML approaches are really challenged with that as well. So so, you know, when we talk about, uh, AI and probably more specifically machine learning and, you know, not the nebulous part of the intelligence thing, Uh, we talk about effectiveness in a couple of different ways. We have different scores for that and everything. Could you just, like, at a high level, tell us, you know, what people could expect if they're looking at DeepTempo on their data and in terms of how effective they can be at their their SOC operations and identifying, you know, things that they need to look into. Right. So it begins with the models themselves. And we're getting, uh, F1 scores, which is just basically a mean between false positives, false negatives, that whole confusion matrix in the 99% range. So that is to say you will get some false positives, but it'll be in that, you know, 1% or sub 1% range. Um, and most of our clients like us to tune for, uh, eliminating false negatives if at all possible. Um, and so that's, that's what we're doing. But then that's the model. And so it's great. It's much more accurate than prior approaches. And you can look at papers out there like from Bert and others that, um, that we benchmark against. Um, but then once the model says something we don't believe, it should just tell the humans. I mean, that's fine, but we then add context on top of it. And that context includes attack patterns. So miter attack patterns. So you might decide that if the fit it looks a lot like a known miter attack pattern. Make that a P1. Right. Or let's say it and it hits, you know, more than three entities in your environment or whatever. That's something that can be done in your SIM. But the context we provide you is what entities are getting impacted. What is that miter attack mapping in the strength of that mapping. And then of course if you want it you can go all the way down to that underlying sequence. So it's not a black box in that sense. You can see what made the model concerned. And then conversely, you can also say like, can you just stop alerting on that type of sequence? So we have this notion of a white list sequence as well. Um, but then eventually shows up with context in your sim or in your Splunk, let's say. And you can see, ah, okay, what else is happening? You can use all of your workflow, all of the education you've put into your teams to really triage from that point forward. Yeah, I mean. I love the explainability part of that where you can actually go back through. A lot of people developed over the years, a lot of neural nets that were just opaque. And you would say, this is a problem, but you couldn't figure out why it thought it was a problem. So to be able to go back and say, this is exactly why we think this is a problem is really great. Uh, also, I believe there's that sort of that reduction in what somebody has to look at every day. And because you're getting that those high rates on that F1 score where you can say, hey, instead of looking at 10,000 things, look at these three things. This is what you need to be looking at. And that's that's a savings to uh, and and then you just, you know, scalability. I think we just sort of skipped by it a little bit when we're talking snowflake, we're talking this we're talking a lot of data on a global kind of enterprise scale, not simply, you know, like one web server somewhere generating stuff. Right. This is this is a solution that's going to help in help the biggest of of global conglomerates, if you will. Yes. Yeah, absolutely. And, and do so in a way that is even adapted to their different domains. So part of, you know, if you pop the hood in any big enterprise, it looks like a lot of enterprises like put together. Right. And so you want a system that is able to adapt to retail banking versus, I don't know, wholesale if those are different or, you know, custodial, which may be a totally air gapped environment that doesn't look anything like the others. Um, But turns out they do, at least in the eyes of the model, and it is a foundation model that's able to adapt very quickly to a novel domain, or a domain that it has not seen before. And I did want to just emphasize something you just said too, is we have a burnout problem, right? We have a huge burnout problem in cybersecurity. And part of it is, generally speaking, and this is from an old open source guy. In my opinion, part of what's happening in cybersecurity is solutions are sold almost entirely top down. So they're sold via folks who are well-meaning and brilliant people. And we like CISOs, don't get me wrong, but they're not actually living in the SOC. They're not the ones having to use the solutions. And as a result, you have, I think, this feeling of disempowerment, not being empowered, uh, if you're a operator. And so that's one reason that we are so excited about the snowflake relationship. You may not be all in as snowflake for your security data lake yet. Let's say that's fine. You can try our solution on snowflake nonetheless with a free account. And we have one version with data included. So it's actually a sample data from the Robert paper I mentioned quickly from Canadian Institute of Cybersecurity. You can grab it and try it in five minutes. So just think about that. Your team has tried it. They've played with it. They're already thinking, oh these IPS are problematic. We pay a lot of money for, let's say, a threat intelligence service that tells us about problematic IPS. Maybe we should do a join and look at. Yeah, we think it's that threat actor. So you can get your hands dirty in minutes or a day as opposed to after a long process that comes top down, and we think that is a piece of what needs to happen in cyber to get people who actually do the job, the day to day job more empowered. Um, as well. So sorry to rant on that, but. No, let me just ask you this. So I mean, you're already sort of saying this is easy enough to kick the tires on and try out, but just, just sort of a final question here. Time to time to try that out, time to implement. And if they if somebody wants to do it, do they just go to your website. Where would they start. Yeah. Well you can, you can Google, uh, you know snowflake and DeepTempo. You'll find some press. But you're also find the listing of us on snowflake itself. You can go to DeepTempo AI as well. Learn more about us and the team, the approach and get to the underlying, uh, underlying information as well. But. And don't be a stranger, you know? Reach out. Uh, we're building this company, um, on word of mouth and the last several companies that ended up getting a lot of product market fit and a lot of usage. How does that happen? I mean, talks like this help a lot, like, frankly. And it's just getting the information out there and, uh, yeah. And you ask questions, you know, as you try it out, ask what about this? And you better map to this attack pattern, you know, how does this fit with my rules? We have some code in the in that we work with to assess rules coverage versus our coverage. Happy to chat with users about all that and more. All right. Well, thanks. Uh, thanks for explaining this today. You know, and it's tough. It's tough in just a few minutes to really get into the under the hood here on what you're doing, because you're doing things at a speed and scale with some new technology that's just hard to. Verbally describe. Right. But but people, I think, should be able to understand that you're really bringing AI tools to the good side of this problem and large not language models, but log models, as you're saying, sort of analogous technologies so that people can stay ahead of their unethical AI hackers trying to break in, they can apply some of those technologies to the good side and get ahead of that, possibly saving quite a bit of cost on what they're doing today, certainly saving on effort if they do this and focusing their attention to what really needs to be focused on. So lots of good things there, Evan. Uh, check it out, folks. Uh, DeepTempo AI. Uh, thanks for being here and explaining that to us. Thank you. Mike, this was great. All right, take care again. Once again, DeepTempo AI, go check it out. Bye. Yum yum yum.