Transcript
Mike Matchett: Hi Mike Matchett Small World Big Data and I'm here talking about privacy, compliance and things like that. It's really difficult these days, particularly in larger and larger environments with data stored in more and more places in our hybrid world, our sanctified world, um, multiple data centers to really ensure that we can be compliant when someone asks for their information to be deleted, for example. It's getting really tough. We've got Tarique here today who's going to help explain, uh, what, uh, what they're doing with Chorology and, uh, their new product and solution to help people get on top of that. So just hang on. For one. Hey, welcome here today, Tarique, how's it going? Tarique Mustafa: Good, good. Mike Matchett: Uh, tell me a little bit about how you first got involved in, uh, this solution space. I mean, it seems like kind of a corner of privacy and, uh, what you would call the e-discovery and, uh, compliance. What what brought you into this world? Tarique Mustafa: Yeah. Actually, uh, my, uh, this is very interesting story. Um, you know, my main, uh, another company called Google Cloud, which, uh, we have a very important product in cyber security space, uh, for generation, cyber security solution for data lake and exfiltration attack prevention. The product, uh, is doing very well. My SVP of North America, when they came to me about three years ago, I think it was 2019, beginning of 2019. And, uh, he was almost in tears. Oh, what what happened? Well, uh, I've talked with about more than two dozen CSOs, and they all are in tears. Why are they in tears? Well, because the CcpA, which is California Consumer Privacy Act, which was, I believe, enacted, enforced in California in 2018. So they spend their money, their budget, buying products, solutions to enforce CcpA. And lo and behold, when they try to turn the engine on, it is not starting. So what happened when they opened the hood? There is no engine in there, literally. And these are the products coming from the incumbents, uh, who charge a large sum of money for their product. So they went to the vendor and they asked, okay, why isn't there any engine in there? So they said that, well, you need to go and buy a third party DLP solution or, uh, data discovery solution and integrate this with this, um, CSP solution. And then you have a compliance solution. Well, you didn't tell me when, uh, you know, you sold me the product. Well, read the fine line, the fine print in the in the the fine print in the, uh, contract. That was literally the case at that time. So here we have, uh, fortune 500 company, uh, CISOs and CIOs from fortune 500 companies. They spend their budget and their time, their severe time crunch. And at the 11th hour, they find out that the solution that they procured at a very heavy cost, that requires very crucial piece of, um, technology called data discovery Engine, without which it will not work. So then, um, uh, my VP uh, actually, uh, forced me into taking a look at this problem and see if we can do something about this, because that sounded very to me, that sounded very fortuitous, because I did not want to distract ourselves from our main flagship product line called Ice, which is information security enforcer, doing very well. We are pretty much the gold standard in that space. So I did not want to take the risk of divulging any of my resources on another product. But then I went on a business trip, but it was constantly in the back of my mind. And when I come back from my daily meetings with the customers and, uh, different, uh, countries, uh, I come back to my hotel room and they start thinking about this problem. And then I did some research while being on that business trip. And then I decided, uh, to actually build this product because it sounded, um, as a big opportunity. Actually, anybody who built a real good technology and a solution for data discovery, that was the missing piece in a very crucial, uh, government mandated area of, uh, compliance. Right. So that motivated me to taking a look at that one. And then I started building the product, architecting the product very quickly. I also figured out that why those big companies, many of them, are unicorns now. Okay, um, why did they not go ahead and build the crucial piece, which is the data discovery component in there? Because it is a very tough problem if you want to actually build a good data discovery engine, which solves the problem even partially holistically, which is an oxymoron statement. But that means something. Okay, you need to have a very solid technology and R&D behind that. Otherwise it's not going to work. So based upon, uh, my, uh, previous research and, uh, technology that, that I've been working on using deep AI, classical AI based model of, uh, this, this problem, I decided to build this this, uh, solution. And, uh, lo and behold, we, uh, just launched the product, uh, a few weeks ago, and the company is coming out of the stealth mode this week. Mike Matchett: Oh. It's great. Uh, tell me a little bit about, you know, we talked about you talked about the need for data discovery. What are some of the challenges people might face in data discovery? I hinted at some things in terms of like scope and complexity in an opening, but what, from your perspective, what are some of the biggest challenges somebody looks at when they try to really find the data? Tarique Mustafa: Sure, if you look do a little bit of research of this particular area, you will find that data discovery products actually in some shape or form have existed for quite some time, but very limited scope, very limited narrow applicability. What does that mean? Those solutions are basically not capable of scaling beyond discovery of very, very elemental, okay, canonical forms of data. So for example, if you have a keyword, yeah, it may find keyword in a large corpus of, uh, content. Or if you have one particular regex such as a credit card number, it may discover that in a huge corpus. Anything beyond that those technologies were not capable of handling. Whereas compliance, it basically combines structured, unstructured, semi-structured, even unknown data and the modalities together to basically. And that's the requirement of CcpA, for example, or GDPR, every single one of these mandates. So um, that makes the problem very, very difficult. So machine learning would not help you at all. Some people try to build machine learning solution. That did not help at all, because machine learning is not the right paradigm for this, for solving this class of problems. So, uh, we built basically our solution based upon our technology, which is based upon classical AI, which means basically knowledge representation and coding and inference are mechanisms of classical AI. And this turns out that this is ideally suited for that, you know, the style of technology. So we build our product and it's working very well. Mike Matchett: Right, right. Because you just you don't want to pinpoint something when you're trying to find the data, you want everything that's contextually relevant and things that are semantically relevant. And. Exactly. And then I think you were talking about there being structured data and unstructured data and some other forms of data might all relate to the same, uh, compliance object, if we call it that for the moment. Right. Somebody. Tarique Mustafa: Absolutely, absolutely. For example, if you look at these things, last name, first name, very basic example. Okay. Pii personally identifiable information. Right. So this is a very complex field in and by itself. To what extent do you combine different elements like first name last name, random first name, random last name okay. And phone number. Or you add additional fields in there such as street address or social security number. You keep adding that that set of data is PII, Personally Identifiable Information, which is a subcomponent of all of the mandates, compliance mandates, PCI banking industry, and, you know, you name it. So the, the the thing is that and when the databases and huge, uh, those data lakes and, uh, the big data was, uh, that old paradigm where companies started profiling their end users, their behavior just to sell them, cross sell them, upsell them new new products or relevant products that they looked at, basically behavioral profiling. They were doing that using all of this PII based information so that they can correlate behavior to a particular individual. Right. So the thing is that now those companies have spent billions and billions of dollars building, uh, these, uh, fantastic sales tools, which has resulted in massive, uh, revenue increase for those, uh, big enterprises and midsize enterprises right. Now comes a bunch of, uh, uh, mandate formed by basically the bureaucrats, mostly. And, um, you know, so no, no, you cannot use customer information. Okay. This is already causing a huge problem for those big enterprises. They spend billions of dollars. How can they now when a, uh, for example, an individual consumer says delete my information, if every consumer starts asking, delete my information. Those billions and billions of dollars which were spent on profiling those customers are gone. They are useless. Enterprise cannot use that for their sales and marketing anymore, right? Revenue loss and all of those things. So this is a very, very complex problem. It's not just finding, discovering, uh, independent elements of information. There comes in, like you mentioned, the context of the question of context. Right. So can you break the context? Will that address the solution that did the problem? Okay. A lot of questions. Very, very, very difficult questions arise in there. So when we talk about discovery we take into account all of these different issues. Okay. And then, uh, the name of the, uh, our company is cosmology Dot AI. And if you look up the dictionary, cosmology means actually how the sprawl of a phenomenon happens. Like human population, it spreads in different parts of the world, geographically speaking. Or a disease spreads different parts of the world. Or data sprawl happens in different parts of the enterprise's world, right? Both on prem and in the cloud. Both, um, you know, uh, sanctioned and unsanctioned sprawl of data. So there are multiple dimensions to this, uh, data discovery process. So we have built a solution which addresses all of these, uh, multiple issues simultaneously and in a very methodical and, uh, profound way. Mike Matchett: All right. And this and so we're talking about a solution. We're talking about something you're calling Cape. Maybe you could just describe to us what what that embodies, like what have you brought together for a customer and how easy it is it to apply to a problem. Tarique Mustafa: Exactly. Yeah. Yeah. So this is what Cape does basically. And it's a point and click. Um, it makes uh, usability very, very easy because, you know, compliance folks in the enterprise, they are not IT guys. They are business people. Compliance people. Right. The regulatory compliance people. So the solution needed to provide a very robust yet very, very easy to use interface, human machine interface. Right. Talking the language of the domain in which discovery needs to happen. Right. And it needs to provide the facility to the end user of that particular client, that particular customer's company or enterprise, okay, that they are used to the lingo that they are used to speaking in. And um, uh, so for example, in this particular case, you know, the large language model, okay, LM which is the basis for ChatGPT. Okay. That will not work. And this has to be based upon, uh, first of all, machine learning is not going to help. It's not going to work at all, actually. And even the language model, if you, um, are going to generalize the language into an LLM, that will be counterproductive. It has to be based upon basically domain specific language model because somebody in pharmaceutical which is highly, highly compliance mandated regulated industry, right. They speak their own. They have their own lexicon in which they talk, their documentation. Everything is based upon that lexicon or medical sciences or pharmaceuticals. Right? They don't care about financial industry lexicon. Okay. And if you try to mix everything the way large model tries to do it only creates havoc. So the technologies we just got very important patents in this space. We are very proud of that. In a record time of less than three months, from application to to the, you know, issue of the patent and the patent Office has written some very strong, very, very strong comments there, Uh, which I don't know if I should mention here or not, but I will mention you can edit it if you don't like. But the USPTO, for example, uh, the, the examiner says and they cite, uh, our competing products and other technologies and the site and I quote that none of those 67, uh, technologies or products, either individually or combined together, can address the problem the way our, our patent, our technology does, unquote. So, yeah, um, we're excited about this whole thing. And, uh, the company is coming out of stealth mode. Uh, tomorrow, I believe the press release is going to go out. And, uh, we already have two fortune 500 customers, uh, where this is deployed. And we have a very healthy pipeline of customers who are waiting for us to come out of stealth mode so that deployments can happen. Mike Matchett: All right. So tell me, tell me a little bit about, you know, when I'm if I'm using, uh, Cape, uh, and I'm looking at my data, uh, and it's got, uh, you know, a particular components that are spread across everywhere. Uh, can I, can I interact with those easily? Does it give me some recommendations if, say, someone says delete this particular person from from our data estate. Uh, how actionable is this. Tarique Mustafa: Very, very actionable. Because see, when we and it's interesting that you use the word action and actionable. Okay. We have invented, uh, new language model for this technology, which is domain specific. And this has its own semantic set and also meaning lexicon for different verticals, different domains of discourse, as well as its own action language. This is hardcore hardcore, uh, AI stuff. And then, uh, so it makes, you know, uh, action taking the action or remediation, for example, it will tell you that such and such repository where you have your data, contains such and such piece of regulated data. Okay, regulated by CcpA, GDPR or whatever PDF whatever. And in such a large number or small number, whatever the risk factor associated with that is such and such high medium, low, very high, catastrophic whatever. Okay. And you need to do take some action right away. What are recommended actions. It also recommends recommends actions. And the action language is based upon which ever which ever. Um, um, you know, um, uh, compliance. Um, you want to enforce, you know, on, on, on your data. So this is a huge problem, by the way, for take GDPR, CcpA or any Consumer Privacy Act. They actually empower the end user from asking for, uh, you know, uh, empower them that they can ask a few, few, few things from the enterprise. Tell me, what do you know about me? Number one, don't use my data, don't sell my data, delete my data, and so on and so forth. The right to be forgotten altogether. Right. So the thing is that if you randomly go about serving those DSR requests, like I mentioned earlier, many of the business applications start failing because of the lack of those data. All of a sudden that data is gone, right? So the analytics and everything starts failing. So how do you do this in a methodical way without causing that catastrophe? So our system intelligently makes an assessment of how different, um, uh, applications are built and implemented. What is the correlation between different data items and applications, and which application uses or which, uh, data item from which repository, and what is the safest possible way of serving the customers. Request DSR or DSR request. Okay, where the customer says delete my data. Okay. So, uh, did that answer your question? Did that help understand a little better? Mike Matchett: Yeah, because there's definitely a lot, a lot to do there. And I mean, if I'm a if I'm a CISO and I have gotten some of these requests in and I'm looking at my huge, uh, data state. You know, I have to I have to I have to know where the data is. I have to have it classified. I have to have it mapped together to be able to take actionable, uh, action. And I also have to report on it to my board. And I've got stakeholders that need reports on this. So it sounds like you brought that together, uh, in this Cape solution, starting with some, uh, deep I, as you're calling it, uh, which is, which is sounds really intriguing on how you how you put that together. Uh, the idea of complex data objects, uh, not complex to understand, but, uh, complex. And when I'd say data object, I mean a person or a piece of PII. Tarique Mustafa: Interrelated. Interrelated. Many, many interrelated, uh, data items. Which data items and. Exactly. Mike Matchett: And you help map that. Tarique Mustafa: Exactly which influence each other semantically. Okay. As well as operationally. Mike Matchett: And then and then I can take actions on that and report on it. So, uh, it's it's it's it's intriguing. It's intriguing, uh, what you put together because I have seen other pieces of this puzzle by various smaller tool sets or various, you know, I won't say narrow but focused solutions and someone trying to really run a security operations would need a handful of tools to even come close to being able to do something like this. And even then, they're going to miss the idea that if I've got an email with attachments, I can't just delete the email. I have to go get those attachments right. They're not together. So, uh, this is very interesting. If someone wanted to find a little bit more information about this, uh, maybe, maybe. Look at it. Uh, get in touch with you guys. Uh, would you put it on your website, or is there something more specific? Tarique Mustafa: Yes, on on on our website. Uh, you can actually request for a demo request for more information. There are many, uh, white papers, technology, white papers, as well as, uh, uh, use cases, uh, posted there. So, um, you know, uh, anybody can just go to the website and download it from there or request for a demo. Mike Matchett: All right. Uh, this is this is interesting. And I and I and I know that, uh, at some point you're going to give me a deep dive demo on this and we'll report back maybe on that too. Tarique Mustafa: Yeah, absolutely. Absolutely. Mike Matchett: Uh, as you get going, uh, but, uh, I hope, I hope it goes well for you, 20, 25. There's certainly a lot of data out here. Um, and, uh, people are trying to get a handle of it. I mean, I know a handful of CISOs myself and CISOs, however you want to pronounce that, but, uh uh, they are all struggling to get their hands around their, their data state. So it looks like a pretty good future for what you're doing. Tarique Mustafa: Right, right. This is a problem that is going to persist for quite some time because, um, when, um, those enterprises, they were building their, uh, solutions and their analytics applications and whatnot, they were not thinking about compliance. There was no compliance at that time. Right. Compliance is an afterthought and it is a very tough problem. So there's there has been a pushback from the enterprise for good reasons, from their if you look at from their point of view, it's a very genuine pushback. So the thing is that somewhere it's going to settle where the enterprise does not lose all their investment without compromising the privacy of an individual and or the consumer. And therein lies a big challenge. Uh, and, um, you know, uh, our technology is geared towards, uh, taking on those challenges, you know, very effectively. Mike Matchett: Well, thank you for being here today. Thank you for presenting this to us. Tarique Mustafa: Thank you very much, Mike. Look forward to, uh, having the opportunity of showing you a demo of the product. So. Mike Matchett: Right. And so check it out if you've got data, we all know you do. Uh, if you've got it in a number of places, a number of formats, and you're trying to get a security handle on it for compliance reasons, or you discover you're basically just so you know what you have. Check it out. Take care folks. You. Know.