Transcript
Hi Mike Matchett Small World Big Data. And we today are talking about data, of course, always talking about data at Small World Big Data. But we're going to talk about how you track its lineage. How do you know where it came from? How do you know what's happened to it along the way. And this has a surprising amount of of influence and impact on the integrity of all of our business applications, on our compliance issues. And as we're going to find out, everybody is trying to do AI today really needs to have a good grasp of where their data came from and what it's been gone, what's been going on with it until it gets into that model and they use it. So I've got solid data here today. They're going to explain to us their data governance solutions that really focus on lineage. Just hang on and we'll dive right into it. Yum yum. Hey Tina, welcome to our show today. Thanks for having me, Mike. Okay, Solidatus, uh, before we get into what Solidatus does, you know, this this data governance compliance lineage thing. It's not you know what? Most people wake up in the morning and first think about. So what what's your sort of background that got you into doing data lineage. How did you get excited about data lineage? Well, actually, the thing that got me excited about data lineage and also Solidatus as a company when they recruited me, is that for the past five years before I joined, I was actually working in building machine learning and AI products and delivering them to enterprise customers like banks, insurance companies, people who are trying to use AI to help automate their workflows. And one of the most frustrating things when we were doing that was we would identify data sets or data that we would use in the model or to train the model, and all of a sudden we'd realize the quality was bad, or it had information in it that we weren't allowed to be using. And so we didn't understand where that data was coming from, if it was high quality, and that would always put a delay in our projects. And even once our projects went live, someone somewhere would make a change five departments away. That impacted our AI model and all of a sudden our automation rates would tank and we'd have no idea why, and our customers would call us and say, your product stopped working. It almost always ended up being a data problem. Um, so when I heard about solid data and what this company was doing around tracking data flows and operationalizing it, I was like, wow, I really could have used a product like this when I was an AI practitioner. Yeah, it sounds like and I think you're hinting already data governance. And this data lineage is not simply a static snapshot that that there's going to be a lot of uses for this throughout the business operational phases of an application. But before we dive into that, let's let's just talk specifically about data governance just a little bit. What what's sort of the state of the art traditionally. And where do we need to go? Why does that need to change? Yeah. So data governance isn't just a technology, it's also a practice as well. And I think until we get both the practice and the technology in the right spot, we're not going to have state of the art. Um, more and more, the demand is to understand data at a granular level. So historically, data governance could have been like, I understand that this table from this database flows into this table in another database. But now you actually need to understand the fields and their transformations and to be able to query against this report, against this proactively receive alerts, plan new changes. And so like the state of the art is something that they try to call active metadata, which means that you're constantly refreshing and updating and using the information that you've got on your data flows. So data lineage and data governance have become dynamic management practices, not simply a static report for compliance, basically. Exactly, exactly. Right, Mike. All right, so it also sounds like, you know, some of the interesting parts of what's going on are up leveling, uh, what might be kind of a schema level view or a metadata level view of the objects in a record to really tying it to what's going on with the business applications. You mentioned, like if the data changes, the AI model might not be accurate anymore. So you've got these concerns that are uh, being uh, escalated higher up. And I think, you know, I mean, you can tell me tell me what you want about that. But, you know, it sounds also like, you know, if we get into a hybrid world with data coming from lots of different data sources, the problem just compounds. Exactly right. And it's also how do you make the work that you're doing to like really in the IT land to understand how your data flows, developing applications, API's, data pipeline. How do you make understanding that important to your business stakeholders? And so data governance and the way solid data views data governance really elevates the importance of understanding this information. So it's no longer like a tech IT project to understand data lineage from a technical perspective. Understanding how data flows and the quality or the ownership, or whether it's fit for purpose is necessary for like key business stakeholders to make their business decisions. For example, like how do you know in a bank that the margins that you are calculating are accurate and high quality? That's the end result of a calculation and data from probably multiple disparate sources. So if you're saying, I need to prove this to make sure that my margin calculations are correct, um, you rely on the data lineage provided by the IT team. And like you said, Mike, it flows through many, many different applications. If you make a change in a singular application, it's probably going to impact the end result. So data lineage and data governance is not only understanding it for your own sake in it, but also making sure you hit the importance of this for your business stakeholders. Now, I know, I know a great deal of the value of solid data just comes. You appreciate it just by looking at it because of the visualizations that are in the product. And we're just talking here, painting the pictures with words. But maybe you could just tell us a little bit about how solid data approaches lineage for the user who's responsible for it, and how that differs from some of the other tools they might have used in the past. Absolutely. So solid data has always approached data lineage in a fine grained and end to end format. So we've never thought it was good enough to just show table to table what people call coarse grained lineage. We've always understood that it's important to look at the field level within the table, all the transformations, and actually to be able to create and visualize and use that all the way from the source of data to its target of use, which can be across like hundreds of systems. Right? And most other practitioners of data lineage or data catalogs aren't really thinking about it in that way. Data lineage is kind of an afterthought or an additional piece of information that's, you know, ancillary, um, or, you know, an add on. So we've always thought it's quite important to understand this and to also be able to layer over business context. So what data policies are applied to certain technical systems? Do any of these have PII or sensitive data. Um, what kind of regulations are actually impacting certain systems and data flows. And so we've always created this product that lets you view hundreds of systems linked together with the context of policy, quality, sensitivity, etc. and then allowed you to make this not static. It's constantly updated and you can use that for your business decisions for SDLC, even to understand impact analysis. You can query against it. You can report against it. So we've always got so. You've got this model that's not just the technical flow but sort of the business perspectives and policies of the data as it's moving through the system overlaid. Exactly. And then and then it's not simply a static snapshot, which I love because I've done modeling exercises in the IT industry all my life. Um, and, and with once you have models, you know, you're not just, uh, looking at what's there, even if they're constantly updating, doing that monitoring part. But you're now able to do some more proactive work. What are some of the proactive things people can now do with with their with their lineage or data? Data mapping. What did they do with it? Yeah. So there's two ways you can proactively use solid data to enable data governance or also even business operation. The first is you have the ability because you understand how data is used and flowed. You have the ability to conduct impact analysis. So some of our customers are integrating solid data into their SDLC practice. So anytime a change is made to a system, you can find out who's going to be impacted downstream, what system owners are going to be impacted. It's just an automated query that you can run. But more powerfully. Solid data allows you to model potential future states. So say you're thinking about migrating a system or deprecating a system. You can fork off a version of your data estate and make these changes and then see what's going to happen. And so it's almost like git where you can actually fork and merge back what's happening. And it allows you to proactively plan Any of the changes that you're making. I mean, that's something people just don't think about when they think of compliance, first off. Right. No, this is a forward looking tool that I can actually do optimizations with and make a better equation out of the whole thing with than simply like, here's a snapshot of what we did last year, right? Which is pretty cool. So we I just mentioned compliance. I know there's a couple of use cases here. Just quickly, what are some of the main things that people do with this. I mean compliance obviously. But beyond that. Yeah. Compliance. Obviously BCBS 239 is requiring you to be able to document this kind of information. Um, lots of regulation around that. Um, related compliance is in the US. When you're rolling out machine learning models or AI models at a financial institution, you have to conduct model risk management, which which means you must understand the source of the data flowing into the model and any transformations. So solid data helps you with that. Um, also the EU compliance for you to understand what data is flowing into a machine learning model and how that result is being used. Solid Natas is great for making sure you comply with not only governance compliance, but AI compliance. Um, I also touched on AI. So I mentioned earlier, as an AI practitioner, I really would have loved to understand basic things about the data sets I was using, such as where are the origins, how is it calculated, what is the quality? Can I even use this? Like is this data fit for purpose? Um, and solid data is provides you that layer to answer those questions. Um, and can also provide the security to make sure that once my project is live, upstream changes are not going to break what I've rolled out. Awesome, awesome. And I understand, uh, you know, even even short of AI, which everyone's trying to do, most people are on some sort of cloud transformation journey and their data starts getting more and more complex. I can imagine this has some advantages for people who are moving data everywhere these days. You're exactly right. Like, one of the first steps of cloud transformation or cloud migration is just understanding your current state. What are you using? Like where is it going? How does this actually impact my end stakeholder who's maybe using the system. Right. And so solid data allows you to go in and scan your current state as is. And then it can also help you plan your future state and make sure that you're not breaking anything downstream as you're making this migration. I think one of the statistics I read is like 80% of cloud transformation issues are because of outages and data issues when you're actually doing the migration. So having something that understands how your data flows and monitoring it can really prevent a lot of those easy, easy, preventable mistakes. You know. I may be the sizzling thing right now, but we know that most of the money that enterprises have tied up is in their cloud migration projects right now that are sort of stalled or not being effectively done. So that's great. So you can say, like, not only that, you can use this for compliance, but you can use it to get your cloud migration going better. And of course, check some boxes on that I think. So when they come knocking on your door saying, why is this hallucinating and making stuff up, you got some, you got some things to go back to. Um, uh, how hard is this to get to get going? I mean, is this something that takes, you know, six months to a year? Is it like an ERP implementation, or is this, uh, better than that? Well, I think customers are used to their data governance, um, programs being very manual, but Solidatus recognizes that that's not feasible if you're actually trying to do this at scale. Um, so we've seen like first implementations in as little as three months. The reason is because we have connectors that automatically harvest lineage and metadata from a whole host of systems. And technologies. So you're not manually documenting any of this or automatically ingesting it into our platform. And then once it's there, we also have a whole host of what we call auto mappers, which will use the underlying metadata and suggest the data flows for you. So that means that an exercise that's manual, where you're interviewing people, you're maybe inspecting these technologies, it's much faster and it can be federated because you can just ask your department to provide access to the metadata, not underlying data at all, which is easier to get access to, to the metadata in their systems and then automatically scan it. And then the centralized governance person has this picture, which is great. So it makes like the logistics of gathering this information a lot easier and more automated. All right. And if someone wants to find out more information, maybe kick the tires a little bit. Read some background material on what Solidatus is doing, figure out if it's a good fit for them or if they should try it out. What would you have them do? I would encourage them to come hit our website. Not only is this an excellent data lineage tool on its own, we've been selected as the the Data Lineage partner for Microsoft Purview and Fabric. So if you come to our website, you can read a little bit more about that partnership and also see some videos and find out more about our software and some use cases that our customers have. Yes. I should have mentioned you have a lot of partnerships and technologies going on, and a big one with Microsoft happening right now, in fact. So it's very exciting. Yeah. So good. Good good things going on there. Thank you, Tina, for being here today. I look forward to having you back. Maybe next time we could do a little bit of a demo or something, because it really is supremely visual. When we look at this thing and people should take a look at it and that. Okay. So, uh, check it out. Thank you guys very much. Thanks, Mike. Yum, yum.