Transcript
Mike Matchett: Hi, Mike. Matchett, Small World Big Data. And we are here talking about one of my favorite things, which of course is AI and machine learning. We are going to talk a little bit today about large language models and how you actually might get your hands around them and make them useful in your organizations today. It's a big challenge, so stay tuned. We've got Data Kinetic coming up here in a second. So welcome back, Nick. You were here on our show, you know, not that long ago talking to us from another perspective in the data science realm. And now you are doing this new thing at Data Kinetic. Tell us a little bit about how you evolved here in coming into looking at large language models and AI and thinking that's the problem you need to sink your teeth into today. Nick King: Yeah. Thanks, Mike. It's awesome to be here. You know, I've looked at enterprise applications almost my entire career and tried to solve it in a number of different places and really felt it wasn't a solved problem. And so set down the path of researching. Is there a way to deliver repeatable applied outcomes for industry? And it came from speaking to a lot of CIOs. You know, a lot of senior folks there like Nick don't mind what the model is, just help me get to the outcome. And so that sent me down a research path. Like, is it possible to make repeatable blueprinted applications that can deliver specific business outcomes versus necessarily worrying about all the different underlying technology underneath it? And that's what started my journey. So here we are. Mike Matchett: Almost, almost a value API, like you want to provide that API into the intelligence and let people not have to worry about what's behind it quite as much. Right? They can just what is what is what is it they're trying to do? What are they asking for? And then how can you deliver that? It becomes your problem 100%. Nick King: You know, when I speak to leaders, they know their problems really well. They're like, but the supply chain problem, it costs me $7 million a year. If we can just resolve this level of fraud, I can you know, that affects my bottom line directly. Or, you know, so what I love about this approach is we get very specific problem statements and that is sort of inverse to how you usually start a data science problem. And so a data science problem, you spend all this time collecting data and trying to prep in the models and then you go to the stakeholder like, does this solve your problem? So we really kind of flipped the entire approach on his head to start with the value exchange and then work on how to build those repeatable models underneath it. Mike Matchett: All right. And just to be clear, you know where we're at with Data Kinetic early days, kind of you are at the stage of finding those repeatable models to sink your teeth into to produce. And the intent, though, is to make eventually a platform of this or a marketplace of these models to deliver them on a kind of a pick your pick your poison kind of solution. Right? Nick King: Yeah. There's a couple of things that people always ask for. They want to be platform agnostic. They've got significant platform investments. They want the models to live inside their data and they want to have clear outcomes. And so a lot of where we're focusing now is looking for some of the toughest problems in oil and gas insurance, health care, and really focusing on like problems that weren't solvable before. And we did that for a couple of reasons. One is we felt like if we could get to a point of these challenging problems, that we could validate some of our thinking in the data science. But two, like those problems have really impactful value for these organizations. And genuinely that allows us to push the limits of today's modern technology, which, to be fair, has been such a gift the last 6 to 9 months where, you know, almost every week, you know, there's some advancement or some some new insight we've been able to gain from everything that's going on. Mike Matchett: Yeah. What I what I see is like everyone saying, hey, we need to have something AI ish powering what we're doing and and we're going to talk a little bit about what they really should be thinking about later. But when they start to pick up the pieces and open the hood and there's a lot there, if they're trying to build it themselves, the DIY approach just isn't going to cut it. What what does it take to make a repeatable model? What do you what do you envision? What are you doing on your side of the fence to build something repeatable? Nick King: Yeah, so there's really three layers. The first is what I call a blueprint. So the blueprint defines how you think about the application. What are the inputs, what are, what's the expected output? So let's say that that might be a fraud detection or it could be supply chain fraud detection. So you know that you're looking for bad actors underneath that. Usually there's a series of what we call functions or declarative. So you're looking for shifts in regime. You're looking for particular anomalies in those models. There's a number of ways we can go look at that. And so that's layer one, layer two. And then really the gift that we've had over the last year is the advent of Transformers and just the ability for us to consume large amounts of data and turn it into structured outcomes. And so being able to extract the data from previously complicated data sources and then use those to prime these applications allows us to, you know, gather enough data to, to train the model, but also provide enough insight to prove the value. And so it really focuses on shortcutting a lot of the complexities that it teams and and executive teams have about trying to understand the value relationship between these significant technology investments and the ability the desire to sort of change their operating model of your business. Mike Matchett: Right. And we talked a little bit about what are some of the things people are trying to do with it, but maybe you could just. Like when people are thinking of deploying. This was kind of being facetious earlier that we all want chat bots. But what what do you see people trying to do with this? And they really should be using something repeatable, really should be trying to focus on their business and not just on the data science part. Nick King: Yeah. So let's take let's take a really common use case like invoice extraction. So imagine you have a supply chain and you know, most supply chains actually live in people's inboxes and then they're keyed into some system, some ERP system, and maybe that's more advanced. There's some digital relationship. But if I look across, you know, retail healthcare insurance, there's still a significant amount of these transactions that exist. And so you have this this delay in the in the loop about how fast it takes to process that. And so it's very easy to calculate the exposure for this. If it takes you six, six weeks to realize that you shouldn't have paid someone and you pay someone 30, it's very hard to go ask them for that money back. And so one of the things that's really when you can deliver high functioning extraction and sort of get to these outcomes, you're able to really speed up the process. So rather than six weeks later when the investigation team gets hold of it being able to flag like, hey, this is now looking like it could be an issue or even just providing line of sight for that same supply chain. And the beauty with these models is that would work for a supply chain. It could work for health and safety. It could work for looking at things like resumes, the ability to have a multi-modal extraction technique and then apply that to outcomes is surprisingly portable and really immensely powerful. And I think that is something that we've seen as we talk to more customers is it doesn't take us long to identify well-known business problems with very specific outcomes. Mike Matchett: Which brings me to the interesting point. So we've got a lot of it people out there, a lot of business developer people looking at models, and they're stuck in the weeds obviously, as part of what we're trying to get them out of. We want them to look up a layer. You're talking about maybe a skill that needs to be developed on how to apply my models from a catalog to problems that they have, and then tailoring at that level, right, to really uplift their perspective and saying, I shouldn't be building these lower level constructs and getting getting down into the weeds with R&D, People should be applying the technology a little bit more to my business. Is that is that kind of where we're headed here? Nick King: Yeah, I think so. I mean, if you just look at the progress we've seen with large language models in the last six months, right? Gpt four comes out. There's no open source models. Facebook releases it. Now we've got our entire leaderboard which are assigned like, you know, not be as good, but heading in that right direction. So, you know, the pace of those models is is moving very fast. But I also think like most leadership and business problems don't start with the word large language model. What they start with is like, hey, here's this specific outcome. And so the next question that usually we get asked is like, how do you stop us from elucidating? How do you make sure my data doesn't leave the organization? How do we scale it? And so a lot of the early days of of where we've been. Focusing is really building out a number of those best practices and how to get these applications into production. And I think that that is really, I think for it organizations focusing on, say, how your embeddings are being designed and how you can have a scalable embedding engine. And then also how on the agent side of the large language model, how you have concept drift and other techniques to ensure that the model is behaving in the way it's expected are actually more scalable and impactful outcomes than necessarily building your own custom today? I'm not saying people shouldn't do that. It's a lot of fun, creates a lot of compute. You know, machines go, Brr, we all love that stuff. But you know, being able to take an outcome and then think about how you can get into production is really where we've been focusing and that that has enabled a couple things. One is getting these models into production to solve very specific use cases. And let's be honest, there's not perfect trust out there. There's still a lot of questions as to, you know, can these models deliver on the promise and how do you how do you solve that? And so in some ways we look at, okay, how do you structure your investments in these platforms so that you can evolve over time? And what we find is that working with the data science team, we kind of build confidence up and they start to identify use cases. You know, from our side, we we effectively deliver the model into the customer's environment. We don't rerun their platforms. We're platform agnostic. So we never see their data. We never train on our side of the fence. And that allows also for our customers and the people we work with to have this competitive moat that they can start to defend. And I think that that is also what a lot of executives are thinking about is how do I maintain my competitive moat if I'm sending my data somewhere else? And I think there are a number of ways of solving that, and not all of them require you to build your custom large language model, but they definitely require you to like, trust the system and be able to understand how to get that into production. Mike Matchett: So is kind of a new a new way of engaging. You're going to create these repeatable models, but the customer is going to get them. They're going to train them, basically putting their own IP into them and and take control of it. And you're going to help them remotely guardrail those things and manage them, move them forward. But you're not going to get any insight into the IP that's in those models, Right? That's that's the that's the key way of trying to work that that system. You kind of like a model system manager of sorts. Nick King: Yeah. I think look these these architectures are evolving very quickly. And you know, like if you just look at last week when you had all these announcements coming from Databricks and Snowflake, like they are building primitives very fast into their platforms, which I think is awesome. And so the ability for IT organizations and others to take advantage of those primitives, you know, how they can build up those platforms, but also understand how they can get to value delivery is critical. And I think, you know, the friction that exists when an organization tries to buy an external platform and send their data off site, you know, PII, Soc2, there's all these things, HIPAA, you know, the real gnarly problems, you know, you want to make sure that the gnarly problems that have value are the ones that usually have the highest level of confidentiality. And so it was a very upfront request for a number of customers at the very beginning. We want to make sure it lives in our datacenter and our VPC. We want to make sure that, you know, we can ensure security, but also we want to build off your building blocks, right? And so in some ways we provide some of these building blocks that allow these organizations to move much faster. And so the the conversation and the business also shifts. So rather than debating the right large language model and how to architect it, the conversation is now okay, we've now done invoice extraction and we've managed to identify fraud faster. Perhaps we should start looking at how we do it with our B2B suppliers and could we start to automate some of those or could we use, you know, safety logs and attach that? And so that is also, you know, I think a lot of our vision is that these these systems continue to evolve. And rather than being, you know, very individualized applications, you can start to create a very large corpus of large language and or of embeddings that can provide a lot of insight for your organization. And I think that we're just starting to see the power of agents now and how how you can sort of build up these systems. And I think that in the next 18 months is really going to change the way we think about all of the information these organizations process and work with. Mike Matchett: So that is going to change the way people think about this, Right? Right now it's a lot of people just run to a hard wall when they say, you know, I've got to implement an AI model by next Tuesday. Right? It's like I start the wrong start the wrong level. So I like where this is going. You talked a little bit about some of the some of the shifts that a company might be making in their perspective if they start to think about this specifically more for the IT people who are concerned now with data and data protection or might be concerned with even the processing and the processing, overheads and loads, what what should they start preparing for today if they're not if they're not going to be building the models themselves, what where should they sort of start to dig in? Nick King: Yeah, I think understanding your document extraction and embedding strategy and use document losing a document could be Json, it could be a PDF. But understanding how to craft those embeddings is actually very important because ultimately what a large language model does is it queries the vector store, it brings a whole bunch of text. You've seen it to the agent, The agent tries to make a decision and gives you an answer based on the context and the problem. And so if those embeddings aren't well organized or don't include some reference back to it, if they're just a garbled mess, you can start to get concept drift depending on how the model is being tuned. You could also have it spit out something that was, you know, unrelated. And that is where thinking about your embedding strategy, thinking about how you can scale and maintain that, but also where abouts you want to ensure compliance. So sometimes you'll put like tear up information at the extraction process, Sometimes you'll put, you know, remove PII data at the agent generation, search to agent process. And so these are some of the things that organizations have to. Have to think about. And then also, I think getting ahead of of what looks like to be some form of regulation when it does get here, we are starting to talk about how do you create a grading level of different types of applications you're building so that as regulation does start to emerge, you have a you have applied a framework to identify low impact to high impact types of applications. And that methodology really does help as you're starting to prioritize, right? Like maybe the legal team is not so comfortable with you going and indexing all your customer status. You choose the safer application. Once you've got that trust, you can move to a higher degree and maybe, you know, a level five application requires CEO sign up or a level two application can just be spun up and managed in in different ways. And so we are looking not just at the technologies, but also how it organizations need to begin to adjust the way that they do this. Because just by the nature of large language models, you throw lots of data at it, you organize it and get something out of it. And so there's lots of room for being overzealous and putting too much data into things and having right compliance. Mike Matchett: Compliance is something we could probably talk about for days and the governance of these things and what, you know, some of the expected challenges. We're going to see people who don't know what they're talking about. Imposing governance as well is already starting to happen, you know, asking for crazy stuff. So. Okay. Okay. So I think I think we're understanding that there is a shift in mindset people should be taking from working at a very low level with this to working with building blocks. And you're going to work hard at Data Kinetic to provide these smarter building blocks so I can work at a at a more plug in, don't say plug and play, but certainly a brick and mortar level of of building something. What does this what does this turn to look like though? I mean, you mentioned a couple of use cases so far. Kind of intriguing looking at supplies. Everyone's got chat bots on their on their mind. Everyone's got a I want to augment my website because does something semi creative in a in an AI tool can add that one more level of creativity to it on the outside. But what are what are some of the more interesting, more useful things you're seeing people talking about at large, you know, different verticals and enterprises? Nick King: Yeah. Look, I think as you get into these very large organizations and even medium sized organizations, the amount of data that your organization has to deal with is is huge. And so we're now starting to see agents being able to go through and through the models and generate instruction sets or inquiry sets to simulate outcomes from these large language models. So effectively you could have a protagonist and antagonist model and say, Hey, what? Based on all of my strategic plans right now and what you understand of this industry, you know, go through and research what you think could be a potential disruption for us based on, you know, large amounts of data. And I think these simulation agents and where those can get, you know, the most obvious one is like, hey, you pick up news, there's a car accident on a freeway. So you reroute your your traffic in one direction or your supply chain in the direction, you know, the other one is like maybe you pick up someone's acquiring, you know, different facilities in Australia and that's a competitor of yours based in the US. Why would they be acquiring things in Australia? And so often the the time it takes for those discoveries to come about can take days, weeks or months, but these agents can be running and providing almost a weekly set of recommendations and we see that one for financial services and portfolio optimization and research. We also see it in supply chain. We're beginning to see it in these very large organizations that have lots of very smart people. But the connection sort of almost the organization itself is not able to summarize the developments over time. And so these agents can quickly become like for like, hey, it looks like this competitor has acquired these things this week. We may have an impact here. We're currently our sales are dropping in Australia. These are complicated to deliver, but definitely in the realms of possibility. Mike Matchett: All right. So more agents and agencies working for us. Automation in the Nth Degree. Guess you would say. Yeah, becoming smarter and smarter as as we can train it to do that for us, which is great. So we're not just having a conversation about what we're having for lunch, but you know what, what what do I do? And, you know, think I think we could talk a little bit more, too, at some point about strategy versus tactics and where just these agents fall and just how intelligent can they be with the information that they have. But we'll defer that for another time. This is also been so interesting. Tell us a little bit about, you know, what what you know, Data Kinetic then is doing for folks today. And if people want more information about what you are doing or what you're talking about, where would you point them at? Nick King: Yeah, I'd say go to our website. So Data Kinetic aecom and and like honestly, we're spending time with organizations and talking through some of these use cases with them. So we actually are very open about how we achieve some of these things because we do believe in sharing information. So we have a series of advisory services. Is other applications that we're working with, but we're also out there as folks find really tough problems they can't solve. We're also up for the challenge of some of that. So we kind of have a very open book on how we do some of these things. We want to kind of educate people on how we're doing that. As I said, we're platform agnostic, so we don't really mind. We just want the right outcome. So I would say definitely check out our website. My other advice is like Databricks, Snowflake, all the Hyperscalers, they're all investing here very quick. And so definitely spend the time to go see what they're doing, particularly in the IT organization. Like just understand those roadmaps. You know, we're sort of watching as Azure and Google rapidly move along. So I think that also is is very helpful. And then the last thing I'll say is, you know, start small. You know, like find that first use case and build from that. And if you need some help from us, give us a yell. If you get stuck, give us a yell. And I think the market's going to keep evolving and these things will keep getting easier as well. But our ability to live a more complex use cases will keep going up into the right. Mike Matchett: Awesome. It's it's a great time to be alive, as someone says and looking forward to seeing what comes next. Do come back and let us know what you find out. I'm sure you're learning stuff every every month, every week, probably every day even. So that's going to proceed rapidly. So thanks for being here today. Nick King: Yeah, really appreciate it. Thanks for having me, Mike. Mike Matchett: And do said do come back, check out Data Kinetic and keep up to date. The best thing you can do for yourself and everyone around you by.