Who can benefit from using PerfectScale?

PerfectScale is beneficial for DevOps teams, Site Reliability Engineers (SREs), and financial managers looking for streamlined cloud operations and significant cost savings.

What unique features does PerfectScale offer for Kubernetes management?

PerfectScale offers features like continuous optimization, granular visibility into resource usage, automated recommendations for resource adjustments, and support for various cloud environments.

What is Kubernetes performance optimization?

Kubernetes performance optimization involves adjusting the resource allocation within a Kubernetes environment to match the actual needs of the applications. This process aims to improve efficiency, reduce costs, and ensure stable performance of containerized applications.

What challenges do organizations face in Kubernetes operations?

Organizations face challenges like managing complex, large-scale deployments, ensuring application resiliency, optimizing resource utilization, and controlling cloud costs. They also need to maintain system stability while deploying updates and new features.

How to achieve optimal resource allocation in Kubernetes?

Teams can achieve optimal resource allocation in Kubernetes by continuously monitoring resource usage, using predictive scaling based on workload demands, and employing tools that provide insights and recommendations for resource adjustments.

What is PerfectScale?

PerfectScale is a platform for automated Kubernetes optimization, enhancing performance and reducing cloud costs through intelligent resource scaling.

How does PerfectScale streamline Kubernetes?

PerfectScale streamlines Kubernetes operations by providing a detailed analysis of the resources used by your workloads. It bridges the gap between the bill provided by cloud providers and the actual resources used by specific applications running in containers. PerfectScale dives into the clusters, extracts the fractions used by particular applications, combines them together, and adjusts them to the different types of workloads. This allows for more efficient resource allocation and cost savings.

How does PerfectScale help reduce costs?

PerfectScale helps reduce costs by providing a detailed analysis of the resources used by your workloads. It bridges the gap between the bill provided by cloud providers and the actual resources used by specific applications running in containers. By providing a clear view of what resources are being used and how, PerfectScale allows you to optimize your resource allocation, which can lead to significant cost savings.

PerfectScale: Streamlining Kubernetes Operations for Peak Performance

Name: PerfectScale: Streamlining Kubernetes Operations for Peak Performance
Uploaded: 2023-12-27T11:25:25-05:00
Duration: 26 min 34 s
Description: Discover how PerfectScale's automated management platform redefines Kubernetes operations, balancing cost reduction with system resilience for optimal resource utilization.

Truth in IT

12/27/2023

101

1 (100%)

Report Like Favorite

Transcript

Hi, Mike Matchett from Small World Big Data. I'm here today talking about performance, one of my favorite topics of all time, because I used to be a performance and capacity planning guru. At one time I did field consulting. It was product manager, built products, designed products, and taught performance training classes to hundreds of people out there. So love the idea of performance. But the world has moved on since then. And now performance is about how do we deal with things like microservices and containerization and still bring all the things we used to do in the data center over into this cloudy world? It's challenging. We have perfect scale here today to tell us how they're going to help us do that. So hang on. Yeah. Okay, well, thanks for being here today. Amir and Ali, you know, the look I got under the hood just a little while ago was pretty interesting. And I hope we can cover some of that. That same excitement and interest for our audience today in just a few minutes. They might have to dig deeper, though. Just there a warning. But let's just start with this idea of performance in a Kubernetes world. What drew you into it and what do you find challenging about it? And if you want to take this one. Yes, absolutely. I'm here. Thank you. So one of the biggest challenges that the microservices revolution pose to the operational and development teams is ephemeral, large scale, scattered environments, colossal environments. And yes, they have all the data. They have monitoring solutions. Those monitoring solutions monitor everything all the time. And eventually what you have is a sea or the ocean, if you will, of the data where you need to find the proper fish to the to focus on. And this is a huge problem because you don't know what you're looking for, even if you know what you're looking for and you find one, then you need to to dive into deep analysis and understand how this thing interfere with the infrastructure, interfere with other pods sitting on the same node. And what you're looking for is not this. You're looking for this solution, what you really want. You really want someone else to analyze all this data for you to pick up the right thing and tell you what to do. And this is what we this is what perfect scale solves. All right. We're going to get into a little bit about that. And you're the right guy to answer some of our technology questions I'm sure. Let's go over to to Amr though. So when you were looking at. Going into this market. Perfect scale. What? What drew you into it? What what was exciting to you about Kubernetes operations in particular? Yeah. So to be frank, nothing specific about Kubernetes. I'm not coming from that space. My background is product management, and I've been doing it for 15 years in different aspects of it. It can be tools for things that relates to development for ITSM and so on. So I knew about Kubernetes, but definitely not to the same level that I know about it now. I'm interested more about opportunities. I'm interested about making a change, a meaningful change in the market and creating software that will delight the users. And we saw here a huge opportunity because we conducted more than 50 virtual interviews with stakeholders in different companies, and we understood that we we have here something that can disrupt the market and will make a big difference in the way our solution can solve very painful problems. And luckily for me, I met Elie Berger, who is the CTO, and he had exactly, you know, the the technical perspective and the actual understanding about what are these pain points, because he was a DevOps manager back then. So together we can kind of came with the understanding of what should we do in order to create something that will really make a huge difference in this market. Yeah. Let's talk about that a little bit more specifically, what the problem is that we're solving. So traditional capacity planning and performance monitoring. And this I really like to think of as right sizing or and right right timing in some cases when do I do something. But really it's about making sure you have the right resources matched to the workload. And we want to do that for a couple of reasons. Right. We don't want to be a under allocated on resources because in our performance goes bad and we don't want to be over allocating resources because we're paying a lot more. Which of those problems, by the way, and this is, this is are you finding that your customers mostly experience? I would say that it is always both. It's not end. But you can say in 100%, we already have more than 300 clusters that we are optimizing, and once they onboard the cluster for the first time, we always see in each one of these clusters that they have both issue of wasteful resources and also issues that relates to performance, resiliency, durability. And this can be either due to throttling out of memories or even not putting the request in limit, which also have an effect on the quality of service. So it's a question of both, but the question the more interesting question is what do they care about more? And especially in the last year where the financial situation affected all companies, startups to enterprises, all of them are requested now to reduce the cloud cost. So this is definitely now in the center, although we do have companies that the main reason why they started the trial and became customers is more about reducing the amount of breaches or reducing the time they are wasting on troubleshooting. But again, the majority is coming for us to save cost to see also to get the full visibility of what's going on in Kubernetes. And we do have some that are more interested actually about the resiliency improvement. So yeah. So resiliency is probably a key word we should be using here. Ellie maybe you could explain why isn't this just automatically done already. What. You know we're using the cloud. The cloud should auto scale go up and down. So why do we get into these problems when we start talking about lots of pods and lots of clusters? Yeah. Great question. So what happened is when the industry adopted microservices and the Kubernetes in the cloud for one specific reason, to move faster to allow developers to do what they need to do, like write the code and deploy this code as fast as possible and make the changes as fast as possible. So this shift left paradigm, where developers started to gain the control around the resources of the cloud when. This is fine. This enabled many. This enabled the organization to move faster. However, you can imagine it in a different way. We took all the possible resources, put them in the middle of the company, and gave everyone the opportunity to set as much as they need. But no one controls this and no one checks this later on if it's still valid assumption. Was it tested properly? Et cetera, et cetera. So eventually what we have is, is a situation where particular workload, particular workloads may be set correctly. But in many situation the main driver for developer in this case is the fear. The fear that something will fail and he will need to address it in the middle of the night, in the middle of the weekend. Et cetera, et cetera. So people are overprovisioning. That's the one problem. The other problem is the under provisioning. We probably like I'm a developer, I set I tested my software, I set something, but now someone else changed the query that he sends to me. And I need twice or three times CPU or memory and the ephemeral nature and the continuous change of everything. Because organization deploying multiple times a day to Kubernetes this. What makes this a big challenge? Okay. So when when we look at that situation with the DevOps person is probably adding more resources that are unnecessary so they don't get caught out, the costs go up and out of control. The business side probably doesn't even understand how much wastage there is, but they know their bill keeps going up. At the same time. It's pretty complex sometimes to even get the right amount of resources allocated. And from my experience doing, you know, there's the data center capacity planning years ago, you know, people would often do things like, oh, you need more memory. And they'd throw millions of dollars at memory on a machine and come to find out that the bottleneck was somewhere else. And and so even though they were overprovisioning, they were overprovisioning the wrong resource and then finding out they were under allocated in the key and the key resource for that workload. And guess, guess, here's a fair technical question. I mean, we don't have, you know, the hours we probably need to really dive into this, but how how granular can someone get if they're using perfect scale and looking into pods and clusters and things are they are they are they, you know, getting enough information to actually deal with, deal with the issues that they're facing. Yeah. So when you're looking at the standard, let's say if you're using your own data center, this is a big pain. There is no tools for that or nearly no tools for that. For cloud providers. Cloud providers provide you with the bill. And this bill is how many machines you use, how much network you used. But when we're running containers and those containers are hosted multiple pods on the same machine, then you need a level of granularity of particular workload, because this is the level that the development organization is thinking on, like I'm developing this application or running this job perfect scale, able to bridge this gap and the dive into the particular clusters, extract the fractions that used by particular applications, combine them together, adjust them to the different types of cloud and in different types of reservations in the cloud and different financial plans that the organization have and give the exact and clear visibility. What which part in the source code need to be changed in order to achieve this optimization, and also do it proactively for you? This is what we do. All right. Is this is this something that people would then look at their environments periodically like, you know, come in once a month and do a plan and a review when they get the bill? Or is this something that you start to say, you know, we should be doing this at the other extreme continuously? Where do you guys land on that on that kind of opportunity scale? Yeah. So first of all, we offer continuous optimization, continuous informatic optimization. But that's not the only value. The the other value that we provide is a is a very, very granular view into the resiliency problems, which extremely shortens the mttr the minimum time to resolve issue. So our people, our customers using us not only to observe the cost aspects, but also to improve the resilience and and react in some situations much, much faster than they would do without without us. I would add to that is that the answer depends also on the stakeholder or the user. That using our machine, our platform, because we give services or value to different types of users, it starts from the developers going on to the DevOps or site reliability engineer, going even further to the FinOps guys or to the C-level executives. So each one of them has value and based on the value that he needs, is the frequency of getting into the system might change. So, for example, the FinOps guys or the executive might want to look at it only once a week or once a month, just to look at the over over time trend reports. The developers might look at it only when there is an active slack message that they are getting from our platform, telling them that there is now an out of memory in the in the application that they are in charge of the DevOps, or the might look at it in a more holistic way to see what's going on, to see if everything in place and to make sure that, again, they are in control. So it's very different from one organization to another and from one persona to another. Yeah. What? Like what I like too is that with with a solution like perfect scale, if there's a very complex environment and God bless everyone with a Kubernetes in production, it gets complex. You have lots of pods, lots of clusters, lots of replicas. What I like is that someone can see where in that complexity specifically they're over allocated or under allocated. And I believe, I believe you guys even have a column called Waste or Wastage. Right on your report by Pod and Cluster and Replicas, which is just to me, amazing, right? So someone can be like, you know, here's our bill. And, you know, on whatever time increment you're looking at, like, where, where are we overspending? And let's, let's dial that back. But do you really mean that this thing could go automatically setting the dials and helping us and helping us keep that alignment in like, perfect alignment? Yeah. So I want to echo what you said. It's not just that we're putting the spotlight where you have waste or where you have resiliency issues, and what is the impact of it and what is the actual waste. We are also telling you exactly how you should fix it. So this recommendation in the UI is including all the supporting evidence to why we came with this recommendation. So you can easily understand it and take it to the next level, which is the remediation. It's up to you how you want this remediation to happen based on your workflow in the organization. If you want, you can copy paste the Yaml file and infrastructure as a code, paste it in the right place. If you want, you can create a ticket so the service owner will be able to tackle it. You can also open a PR and as you correctly mentioned, you can also say that from now on you. Want either this specific workload or the entire namespace, or the entire cluster will move to a complete autonomous fashion. You can say that this autonomous will only increase resources, or only increase and decrease resources. You can set a lot of, let's say, configuration for this automation. So it will have less risk on your production environment. And yes, from there on it can be autonomously and continuously do this fine tuning on your behalf. All right. Let's talk about risk for a second. Are these are these hard thresholds or is there a policy that you recommend as the customer set this. What how how are we doing. Don't know someone might call this SLOs or SLAs. How are we doing that. So we exactly we following the seller's or the slow that you set. So you set the desired resiliency level. For example I want to guarantee four nines or three nines or this is my development environment and I want to maximize the savings. And from there on we do it completely automatically. We do. We do have all the needed safe safety guards around what exactly needs to be happen. And the most exciting about this is we also understand the changes of the code. So every time when developer need makes a change and knows that he need more memory and more CPU, he can, there is a way that we will understand it too, so there is no need to inform us or turn off the automation or something like that. We completely built into the standard and native workflow of organizations, of development organizations. All right. I'd love to talk to. You all day about this, because there's lots of stuff going on here. But let me just bring it back up a level here. You also talk about governance a little bit and things like that. How does that play in. I mean we've got performance. We've got this operational thing going. What is the governance aspect? Yeah, it's a good point. For now we are more about the visibility giving you again the full visibility about what's going on. We don't have any gatekeeping or any built in a SLOs that you can set from the system. This is planned for for next year. So the governance for now is again to giving you the exact visibility to what's going on to whatever level of granularity that you care about. And you need to do the kind of the gatekeeping or to have it kind of back into your work process as you see fit. Because we have an open API and you can easily trigger whatever you want from our platform in the future. Again, it will be more out of the box from our platform. All right, so you've got a lot more on the roadmap coming, but you know, even like even, you know, like what I've been able to see of it from, from the outside here. And I don't even have a Kubernetes cluster to run it on. But just the idea that I can go in there and see, you know, for example, an aggregate statistics over all the replicas that consists, that comprise a workload, no matter how many of those replicas are. Because when I know, I know, lots of customers get their bills and like the the bills are all over the place and just even putting those in a spreadsheet and sorting and filtering and aggregating is a challenge. And so now you can be like, oh no, this database functionality, this workload has got this much wastage, this much opportunity and is set to this resiliency right there in a nice report that anybody can take to the business and say, here's where we're at. Right. This is kind of what the governance needs to be. How hard is this to deploy? I mean, how much instrumentation do we have to put in. How what's what's the what's the effort to get this going? Yeah, that's. One of the advantages of our solution because we are a SaaS solution, Soc2 type two certified. And all you need in order to deploy us is to install a stateless pod in your deployment. So that's it. This pod collects all the information that we need, and every couple of minutes it send the telemetry, aggregated telemetry to our cloud where the processing and the analysis is actually a are taking care of. So the footprint is very minimal. No maintenance from your side besides from time to time to update this agent. But very easy to to onboard. We're talking about a helm command and a couple of minutes afterwards you're already starting to see the value. All right. So we we've already talked and you sort of answered the question I would have next about who's really getting the value out of this or who's able to use this because we talked about the FinOps people, the financial managers, we talk about DevOps people, site reliability, site reliability engineers and IT folks. It's just kind of interesting how a solution like this can give a consistent single view of what's going on to all those teams and really become a common language across them, you know, how's that working out for some of the clients that you've have? So again, it depends on the organization itself. Some organizations do have a FinOps persona, some the DevOps needs to answer also financial question to the management. It depends on the again the size usually of the organization. What we like is the fact that we are not pushing our solution in the IT kind of doing the lending expense on our behalf. So we have plenty of customers that we started with the DevOps, and they then introduced the devs and the entire kind of R&D into the the process. And they they are the ones that did the actual kind of technical education to the rest of their peers. And from there it reached also to the FinOps or to to the executive. So this is why we again, we are not pushing ourself there. We are letting our champions in each organization to be the best voice on our behalf. All right. And just as. A a final question, you know, it's it's it's it's on everybody's mind to try to be more green these days. And how do we be climate conscious and obviously not over allocating or, or running things and having them break is key to that. So right sizing see that. But there are a lot of people with ESG initiatives and I understand that you've got something coming for that. So maybe you can explain that. Yeah. That's one of the things that we are very happy to announce that is currently already supported for AWS, but in the future for all the other cloud providers, including also your own data center, and this is the carbon footprint awareness and control. So we will provide you like we provide in the cost or in the waste, or with the amount of risks that you have, the full visibility also to the CO2 consumption that you have in each one of your clusters, in each one of your namespaces. So you will be able again to tell what is the current situation, how it is versus the past, and also to kind of predict what it will be in the future. And as you are reducing your cost, you will also, in most of the cases will also reduce your carbon footprint. So you will have kind of a trend reports showing you how your work was actually impacting the environment for the good. That's a that's amazing because we never did that. You know, it was just like it's like you use power. That was a totally separate concern. And how much cooling you use, it's totally separate concern. Someone else had to make sure the data center stayed online. In fact, the real problem back in the day was they couldn't get enough electricity into some of the data centers, which is why we had to do capacity planning. It's just the walls were too short and the power lines were only so big, and that now we're in a completely different world as we look out there and say like, no, we really have to be optimizing on so many parameters. So just trying to sum this up for me, maybe you could do this in better words, but we've got performance, we've got resilience, we've got utilization, we've got carbon footprint. And really it becomes a multivariate optimization problem of cost. Yeah. Cost is one. Of the cost of. Course. Right. For the lowest cost. Right. He was missing something there. All right. This is pretty cool I can't imagine that that there are too many people out there running complex Kubernetes installs or even simple and simple ones and have a good grasp of these things because it's hard to get visibility on them. If someone wants to look at perfect scale, or kick the tires a little bit, or dig a little deeper, what would you what would you recommend they go look at? Yeah. So first and foremost is our website w-w-w dot perfect Scaleio. We have there both blogs with thought leadership articles. We have their success stories or case studies, and we have their more information about our solution. And so that's the first thing will go. There is also a demo that I think Ellie did showing exactly how the solution looks like. And from there you can go to the self onboarding. So we are offering currently self onboarding via the AWS marketplace. Or very soon it will be also from our website. And we always prefer to meet you in person. So if you want to have a demo with Ellie, myself or we have a few other engineers that can do that, we will be more than happy to meet you, talk with you and show you the them a live demo. And from there on to continue. All right. Great. And there's a lot of Kubernetes installs out there. Right. So so could end up getting a lot of people asking you for demos here. But good luck. You may say something else regarding our pricing model or how to get us. You will see it also in the in the pricing tab in the website. We are. Offering a 30 day free trial. No commitment, unlimited, so everyone can start using us whenever they want without even reaching out to us. After these 30 days, we are offering a very generous kind of a package for everyone that is doing either his first step in Kubernetes or if you are a start up. So as long as your monthly Kubernetes compute resources are less than ten K, you're able to use our system for free. And that's quite, quite a lot only. And that's not just a month, that's just free ongoing. So if you're small and getting started or getting going, you can use perfect scale for free for quite some time until you get above a certain threshold. Yeah, above ten K, which is again not not not low and only. And afterwards we are only charging a very small percentage. We're talking about between 1 to 3% out of your Kubernetes compute cost. So and the value again we are kind of guaranteeing that you are able to get at least 30% drop in your Kubernetes cost by using our solution. So compare that to paying 1% from it. So yeah. Awesome awesome. Awesome. Well I can't imagine anybody just heard that that set of offers and isn't going to go click on the website right now. So check it out if you've got a Kubernetes install. And you always wondered what's going on in there and you need to report back, or you even want to optimize it, maybe save some dollars today. Certainly this month, it sounds like something that you could ramp up very quickly for free and and recoup that cost. So thank you so much for being here today, Omar and Ali. Thank you Mike. It was a pleasure. All right. All right. Take care guys.

Join Mike Matchett from Small World Big Data as he delves into Kubernetes performance optimization with PerfectScale's experts, Amir Banet and Eli Birger. Discover how PerfectScale's automated management platform redefines Kubernetes operations, balancing cost reduction with system resilience for optimal resource utilization.

The conversation explores technical aspects of Kubernetes management, showcasing PerfectScale's role in simplifying resource allocation. The platform addresses overprovisioning and underprovisioning, ensuring peak performance and resilience while significantly reducing cloud costs.

PerfectScale emerges as an essential tool for DevOps, SRE teams, and financial managers, offering a streamlined approach to cloud operations and substantial savings, making it a cornerstone for efficient cloud management strategies.

Categories: