FinOps Focus: Cost Management vs. Cost Optimization
Rethinking Cost Optimization
Cost Optimization is a term that has been around for a while when discussing Cloud cost, and to a larger extent the practice of FinOps.
It is usually what most people associate with FinOps when they hear those terms initially, but is that the correct term to use?
Let’s evaluate the term “Optimization”, the dictionary definition provides the following: “the action of making the best or most effective use of a situation or resource”, this implies that there will be actions taken to improve a certain state.
When cloud users start to think about their cloud cost they begin by searching on how to reduce their spend, in most of the search results they encounter the title of “Cost Optimization”, but is it really?
Cost Management vs. Cost Optimization
The actions that tend to be associated with these results are listed below:
But do those actions truly optimize the cloud cost ? Only the 5th point is one that has any correlation with the definition of “optimization”, as it attempts to tailor a resource usage to its function, the rest fall in to the first step of controlling cloud cost which is Cost management.
Let’s evaluate the most common action associated by the general public with “Cost Optimization”, Reservations. Buying a reservation for an underutilized virtual machine does not make the cloud resource provided for the user the best or most effective resource for the task, what it does is provide the user with a better handling on his spending in the cloud which is Cost management.
The Power of True Cost Optimization
Cost management is an important step that is usually the initial step needed, and done by companies when they start the FinOps journey. This is the step that in 9 out of 10 cases provides the highest cost reduction impact on the cloud cost as it involves waste elimination (unattached volumes, left running VM’s, personal backups in S3 etc`) and gives management the desire to implement FinOps practices.
Cloud vendors advertise that you “Pay for what you use”, but that is sometimes inaccurate. Let’s re-evaluate this with the context of Cost Management and define the cloud cost as “you pay for what is provision“, meaning that you pay for resources in the cloud that you created, regardless of if you use them or not ( many resources are left after POC’s when employees forget to terminate them) or in many cases underutilize them.
When you clean-up unused resources, you do not optimize any usage, Think of it in this way: when you tidy up your child’s room and throw away old toys, you manage the space you have to work with and gain shelf space. That is the same step as removing Unattached disks and unused resources in the cloud.
Cost Optimization is the principle of taking ownership of your cloud spending, and making sure you only pay for what you need, which means you provision lean resources that are tailored to the consumption you desire while planning ahead.
The Crucial Role of Cost Management
Let’s think about it in a day to day transport analogy, your company is tasked with transferring a cargo from point A to point B (application), your transport planner (developer) states that you need a 16 wheeler truck to ensure that you have the capacity for the cargo and the safety margin for a the additional pull it might need for the haul (a 4 Core 16GB RAM Instance). You approve the transport acquisition (Cloud Operations) the rental of 16 wheel truck and send the driver (Deployment) to the loading dock, were he finds a palate of pillows, the truck does the job and the driver is content with the performance of the truck and the delivery is performed with no issues.
At the end of the month you get the invoice about the truck rental and are surprised by the price, you engage with the rental agency (Cloud provider) and negotiate a discount rate for a yearly contract for using the truck (Reservation), and by that you reduce your cost: you better Manage your expenses.
Let’s take the same scenario and apply optimization to it:
You are contracted to deliver cargo from point A to point B, you gather your transport planner, the driver, the transport acquisition and the customer to a meeting and start asking: What is the cargo, how much does it weight, what is the distance it will travel, what are the dimensions (width, height), how much time it will need to be on the road, what emissions limit does it must comply with (congestion charges), in the end you adapt the car to the needs of the task- like a Renault Jumpy, so the cost you pay is tailored to the task at hand.
In the Cloud, the first scenario is the equivalent of buying reservations, meaning you maintain the same cloud usage but you pay a bit less for the same resource, and you are committed to pay for it even if it no longer fit your needs. In the second scenario you engage with multiple departments in the company to understand the needs of the business (product) and tailor the solution to meet the needs (developer, Architect) based on performance and requirements, and by that you ensure you only pay for what you truly need in order to get the task done.
Here is an example that can give more in depth view on how this translates to a real world occurrence:
A company wanted to launch a new feature in their platform to allow users to communicate with other users in their vicinity, to do that they planned on using AWS SQS to deliver ~50,000 notifications in a second.
When submitting the HLD ( High Level Design), the Product and the Architect used the AWS calculator to forecast the cost of the SQS for a month, the estimated value was around $49,419.
The HLD was then submitted to the FinOps team for evaluation and approval, upon reading the HLD the team gathered all the members working on the feature for a discussion to understand the logic and demands. When asked about the high cost of the SQS usage, the Architect replied that they were in negotiations with AWS for private pricing for the SQS usage (cost management).
After further review of the information and the technologies, the FinOps team suggested to use the Bulk feature of the SQS to reduce the number of messages in the queue. The developers and product objected to the change as it would delay the release, but the insistence of the FinOps team and their refusal to sign on the HLD forced them to re-evaluate the proposed solution. The research took 1 week and showed the possible benefits, This prompted the development team to make changes in the code. While the developers worked to use the bulk consumption they realized that a single notification payload was small enough that they can send 15 data notifications in 1 JSON and then using the Bulk feature to submit 300 messages in 1 call, thus reducing the amount of calls to the SQS from 50000 to 166 and still deliver the ~50000 feature notifications as needed, going back to the AWS calculator the Architect and developers recalculated the expected cost and found that the price went down to $174.08/month.
The delay to the release of the product was in 4 weeks, and the cost of the developers’ time was far less the the saving once the feature was launched.
This shows a process of optimizing in the true sense vs. the classic definition that is prevailing in the industry.
Cost management is the first step in the cloud cost journey and an important one, but it must be addressed appropriately and not mislead the end users about the actions they take.