{"id":1136,"date":"2023-07-04T16:13:51","date_gmt":"2023-07-04T16:13:51","guid":{"rendered":"http:\/\/oqtacore-blog-473533498.us-east-1.elb.amazonaws.com\/?p=1136"},"modified":"2023-09-22T16:22:54","modified_gmt":"2023-09-22T16:22:54","slug":"how-to-cut-down-aws-costs-by-80","status":"publish","type":"post","link":"https:\/\/oqtacore.com\/blog\/how-to-cut-down-aws-costs-by-80\/","title":{"rendered":"Dramatic AWS costs optimization in 5 steps (we saved 80%)"},"content":{"rendered":"<p>This is a story how we cut down our AWS costs by 80% in just under 2 weeks.<\/p>\n<p><!--more--><\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 counter-flat ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/oqtacore.com\/blog\/how-to-cut-down-aws-costs-by-80\/#AWS_is_a_candy_shop_for_developers\" >AWS is a candy shop for developers<\/a><\/li><li class='ez-toc-page-1'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/oqtacore.com\/blog\/how-to-cut-down-aws-costs-by-80\/#What_did_we_do_to_cut_AWS_costs\" >What did we do to cut AWS costs?<\/a><\/li><li class='ez-toc-page-1'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/oqtacore.com\/blog\/how-to-cut-down-aws-costs-by-80\/#How_do_organizations_approach_cloud_costs_optimization\" >How do organizations approach cloud costs optimization<\/a><\/li><li class='ez-toc-page-1'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/oqtacore.com\/blog\/how-to-cut-down-aws-costs-by-80\/#1_Buying_just_virtual_machines\" >1. Buying just virtual machines<\/a><\/li><li class='ez-toc-page-1'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/oqtacore.com\/blog\/how-to-cut-down-aws-costs-by-80\/#2_Review_every_request\" >2. Review every request<\/a><\/li><li class='ez-toc-page-1'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/oqtacore.com\/blog\/how-to-cut-down-aws-costs-by-80\/#3_Hire_a_FinOps_team\" >3. Hire a FinOps team<\/a><\/li><li class='ez-toc-page-1'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/oqtacore.com\/blog\/how-to-cut-down-aws-costs-by-80\/#4_Use_cloud_waste_management_software\" >4. Use cloud waste management software<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"AWS_is_a_candy_shop_for_developers\"><\/span>AWS is a candy shop for developers<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>I need to begin with some introduction. We use AWS since 2018 for all our projects, and it has worked miracles for us. We are a fully distributed team and having our own data center somewhere in the world would be problematic. It is much easier to rent resources from AWS and skip all the capital expenses.<\/p>\n<div id=\"attachment_1139\" style=\"width: 454px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-1139\" class=\"wp-image-1139\" src=\"http:\/\/blog.oqtacore.com\/wp-content\/uploads\/2023\/07\/photo_2023-07-04_16-27-35.jpg\" alt=\"AWS costs optimization\" width=\"444\" height=\"362\" srcset=\"https:\/\/oqtacore.com\/blog\/wp-content\/uploads\/2023\/07\/photo_2023-07-04_16-27-35.jpg 572w, https:\/\/oqtacore.com\/blog\/wp-content\/uploads\/2023\/07\/photo_2023-07-04_16-27-35-300x244.jpg 300w, https:\/\/oqtacore.com\/blog\/wp-content\/uploads\/2023\/07\/photo_2023-07-04_16-27-35-180x147.jpg 180w\" sizes=\"auto, (max-width: 444px) 100vw, 444px\" \/><p id=\"caption-attachment-1139\" class=\"wp-caption-text\">Negotiate buying new servers with the CFO? Nah! Uncontrollably spend thousands on AWS? Count me in!<\/p><\/div>\n<p>The problem with AWS is that developers can basically create any resources without having to approve them with our financial department. With traditional data center this is not the case &#8211; buying an additional server would need getting an invoice from the store and asking the financial department to pay for it.<\/p>\n<p>So, basically, the basis of the problem is that with AWS, the developers can just buy the resources in the amounts they want and when they want.<\/p>\n<p>&nbsp;<\/p>\n<h2><span class=\"ez-toc-section\" id=\"What_did_we_do_to_cut_AWS_costs\"><\/span><b>What did we do to cut AWS costs?<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">We are not a huge company and our AWS costs are just a little higher than $7k per month across all AWS accounts. Also it is worth mentioning that we host only DEV and QA stands, as PROD stands are paid by our customers. Our resources are mostly individual dev machines, test databases, and various custom resources for research projects such as Kinesis Firehose, Sage Maker, etc. So we have a lot of random resources that are hard to categorize, structure, predict and control.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">So, how did we tackle lowering our AWS costs?<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>First<\/strong>, we started looking into the Cost Explorer and identified the most expensive items:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">We found a Bitcoin Node that was running for the last 4 months and costing us $600\/month as it required a large SSD with additional provisioned speed. We had a small research into <a href=\"http:\/\/blog.oqtacore.com\/what-is-so-great-about-bitcoin-ordinals\/\" target=\"_blank\" rel=\"noopener\">Bitcoin Ordinals<\/a> and did not remove the machine. <\/span><\/span><br \/>\nResolution: we archived the volume (costs $6\/month) and terminated the VM.<br \/>\nSavings: $594\/month<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">We found an Nvidia Tesla GPU machine that costs $531\/month. We use it up to this day for generative AI experiments. We think of building our own app that generates text-to-video, so we need this machine.<\/span><br \/>\nResolution: moved the volume to a spot instance.<br \/>\nSavings: $360\/month<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Not the most expensive, but the most amazing finding was that we forgot to remove a demo-PROD stand in one of the unused regions where we deployed our terraform scripts to test rollout of PROD \u201cfrom scratch\u201d. <\/span><br \/>\nSavings: $340\/month.<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Many more smaller items. <\/span><br \/>\nResolutions: vary.<br \/>\n<span style=\"font-weight: 400;\">Savings: $1700\/month<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\"><strong>Second<\/strong>, we started moving everything possible to spot instances. This is a simple procedure. For an individual machine, you need to shut it down, detach the volume (remember to write down the mount path) and then terminate the machine. Then you create a new spot instance (no matter what AMI, just make sure that the CPU architecture is compatible with your previous volume). Once the spot instance is created, detach (and don\u2019t forget to delete!) the new volume and attach the previous volume on the same mount path as it were on the original machine. For Beanstalk environments, it\u2019s simpler &#8211; we just changed the capacity settings to utilize only spot instances.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Savings: $1000\/month<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Third<\/strong>, we cleared unused S3 buckets (we did some auto-trading bots that accumulated a lot of streaming data). And setup auto-removing of data in multiple S3 buckets, so that we don\u2019t store trading data for more than a year as it becomes completely obsolete and unuseful.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Savings: $300\/month<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Fourth<\/strong>, we shrank some resources. It\u2019s a matter of checking the consumed CPU and RAM, and if we see less than 50% constant use, we lower the tier.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Savings: $300\/month (would be 3x more on on-demand instances)<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Fifth<\/strong>, we set up auto-shutdown on individual machines. We created multiple lambda functions fo different types of tasks: shutdown a SageMaker Jupyter VM after 1 hour of inactivity, shutdown individual VMs, DEV and QA stands for the night period when nobody is working. These lambda functions are run on cloudwatch events daily. There are lambdas to enable DEV and QA stands as well to facilitate the process.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Savings: $500\/month<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Also, we implemented some smaller solutions for further savings, but they are not covered in this article.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">So far, we have saved about $5500 of our $7000 monthly bill, which is around 80% of all costs! I knew that we were overspending on AWS, but never knew that it was THAT much. Over the course of the year, it means about $66,000 in savings.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"How_do_organizations_approach_cloud_costs_optimization\"><\/span><b>How do organizations approach cloud costs optimization<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">After having our own experience of cloud cost optimization, I understood how important it is to carefully track cloud costs. Basically, cloud cost optimization can save enough to boost the business if you put the saved money into marketing. Or you could take it out as dividends and buy a new car. The sum is great and there many things that can be done with it.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Since it is out of question that cloud cost optimization is an absolutely needed endeavor, how do companies approach it? Let\u2019s think about ways of implementing cloud waste management, from the simplest to the most advanced.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"1_Buying_just_virtual_machines\"><\/span>1. Buying just virtual machines<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>You could approach the problem in the most traditional way possible. Deny the countless possibilities provided by AWS and just restrict your developers to buying EC2 machines.<\/p>\n<p>SQS? No. DynamoDB? No. Just use EC2 virtual machines and install everything on them.<\/p>\n<p>Pros:<\/p>\n<ul>\n<li>You can predict the spending very well, as there is a flat rate for each type of EC2 VM<\/li>\n<li>The developers will stuff the available machines with the software they need. Just like in a traditional physical on-premise data center, thus increasing the effectiveness of money spending<\/li>\n<\/ul>\n<p>Cons:<\/p>\n<ul>\n<li>You miss out the benefits of auto-scaling<\/li>\n<li>Your developers waste time on implementing things that are already there<\/li>\n<li>You miss auto-updates of software that would be applied automatically<\/li>\n<\/ul>\n<p>All-in-all, it is not a good strategy to work with the cloud as if you just rent hosting on GoDaddy.<\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"2_Review_every_request\"><\/span>2. Review every request<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>What if you allow the developers to use and scale any resources, but they have to negotiate them with the special department that controls the costs? The developers do not have their own rights to buy\/scale resources, but they can ask a special person to buy\/scale a resource for them.<\/p>\n<p>Let&#8217;s say, a developer needs a Kinesis Firehose endpoint (yes, I mention a service that you most probably have not even heard about). Would it be a simple task for the developer to explain what he\/she wants to the controller? And then the developer should also explain the reasoning behind scaling, and probably even prove that the architecture choice is good and not wasteful in terms of cost management.<\/p>\n<p>Upon providing a specific example, one could see that it just does not work this way. It could work only if the cost management team consists of experts.<\/p>\n<p>And that&#8217;s just the tip of the iceberg. Now consider:<\/p>\n<ul>\n<li>A resource becoming unneeded due to the architecture change<\/li>\n<li>A developer leaving the job and not removing the resources they used for their individual development purposes<\/li>\n<li>An emergency when a resource needs to be scaled quickly to avoid business trouble<\/li>\n<\/ul>\n<p>Pros:<\/p>\n<ul>\n<li>The developers are allowed to utilize the maximum benefits of AWS managed resources<\/li>\n<li>The spending is well-controlled<\/li>\n<\/ul>\n<p>Cons:<\/p>\n<ul>\n<li>Cloud waste still can come from non-removed unneeded resources<\/li>\n<li>The cost management team needs high level of AWS knowledge<\/li>\n<li>The bureaucracy level can damage business<\/li>\n<\/ul>\n<h2><\/h2>\n<h3><span class=\"ez-toc-section\" id=\"3_Hire_a_FinOps_team\"><\/span>3. Hire a FinOps team<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>A more advanced way would be to actually find and hire experts in AWS that would control the spending. They can use the tools that AWS provides to control spending out-of-the-box. It has:<\/p>\n<ul>\n<li>a cost explorer<\/li>\n<li>a tagging subsystem<\/li>\n<li>reserved instances<\/li>\n<li>savings plans<\/li>\n<li>cost anomalies<\/li>\n<li>much more<\/li>\n<\/ul>\n<p>These tools are not user-friendly and require a well educated personnel that knows what to do with it. However, you can actually start controlling your cloud costs. This approach requires not only tools and highly skilled workers, but also a framework in which the team works: periodic check-ups of underutilized resources, shrink&amp;clean procedures and others.<\/p>\n<p>A team that is basically DevOps with a financial conscious approach is called FinOps.<\/p>\n<p>Pros:<\/p>\n<ul>\n<li>The developers have the full power of AWS<\/li>\n<li>Small bureaucracy overhead for the developers<\/li>\n<li>The financial team has full control over the spending in various aspects: per-project, per-team, etc.<\/li>\n<li>The developers consume resources in a conscious manner<\/li>\n<\/ul>\n<p>Cons:<\/p>\n<ul>\n<li>Requires highly educated staff that mostly does not even exist yet, so you need to train one<\/li>\n<li>Vulnerable to human factor<\/li>\n<li>The reaction time is as fast as period between check-ups &#8211; an unused EC2 machine can stay on for 1-2 weeks or more<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"4_Use_cloud_waste_management_software\"><\/span>4. Use cloud waste management software<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Once you think seriously about hiring (or growing your own) FinOps team, you should also consider a 3rd party cloud cost optimization software, such as <a href=\"https:\/\/infinops.com?utm_source=oqtacore_blog\" target=\"_blank\" rel=\"noopener\">infinops.com.<\/a> It is your automatic FinOps team memeber that works 24\/7 and is not susceptible to human error. Such software automatically controls your cloud for underused resources and other known ways of saving, such as:<\/p>\n<ul>\n<li>Using spot instances<\/li>\n<li>Using reserved instances<\/li>\n<li>Reducing number of OpenSearch clusters in QA environment<\/li>\n<li>Disabling personal VMs for the night<\/li>\n<li>Auto-shutting off expensive SageMaker VMs with Jupyter<\/li>\n<li>etc<\/li>\n<\/ul>\n<p>All those tips come automatically as your system in constantly scanned for changes. And such advice can <strong>save you up to 80% of the monthly bill.<\/strong> This usually means saving at least tens of thousands of dollars over the course of year.<\/p>\n<p>Pros:<\/p>\n<ul>\n<li>Great tool for the FinOps team<\/li>\n<li>Helps beginner FinOps with optimization techniques<\/li>\n<li>Reduces the human factor<\/li>\n<li>Enforces periodic reviews of resource consumption<\/li>\n<li>Enforces tags, lifecycle management, etc<\/li>\n<li>Allows tracking multiple AWS accounts at once<\/li>\n<\/ul>\n<p>Cons:<\/p>\n<ul>\n<li>Has its own cost (usually much less than it saves)<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>This is a story how we cut down our AWS costs by 80% in just under 2 weeks.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_mo_disable_npp":"","yasr_overall_rating":0,"yasr_post_is_review":"","yasr_auto_insert_disabled":"","yasr_review_type":"","footnotes":""},"categories":[1],"tags":[],"class_list":["post-1136","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"acf":{"image":1170},"yasr_visitor_votes":{"number_of_votes":0,"sum_votes":0,"stars_attributes":{"read_only":false,"span_bottom":false}},"_links":{"self":[{"href":"https:\/\/oqtacore.com\/blog\/wp-json\/wp\/v2\/posts\/1136","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/oqtacore.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/oqtacore.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/oqtacore.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/oqtacore.com\/blog\/wp-json\/wp\/v2\/comments?post=1136"}],"version-history":[{"count":17,"href":"https:\/\/oqtacore.com\/blog\/wp-json\/wp\/v2\/posts\/1136\/revisions"}],"predecessor-version":[{"id":1171,"href":"https:\/\/oqtacore.com\/blog\/wp-json\/wp\/v2\/posts\/1136\/revisions\/1171"}],"wp:attachment":[{"href":"https:\/\/oqtacore.com\/blog\/wp-json\/wp\/v2\/media?parent=1136"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/oqtacore.com\/blog\/wp-json\/wp\/v2\/categories?post=1136"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/oqtacore.com\/blog\/wp-json\/wp\/v2\/tags?post=1136"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}