Our Insights.

Covering strategic and technical aspects of analytics.

By subscribing, you agree to receive a monthly newsletter from 173Tech.

The $1 Analytics Stack.

January 10, 2022 | By Robin Watteaux.

1 Dollar Bill

Analytics infrastructures have gotten significantly cheaper over the past decade. But how cheap can they get? As it turns out, you can get a great analytics stack for less than $1 per month (yes $1!) and around $30 per month with the visualisation tool for the whole team! How do you achieve that? By leveraging the pricing structure of the various cloud and open source providers in the ecosystem.

About The $1 Analytics Stack

The first time we built this stack, we actually were amazed that you could get such great tooling for such a low monthly bill. Since then, we have implemented it for several clients. The $1 stack applies mostly to low data level ecosystems (typically in the tens of gigabytes). It works extremely well for e-commerce and B2B industries which are by nature less data hungry digital businesses, or early stage businesses which will produce less data. For example, we have multi-million yearly revenue e-commerce clients leveraging this stack.

This stack isn't only cost effective, it is a viable long term option and can get you a long way - and it is close to being a no brainer for e-commerce businesses with low data volumes. Here are a few benefits of this stack:

  • No / little vendor lock-in as the stack mostly uses open source solutions.
  • Full control over the business logic and how you report on metrics.
  • Serverless approach, making the deployment more secure and resource efficient.
  • Extremely cost efficient (obviously!)

The $1 Analytics Stack Tooling

And now, the great reveal. These are the tools to build the $1 stack:

The 1 Dollar Analytics Stack

As mentioned before, we use either open source tools or cloud providers through which billing is advantageous at low data volumes. Read on to have more information about each part of the stack.

The Analytics Warehouse

The analytics warehouse is where we will centralise all our data. For this, we will use Google BigQuery, as it is priced based on the number of terabytes (1 terabyte = 1,000 gigabytes) being stored, written (when loading data) and read (when running queries) on the database on a monthly basis. For reference, in the multi-European region, Google BigQuery prices $5 per terabyte read (the first terabyte is free) and $20 per terabyte of active storage monthly (the first 10 gigabytes are free). When your data estate is within the tens of gigabytes, this will take advantage of this pricing structure, making your warehousing solution extremely cost effective! Warehouses like Snowflake and Redshift, as well as traditional databases like PostgresQL, use completely different pricing models which make it impossible to achieve such low prices at any volume. Considering that BigQuery is the warehouse of choice for this stack, we typically use Google Cloud Platform (GCP) as the overall ecosystem for the analytics deployment.

ELT

For extract, load and transform (ELT) processes, we use our open source framework SAYN. It supports automated database extracts, allows custom extracts through Python tasks and enables automated data transformations in the warehouse. It is a very simple tool to write, orchestrate and run ELT infrastructures and abstracts a lot of data engineering processes (including building a DAG) to make things simpler for end users. As a result SAYN ELTs are extremely simple to deploy and maintain! Because SAYN is open source, it is free to use. You can read more about SAYN in this article. The code is hosted on GitHub which can be free or cost a small fee (charged by GitHub) per user depending on the plan you are on. You can add on a data extraction tool (e.g. Stitch) later to scale the number of data sources - although this will get you over the $1 mark :) .

Deployment

For this, we use a serverless deployment which has several benefits. As per the topic of this article, you likely already guessed it is more efficient. This is because the resources to run the code are only up whilst the code is running. As a result, instead of paying for a full time machine, you only pay for a few minutes of execution time on a daily basis. It is also more secure as this means you do not have a machine constantly up and running, so this is one less potential entry point to your infrastructure. For this part of the stack, we use Google Cloud Build, Google Cloud Run and Google Cloud Scheduler - the three combined will run for a few pennies a month given that the overall data processing time is extremely short. With these tools, we will effectively build a docker image and run containers on a daily basis. You can automate the Cloud Build processes for Continuous Delivery (CD) through the Google Cloud Run interface with a few clicks.

Data Testing & Continuous Integration (CI)

Yes, you can also get CI in the $1 stack! GitHub Actions provide a fairly generous amount of free actions time for your CI processes, meaning that your CI can cost you nothing on a monthly basis. GitHub Actions can replace Cloud Build for the CD to centralise all CI / CD processes but is unnecessary if you want to keep your infrastructure light and simple.

Adding Your Visualisation Layer

Your visualisation layer can also be extremely cost efficient, with great tooling! If you want to keep a low monthly bill, then you can use Metabase - an attractive open source reporting solution. If you decide to deploy yourself, you will only pay for the resources your Metabase instance consumes (likely around $25 - $50 a month at initial levels). Metabase also has a very reasonably priced managed offering for teams as well as a free desktop app for single users if you are building exploratory projects, both worth checking out!

And To Recap

This is it, you have a top notch analytics stack for $1 a month! Or $25 to $50 a month when adding the visualisation layer. To recap, here are the tools using GCP as the main cloud provider:

Happy deployment!


Subscribe to our monthly newsletter for more insights.

By subscribing, you agree to receive a monthly newsletter from 173Tech.

Get In Touch

Send us a quick message about your data challenges, we are always happy to chat!