One of the projects was to build a time tracker that can handle tens of thousands of users. The first thing we decided is that we will not use traditional architecture, and will go serverless. Going serverless is almost always better when you work with IoT devices that can scale quickly in number.
As a base for the infrastructure we chose Azure, but the same can be done in AWS, and we have the same experience with AWS as well.
The processing starts when Azure Event Hub (or AWS Kinesis) consumes the events and saves them in bulk into Azure Blob Storage (or AWS S3).
On the next step, an event triggers Azure Data Catalog (or AWS Glue) to prepare data for further processing. This reduces the cost by 95% sometimes.
The data files that are result of pre-processing have avro format in Azure and parquet format in AWS. Both are binary formats that cannot be read by human. We chose Azure because avro is more flexible and allows to work with unstructured data better. Parquet creates many unneeded data only to allow flexibility.
After that, we used Azure Data Analytics (in AWS it would be Athena) to query the files, do analytics on the raw data and save the needed metrics into MySQL cluster. MySQL works in the both systems almost identical.
When the data is in MySQL, we use traditional computing with C# and React to show the dashboard to the customer.
Project length: 4 months
Tech stack: Azure, C#, React