
I hope it can save you the time of diving into a 400 page thick book, yet telling a more cohesive and compact story than reading the technical documentation.

My text editor counts almost 10000 words in this document, making it something the average reader comfortably works through in less than one hour. Complexity dropped back to the level of the early days, where we could just directly query the raw data set in its completeness, without having to worry about things being too slow or affecting customer transactions.
#Redshift foreign key code
We were even able to drop the whole intermediary roll-up layer, saving a few thousands lines of code and removing another operational concern. Except for a handful of important schema design decisions, Redshift offered surprisingly little resistance. By the time I finished writing the last chapter, we ported most of the functionality. Shortly after committing to this goal, I started working on this piece of investigative writing - writing down what we learned, tuning our own setup along the way.

Putting the work in to have a solid grasp on the fundamentals early on would definitely be a high-return investment. We got decent results very early on, but failing to fully understand all Redshift core concepts left room for improvement. It is ubiquitous in the developer world and well understood by most.Īll these parameters combined allowed us to to gain enough confidence to validate our decision to go with Redshift in just a couple of weeks. SQL is not always the most elegant language, but its declarative nature does a good job hiding most of the complexities of running massive parallel queries on a cluster of machines. In comparison to other Big Data products, Redshift supports Standard SQL.

Low operational requirements, low-cost and room to scale up to petabytes of data, reduce the buy-in needed to build a proof of concept significantly. Having experience running parts of our infrastructure on AWS, AWS is more often than not the first place we look. We settled for Amazon Redshift quite quickly. With some big and non-trivial features in the pipeline, we set out some time to port the existing functionality onto infrastructure that could handle our ever demanding needs better. Dashboarding queries had to be implemented and monitored carefully, since having a single dashboarding query go rogue could have a negative effect on transactional workloads. Having to tune schemas and indexes in a way so that they can be used to both serve transactional and dashboarding workloads makes things more complex and fragile than they should be. When you’re seeing business 24/7, the scheduled nightly roll-ups still impacted a small set of customers. These changes had a big pay-off and were stable for a good while. Here, we introduced an intermediary layer that ran nightly roll-ups so that we could query an already aggregated, highly compressed data set. They queried a much larger data set, and noticeably got slower day by day.

This hardly brought any relief for the reports though. This was a small investment, but had a big immediate impact on the performance of the dashboards. We hastily disabled features that were not mission-critical, rewrote problematic queries, fine-tuned indexes and added some caching on top. With production on fire, your first instinct is to extinguish the fire as quickly as possible. To make things even worse, because of business owners frantically refreshing their dashboards, customers would start experiencing failing transactions. The dasboards and reports would take a long time to load, or would even time out. Things started to go awry when the website started serving more and more actual paying customers. This approach was extremely simple and worked pretty well when there were only a handful of people using the system.
#Redshift foreign key software
In the state the software was in, dashboards and reports alike would query the live transactional database directly. When I joined my current employer early 2014, I was handed over the ownership of a dashboarding and reporting solution. Amazon Redshift - Fundamentals Amazon Redshift - Fundamentals
