This post is the second in a series of small groups of two tips on improving the performance of SQL queries, in addition to useful concepts usually overlooked in the modern SQL world:
Similarly to previous iterations, an introductory note is required: I am by no means an SQL expert, feedback and suggestions are more than welcome.
CTEs can be a major help in most queries, simplifying the development process and its maintenance, providing a single source of truth. However, their impact drastically depends on the SQL engine. For instance, PostgreSQL 12+ automatically materializes (allocates the output into memory) CTEs which are called more than once. Microsoft SQL Server is known to not materialize CTEs but, instead, running it as a view each time it is called, which can even result in different outputs per call — depending on the serialization level. But even if the engine materializes the CTE like PostgreSQL, what about its indexes? …
Many consider SQL to be data science’s ugly brother, often dedicating little time exploring its nuances and capabilities. And yet, SQL is often the primary means of extracting and preparing information (ETL) to be consumed further on.
This series of posts attempts to shed light on some of the often misunderstood or unknown details in modern SQL which can severely hinder a query’s performance.
An introductory note is required: I am by no means an SQL expert, feedback and suggestions are more than welcome.
The concept of self-join might be daunting for those not used to SQL’s line of thought. But even some seasoned data professionals forget there are alternatives to self-joins, which effectively prevent its complex cardinality. …
Knowledge on NoSQL databases seems to be an increasing requirement in data science applications, yet, the taxonomy is so diverse and problem-centered that it can be a challenge to grasp them. This post attempts to shed light on some of the concepts, often delving into each design’s specificities.
We start by briefly introducing NoSQL and the reasoning behind its appearance, followed by an analysis of each of the four members of the NoSQL family, their behavior, and main mechanisms, in addition to their advantages, disadvantages, and typical use cases.
Hardcoding. I’m sure at this point you’ve heard all about how using it will take you to coder’s hell. Experts seem to recommend avoiding it whenever possible. So why are most of our queries hardcoded? What if we could dictate the query’s predicate in a centralized, interchangeable fashion? Let us find out just how dynamic (T-)SQL is.
Note, however, that this might not be a good idea, but let's have some fun anyway.
In order to illustrate the possibilities data-driven constraints introduce, we’re going to create a data validator which enforces a set of constraints (not to be confused with SQL’s own constraints), business rules or simply thresholds, which are defined on a table. In addition, the same model can be extended to define your where predicate, or any part of your query. …