Avoiding running on the heap and CTEs vs Temporary Tables

Image for post
Image for post
Photo by Cris Ovalle on Unsplash

This post is the second in a series of small groups of two tips on improving the performance of SQL queries, in addition to useful concepts usually overlooked in the modern SQL world:

Replacing CTEs with indexed temporary tables

CTEs can be a major help in most queries, simplifying the development process and its maintenance, providing a single source of truth. However, their impact drastically depends on the SQL engine. For instance, PostgreSQL 12+ automatically materializes (allocates the output into memory) CTEs which are called more than once. Microsoft SQL Server is known to not materialize CTEs but, instead, running it as a view each time it is called, which can even result in different outputs per call — depending on the serialization level. But even if the engine materializes the CTE like PostgreSQL, what about its indexes? …


Avoiding self joins and join on operations

Image for post
Image for post
Photo by Cris Ovalle on Unsplash

Many consider SQL to be data science’s ugly brother, often dedicating little time exploring its nuances and capabilities. And yet, SQL is often the primary means of extracting and preparing information (ETL) to be consumed further on.

This series of posts attempts to shed light on some of the often misunderstood or unknown details in modern SQL which can severely hinder a query’s performance.

An introductory note is required: I am by no means an SQL expert, feedback and suggestions are more than welcome.

Replacing self joins with unbounded or n-bounded partitioned windows

The concept of self-join might be daunting for those not used to SQL’s line of thought. But even some seasoned data professionals forget there are alternatives to self-joins, which effectively prevent its complex cardinality. …


Data Science

A (long) primer on a growing requirement for Data Scientists

Image for post
Image for post

Knowledge on NoSQL databases seems to be an increasing requirement in data science applications, yet, the taxonomy is so diverse and problem-centered that it can be a challenge to grasp them. This post attempts to shed light on some of the concepts, often delving into each design’s specificities.

We start by briefly introducing NoSQL and the reasoning behind its appearance, followed by an analysis of each of the four members of the NoSQL family, their behavior, and main mechanisms, in addition to their advantages, disadvantages, and typical use cases.


Creating a data validator with dynamic and interchangeable business rules or why (T-) SQL is more powerful than we like to admit

Image for post
Image for post
Image courtesy of PNGJoy

Hardcoding. I’m sure at this point you’ve heard all about how using it will take you to coder’s hell. Experts seem to recommend avoiding it whenever possible. So why are most of our queries hardcoded? What if we could dictate the query’s predicate in a centralized, interchangeable fashion? Let us find out just how dynamic (T-)SQL is.

Note, however, that this might not be a good idea, but let's have some fun anyway.

What?

In order to illustrate the possibilities data-driven constraints introduce, we’re going to create a data validator which enforces a set of constraints (not to be confused with SQL’s own constraints), business rules or simply thresholds, which are defined on a table. In addition, the same model can be extended to define your where predicate, or any part of your query. …

About

Guilherme Banhudo

From finance to nerd

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store