Introduction
The SQL language does not provide an easy way to reuse code. Stored procedures / functions are the closest things which could be used for this task but the development experience is not great. You have to manually deploy stored procedures into database before you can use them and you need to make sure the version deployed there is consistent to what you expect based on your source code. In addition, a lot of database users do not have permissions to deploy stored procedures / functions due to internal policy rules.
CoginitiScript allows users when writing SQL following the same best practices of code organization and reuse as software developers. In addition, being stored inside the Coginiti catalog it allows you to collaborate efficiently and version your assets. By organizing SQL logic into reusable modular pieces (code blocks + packages) as a user you create a single source of truth for your analytics.
This tutorial will show you how to define a SQL code block and reuse it in downstream calculations.
For a full overview of the language features, please refer to the Language Reference documentation.
For this tutorial we’re going to pretend we are a e-commerce shop and we need to calculate RFM segmentation for our customers. Our orders table looks like following:
DROP TABLE IF EXISTS orders; CREATE TABLE orders( transaction_id INT, transaction_date TIMESTAMP, customer_email VARCHAR(255), quantity INT, unit_price DOUBLE PRECISION );
What is RFM analysis?
An RFM analysis is a way of segmenting customers using three metrics: recency (when their last purchase was), frequency (how often they purchase), and monetary value (how much they spent).
As a first step, for each customer we have to calculate those three metrics.
- Open
Catalog
tab and add new folder with nametutorial
by invoking a context menu with right mouse click and selectingNew Folder
item. - Right click on the created
tutorial
folder and selectNew Entry
menu item - Name it
getting_started
and clickCreate
button - Expand
tutorial
folder and double click ongetting_started
to open it in the editor tab. - Paste the following code into the editor and save it by clicking on
File -> Save Tab
menu item in the application menu.
#+src sql rfmValues() #+begin SELECT customer_email AS email, MAX(transaction_date) AS recent_order, COUNT(*) AS total_orders, SUM(quantity * unit_price) AS total_sales FROM orders WHERE customer_email IS NOT NULL AND unit_price != 0 GROUP BY customer_email ORDER BY customer_email #+end
To execute a given block in Coginiti application just place a cursor anywhere inside the block and hit Run at Cursor
button (assuming correspondent connection to Redshift server has been selected in the dropdown).
The query above provides the values we’ll need for our RFM analysis. The data set is now ready for RFM analysis with 5 tier groups for each dimensions (R, F and M). To calculate such quantiles we will create a separate block with a query referencing our first block.
Past following code into the editor just below previous block.
#+src sql rfmQuantiles() #+begin SELECT email, NTILE(5) OVER ( ORDER BY recent_order ) AS rfm_recency, NTILE(5) OVER ( ORDER BY total_orders ) AS rfm_frequency, NTILE(5) OVER ( ORDER BY total_sales ) AS rfm_monetary FROM {{ rfmValues() }} ORDER BY 2 DESC #+end
Please note that syntax of block invocation has changed from ${block()} to {{ block() }} in 22.5.
This block returns a dataset with customer emails alongside the quintile that each customer falls into for recency, frequency, and monetary value.
Now that we know where our customers fall generally, we can put them into correspondent segment:
#+src sql rfmSegments() #+begin SELECT email, rfm_recency, rfm_frequency, rfm_monetary, rfm_recency * 100 + rfm_frequency * 10 + rfm_monetary AS rfm_cell, CASE WHEN rfm_cell IN (355,255) THEN 'Cannot lose' WHEN rfm_cell IN (543,542,453,452) THEN 'Active fans' WHEN rfm_cell IN (525,524,515,514) THEN 'Promising newbies' WHEN rfm_cell IN (335,334,325,324) THEN 'Potential churners' ELSE 'Other' END AS rfm_segment FROM {{ rfmQuantiles() }} ORDER BY 6 DESC #+end
If we want to precalculate this data and store it as a table we could just execute the following query:
CREATE TABLE rfm_segments AS SELECT * FROM {{ rfmSegments() }};
As you can see when you write SQL queries you reference CoginitiScript blocks just like you would reference tables or views (in the same contexts). Code blocks provides a nice way of abstracting certain datasets and reusing them in the downstream calculations.