Varenyk Shop

Varenyk Shop is a fictional e-commerce store which sells varenyky - a traditional Ukrainian dish. In this tutorial we will be using CoginitiScript to build a transformation pipeline and transform raw sales data into a model which is ready for analytics.

 

Our raw (source) data looks like following:

We would like to load it and prepare customers and orders datasets that we want to use for reporting and analysis purposes:

Please note, that proposed solution in this tutorial is just a recommended way of how given task could be solved. CoginitiScript doesn’t force user to follow certain ways to solve problems. Users are free to come up with their own solutions.

Instructions

This tutorial is based on the Postgres database. To get up and running do following:

  • Download Coginiti catalog package from here: varenyk_shop.coginitipkg
  • Import this package into the root folder
  • Open a new query tab, select Postgres database where you want demo data to be created and run following script:
#+import "tutorials/varenyk_shop/seed"

{{ seed.Run() }};
  • Open tutorials/varenyk_shop/schedule_nightly and execute it on the same Postgres database.

Once you done with this steps, customers and orders tables should be created in your Postgres database. You can query those tables to see final result.

Reviewing project structure

Let’s review imported tutorial project and see how it is implemented.

The overall structure is following:

+-- tutorials/
    +-- varenyk_shop/
        +-- reporting/                   
            +-- customers
            +-- orders
        +-- seed/
+-- gen/
+-- person
+-- clean +-- source_data +-- sources/ +-- sources +-- schedule_nightly

We break down the project into several packages, which represent layers. sources package contains definition of the source data and reporting package contains definition of the datasets we produce that will be used for further analysis. seed package contains scripts to prepopulate database with a demo data. You’ve already used it in the instructions above.

Sources

When we work on a projects like that we always create a dedicated layer for source data. All downstream transformations then references blocks from this layer rather then using tables directly. This gives us a way to abstract our transformation logic from the physical sources which have following benefits:

  • if your sources changes, you have to change only a source layer and do not touch transformation layer at all (which is usually much more bigger).
  • your transformation logic becomes less polluted with cleansing logic, dealing with inconsistencies in source data, etc, because these operations are done in the source layer.

For this tutorial we have following source blocks defined in the tutorials/varenyk_shop/sources/sources asset:

#+src sql Customers()
#+begin
  SELECT
    id AS customer_id,
    first_name,
    last_name,
    email
  FROM
    src_customers
  ;
#+end

#+src sql Products()
#+begin
  SELECT
    id AS product_id,
    name
  FROM
    src_products
  ;
#+end

#+src sql Orders()
#+begin
  SELECT
    id AS order_id,
    order_date,
    customer_id,
    payment_type_id,
    order_status
  FROM
    src_orders
  ;
#+end

#+src sql OrderLines()
#+begin
  SELECT
    id AS order_line_id,
    order_id,
    product_id,
    quantity,
    price
  FROM
    src_order_lines
  ;
#+end

#+src sql OrderStatuses()
#+begin
  SELECT
    id AS order_status_id,
    status_name
  FROM
    src_order_statuses
  ;
#+end

#+src sql PaymentTypes()
#+begin
  SELECT
    id AS payment_type_id,
    payment_type_name
  FROM
    src_payment_types
  ;
#+end

As you can see, those blocks has a simple definition with SELECT statements where we extract only the columns we need and rename some columns.

Reporting Models

Customers

For analyzing customers data we want to produce a dataset which for each customer have following metrics calculated:

  • date of the first order
  • date of the most recent order
  • total sales amount
  • total number of orders

The first 3 metrics represent a so called RFM values:

Recency - How recently a customer has made a purchase.

Frequency - How often a customer makes a purchase.

Monetary Value - How much money a customer spends on purchases.

We decided to split calculation logic into 2 blocks. rfmValues block is used to calculate required metrics and Customers block is used to produce final result. Customers blocks has publication strategy defined on it since we want to materialize result of this block as a table in the database.

Please pay attention how we use sources here by importing sources package first and then referencing blocks from this package in our queries.

tutorials/varenyk_shop/reporting/customers

#+import "tutorials/varenyk_shop/sources"

#+src sql rfmValues()
#+meta {
  :doc "Calculates following metrics for each customer:
  - first order : when they first purchase was
  - recency : when their last purchase was
  - frequence : how often they purachase
  - monetary value : how much they spent
  "
}
#+begin
  SELECT
    customer_id,
    MIN(o.order_date) AS first_order,
    MAX(o.order_date) AS recent_order,
    COUNT(o.order_id) AS total_orders,
    SUM(ol.price) AS total_sales
  FROM
    {{ sources.Orders() }} AS o
  INNER JOIN {{ sources.OrderLines() }} AS ol
    ON ol.order_id = o.order_id
  GROUP BY
    customer_id
#+end

#+src sql Customers()
#+meta {
  :publication {
    :type "table",
    :name "customers"
  }
}
#+begin
  SELECT
    c.customer_id,
    c.first_name,
    c.last_name,
    r.first_order,
    r.recent_order,
    r.total_orders,
    r.total_sales AS ltv
  FROM
    {{ sources.Customers() }} AS c
  LEFT JOIN {{ rfmValues() }} AS r
    ON r.customer_id = c.customer_id
  ;
#+end

Orders

In the same manner we define a block to calculate orders dataset. The interesting detail here is that we dynamically generate a SQL query to build list of columns for each payment type our system support. To do this, we first define a list of supported payment types. This is done by defining a constant with paymentTypes name. Then we use CoginitiScript loop statement to iterate over the values in the paymentTypes constant and produce a correspondent column expression.

tutorials/varenyk_shop/reporting/orders

#+import "tutorials/varenyk_shop/sources"

#+const
  paymentTypes = ["credit_card", "paypal", "wire_transfer"];
#+end

#+src sql Orders()
#+meta {
  :publication {
    :type "table",
    :name "orders"
  }
}
#+begin
  SELECT
    o.order_id,
    o.customer_id,
    o.order_date,
    os.status_name AS order_status,
    #+for t : paymentTypes do
      SUM(CASE WHEN pt.payment_type_name = '{{ t }}' THEN ol.price ELSE 0 END) AS {{ t }}_amount,
    #+end
    SUM(ol.price) AS amount
  FROM
    {{ sources.Orders() }} AS o
  INNER JOIN {{ sources.OrderLines() }} AS ol
    ON ol.order_id = o.order_id
  INNER JOIN {{ sources.PaymentTypes() }} AS pt
    ON pt.payment_type_id = o.payment_type_id
  INNER JOIN {{ sources.OrderStatuses() }} AS os
    ON os.order_status_id = o.order_status
  GROUP BY
    o.order_id,
    o.customer_id,
    o.order_date,
    os.status_name
  ;
#+end

Executing publication

To execute publication we created an asset tutorials/varenyk_shop/schedule_nightly where we run publication from the reportingpackage.

#+import "std/publication"
#+import "tutorials/varenyk_shop/reporting"

{{ publication.Run(packages = [reporting]) }};

We could schedule this asset to be executed nightly using built-in scheduler. This functionality is only available in the Coginiti Team / Enterprise.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request