Reaggregation Scenario 4: Average period value for semi-additive numbers II
Semi-additive numbers may be aggregated across some, but not all, dimensions. They commonly apply to specific time positions. In this scenario, we have daily position values for home loans, and therefore cannot aggregate on the date dimension.
Here, we consider a somewhat different situation than in Reaggregation Scenario 3: Average period value for semi-additive numbers I.
In some financial circumstances, the average daily balance has to be calculated, even if the balance does not exist. For example, if a banking account was opened on the 15th of June, business requirements have to consider all the days in the same month, starting with the 1st of June. Importantly, we cannot add these ‘missing’ data rows to the data set; note that the solution used in Scenario 3 returns an average only for the period that has data, such as June 15th to 30th, not for the entire month of June. The challenge is to ensure that in the daily average formula, the denominator returns the total days in the selected period, not just the days that have transactions:
sum(loans) / sum(days_in_period)
To solve for this, consider the data model:
-
The fact table
transactions
reports the daily position for each account, and uses aloan
column. -
The dimension table
date
tracks information for each date, starting with the very first transaction, all the way through the most recent transaction. This table includes the expecteddate
column, anddays_in_period
column that has a value of 1 in each row. -
Worksheets use the
date
column with keywords such as weekly, monthly, yearly to change the selected period. -
When users run a search with the monthly keyword, the denominator must reflect the number of days in each month.
Valid solution
A valid query that meets our objective may look something like this:
The following code in the denominator definition returns the total number of days for the period, regardless whether there are transactions, or what filters apply:
group_aggregate (sum(days_in_period),{Date},{})
Resolution
-
The
sum(days_in_period)
function aggregates to:-
{Date}
No other search columns appear.
-
{}
We require the entire period, so there are no filters.
The date
keywords yearly, quarterly, monthly, and weekly apply because we use the same column in both the search and the aggregation function.So, the function results in the following output when it runs with the yearly keyword in search:
Year Result 2016
366
2017
365
2018
365
2019
365
2020
366
-
-
This data is not reaggregated because we want to return the result at the appropriate
date
level.
Alternate solution
To return only the number of days that have existing transactions, use the following code in the denominator:
sum(days_in_period)