Last_value_in_period and first_value_in_period functions

The last_value_in_period and first_value_in_period functions augment the base last_value and first_value functions. Their purpose is to provide a solution for when the data engineer is unable to include zero value adjustment transactions as part of the ETL process.

Last_value_in_period and first_value_in_period functions are currently in Early Access. To enable them, contact your administrator.

These functions are the same syntax as last_value and first_valie with the inclusion of an optional fourth parameter. This parameter defines the partition of the comparison partition. That is, return the last value if the value is from the last value in the period. If it is not, then null is returned rather than the actual balance.

These functions included additional window SQL calculations. Therefore, it is anticipated that these may be less performant.

Example formula syntax
last_value(operation(measure or attribute), query_groups(), {column to order by}, [Optional query_groups()])
Without optional fourth parameter
last_value(sum(balance), query_groups(), {transaction date})
With optional fourth parameter
last_value(sum(balance), query_groups()+{partner_id}, {transaction date},{partner_id})

Parameters

You can separate the parameters in a first/last_value_in_period expression into operation, primary partition, "order by" column, and comparison partition.

Operation

For example, (sum(balance)). This operation defines the aggregation type and the measure column.

Primary partition query groups

This parameter defines which columns to partition the data by. Reference the following examples:

query_groups()

Partitions by all grouping columns in the search. This is the default property for the majority of semi-additive measures.

query_groups()+{partner_id}

Partitions by all grouping columns in the search and ensures that partner_id is always included.

{partner_id}

Partitions by specific columns.

query_groups()-{unit price, sales price}

Partitions by all grouping columns in the search with the exception of unit price and sales price.

Order by column

For example, ({transaction date}). Specifies the sorting column to determine the "last" or "first" value, typically a date.

Comparison partition query groups

Contains the columns for the date comparison partition.

Not defined

The comparison partition is the order by column.

{partner_id}

The comparison partition is the order by column and partner_id.

query_groups(fiscalYear, fiscalMonth, fiscalWeek)

The comparison partition is the order by column and optionally the columns: fiscalYear, fiscalMonth, fiscalWeek. These extras are only included if they are included in the search.

query_groups(fiscalYear, fiscalMonth, fiscalWeek)+{partner_id}

The comparison partition is the order by column and partner_id. In addition, include the optional columns.

Formula examples

Simple last date in period

For a simple last_date_in_period function, keep in mind that the requirement is that the result should return the maximum date in the period, defined by a date column. Define the period with a ThoughtSpot keyword such as yearly, monthly, or weekly. The function must verify that the partition’s date is the last date in the period. This signature is the same as the existing last_value function, therefore only the function name must be updated.

last_value_in_period ( sum ( BALANCE ) , query_groups ( ), { SnapshotDate })

Additional partition columns

Let’s say that snapshot files are received from partners at different dates. Therefore, the result should return the value from the maximum date per partner. The fourth parameter includes any mandatory columns that should be included in the comparison dates partition clause. In the majority of circumstances, these would match those defined in the second parameter. Refer to the repeated use of {partnerid}.

last_value_in_period ( sum ( BALANCE ) , query_groups ( )+{partnerid}, { SnapshotDate },{partnerid})

Date table attribute columns

Say that the data model includes a date dimensional table with columns that represent date periods, such as fiscal year, fiscal month, fiscal week. You can leverage these columns instead of the standard ThoughtSpot date periods. The comparison logic should include these attribute columns when they are included in the search. Therefore, they are optional partition-by columns.

last_value_in_period ( sum ( BALANCE ) , query_groups ( ), { SnapshotDate },query_groups(Fiscal Year, Fiscal Month, Fiscal Week))

Additional partition columns and date table attribute columns

Say that the snapshot files are received from partners at different dates. In addition, the data model includes a date dimensional table with columns that represent date periods, like fiscal year, fiscal month, and fiscal week. You want a function that returns the value from the maximum date per partner, while including the date period columns as optional partition columns.

last_value_in_period ( sum ( BALANCE ) , query_groups ( )+{partnerid}, { SnapshotDate },query_groups(Fiscal Year, Fiscal Month, Fiscal Week)+{partnerid})

Extended examples

Opening balances

In this scenario new inventory items are created throughout the year. Their opening balance should only be included if it is the first day of the period.

Formula Definition
fxOpeningBalance = first_value_in_period(sum(balance), query_groups()+{product}, {date})

For the month of March a new product, oranges, was introduced. This was on the 15th March. However, currently there are no transactions for oranges in January or February.

Table 1. Fruit table
Date Product Balance

1st January

Apples

10

1st January

Pears

10

1st January

Grapes

0

2nd January

Apples

10

2nd January

Pears

5

2nd January

Grapes

5

…​

…​

…​

31st January

Apples

5

31st January

Pears

10

31st January

Grapes

15

1st February

Apples

10

1st February

Pears

5

1st February

Grapes

20

…​

…​

…​

28th February

Apples

20

28th February

Pears

5

28th February

Grapes

0

…​

…​

…​

15th March

Apples

20

15th March

Pears

5

15th March

Grapes

0

15th March

Oranges

10

Let’s say you want to use first_value_in_period to find the opening balance from the beginning of the period defined in your table. The formula will give you the result that the opening balance is 20, defining the beginning of the period as January 1st. If you search for the opening balance by product, it will result in the following table:

Product Balance

Apples

10

Pears

10

Grapes

0

Oranges

0

The first date in the table is January 1st. First_value_in_period verifies that the first transaction for Oranges in not the 1st of January. Therefore, null is returned.

Late arriving data files with null assumed as zero

In this scenario, data files are not always received on the same date. The business requirement is to return the inventory balance for the last file received by the partner. In addition, if a product is not received in a subsequent file, it is assumed to be zero.

In the following example, Acme Industries included monitors with a balance of 5 in the snapshot file for March 14th. They provided a new file on March 15th, which did not include a line item for monitors. Therefore, the balance for monitors was assumed to be zero on March 15th. Vandaly Industrial included all the products in both files, and had a zero balance for monitors on March 15th.

Date Partner Product Balance

14th March

Acme Industries

Printers

10

14th March

Acme Industries

Monitors

5

15th March

Acme Industries

Printers

5

13th March

Vandaly Industrial

Printers

10

13th March

Vandaly Industrial

Monitors

10

14th March

Vandaly Industrial

Printers

5

14th March

Vandaly Industrial

Monitors

0

Formula definition
fxClosingBalance = last_value_in_period(sum(balance), query_groups()+{Partner}, {date},{partner})

The last date in the table is March 15th. However, the last value for Vandaly Industrial is March 14th. Last_value_in_period must return the last available date by partner, which is the 15th for Acme Industries and the 14th for Vandaly Industrial.

The following table results, with a total of 10:

Product Balance

Monitors

0

Printers

10

The balance of Monitors is expected to be zero on the 15th March. This is because the balance for Monitors from Acme Industries was provided on the 14th. The last balance value for the partition combination of Acme Industries and Monitor is the 15th, therefore zero should be assumed for Monitors on the 15th. Note that the last_value function would return the balance of 5 for Monitors.

Closing balances not supplied

In this use case, the team receives account balances every day. Closed accounts do not currently have a follow-up transaction which zeroes the account out. If the last snapshot date received for the account is not the last day of data then the balance is considered to be zero.

In the following example the account, Darren, had a final snapshot balance received on March 14th. On the 15th of March, this is considered to be a value of zero.

Table 2. Snapshot table
Date Account Balance

14th March

Darren

10

14th March

Marie

10

15th March

Marie

5

Formula definition
fxClosingBalance = last_value_in_period(sum(balance), query_groups(), {date})

This formula returns a result of 5. The last date in the table is March 15th. Only balances for this date are included in the result. Therefore, the account for Darren is not included in the total.

Table 3. Current inventory balance for each account
Account Balance

Darren

0

Marie

5

Employee headcount with Type II slowly changing attribute dimension

In the following example, the files are received every day. These files are at the employee level and indicate if the employee is a full-time employee.

At the start of the year, Sales and Marketing were separate departments. At the end of the year, there was a single department, Sales and Marketing. Field Tech is a department that exists at the end of the year but was not valid at the start. Human Resources existed at the start of the year but not at the end of the year.

date employee_id full_time department

1st Jan

EMP1001

1

Sales

1st Jan

EMP1002

1

Marketing

1st Jan

EMP1003

0

Engineering

1st Jan

EMP1004

1

Human Resources

…​

…​

…​

…​

31st Dec

EMP1001

1

Sales and Marketing

31st Dec

EMP1005

1

Engineering

31st Dec

EMP1002

1

Sales and Marketing

31st Dec

EMP1010

1

Field Tech

Formula definition
fxFTE = last_value_in_period(sum(FTE), query_groups(), {date})

Using the formula above, you can calculate the yearly headcount by department. The last date in the year is December. Therefore, only full-time employee values from this month should be included. Note that during the year, the departments, Sales, Marketing, and Human Resources were valid, however they have no valid records for December.

The following table results:

Yearly

Engineering

1

Field Tech

1

Human Resources

0

Marketing

0

Sales

0

Sales and Marketing

2

The total tallies to four full-time employees.

Available data date versus strict calendar date

These functions return the last value as determined by the resulting partition or ordering column. With data modeling, it is possible to return the value based upon the partition of available transactions (date from transaction table) or based upon a strict calendar interpretation (date from calendar table).

The example ER Diagram includes a DATE_DIM table. The join from INVENTORY_SNAPSHOT is a RIGHT OUTER JOIN. This ensures that we get all dates from the DATE_DIM table, even when there are no corresponding snapshots. This is critical for scenarios where a strict calendar date should be implemented for semi-additive measures.

Available Data Date

Logic

If no transaction exists on the first day, use the last/first available date’s balance in the period (in this case, month).

Implication

This approach assumes that balances persist and that the first recorded balance in the month is a valid proxy for the opening balance.

Use Case

Common in inventory management, banking, and operational dashboards, where the first known balance in a period is considered the starting point.

Example formula
Inventory Balance = last_value(sum(balance), query_groups(), {snapshot_date})

Strict Calendar Date

Logic

If there are no transactions on the last/first day of the period (in this case, month), then the balance is considered to be 0.

Implication

This approach is useful for businesses that define an opening balance as the value at the exact start of the period (for example, 1st Jan at 00:00:00). If no data exists for that timestamp, then no balance is recorded.

Use Case

This is often used in financial accounting where an explicit balance is required at the start of a period, and missing data means no value exists.

Example formula
Inventory Balance = last_value(sum(balance), query_groups(), {date_id})
Image showing an inventory snapshot table joined to a date_dimension table in a right outer join
Figure 1. ER Diagram

Limitations

  • Semi-additive functions cannot span multiple fact tables.

  • Semi-additive functions cannot contain only constant expressions. For example, if you create a formula with no references to a column, such as last_value(sum(1), {}, {true}), ThoughtSpot will not support the function.

  • You cannot combine different partitioning and ordering clauses in different semi-additive functions from the same table, in the same query. That is, a case where formula 1 partitions on Date and Product, and formula 2 partitions on Date, Product, and Client.

  • Average, Variance, Standard Deviation and Unique Count do not work with semi-additive functions across an attribution query. That is, multiple fact tables with at least one non-shared attribute.

  • We do not support semi-additive functions and unique count functions from the same table. Note that a work-around exists by wrapping the unique count in a group_aggregate function. For example, group_aggregate(unique_count(product),query_groups()+{},query_filters()).

  • Advanced aggregates (group, cumulative, moving and rank) cannot be used within the definition of semi-additive functions. Note they can be used to wrap these functions as outer aggregation.

  • ThoughtSpot will support first_value and last_value functions for Redshift and Google BigQuery beginning in the 10.1.0.cl release.

ThoughtSpot University

For more information, view the Introduction to Semi-additive Measures course on ThoughtSpot U.



Was this page helpful?