Flexible aggregation functions

Use the group_aggregate function in ThoughtSpot to aggregate measures at different granularities than the dimensions used in the search columns.

How aggregation formulas work

Typically, the groupings and filters used in a formula use the same fields as columns returned in the search results. The concept of a grouping equates to an attribute column.

For example, in the search revenue ship mode, revenue is the measure, and ship mode is the attribute, or grouping. The result of this search shows total revenue for each ship mode:

revenue ship mode

$ a

air

$ r

rail

$ t

truck

$ s

sea transport

The aggregation formulas are described in Overview of aggregate formulas.

About flexible aggregation

ThoughtSpot provides flexible aggregation with the group_* functions. You can use group_* formulas when you want to specify columns and filters to include or ignore in your query.

The group_* formulas use a sub-query to perform these custom aggregations. If the sub-query is at a less detailed level than the original query, ThoughtSpot adds the result column to the result of original query. When the sub-query is at a finer detail level than the original query, ThoughtSpot re-aggregates the formula’s results to match the groupings of the original query.

To ensure that the level of aggregation of the sub-query is returned the sub-query grouping columns must be included in the search query and/or visualized chart columns.

To use the groups and filters, specify them using the query_groups and query_filters keywords, respectively. You can also add or exclude groups or filters.

Best practices for flexible aggregations

The group_aggregate function enables you to calculate a result at a specific aggregation level, and then returns it at a different aggregation level. For this reaggregation result to return correctly, follow these syntax guidelines:

  • Wrap group_aggregate in an aggregate function, such as sum or average.

  • The wrapping function must be the immediate preceding function, such as sum(group_aggregate(...)).

Examples

For a search on revenue monthly ship mode, you can add a formula to calculate yearly revenue by ship mode:

group_aggregate(
  sum(revenue),
  {ship mode, year(commit date)},
  {}
)

The same formula can also be written using query_groups() and query_filters() as following:

group_aggregate(
  sum(revenue),
  query_groups() - {commit date} + {year(commit date)},
  {}
)

This is helpful to include the main query groups that are not known at formula creation time. You can use +/- to modify the set of groups included from the query.

Reaggregation enhancement Beta

To reduce confusion when using aggregate formulas, the level of detail defined in group formulas is now respected at query level.

For example, when trying to calculate the contribution of each store’s sales to the entire region, you might use a group_sum or group_aggregate formula, where the sales at store level is divided by the sales at regional level. Your search would include the following columns: Revenue, Customer nation, Regional Revenue formula, Customer region, and Contribution percentage.

Here, the formula used would be either group_sum(Revenue, Customer Region) or group_aggregate(sum (Revenue), (Customer Region), query_filters()), and the Contribution percentage formula would be safe_divide(sum(Revenue), Regional Revenue formula).

According to the old reaggregation behavior, if a customer removed Customer region from the search bar, the level of detail in the formula would no longer be respected and the formula’s denominator would re-aggregate up to the total. In this case, the formula result would display revenue as a percentage of total revenue, rather than as a percentage of regional revenue. In visualizations as well, under the old behavior, charts would implicitly re-aggregate up to a percentage of the total revenue unless the user made sure to include customer region as a measure on the X-axis.

Under the new aggregation behavior, the formula result will continue to display revenue as a percentage of each region, even if you remove Customer region from the search bar. Visualizations will also display the correct aggregation even if the grouping column is not added to the X-axis.

For more information, see our video on the new reaggregation behavior:

 

With the flexibility of groupings for group formulas, the computed column created by a formula can be finer or coarser grained than the search itself. Note that ThoughtSpot by default aggregates formula results to the level of detail defined in the search bar. To increase the level of detail in your results, you must add filters to your search as well as defining them in your formula.

For example, you can have a search that shows total yearly sales and a formula that computes total sales for each month (a finer-grained calculation than the search).

In such cases, if an additional aggregation is specified by the formula, the results get reaggregated.

Reaggregation can be applied in either of these ways:

  • You can add an aggregation keyword just before a formula column in a search. For example, in this search we’ve added the keyword min just before our formula for monthly_sales:

    sum revenue yearly min monthly_sales

    where, the monthly_sales formula is written as:

    group_aggregate(
      sum(revenue),
      {start_of_month(date)},
      {}
    )
  • You can create a separate formula, such as in this search for:

    sum revenue yearly min_monthly_sales

    where, the min_monthly_sales formula is written as:

    min(monthly_sales)

If no aggregation is specified, then the search query aggregates to the level of detailed defined in the search bar. In the following search:

sum revenue yearly monthly_sales

the original query is computed at a yearly grain instead of monthly.

Reaggregation scenarios

Some scenarios require aggregation on an already aggregated result.

For example, computing minimum monthly sales per ship mode, requires two aggregations:

  • the first aggregation of sum to compute total monthly sales per ship mode.

  • the second aggregation of min to compute the minimum sale that happened for any given month for that ship mode.

An example of this is this search:

ship mode min monthly_sales

where the formula monthly_sales is written as:

group_aggregate(
  sum(revenue),
  query_groups() + {start_of_month(date)},
  {}
)

For more extensive examples of using the group-aggregate function, we encourage you to see Reaggregation scenarios in practice.

Query groups and query filters

Flexible group aggregate formulas allow for flexibility in both groupings and filters. The formulas give you the ability to specify only groupings or only filters.

Query groups

With query_groups()+ {attribute_column} or query_groups()-{attribute_column}, you can aggregate results while including/excluding a column from the original search.

The query_groups() function returns all attribute columns defined in the base search.

If, for example, you use the condition query_groups() - {region, sku, name}, this changes the level of detail for the group aggregate formula. In this scenario, region, sku, and name have been removed. If these columns are not included in the base search, then this definition is ignored.

Under the new logic, the {region, sku, name} condition fixes the level of detail for the group by columns to the columns defined.

You can combine options to both add and remove columns like in the following example:

query_groups()+ {region} - {sku, name}.

Regarding dates, when query_groups() is defined, the date period defined in the search will be passed into the group function. Assuming the search is monthly, then the group function will also be at the monthly grain. You can use date functions to change the grain, such as {start_of_year(transaction date)}.

Query filters

With query_filters()+{filter_condition} or query_filters()-{filter_condition}, users will be able to aggregate the results while including/excluding a filter condition.

Including a filter condition

Filter condition: Ship Mode='car'

For a search on Category Customer ID sales by customer id and category Ship Mode='car’, you can add a formula to calculate sales by category for each customer as:

sales by Customer ID and Category= group_aggregate (
  sum(Sales),
  {Category, Customer ID },
  query_filters()+{Ship Mode='air'}
  )

In this case, the results will be aggregated based on the dimensions: ‘Category’ and ‘Customer ID’ and filters: ‘air’ and ‘car’.

Subtracting or removing a filter condition

With query_filters()-{column}, users will be able to aggregate the results while removing any expression related to a column.

Filter condition: Ship Mode='car'

For a search on Customer ID sales by customer id and category Ship Mode='car', you can add a formula to calculate sales for each customer while ignoring the filter on a column as:

sales by Customer ID and Category= group_aggregate (
  sum(Sales),
  {Customer ID, Category },
  query_filters()-{Ship Mode}
  )

In this case, the results will be aggregated based on the dimensions in the search; Customer ID and any filter related to Ship Mode will not be considered while aggregating the results.

Group aggregation filters enhancement Beta

Specify a column in the third argument of a group_aggregate function in order to include all filters defined in the search that pertain to that column, and ignore unrelated filters. This enhancement allows the analyst to specify which fields to accept filters from, improving readability of formulas, and reducing operational change and related errors.

Beta

The group aggregation filters enhancement feature is in Beta and off by default. To enable it, contact ThoughtSpot Support. For more information about Beta features, see ThoughtSpot Software release life cycle.

For example, say that you would like to determine how many stores a certain product was sold in during a given time period, expressed as a ratio of total available stores. In order to calculate the total available stores, you require a group_aggregate function that will accept the filters associated with transaction date (for example, this year, last six months, august), and ignore filters from other fields.

Before 9.0.1.sw, this formula would look like: group_aggregate(count(storid),query_groups(),query_filters-{field1,field2,fieldN}), where users would have to use query_filters - {} to manually remove every field in the search bar that did not relate to transaction date. When adding a new unrelated filter to the search bar, users would then have to edit all existing formulas to ignore the filter.

With this feature enabled, users can simply specify the formula as group_aggregate(count(storied),query_groups, {date}), and the formula will automatically ignore all unrelated filters in the search bar.


Related information


Was this page helpful?