The APPROX_COUNT_DISTINCT function gives the approximate number of unique non-null values in a group.
It is available for the following environments:
SQL Server 2019
Azure SQL Database
Azure SQL Managed Instance
Azure Synapse Analytics
You can use it to return approximate the number of rows that are not duplicated in a large table or result set. It works similar to the COUNT() functions (when using the DISTINCT clause), but rather than showing a precise number, it returns an approximate number for faster calculations.
APPROX_COUNT_DISTINCT( ) is a function that's incredibly useful for large sets of data. It provides approximate figures returns by calculating the approximate number of unique non-null values in a group.
This function is suitable for the following conditions
Data set with millions of rows or higher
Aggregation of columns having many distinct values
Note: According to Microsoft documentation, this function returns approximately 97% probability and up to 2% error rate
Let’s calculate the Actual distinct record counts (using COUNT(DISTINCT)) and approximate count (using APPROX_COUNT_DISTINCT) function.
SELECT
COUNT(SalesOrderID) 'Actual Row Count',
COUNT(DISTINCT SalesOrderID) 'Actual Distinct Count',
APPROX_COUNT_DISTINCT(SalesOrderID) 'Approx Distinct Count'
FROM SalesLT.SalesOrderDetail
As shown below, we got the similar value using the COUNT(DISTINCT) and APPROX_COUNT_DISTINCT functions. In this case, we used SalesOrderID that contains the unique value (due to primary key column) and does not have any duplicates.
Let's run the above query for the column - UnitPrice.
SELECT
COUNT(UnitPrice) 'Actual Row Count',
COUNT(DISTINCT UnitPrice) 'Actual Distinct Count',
APPROX_COUNT_DISTINCT(UnitPrice) 'Approx Distinct Count'
FROM SalesLT.SalesOrderDetail
Here, we see that there is a slight difference in the value returned by the COUNT(DISTINCT) and APPROX_COUNT_DISTINCT functions.
You can refer to the article https://www.sqlshack.com/the-new-sql-server-2019-function-approx_count_distinct/ for more details on the Approx_Count_Distinct in SQL Server 2019.