Accuracy Data Quality Rules
Accuracy rules validate the correctness of values in your data. These checks ensure data values meet expected standards and fall within acceptable ranges.
POST /integration/v1/native_data_quality/monitor
Table-Level Accuracy Metrics
row_count: Validates the total number of rows in a table.
Supported Operators: =, <, >, <=, >=, !=, <>, between, not between
{
"metric_name": "row_count",
"operator": ">=",
"threshold": "1000",
"category": "accuracy"
}
Numeric Column Metrics
avg(Average): Calculates the average value in a numeric column.
Supported Operators: =, <, >, <=, >=, !=, <>, between, not between
{
"metric_name": "avg",
"operator": "between",
"threshold": ["100", "500"],
"category": "accuracy"
}
min(Minimum): Validates the smallest value in a numeric column.
Supported Operators: =, <, >, <=, >=, !=, <>, between, not between
{
"metric_name": "min",
"operator": ">=",
"threshold": "0",
"category": "accuracy"
}
max(Maximum): Validates the greatest value in a numeric column.
Supported Operators: =, <, >, <=, >=, !=, <>, between, not between
{
"metric_name": "max",
"operator": "<=",
"threshold": "1000000",
"category": "accuracy"
}
percentile: Validates values at a specific percentile within a numeric column.
Supported Operators: =, <, >, <=, >=, !=, <>, between, not between
{
"metric_name": "percentile",
"operator": "<",
"threshold": "500",
"configuration_keys": {
"percentile_value": 0.95
},
"category": "accuracy"
}
stddev(Standard Deviation): Calculates the standard deviation of values in a numeric column.
Supported Operators: =, <, >, <=, >=, !=, <>, between, not between
{
"metric_name": "stddev",
"operator": "\<",
"threshold": "10",
"category": "accuracy"
}
stddev_pop(Population Standard Deviation): Calculates the population standard deviation of values in a numeric column.
Supported Operators: =, <, >, <=, >=, !=, <>, between, not between
{
"metric_name": "stddev_pop",
"operator": "<=",
"threshold": "15",
"category": "accuracy"
}
stddev_samp(Sample Standard Deviation): Calculates the sample standard deviation of values in a numeric column.
Supported Operators: =, <, >, <=, >=, !=, <>, between, not between
{
"metric_name": "stddev_samp",
"operator": "between",
"threshold": ["5", "20"],
"category": "accuracy"
}
sum: Validates the calculated sum of values in a numeric column.
Supported Operators: =, <, >, <=, >=, !=, <>, between, not between
{
"metric_name": "sum",
"operator": ">=",
"threshold": "10000",
"category": "accuracy"
}
variance: Calculates the variance of values in a numeric column.
Supported Operators: =, <, >, <=, >=, !=, <>, between, not between
{
"metric_name": "variance",
"operator": "<",
"threshold": "100",
"category": "accuracy"
}
var_pop(Population Variance): Calculates the population variance of values in a numeric column.
Supported Operators: =, <, >, <=, >=, !=, <>, between, not between
- {
"metric_name": "var_pop",
"operator": "<=",
"threshold": "200",
"category": "accuracy"
}
var_samp(Sample Variance): Calculates the sample variance of values in a numeric column.
Supported Operators: =, <, >, <=, >=, !=, <>, between, not between
{
"metric_name": "var_samp",
"operator": "between",
"threshold": ["50", "300"],
"category": "accuracy"
}
Text Column Metrics
avg_length: Validates the average length of text values in a column.
Supported Operators: =, <, >, <=, >=, !=, <>, between, not between
{
"metric_name": "avg_length",
"operator": "between",
"threshold": ["10", "50"],
"category": "accuracy"
}
max_length: Validates the greatest length among text values in a column.
Supported Operators: =, <, >, <=, >=, !=, <>, between, not between
{
"metric_name": "max_length",
"operator": "<=",
"threshold": "255",
"category": "accuracy"
}
min_length: Validates the smallest length among text values in a column.
Supported Operators: =, <, >, <=, >=, !=, <>, between, not between
{
"metric_name": "min_length",
"operator": ">=",
"threshold": "3",
"category": "accuracy"
}
Time Column Metrics
variance: Calculates the variance of time values.
Supported Operators: =, <, >, <=, >=, !=, <>, between, not between
Note: Not supported on Redshift databases.
{
"metric_name": "variance",
"operator": "<",
"threshold": "3600",
"category": "accuracy"
}
var_pop(Population Variance): Calculates the population variance of time values.
Supported Operators: =, <, >, <=, >=, !=, <>, between, not between
Note: Not supported on Redshift databases.
{
"metric_name": "var_pop",
"operator": "<=",
"threshold": "7200",
"category": "accuracy"
}
var_samp(Sample Variance): Calculates the sample variance of time values.
Supported Operators: =, <, >, <=, >=, !=, <>, between, not between
Note: Not supported on Redshift databases.
{
"metric_name": "var_samp",
"operator": "between",
"threshold": ["1800", "5400"],
"category": "accuracy"
}
Custom Metrics (All Column Types)
sql (Custom SQL Query): Defines custom accuracy metrics using SQL queries.
Supported Operators: =, <, >, <=, >=, !=, <>, between, not between
{
"metric_name": "sql",
"operator": "=",
"threshold": "100",
"configuration_keys": {
"custom_metric": "accuracy_validation",
"query value": "SELECT COUNT(*) FROM schema.table WHERE column > 0"
},
"category": "accuracy"
}
Sample JSON Example
[
{
"monitor_title": "Sales Data Accuracy Monitor",
"monitor_description": "Monitor accuracy of sales data metrics",
"priority": "high",
"cron_schedule": "0 2 * * *",
"assets": [
{
"asset_otype": "DUMMY_OTYPE",
"asset_id": "DUMMY_OID",
"tables": [
{
"table_name": "sales_transactions",
"schema_name": "sales",
"ds_id": 1,
"table_rules": [
{
"metric_name": "row_count",
"operator": ">=",
"threshold": "100",
"category": "accuracy"
}
],
"columns": [
{
"column_name": "amount",
"column_rules": [
{
"metric_name": "avg",
"operator": "between",
"threshold": ["50", "500"],
"category": "accuracy"
}
]
},
{
"column_name": "customer_name",
"column_rules": [
{
"metric_name": "avg_length",
"operator": ">=",
"threshold": "5",
"category": "accuracy"
}
]
}
]
}
]
}
]
}
]
Common Use Cases
- Financial Data: Validate transaction amounts, account balances, interest rates
- Customer Data: Ensure name lengths, age ranges, credit scores are accurate
- Product Data: Validate prices, weights, dimensions within expected ranges
- Performance Metrics: Monitor response times, processing durations, error rates
- Statistical Analysis: Validate data distributions using variance and standard deviation
Updated 5 days ago