Accuracy Data Quality Rules

Accuracy rules validate the correctness of values in your data. These checks ensure data values meet expected standards and fall within acceptable ranges.

POST /integration/v1/native_data_quality/monitor

Table-Level Accuracy Metrics

  • row_count: Validates the total number of rows in a table.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

 {
       "metric_name": "row_count",
       "operator": ">=",
       "threshold": "1000",
       "category": "accuracy"
   }

Numeric Column Metrics

  • avg (Average): Calculates the average value in a numeric column.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

{
       "metric_name": "avg",
       "operator": "between",
       "threshold": ["100", "500"],
       "category": "accuracy"
   }
  • min (Minimum): Validates the smallest value in a numeric column.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

{
       "metric_name": "min",
       "operator": ">=",
       "threshold": "0",
       "category": "accuracy"
   }
  • max (Maximum): Validates the greatest value in a numeric column.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

{
       "metric_name": "max",
       "operator": "<=",
       "threshold": "1000000",
       "category": "accuracy"
   }
  • percentile: Validates values at a specific percentile within a numeric column.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

 {
       "metric_name": "percentile",
       "operator": "<",
       "threshold": "500",
       "configuration_keys": {
           "percentile_value": 0.95
       },
       "category": "accuracy"
   }
  • stddev (Standard Deviation): Calculates the standard deviation of values in a numeric column.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

{  
       "metric_name": "stddev",  
       "operator": "\<",  
       "threshold": "10",  
       "category": "accuracy"  
  }
  • stddev_pop (Population Standard Deviation): Calculates the population standard deviation of values in a numeric column.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

 {
       "metric_name": "stddev_pop",
       "operator": "<=",
       "threshold": "15",
       "category": "accuracy"
  }
  • stddev_samp (Sample Standard Deviation): Calculates the sample standard deviation of values in a numeric column.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

{
       "metric_name": "stddev_samp",
       "operator": "between",
       "threshold": ["5", "20"],
       "category": "accuracy"
  }
  • sum: Validates the calculated sum of values in a numeric column.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

{
       "metric_name": "sum",
       "operator": ">=",
       "threshold": "10000",
       "category": "accuracy"
   }
  • variance: Calculates the variance of values in a numeric column.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

{
       "metric_name": "variance",
       "operator": "<",
       "threshold": "100",
       "category": "accuracy"
   }
  • var_pop (Population Variance): Calculates the population variance of values in a numeric column.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

- {
       "metric_name": "var_pop",
       "operator": "<=",
       "threshold": "200",
       "category": "accuracy"
   }
  • var_samp (Sample Variance): Calculates the sample variance of values in a numeric column.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

{
       "metric_name": "var_samp",
       "operator": "between",
       "threshold": ["50", "300"],
       "category": "accuracy"
   }

Text Column Metrics

  • avg_length: Validates the average length of text values in a column.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

{
       "metric_name": "avg_length",
       "operator": "between",
       "threshold": ["10", "50"],
       "category": "accuracy"
   }
  • max_length: Validates the greatest length among text values in a column.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

{
       "metric_name": "max_length",
       "operator": "<=",
       "threshold": "255",
       "category": "accuracy"
   }
  • min_length: Validates the smallest length among text values in a column.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

{
       "metric_name": "min_length",
       "operator": ">=",
       "threshold": "3",
       "category": "accuracy"
   }

Time Column Metrics

  • variance: Calculates the variance of time values.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

Note: Not supported on Redshift databases.

{
       "metric_name": "variance",
       "operator": "<",
       "threshold": "3600",
       "category": "accuracy"
   }
  • var_pop (Population Variance): Calculates the population variance of time values.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

Note: Not supported on Redshift databases.

{
       "metric_name": "var_pop",
       "operator": "<=",
       "threshold": "7200",
       "category": "accuracy"
   }

  • var_samp (Sample Variance): Calculates the sample variance of time values.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

Note: Not supported on Redshift databases.

{
       "metric_name": "var_samp",
       "operator": "between",
       "threshold": ["1800", "5400"],
       "category": "accuracy"
   }

Custom Metrics (All Column Types)

sql (Custom SQL Query): Defines custom accuracy metrics using SQL queries.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

{
       "metric_name": "sql",
       "operator": "=",
       "threshold": "100",
       "configuration_keys": {
           "custom_metric": "accuracy_validation",
           "query value": "SELECT COUNT(*) FROM schema.table WHERE column > 0"
       },
       "category": "accuracy"
   }

Sample JSON Example

[
        {
            "monitor_title": "Sales Data Accuracy Monitor",
            "monitor_description": "Monitor accuracy of sales data metrics",
            "priority": "high",
            "cron_schedule": "0 2 * * *",
            "assets": [
                {
                    "asset_otype": "DUMMY_OTYPE",
                    "asset_id": "DUMMY_OID",
                    "tables": [
                        {
                            "table_name": "sales_transactions",
                            "schema_name": "sales",
                            "ds_id": 1,
                            "table_rules": [
                                {
                                    "metric_name": "row_count",
                                    "operator": ">=",
                                    "threshold": "100",
                                    "category": "accuracy"
                                }
                            ],
                            "columns": [
                                {
                                    "column_name": "amount",
                                    "column_rules": [
                                        {
                                            "metric_name": "avg",
                                            "operator": "between",
                                            "threshold": ["50", "500"],
                                            "category": "accuracy"
                                        }
                                    ]
                                },
                                {
                                    "column_name": "customer_name",
                                    "column_rules": [
                                        {
                                            "metric_name": "avg_length",
                                            "operator": ">=",
                                            "threshold": "5",
                                            "category": "accuracy"
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]

Common Use Cases

  • Financial Data: Validate transaction amounts, account balances, interest rates
  • Customer Data: Ensure name lengths, age ranges, credit scores are accurate
  • Product Data: Validate prices, weights, dimensions within expected ranges
  • Performance Metrics: Monitor response times, processing durations, error rates
  • Statistical Analysis: Validate data distributions using variance and standard deviation