Validity Data Quality Rules

Validity rules validate that data values conform to expected formats, patterns, and business rules. These checks ensure data meets specific format requirements, falls within acceptable value ranges, and follows defined patterns.

POST /integration/v1/native_data_quality/monitor

Numeric Column Metrics

  • invalid_count: Counts the number of rows containing values that are not valid according to specified criteria.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

Basic Usage:

{
       "metric_name": "invalid_count",
       "operator": "=",
       "threshold": "0",
       "category": "validity"
   }

With Valid Min:

 {
     "metric_name": "invalid_count",
     "operator": "<=",
     "threshold": "5",
     "configuration_keys": {
       "valid min": 0
     },
     "category": "validity"
   }

With Valid Max:

 {
     "metric_name": "invalid_count",
     "operator": "<=",
     "threshold": "5",
     "configuration_keys": {
       "valid max": 100
     },
     "category": "validity"
   }

With Valid Values:

{
     "metric_name": "invalid_count",
     "operator": "=",
     "threshold": "0",
     "configuration_keys": {
       "valid values": [1, 2, 3, 4, 5]
     },
     "category": "validity"
   }

With Invalid Values:

{
     "metric_name": "invalid_count",
     "operator": "=",
     "threshold": "0",
     "configuration_keys": {
       "invalid values": [-1, -999, 0]
     },
     "category": "validity"
   }

  • invalid_percent: Calculates the percentage of rows containing invalid values relative to the total row count.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

Basic Usage:

{
       "metric_name": "invalid_percent",
       "operator": "<=",
       "threshold": "2.0",
       "category": "validity"
   }

With Valid Min:

{
     "metric_name": "invalid_percent",
     "operator": "<",
     "threshold": "1.0",
     "configuration_keys": {
       "valid min": 18
     },
     "category": "validity"
   }

With Valid Max:

{
     "metric_name": "invalid_percent",
     "operator": "<",
     "threshold": "1.0",
     "configuration_keys": {
       "valid max": 120
     },
     "category": "validity"
   }

With Valid Values:

{
     "metric_name": "invalid_percent",
     "operator": "<",
     "threshold": "1.0",
     "configuration_keys": {
       "valid values": [18, 21, 25, 30]
     },
     "category": "validity"
   }

With Invalid Values:

{
     "metric_name": "invalid_percent",
     "operator": "<",
     "threshold": "1.0",
     "configuration_keys": {
       "invalid values": [0, -1, 999]
     },
     "category": "validity"
   }

Text Column Metrics

  • invalid_count: Counts the number of rows containing values that don't conform to specified formats or patterns.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

With Format Validation:

Optional predefined formats can be used for validation.

{
       "metric_name": "invalid_count",
       "operator": "=",
       "threshold": "0",
      "configuration_keys": {
           "valid format": "email"
       },
       "category": "validity"
   }

With Invalid Format:

 {
       "metric_name": "invalid_count",
       "operator": "=",
       "threshold": "0",
      "configuration_keys": {
           "invalid format": "email"
       },
       "category": "validity"
   }

With Valid Length:

{
       "metric_name": "invalid_count",
       "operator": "=",
       "threshold": "0",
      "configuration_keys": {
           "valid length": 10
       },
       "category": "validity"
   }

With Valid Min Length:

{
       "metric_name": "invalid_count",
       "operator": "<=",
       "threshold": "5",
      "configuration_keys": {
           "valid min length": 3
       },
       "category": "validity"
   }

With Valid Max Length:

{
       "metric_name": "invalid_count",
       "operator": "<=",
       "threshold": "5",
      "configuration_keys": {
           "valid max length": 50
       },
       "category": "validity"
   }

With Regex Validation:

{
       "metric_name": "invalid_count",
       "operator": "=",
       "threshold": "0",
      "configuration_keys": {
           "valid regex": "^[A-Z]{2}[0-9]{4}$"
       },
       "category": "validity"
   }

With Invalid Regex:

{
       "metric_name": "invalid_count",
       "operator": "=",
       "threshold": "0",
      "configuration_keys": {
           "invalid regex": "^[A-Z]{2}[0-9]{4}$"
       },
       "category": "validity"
   }

With Valid Values:

{
     "metric_name": "invalid_count",
     "operator": "<=",
     "threshold": "5",
     "configuration_keys": {
       "valid values": ["Active", "Inactive", "Pending"]
     },
     "category": "validity"
   }

With Invalid Values:

{
     "metric_name": "invalid_count",
     "operator": "<=",
     "threshold": "5",
     "configuration_keys": {
       "invalid values": ["Unknown", "Error", ""]
     },
     "category": "validity"
   }

  • invalid_percent: Calculates the percentage of rows containing invalid text values.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

Phone Number Validation:

{
       "metric_name": "invalid_percent",
       "operator": "<=",
       "threshold": "1.0",
      "configuration_keys": {
           "valid format": "phone number"
       },
       "category": "validity"
   }

Credit Card Validation:

{
       "metric_name": "invalid_percent",
       "operator": "=",
       "threshold": "0",
      "configuration_keys": {
           "valid format": "credit card number"
       },
       "category": "validity"
   }

Time Column Metrics

  • invalid_count: Counts the number of rows containing invalid time/date values.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

Basic Usage:

{
       "metric_name": "invalid_count",
       "operator": "=",
       "threshold": "0",
      "configuration_keys": {
         "valid values": ["2023-01-01", "2023-12-31"]
       },
       "category": "validity"
     }

With Invalid Values:

{
      "metric_name": "invalid_count",
      "operator": "=",
      "threshold": "0",
    "configuration_keys": {
        "invalid values": ["1900-01-01", "9999-12-31"]
      },
      "category": "validity"
    }

  • invalid_percent: Calculates the percentage of rows containing invalid time/date values.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

{
       "metric_name": "invalid_percent",
       "operator": "<=",
       "threshold": "0.5",
      "configuration_keys": {
           "invalid values": ["0000-00-00", "1900-01-01"]
       },
       "category": "validity"
   }

Custom Metrics (All Column Types)

  • sql (Custom SQL Query): Define custom validity metrics using SQL queries.

Supported Operators: =, <, >, <=, >=, !=, <>, between, not between

Required Configuration:

{
       "metric_name": "sql",
       "operator": "<=",
       "threshold": "10",
       "configuration_keys": {
           "custom_metric": "business_rule_validation",
           "query value": "SELECT COUNT(*) FROM schema.table WHERE price < 0 OR quantity < 0"
       },
       "category": "validity"
   }

Supported Formats for Valid Format and Invalid Format

Financial Formats

Format NameExampleDescription
credit card number1234 5678 9012 3456Various credit card number formats
money$123.45 USDCurrency with symbol and abbreviation
money comma$123,45 USDCurrency with comma decimal separator
money point$123.45 USDCurrency with point decimal separator

Date Formats

Format NameExampleDescription
date eu31/12/2023European date format (dd/mm/yyyy)
date us12/31/2023US date format (mm/dd/yyyy)
date inverse2023/12/31Inverse date format (yyyy/mm/dd)
date iso 86012021-04-28T09:00:00+02:00ISO 8601 standard format

Number Formats

Format NameExampleDescription
integer123Whole numbers
decimal123.45Decimal numbers (. or ,)
percentage50%Percentage values
positive integer123Positive whole numbers only
negative integer-123Negative whole numbers only

Contact Formats

Format NameExampleDescription
email[email protected]Valid email addresses
phone number+1 123-123-1234Various phone number formats

Technical Formats

Format NameExampleDescription
ip address192.168.1.1IPv4 addresses
ipv4 address192.168.1.1IPv4 addresses (explicit)
ipv6 address2001:0db8:85a3::8a2eIPv6 addresses
uuid550e8400-e29b-41d4-a716Universally unique identifiers

Time Formats

Format NameExampleDescription
time 12h02:30:45 PM12-hour format with seconds
time 24h14:30:4524-hour format with seconds
time 12h nosec02:30 PM12-hour format without seconds
time 24h nosec14:3024-hour format without seconds

Configuration Options

Text Validation Options

PropertyDescription
valid formatPredefined format from supported formats list
invalid formatPredefined format that should be considered invalid
valid regexCustom regex pattern for valid values
invalid regexCustom regex pattern for invalid values
valid valuesList of acceptable values
invalid valuesList of unacceptable values
valid lengthExact valid string length
valid min lengthMinimum valid string length
valid max lengthMaximum valid string length

Numeric Validation Options

PropertyDescription
valid valuesList of acceptable numeric values
invalid valuesList of unacceptable numeric values
valid minMinimum acceptable numeric value
valid maxMaximum acceptable numeric value