Lineage & Dataflows Overview

Description

This API allows you to access lineage information using Lineage. Lineage information for an object contains where the object's data came from, what process/SQL query/script was used to extract the data from the sources, as well as which objects this object affects.

With the Lineage API, we have introduced DataFlow as a new 1st class object, that captures the fine-grained connections between sources and targets, and the description of the process to create such connections.

Note: If the object key doesn’t get resolved, then it results in a TMP object.

Open API 3.0 Specification

Lineage V2 APIs are also described using the Open API 3.0 Specification (OAS). OAS is a broadly adopted industry standard for describing APIs.

To see the specification, replace {AlationInstanceURL} below with your Alation instance's URL and visit the link:

{AlationInstanceURL}/openapi/lineage/

NOTE:

  1. The Open API 3.0 specification for the Lineage APIs are available from Alation
    version 2020.3 and later.
  2. The Swagger UI is not enabled by default on an Alation instance. Please set the flag
    alation.feature_flags.enable_swagger to True using alation_conf.

Upload lineage information

This API lets you upload new lineage information of objects existing in/outside of Alation catalog.

NOTE:

  1. Upload to deleted servers (data source, bi server, filesystem, etc.) is not supported.

  2. An object that doesn't exist in Alation catalog, or is identified as deleted will be displayed with a TMP badge.

URL

POST /integration/v2/lineage/

Data Format

POST body is a single JSON.

{
    "dataflow_objects": [
        {
            "external_id": <unique identifier of dataflow object>,
            "content": <description of the process/SQL query/R script/etc.>
        },
        ...
    ],
    "paths": [
        [
            [
                {"otype": <object type>, "key": <unique key of object>},
                {"otype": <object type>, "key": <unique key of object>},
                ...
            ],
            [
                {"otype": <object type>, "key": <unique key of object>},
                {"otype": "dataflow", "key": <dataflow external id>},
                ...
            ],
            ...,
            [
                {"otype": <object type>, "key": <unique key of object>},
                ...
            ]
        ],
        ...
    ]
}

"dataflow_objects" contains information about new DataFlow objects to create. It can be omitted, if using existing DataFlow objects only:

NameRequiredDescription
external_idYesUnique identifier of dataflow object. It SHOULD start with "api/".
contentNodescription of the process/SQL query/R script/etc.

"paths" is an array of "path"s. Each "path" specifies the details of sources (-> dataflows) -> targets lineage by listing elements of each step, or "segment", of the lineages in order. Each "segment" may contain data objects and/or dataflows, but the 1st and the last "segment" of a "path" SHOULD NOT contain any dataflows:

NameRequiredDescription
otypeYesobject type
keyYesunique key of object

where "otype" can be any of the following:

otypeDescription
dataflowRepresents a dataflow
tableRepresents a table
columnRepresents a column
fileRepresents a file
directoryRepresents a directory
bi_reportRepresents a report of a BI server
bi_report_columnRepresents a column of a report of a BI server
bi_datasourceRepresents a data source of a BI server
bi_datasource_columnRepresents a column of a data source of a BI server
externalPlaceholder for anything Alation doesn't support natively

and "key" is the unique identifier of an object. The format depends on "otype":

otypekey
dataflowapi/<unique identifier of dataflow>
table<datasource_id>.[<dbname>.]<schema_name>.<table_name>
NOTE: <db_name> has to be specified for SQL Server, Redshift and Netezza.
column<datasource_id>.[<dbname>.]<schema_name>.<table_name>.<column_name>
NOTE: <db_name> has to be specified for SQL Server, Redshift and Netezza.
file<filesystem_id>.<full path of a file delimited by '/'>
directory<filesystem_id>.<full path of a directory delimited by '/'>
bi_report<bi_server_id>.bi_report.<unique identifier of bi_report on the server>
bi_report_column<bi_server_id>.bi_report_column.<unique identifier of bi_report_column on the server>
bi_datasource<bi_server_id>.bi_datasource.<unique identifier of bi_datasource on the server>
bi_datasource_column<bi_server_id>.bi_datasource_column.<unique identifier of bi_datasource_column on the server>
external<unique identifier/name of external object>

Sample Request Body

{
    "dataflow_objects": [
        {
            "external_id": "api/df1_external_id",
            "content": "Combine table1 and table2, push them to table4. Do the same between table2+table3 and table5"
        }
    ],
    "paths": [
        [
            [
                {"otype": "table", "key": "1.schema.table1"},
                {"otype": "table", "key": "1.schema.table2"}
            ],
            [
                {"otype": "dataflow", "key": "api/df1_external_id"}
            ],
            [
                {"otype": "table", "key": "1.schema.table4"}
            ]
        ],
        [
            [
                {"otype": "table", "key": "1.schema.table2"},
                {"otype": "table", "key": "1.schema.table3"}
            ],
            [
                {"otype": "dataflow", "key": "api/df1_external_id"}
            ],
            [
                {"otype": "table", "key": "1.schema.table5"}
            ]
        ]
    ]
}

Headers

HTTP HeaderValue
TOKEN<your_token>

Replace <your_token> with the one which can be obtained from Get Token API call (Get API Token).

Success Response

Content-Type: application/json

Status: 200 OK

Body:

{
    "job_id": 1
}

NOTE: The response is the identifier of a job record that tracks the status of the job triggered after a successful call to the API. This job is responsible for uploading lineages to Alation. To know the status of the job, please refer Job Status API (/api/v1/bulk_metadata/job/?id=<job_id>).

Error Response

Invalid Token

Status: 401 UNAUTHORIZED

Body: Authentication failed

Missing Token Header

Status: 401 UNAUTHORIZED

Body: Authentication credentials were not provided.

Code Samples

cURL

#!/bin/bash

# This is an example token. Please replace this with yours.
API_TOKEN="2abcd-4c04-4c21-8692-eda27a877f90"

BASE_URL="https://alation.yourcompany.com/integration/v2/lineage/"

curl -X POST "${BASE_URL}" -H 'content-type: application/json' -H "TOKEN: ${API_TOKEN}" -d $'{"dataflow_objects": [{"content": "Combine table1 and table2, push them to table4. Do the same between table2+table3 and table5", "external_id": "api/df1_external_id"}], "paths": [[[{"otype": "table", "key": "1.schema.table1"}, {"otype": "table", "key": "1.schema.table2"}], [{"otype": "dataflow", "key": "api/df1_external_id"}], [{"otype": "table", "key": "1.schema.table4"}]], [[{"otype": "table", "key": "1.schema.table2"}, {"otype": "table", "key": "1.schema.table3"}], [{"otype": "dataflow", "key": "api/df1_external_id"}], [{"otype": "table", "key": "1.schema.table5"}]]]}'

Python

import requests
import json

# This is an example token. Please replace this with yours.
headers = {'Token': '2abcd-4c04-4c21-8692-eda27a877f90', 'content-type': 'application/json'}

data = json.dumps({
  'dataflow_objects': [
    {
      'external_id': 'api/df1_external_id',
      'content': 'Combine table1 and table2, push them to table4. Do the same between table2+table3 and table5'
    }
  ],
  'paths': [
    [
      [
        {'otype': 'table', 'key': '1.schema.table1'},
        {'otype': 'table', 'key': '1.schema.table2'}
      ],
      [
        {'otype': 'dataflow', 'key': 'api/df1_external_id'}
      ],
      [
        {'otype': 'table', 'key': '1.schema.table4'}
      ]
    ],
    [
      [
        {'otype': 'table', 'key': '1.schema.table2'},
        {'otype': 'table', 'key': '1.schema.table3'}
      ],
      [
        {'otype': 'dataflow', 'key': 'api/df1_external_id'}
      ],
      [
        {'otype': 'table', 'key': '1.schema.table5'}
      ]
    ]
  ]
})

# Add lineage information. This example also adds a query that created the lineage.
response = requests.post('https://alation.yourcompany.com/integration/v2/lineage/', data=data, headers=headers)
job_id = json.loads(response.content)['job_id']
print "Job id: %s" % job_id

# Check the status of the job
response = requests.get('https://alation.yourcompany.com/api/v1/bulk_metadata/job/?id=%s' % job_id, headers=headers)
job_details = json.loads(response.text)
print job_details

Delete lineage information

This API lets you delete lineage information of data objects and dataflows.

URL

DELETE /integration/v2/lineage/?<params>

<params>

NameRequiredDescription
source_otypeYesSee otype in Data Format section.
source_keyYesSee key in Data Format section.
target_otypeYesSee otype in Data Format section.
target_keyYesSee key in Data Format section.

Headers

HTTP HeaderValue
TOKEN<your_token>

Replace <your_token> with the one which can be obtained from Get Token API call (Get API Token).

Success Response

Content-Type: application/json

Status: 200 OK

Body:

{
    "job_id": 1
}

NOTE: The response is the identifier of a job record that tracks the status of the job triggered after a successful call to the API. This job is responsible for deleting lineages from Alation. To know the status of the job, please refer Job Status API (/api/v1/bulk_metadata/job/?id=<job_id>).

Error Response

Invalid Token

Status: 401 UNAUTHORIZED

Body: Authentication failed

Missing Token Header

Status: 401 UNAUTHORIZED

Body: Authentication credentials were not provided.

Code Samples

cURL

#!/bin/bash

# This is an example token. Please replace this with yours.
API_TOKEN="2abcd-4c04-4c21-8692-eda27a877f90"

BASE_URL="https://alation.yourcompany.com/integration/v2/lineage/"

curl -X DELETE "${BASE_URL}?source_otype=table&source_key=1.schema.table1&target_otype=table&target_key=1.schema.table2" -H "TOKEN: ${API_TOKEN}"

Python

import requests
import json

# This is an example token. Please replace this with yours.
headers = {'Token': '2abcd-4c04-4c21-8692-eda27a877f90', 'content-type': 'application/json'}

# Delete lineage information.
response = requests.delete('https://alation.yourcompany.com/integration/v2/lineage/?source_otype=table&source_key=1.schema.table1&target_otype=table&target_key=1.schema.table2', headers=headers)
job_id = json.loads(response.content)['job_id']
print "Job id: %s" % job_id

# Check the status of the job
response = requests.get('https://alation.yourcompany.com/api/v1/bulk_metadata/job/?id=%s' % job_id, headers=headers)
job_details = json.loads(response.text)
print job_details

Get lineage information

Refer to the Open API Specification for information on Get Lineage.