Lineage & Dataflows Overview

Description

This API allows you to access lineage information using Lineage. Lineage information for an object contains where the object's data came from, what process/SQL query/script was used to extract the data from the sources, as well as which objects this object affects.

With the Lineage API, we have introduced DataFlow as a new 1st class object, that captures the fine-grained connections between sources and targets, and the description of the process to create such connections.

Note: If the object key doesn’t get resolved, then it results in a TMP object.

Interactions with Other Alation Features

When you make changes to dataflows using this API, the changes will appear in Alation Analytics for the affected objects.

Performance Recommendations

For each Lineage API request, we recommend including a maximum of 1,000 dataflows (100 paths and 10 dataflows in each path).

Within these recommendations, POST and PUT calls take an average of 25 seconds to execute. DELETE calls take an average of 3 seconds to execute.

Open API 3.0 Specification

Lineage V2 APIs are also described using the Open API 3.0 Specification (OAS). OAS is a broadly adopted industry standard for describing APIs.

To see the specification, replace {AlationInstanceURL} below with your Alation instance's URL and visit the link:

{AlationInstanceURL}/openapi/lineage/

NOTE:

The Open API 3.0 specification for the Lineage APIs are available from Alation
version 2020.3 and later.
The Swagger UI is not enabled by default on an Alation instance. Please set the flag
alation.feature_flags.enable_swagger to True using alation_conf.

Upload lineage information

This API lets you upload new lineage information of objects existing in/outside of Alation catalog.

NOTE:

Upload to deleted servers (data source, bi server, filesystem, etc.) is not supported.
An object that doesn't exist in Alation catalog, or is identified as deleted will be displayed with a TMP badge.

URL

POST /integration/v2/lineage/

Data Format

POST body is a single JSON.

{
    "dataflow_objects": [
        {
            "external_id": <unique identifier of dataflow object>,
            "content": <description of the process/SQL query/R script/etc.>
        },
        ...
    ],
    "paths": [
        [
            [
                {"otype": <object type>, "key": <unique key of object>},
                {"otype": <object type>, "key": <unique key of object>},
                ...
            ],
            [
                {"otype": <object type>, "key": <unique key of object>},
                {"otype": "dataflow", "key": <dataflow external id>},
                ...
            ],
            ...,
            [
                {"otype": <object type>, "key": <unique key of object>},
                ...
            ]
        ],
        ...
    ]
}

"dataflow_objects" contains information about new DataFlow objects to create. It can be omitted, if using existing DataFlow objects only:

Name	Required	Description
external_id	Yes	Unique identifier of dataflow object. It SHOULD start with "api/".
content	No	description of the process/SQL query/R script/etc.

"paths" is an array of "path"s. Each "path" specifies the details of sources (-> dataflows) -> targets lineage by listing elements of each step, or "segment", of the lineages in order. Each "segment" may contain data objects and/or dataflows, but the 1st and the last "segment" of a "path" SHOULD NOT contain any dataflows:

Name	Required	Description
otype	Yes	object type
key	Yes	unique key of object

where "otype" can be any of the following:

otype	Description
dataflow	Represents a dataflow
table	Represents a table
column	Represents a column
file	Represents a file
directory	Represents a directory
bi_report	Represents a report of a BI server
bi_report_column	Represents a column of a report of a BI server
bi_datasource	Represents a data source of a BI server
bi_datasource_column	Represents a column of a data source of a BI server
external	Placeholder for anything Alation doesn't support natively

and "key" is the unique identifier of an object. The format depends on "otype":

otype	key
dataflow	api<unique identifier of dataflow>
table	<datasource_id>.[<dbname>.]<schema_name>.<table_name> NOTE: <db_name> has to be specified for SQL Server, Redshift and Netezza.
column	<datasource_id>.[<dbname>.]<schema_name>.<table_name>.<column_name> NOTE: <db_name> has to be specified for SQL Server, Redshift and Netezza.
file	<filesystem_id>.<full path of a file delimited by '/'>
directory	<filesystem_id>.<full path of a directory delimited by '/'>
bi_report	<bi_server_id>.bi_report.<unique identifier of bi_report on the server>
bi_report_column	<bi_server_id>.bi_report_column.<unique identifier of bi_report_column on the server>
bi_datasource	<bi_server_id>.bi_datasource.<unique identifier of bi_datasource on the server>
bi_datasource_column	<bi_server_id>.bi_datasource_column.<unique identifier of bi_datasource_column on the server>
external	<unique identifier/name of external object>

Sample Request Body

{
    "dataflow_objects": [
        {
            "external_id": "api/df1_external_id",
            "content": "Combine table1 and table2, push them to table4. Do the same between table2+table3 and table5"
        }
    ],
    "paths": [
        [
            [
                {"otype": "table", "key": "1.schema.table1"},
                {"otype": "table", "key": "1.schema.table2"}
            ],
            [
                {"otype": "dataflow", "key": "api/df1_external_id"}
            ],
            [
                {"otype": "table", "key": "1.schema.table4"}
            ]
        ],
        [
            [
                {"otype": "table", "key": "1.schema.table2"},
                {"otype": "table", "key": "1.schema.table3"}
            ],
            [
                {"otype": "dataflow", "key": "api/df1_external_id"}
            ],
            [
                {"otype": "table", "key": "1.schema.table5"}
            ]
        ]
    ]
}

Headers

HTTP Header	Value
TOKEN	<your_token>

Replace <your_token> with the one which can be obtained from Get Token API call (Get API Token).

Success Response

Content-Type: application/json

Status: 200 OK

Body:

{
    "job_id": 1
}

NOTE: The response is the identifier of a job record that tracks the status of the job triggered after a successful call to the API. This job is responsible for uploading lineages to Alation. To know the status of the job, please refer Job Status API (/api/v1/bulk_metadata/job/?id=<job_id>).

Error Response

Invalid Token

Status: 401 UNAUTHORIZED

Body: Authentication failed

Missing Token Header

Status: 401 UNAUTHORIZED

Body: Authentication credentials were not provided.

Code Samples

cURL

#!/bin/bash

# This is an example token. Please replace this with yours.
API_TOKEN="2abcd-4c04-4c21-8692-eda27a877f90"

BASE_URL="https://alation.yourcompany.com/integration/v2/lineage/"

curl -X POST "${BASE_URL}" -H 'content-type: application/json' -H "TOKEN: ${API_TOKEN}" -d $'{"dataflow_objects": [{"content": "Combine table1 and table2, push them to table4. Do the same between table2+table3 and table5", "external_id": "api/df1_external_id"}], "paths": [[[{"otype": "table", "key": "1.schema.table1"}, {"otype": "table", "key": "1.schema.table2"}], [{"otype": "dataflow", "key": "api/df1_external_id"}], [{"otype": "table", "key": "1.schema.table4"}]], [[{"otype": "table", "key": "1.schema.table2"}, {"otype": "table", "key": "1.schema.table3"}], [{"otype": "dataflow", "key": "api/df1_external_id"}], [{"otype": "table", "key": "1.schema.table5"}]]]}'

Python

import requests
import json

# This is an example token. Please replace this with yours.
headers = {'Token': '2abcd-4c04-4c21-8692-eda27a877f90', 'content-type': 'application/json'}

data = json.dumps({
  'dataflow_objects': [
    {
      'external_id': 'api/df1_external_id',
      'content': 'Combine table1 and table2, push them to table4. Do the same between table2+table3 and table5'
    }
  ],
  'paths': [
    [
      [
        {'otype': 'table', 'key': '1.schema.table1'},
        {'otype': 'table', 'key': '1.schema.table2'}
      ],
      [
        {'otype': 'dataflow', 'key': 'api/df1_external_id'}
      ],
      [
        {'otype': 'table', 'key': '1.schema.table4'}
      ]
    ],
    [
      [
        {'otype': 'table', 'key': '1.schema.table2'},
        {'otype': 'table', 'key': '1.schema.table3'}
      ],
      [
        {'otype': 'dataflow', 'key': 'api/df1_external_id'}
      ],
      [
        {'otype': 'table', 'key': '1.schema.table5'}
      ]
    ]
  ]
})

# Add lineage information. This example also adds a query that created the lineage.
response = requests.post('https://alation.yourcompany.com/integration/v2/lineage/', data=data, headers=headers)
job_id = json.loads(response.content)['job_id']
print "Job id: %s" % job_id

# Check the status of the job
response = requests.get('https://alation.yourcompany.com/api/v1/bulk_metadata/job/?id=%s' % job_id, headers=headers)
job_details = json.loads(response.text)
print job_details

Delete lineage information

This API lets you delete lineage information of data objects and dataflows.

URL

DELETE /integration/v2/lineage/?<params>

<params>

Name	Required	Description
source_otype	Yes	See `otype` in Data Format section.
source_key	Yes	See `key` in Data Format section.
target_otype	Yes	See `otype` in Data Format section.
target_key	Yes	See `key` in Data Format section.

Headers

HTTP Header	Value
TOKEN	<your_token>

Replace <your_token> with the one which can be obtained from Get Token API call (Get API Token).

Success Response

Content-Type: application/json

Status: 200 OK

Body:

{
    "job_id": 1
}

NOTE: The response is the identifier of a job record that tracks the status of the job triggered after a successful call to the API. This job is responsible for deleting lineages from Alation. To know the status of the job, please refer Job Status API (/api/v1/bulk_metadata/job/?id=<job_id>).

Error Response

Invalid Token

Status: 401 UNAUTHORIZED

Body: Authentication failed

Missing Token Header

Status: 401 UNAUTHORIZED

Body: Authentication credentials were not provided.

Code Samples

cURL

#!/bin/bash

# This is an example token. Please replace this with yours.
API_TOKEN="2abcd-4c04-4c21-8692-eda27a877f90"

BASE_URL="https://alation.yourcompany.com/integration/v2/lineage/"

curl -X DELETE "${BASE_URL}?source_otype=table&source_key=1.schema.table1&target_otype=table&target_key=1.schema.table2" -H "TOKEN: ${API_TOKEN}"

Python

import requests
import json

# This is an example token. Please replace this with yours.
headers = {'Token': '2abcd-4c04-4c21-8692-eda27a877f90', 'content-type': 'application/json'}

# Delete lineage information.
response = requests.delete('https://alation.yourcompany.com/integration/v2/lineage/?source_otype=table&source_key=1.schema.table1&target_otype=table&target_key=1.schema.table2', headers=headers)
job_id = json.loads(response.content)['job_id']
print "Job id: %s" % job_id

# Check the status of the job
response = requests.get('https://alation.yourcompany.com/api/v1/bulk_metadata/job/?id=%s' % job_id, headers=headers)
job_details = json.loads(response.text)
print job_details

Get lineage information

Refer to the Open API Specification for information on Get Lineage.