Description
This API allows you to access lineage information using Lineage. Lineage information for an object contains where the object's data came from, what process/SQL query/script was used to extract the data from the sources, as well as which objects this object affects.
With the Lineage API, we have introduced DataFlow as a new 1st class object, that captures the fine-grained connections between sources and targets, and the description of the process to create such connections.
Note: If the object key doesn’t get resolved, then it results in a TMP object.
Interactions with Other Alation Features
When you make changes to dataflows using this API, the changes will appear in Alation Analytics for the affected objects.
Performance Recommendations
For each Lineage API request, we recommend including a maximum of 1,000 dataflows (100 paths and 10 dataflows in each path).
Within these recommendations, POST and PUT calls take an average of 25 seconds to execute. DELETE calls take an average of 3 seconds to execute.
Open API 3.0 Specification
Lineage V2 APIs are also described using the Open API 3.0 Specification (OAS). OAS is a broadly adopted industry standard for describing APIs.
To see the specification, replace {AlationInstanceURL}
below with your Alation instance's URL and visit the link:
{AlationInstanceURL}/openapi/lineage/
NOTE:
- The Open API 3.0 specification for the Lineage APIs are available from Alation
version 2020.3 and later. - The Swagger UI is not enabled by default on an Alation instance. Please set the flag
alation.feature_flags.enable_swagger
toTrue
usingalation_conf
.
Upload lineage information
This API lets you upload new lineage information of objects existing in/outside of Alation catalog.
NOTE:
-
Upload to deleted servers (data source, bi server, filesystem, etc.) is not supported.
-
An object that doesn't exist in Alation catalog, or is identified as deleted will be displayed with a TMP badge.
URL
POST
/integration/v2/lineage/
Data Format
POST body is a single JSON.
{
"dataflow_objects": [
{
"external_id": <unique identifier of dataflow object>,
"content": <description of the process/SQL query/R script/etc.>
},
...
],
"paths": [
[
[
{"otype": <object type>, "key": <unique key of object>},
{"otype": <object type>, "key": <unique key of object>},
...
],
[
{"otype": <object type>, "key": <unique key of object>},
{"otype": "dataflow", "key": <dataflow external id>},
...
],
...,
[
{"otype": <object type>, "key": <unique key of object>},
...
]
],
...
]
}
"dataflow_objects" contains information about new DataFlow objects to create. It can be omitted, if using existing DataFlow objects only:
Name | Required | Description |
---|---|---|
external_id | Yes | Unique identifier of dataflow object. It SHOULD start with "api/". |
content | No | description of the process/SQL query/R script/etc. |
"paths" is an array of "path"s. Each "path" specifies the details of sources (-> dataflows) -> targets lineage by listing elements of each step, or "segment", of the lineages in order. Each "segment" may contain data objects and/or dataflows, but the 1st and the last "segment" of a "path" SHOULD NOT contain any dataflows:
Name | Required | Description |
---|---|---|
otype | Yes | object type |
key | Yes | unique key of object |
where "otype" can be any of the following:
otype | Description |
---|---|
dataflow | Represents a dataflow |
table | Represents a table |
column | Represents a column |
file | Represents a file |
directory | Represents a directory |
bi_report | Represents a report of a BI server |
bi_report_column | Represents a column of a report of a BI server |
bi_datasource | Represents a data source of a BI server |
bi_datasource_column | Represents a column of a data source of a BI server |
external | Placeholder for anything Alation doesn't support natively |
and "key" is the unique identifier of an object. The format depends on "otype":
otype | key |
---|---|
dataflow | api<unique identifier of dataflow> |
table | <datasource_id>.[<dbname>.]<schema_name>.<table_name> NOTE: <db_name> has to be specified for SQL Server, Redshift and Netezza. |
column | <datasource_id>.[<dbname>.]<schema_name>.<table_name>.<column_name> NOTE: <db_name> has to be specified for SQL Server, Redshift and Netezza. |
file | <filesystem_id>.<full path of a file delimited by '/'> |
directory | <filesystem_id>.<full path of a directory delimited by '/'> |
bi_report | <bi_server_id>.bi_report.<unique identifier of bi_report on the server> |
bi_report_column | <bi_server_id>.bi_report_column.<unique identifier of bi_report_column on the server> |
bi_datasource | <bi_server_id>.bi_datasource.<unique identifier of bi_datasource on the server> |
bi_datasource_column | <bi_server_id>.bi_datasource_column.<unique identifier of bi_datasource_column on the server> |
external | <unique identifier/name of external object> |
Sample Request Body
{
"dataflow_objects": [
{
"external_id": "api/df1_external_id",
"content": "Combine table1 and table2, push them to table4. Do the same between table2+table3 and table5"
}
],
"paths": [
[
[
{"otype": "table", "key": "1.schema.table1"},
{"otype": "table", "key": "1.schema.table2"}
],
[
{"otype": "dataflow", "key": "api/df1_external_id"}
],
[
{"otype": "table", "key": "1.schema.table4"}
]
],
[
[
{"otype": "table", "key": "1.schema.table2"},
{"otype": "table", "key": "1.schema.table3"}
],
[
{"otype": "dataflow", "key": "api/df1_external_id"}
],
[
{"otype": "table", "key": "1.schema.table5"}
]
]
]
}
Headers
HTTP Header | Value |
---|---|
TOKEN | <your_token> |
Replace <your_token> with the one which can be obtained from Get Token API call (Get API Token).
Success Response
Content-Type: application/json
Status: 200 OK
Body:
{
"job_id": 1
}
NOTE: The response is the identifier of a job record that tracks the status of the job triggered after a successful call to the API. This job is responsible for uploading lineages to Alation. To know the status of the job, please refer Job Status API (/api/v1/bulk_metadata/job/?id=<job_id>).
Error Response
Invalid Token
Status: 401 UNAUTHORIZED
Body: Authentication failed
Missing Token Header
Status: 401 UNAUTHORIZED
Body: Authentication credentials were not provided.
Code Samples
cURL
#!/bin/bash
# This is an example token. Please replace this with yours.
API_TOKEN="2abcd-4c04-4c21-8692-eda27a877f90"
BASE_URL="https://alation.yourcompany.com/integration/v2/lineage/"
curl -X POST "${BASE_URL}" -H 'content-type: application/json' -H "TOKEN: ${API_TOKEN}" -d $'{"dataflow_objects": [{"content": "Combine table1 and table2, push them to table4. Do the same between table2+table3 and table5", "external_id": "api/df1_external_id"}], "paths": [[[{"otype": "table", "key": "1.schema.table1"}, {"otype": "table", "key": "1.schema.table2"}], [{"otype": "dataflow", "key": "api/df1_external_id"}], [{"otype": "table", "key": "1.schema.table4"}]], [[{"otype": "table", "key": "1.schema.table2"}, {"otype": "table", "key": "1.schema.table3"}], [{"otype": "dataflow", "key": "api/df1_external_id"}], [{"otype": "table", "key": "1.schema.table5"}]]]}'
Python
import requests
import json
# This is an example token. Please replace this with yours.
headers = {'Token': '2abcd-4c04-4c21-8692-eda27a877f90', 'content-type': 'application/json'}
data = json.dumps({
'dataflow_objects': [
{
'external_id': 'api/df1_external_id',
'content': 'Combine table1 and table2, push them to table4. Do the same between table2+table3 and table5'
}
],
'paths': [
[
[
{'otype': 'table', 'key': '1.schema.table1'},
{'otype': 'table', 'key': '1.schema.table2'}
],
[
{'otype': 'dataflow', 'key': 'api/df1_external_id'}
],
[
{'otype': 'table', 'key': '1.schema.table4'}
]
],
[
[
{'otype': 'table', 'key': '1.schema.table2'},
{'otype': 'table', 'key': '1.schema.table3'}
],
[
{'otype': 'dataflow', 'key': 'api/df1_external_id'}
],
[
{'otype': 'table', 'key': '1.schema.table5'}
]
]
]
})
# Add lineage information. This example also adds a query that created the lineage.
response = requests.post('https://alation.yourcompany.com/integration/v2/lineage/', data=data, headers=headers)
job_id = json.loads(response.content)['job_id']
print "Job id: %s" % job_id
# Check the status of the job
response = requests.get('https://alation.yourcompany.com/api/v1/bulk_metadata/job/?id=%s' % job_id, headers=headers)
job_details = json.loads(response.text)
print job_details
Delete lineage information
This API lets you delete lineage information of data objects and dataflows.
URL
DELETE
/integration/v2/lineage/?<params>
<params>
Name | Required | Description |
---|---|---|
source_otype | Yes | See otype in Data Format section. |
source_key | Yes | See key in Data Format section. |
target_otype | Yes | See otype in Data Format section. |
target_key | Yes | See key in Data Format section. |
Headers
HTTP Header | Value |
---|---|
TOKEN | <your_token> |
Replace <your_token> with the one which can be obtained from Get Token API call (Get API Token).
Success Response
Content-Type: application/json
Status: 200 OK
Body:
{
"job_id": 1
}
NOTE: The response is the identifier of a job record that tracks the status of the job triggered after a successful call to the API. This job is responsible for deleting lineages from Alation. To know the status of the job, please refer Job Status API (/api/v1/bulk_metadata/job/?id=<job_id>).
Error Response
Invalid Token
Status: 401 UNAUTHORIZED
Body: Authentication failed
Missing Token Header
Status: 401 UNAUTHORIZED
Body: Authentication credentials were not provided.
Code Samples
cURL
#!/bin/bash
# This is an example token. Please replace this with yours.
API_TOKEN="2abcd-4c04-4c21-8692-eda27a877f90"
BASE_URL="https://alation.yourcompany.com/integration/v2/lineage/"
curl -X DELETE "${BASE_URL}?source_otype=table&source_key=1.schema.table1&target_otype=table&target_key=1.schema.table2" -H "TOKEN: ${API_TOKEN}"
Python
import requests
import json
# This is an example token. Please replace this with yours.
headers = {'Token': '2abcd-4c04-4c21-8692-eda27a877f90', 'content-type': 'application/json'}
# Delete lineage information.
response = requests.delete('https://alation.yourcompany.com/integration/v2/lineage/?source_otype=table&source_key=1.schema.table1&target_otype=table&target_key=1.schema.table2', headers=headers)
job_id = json.loads(response.content)['job_id']
print "Job id: %s" % job_id
# Check the status of the job
response = requests.get('https://alation.yourcompany.com/api/v1/bulk_metadata/job/?id=%s' % job_id, headers=headers)
job_details = json.loads(response.text)
print job_details
Get lineage information
Refer to the Open API Specification for information on Get Lineage.