Description
This API allows you to access lineage information using Lineage. Lineage information for an object contains where the object's data came from, what process/SQL query/script was used to extract the data from the sources, as well as which objects this object affects.
With the Lineage API, we have introduced DataFlow as a new 1st class object, that captures the fine-grained connections between sources and targets, and the description of the process to create such connections.
Note: If the object key doesn’t get resolved, then it results in a TMP object.
Open API 3.0 Specification
Lineage V2 APIs are also described using the Open API 3.0 Specification (OAS). OAS is a broadly adopted industry standard for describing APIs.
To see the specification, replace {AlationInstanceURL}
below with your Alation instance's URL and visit the link:
{AlationInstanceURL}/openapi/lineage/
NOTE:
- The Open API 3.0 specification for the Lineage APIs are available from Alation
version 2020.3 and later. - The Swagger UI is not enabled by default on an Alation instance. Please set the flag
alation.feature_flags.enable_swagger
toTrue
usingalation_conf
.
Upload lineage information
This API lets you upload new lineage information of objects existing in/outside of Alation catalog.
NOTE:
-
Upload to deleted servers (data source, bi server, filesystem, etc.) is not supported.
-
An object that doesn't exist in Alation catalog, or is identified as deleted will be displayed with a TMP badge.
URL
POST
/integration/v2/lineage/
Data Format
POST body is a single JSON.
{
"dataflow_objects": [
{
"external_id": <unique identifier of dataflow object>,
"content": <description of the process/SQL query/R script/etc.>
},
...
],
"paths": [
[
[
{"otype": <object type>, "key": <unique key of object>},
{"otype": <object type>, "key": <unique key of object>},
...
],
[
{"otype": <object type>, "key": <unique key of object>},
{"otype": "dataflow", "key": <dataflow external id>},
...
],
...,
[
{"otype": <object type>, "key": <unique key of object>},
...
]
],
...
]
}
"dataflow_objects" contains information about new DataFlow objects to create. It can be omitted, if using existing DataFlow objects only:
Name | Required | Description |
---|---|---|
external_id | Yes | Unique identifier of dataflow object. It SHOULD start with "api/". |
content | No | description of the process/SQL query/R script/etc. |
"paths" is an array of "path"s. Each "path" specifies the details of sources (-> dataflows) -> targets lineage by listing elements of each step, or "segment", of the lineages in order. Each "segment" may contain data objects and/or dataflows, but the 1st and the last "segment" of a "path" SHOULD NOT contain any dataflows:
Name | Required | Description |
---|---|---|
otype | Yes | object type |
key | Yes | unique key of object |
where "otype" can be any of the following:
otype | Description |
---|---|
dataflow | Represents a dataflow |
table | Represents a table |
column | Represents a column |
file | Represents a file |
directory | Represents a directory |
bi_report | Represents a report of a BI server |
bi_report_column | Represents a column of a report of a BI server |
bi_datasource | Represents a data source of a BI server |
bi_datasource_column | Represents a column of a data source of a BI server |
external | Placeholder for anything Alation doesn't support natively |
and "key" is the unique identifier of an object. The format depends on "otype":
otype | key |
---|---|
dataflow | api/<unique identifier of dataflow> |
table | <datasource_id>.[<dbname>.]<schema_name>.<table_name> NOTE: <db_name> has to be specified for SQL Server, Redshift and Netezza. |
column | <datasource_id>.[<dbname>.]<schema_name>.<table_name>.<column_name> NOTE: <db_name> has to be specified for SQL Server, Redshift and Netezza. |
file | <filesystem_id>.<full path of a file delimited by '/'> |
directory | <filesystem_id>.<full path of a directory delimited by '/'> |
bi_report | <bi_server_id>.bi_report.<unique identifier of bi_report on the server> |
bi_report_column | <bi_server_id>.bi_report_column.<unique identifier of bi_report_column on the server> |
bi_datasource | <bi_server_id>.bi_datasource.<unique identifier of bi_datasource on the server> |
bi_datasource_column | <bi_server_id>.bi_datasource_column.<unique identifier of bi_datasource_column on the server> |
external | <unique identifier/name of external object> |
Sample Request Body
{
"dataflow_objects": [
{
"external_id": "api/df1_external_id",
"content": "Combine table1 and table2, push them to table4. Do the same between table2+table3 and table5"
}
],
"paths": [
[
[
{"otype": "table", "key": "1.schema.table1"},
{"otype": "table", "key": "1.schema.table2"}
],
[
{"otype": "dataflow", "key": "api/df1_external_id"}
],
[
{"otype": "table", "key": "1.schema.table4"}
]
],
[
[
{"otype": "table", "key": "1.schema.table2"},
{"otype": "table", "key": "1.schema.table3"}
],
[
{"otype": "dataflow", "key": "api/df1_external_id"}
],
[
{"otype": "table", "key": "1.schema.table5"}
]
]
]
}
Headers
HTTP Header | Value |
---|---|
TOKEN | <your_token> |
Replace <your_token> with the one which can be obtained from Get Token API call (Get API Token).
Success Response
Content-Type: application/json
Status: 200 OK
Body:
{
"job_id": 1
}
NOTE: The response is the identifier of a job record that tracks the status of the job triggered after a successful call to the API. This job is responsible for uploading lineages to Alation. To know the status of the job, please refer Job Status API (/api/v1/bulk_metadata/job/?id=<job_id>).
Error Response
Invalid Token
Status: 401 UNAUTHORIZED
Body: Authentication failed
Missing Token Header
Status: 401 UNAUTHORIZED
Body: Authentication credentials were not provided.
Code Samples
cURL
#!/bin/bash
# This is an example token. Please replace this with yours.
API_TOKEN="2abcd-4c04-4c21-8692-eda27a877f90"
BASE_URL="https://alation.yourcompany.com/integration/v2/lineage/"
curl -X POST "${BASE_URL}" -H 'content-type: application/json' -H "TOKEN: ${API_TOKEN}" -d $'{"dataflow_objects": [{"content": "Combine table1 and table2, push them to table4. Do the same between table2+table3 and table5", "external_id": "api/df1_external_id"}], "paths": [[[{"otype": "table", "key": "1.schema.table1"}, {"otype": "table", "key": "1.schema.table2"}], [{"otype": "dataflow", "key": "api/df1_external_id"}], [{"otype": "table", "key": "1.schema.table4"}]], [[{"otype": "table", "key": "1.schema.table2"}, {"otype": "table", "key": "1.schema.table3"}], [{"otype": "dataflow", "key": "api/df1_external_id"}], [{"otype": "table", "key": "1.schema.table5"}]]]}'
Python
import requests
import json
# This is an example token. Please replace this with yours.
headers = {'Token': '2abcd-4c04-4c21-8692-eda27a877f90', 'content-type': 'application/json'}
data = json.dumps({
'dataflow_objects': [
{
'external_id': 'api/df1_external_id',
'content': 'Combine table1 and table2, push them to table4. Do the same between table2+table3 and table5'
}
],
'paths': [
[
[
{'otype': 'table', 'key': '1.schema.table1'},
{'otype': 'table', 'key': '1.schema.table2'}
],
[
{'otype': 'dataflow', 'key': 'api/df1_external_id'}
],
[
{'otype': 'table', 'key': '1.schema.table4'}
]
],
[
[
{'otype': 'table', 'key': '1.schema.table2'},
{'otype': 'table', 'key': '1.schema.table3'}
],
[
{'otype': 'dataflow', 'key': 'api/df1_external_id'}
],
[
{'otype': 'table', 'key': '1.schema.table5'}
]
]
]
})
# Add lineage information. This example also adds a query that created the lineage.
response = requests.post('https://alation.yourcompany.com/integration/v2/lineage/', data=data, headers=headers)
job_id = json.loads(response.content)['job_id']
print "Job id: %s" % job_id
# Check the status of the job
response = requests.get('https://alation.yourcompany.com/api/v1/bulk_metadata/job/?id=%s' % job_id, headers=headers)
job_details = json.loads(response.text)
print job_details
Delete lineage information
This API lets you delete lineage information of data objects and dataflows.
URL
DELETE
/integration/v2/lineage/?<params>
<params>
Name | Required | Description |
---|---|---|
source_otype | Yes | See otype in Data Format section. |
source_key | Yes | See key in Data Format section. |
target_otype | Yes | See otype in Data Format section. |
target_key | Yes | See key in Data Format section. |
Headers
HTTP Header | Value |
---|---|
TOKEN | <your_token> |
Replace <your_token> with the one which can be obtained from Get Token API call (Get API Token).
Success Response
Content-Type: application/json
Status: 200 OK
Body:
{
"job_id": 1
}
NOTE: The response is the identifier of a job record that tracks the status of the job triggered after a successful call to the API. This job is responsible for deleting lineages from Alation. To know the status of the job, please refer Job Status API (/api/v1/bulk_metadata/job/?id=<job_id>).
Error Response
Invalid Token
Status: 401 UNAUTHORIZED
Body: Authentication failed
Missing Token Header
Status: 401 UNAUTHORIZED
Body: Authentication credentials were not provided.
Code Samples
cURL
#!/bin/bash
# This is an example token. Please replace this with yours.
API_TOKEN="2abcd-4c04-4c21-8692-eda27a877f90"
BASE_URL="https://alation.yourcompany.com/integration/v2/lineage/"
curl -X DELETE "${BASE_URL}?source_otype=table&source_key=1.schema.table1&target_otype=table&target_key=1.schema.table2" -H "TOKEN: ${API_TOKEN}"
Python
import requests
import json
# This is an example token. Please replace this with yours.
headers = {'Token': '2abcd-4c04-4c21-8692-eda27a877f90', 'content-type': 'application/json'}
# Delete lineage information.
response = requests.delete('https://alation.yourcompany.com/integration/v2/lineage/?source_otype=table&source_key=1.schema.table1&target_otype=table&target_key=1.schema.table2', headers=headers)
job_id = json.loads(response.content)['job_id']
print "Job id: %s" % job_id
# Check the status of the job
response = requests.get('https://alation.yourcompany.com/api/v1/bulk_metadata/job/?id=%s' % job_id, headers=headers)
job_details = json.loads(response.text)
print job_details
Get lineage information
Refer to the Open API Specification for information on Get Lineage.