Virtual Data Source

Upload Virtual Data Source

Description

This API allows users to update a data source's technical metadata. The technical metadata includes names and types for schemas, tables, columns and indices.

Users can perform the following operations:

  1. Add Technical Metadata
  2. Delete Technical Metadata

To upload logical metadata (titles, descriptions, stewards, and custom fields), see Upload Logical Metadata.

Note: Only admin users can update a data source's metadata using this API.

Add Technical Metadata

This API allows you to add a list of schemas, tables, columns to the data source and indices to the table objects.

URL

POST /api/v1/bulk\_metadata/extraction/**<ds\_id>**?**<params>**

Replace <params> with your list of parameters. Multiple parameters can be combined as PARAM1&PARAM2

Replace <ds_id> with the unique identifier of the data source inside Alation catalog.

Note: Chunked encoding is not supported.

URL Parameters

NameDescription
set_title_descsThis parameter specifies if the title and description of the object passed should be updated. Passing set_title_descs=true, will add a title and description to the specified object, if the object is being newly added to a database's metadata. If the object already exists, they will be updated.
The value for this parameter should be one of the following:
true
false
NOTE: This parameter is available to use from version 4.16.0. Previous versions should use get_title_descs parameter instead of set_title_descs for same functionality.

Sample Request Body

{"key": "7.schema_a.table_a", "table_type": "TABLE"}
{"key": "7.schema_a.table_b", "table_type": "VIEW"}
{"key": "7.schema_a.table_a.column1", "column_type": "int"}
{"key": "7.schema_a.table_b.column1", "column_type": "char"}
{"key": "7.schema_a.table_b.index1", "index_type": "PRIMARY", "column_names": ["column1"]}

NOTE: The request body can have multiple JSON objects split by a new line. Each JSON object on a line specifies an object. To avoid JSON parsing errors, please put the entire information for an object in a single line.

Fields applicable for all Object Types

NameRequiredValue
keyYesPeriod delimited, Fully Qualified Name of a schema, table, column or an index.

Fields applicable for Table/View

NameRequiredValue
table_typeNoThe type of the table.
The value for this parameter can be one of the following:
TABLE
VIEW
NOTE: Specifying {"table_type": "VIEW"} creates a view. If no table_type is provided, it is assumed that the object passed is a table.
data_locationNoA URI or file path to the location of the underlying data, such as an HDFS URL for a Hive table.
NOTE: Use this parameter when the object specified is a TABLE.
db_ownerNoName of the database account that owns this table.
definition_sqlNoCREATE TABLE statement which was used to create the table.
NOTE: Use this parameter only when the object specified is a TABLE.
view_sqlNoCREATE VIEW statement which was used to create the view.
NOTE: Use this parameter only when the object specified is a VIEW.
view_sql_expandedNoCREATE VIEW statement with fully qualified object references.
NOTE: Use this parameter only when the object specified is a VIEW.
constraint_textNoConstraint statements which are enforced by the DB.
NOTE: Use this parameter when the object specified is a TABLE.
ts_createdNoTimestamp at which the table or view was created.
Example: 2018-03-13T22:09:33Z
Please ensure the timestamp is in UTC and follows the format:
YYYY-MM-DDThh:mm:ssZ
ts_last_alteredNoTimestamp of the last ALTER statement executed against this table.
Example: 2018-03-13T22:09:33Z
Please ensure the timestamp is in UTC and follows the format:
YYYY-MM-DDThh:mm:ssZ
partitioning_attributesNoAn array of columns which are used to partition the table.
Example: ["column1", "column2"]
bucket_attributesNoAn array of columns which are used to bucket the table (in data sources like Hive, bucketing is an alternative mechanism to partitioning for grouping similar data together: LanguageManualDDL-BucketedTables).
Example: ["column1", "column2"]
sort_attributesNoAn array of columns used to sort the table (in Hive, used with bucketing to store data for faster computation: LanguageManualDDL-BucketedSortedTables).
Example: ["column1", "column2"]
synonymsNoAn array of other names that can be used to refer to this table. Each synonym is represented as a JSON comprising schema_name and table_name.
Example: [{"schema_name": "schema_a","table_name": "table_a"}, {"schema_name": "schema_b","table_name": "table_b"}]
skews_infoNoA JSON of the skew column names to an array of their respective skewed column values that appear very often.
Example:{"column1": ["column1_value1", "column1_value2"], "column2": ["column2_value1", "column2_value2"]}
table_commentNoA comment field that stores a description of the table which is ingested from the source system.
Example: "This Table is created by ELT"

Fields applicable for Column

NameRequiredValue
column_typeNoThe type of the column. The value for this parameter can be any of the column types supported by the underlying database.
NOTE: When specifying a column, please make sure the table it corresponds to is already a part of the database's metadata.
positionNoPosition of the column in the table which contains it.
NOTE:
1) This value needs to be a positive integer.
2) When specifying a column, please make sure the table it corresponds to is already a part of the database's metadata.
column_commentNoA comment field that stores a description of the column which is ingested from the source system.
Example: "Column has a default value"
nullableNoField to indicate if the column can be nullable.
Set this to true if the column is a nullable field, false otherwise.
Default value: true.
Example: true

Fields applicable for Index

NameRequiredValue
index_typeYesThe type of the index. The value for this field can be one among: ‘PRIMARY’, ‘SECONDARY’, ‘PARTITIONED_PRIMARY’, ‘UNIQUE’, ‘OTHER’.
Example: "PRIMARY"
The presence of this field distinguishes index object from a column.
NOTE:
1) When specifying an index, please make sure the table it corresponds to is already a part of the database's metadata.
2) Even in case of index upsert(details below), this field is required.
column_namesYesAn array of column names on which the index is defined.
Example: ["column1"]
If the index is composite, this array will have multiple column names.
NOTE:
1) This cannot be an empty array.
2) When specifying an index, please make sure the columns it corresponds to is already a part of the database's metadata.
3) In case of index upsert(details below), this field can be optional.
4) The order in which the column names are specified is important. Since this implies the sequencing of the column in case of composite indices.
data_structureNoThe underlying data structure used by the index. The value for this field can be one among: ‘BTREE’, ‘HASH’, ’BITMAP’, ‘DENSE’, ‘SPARSE’, ‘REVERSE’, ‘OTHER’, ‘NONE’.
Example: "BTREE"
Default: "NONE"
index_type_detailNoA string containing custom detailed information about the index.
Example: "MULTI_COLUMN_STATISTICS"
is_ascendingNoSet this boolean to True, if the index is created in ascending order, else set False.
NOTE: This is not valid for composite index.
filter_conditionNoFilter condition used while creating index for a portion of rows in the table.
Example: "([filteredIndexCol]>(0))"
NOTE: This is not valid for composite index.
is_foreign_keyNoSet this boolean to True, if the index is a foreign key.
NOTE: When this is true, fields: ‘foreign_key_table_name‘ and ‘foreign_key_column_names‘ are required.
foreign_key_table_nameNoThe key of the parent table object which the foreign index refers to.
Example: "7.schema_a.table_a"
NOTE: This is required only if ‘is_foreign_key‘ is set to True.
Please make sure the table it corresponds to is already a part of the database's metadata.
foreign_key_column_namesNoAn array of column names on the parent table object which the foreign index refers to.
Example: ["column1"]
NOTE:
1) This is required only if ‘is_foreign_key‘ is set to True.
2) Please make sure the columns it corresponds to is already a part of the database's metadata.
3) The number of columns here should match the number of columns in ‘column_names’ field.

NOTE: You can have newline within the text values of the above field by enclosing the newline character \n wherever needed.
Example: "view_sql": "CREATE VIEW view_name AS \n SELECT column1, column2 FROM table_name WHERE condition;"

Headers

HTTP HeaderValue
TOKEN<your_token>

Replace <your_token> with API Token which can be obtained from getToken API call (Get API Token).

Success Response

Content-Type: application/json

Status: 200 OK

Body

{
    "job_name":"MetadataExtraction#1_Virtual_2017-01-12-23-43-00-876997-00-00-140650af-37d5-4cbe-a644-bb7ee781ddb2"
}

NOTE: The response returned is a job_name that represents an update metadata job triggered after a successful call to the API. This job is responsible for updating the technical metadata. To know the status of a job, please refer Job Status API.

Add/Update/Delete Properties per Technical Metadata Object

Users will be able to upload only some of the properties for a key and can later update the same key with new properties/update existing property values/delete existing property values.
So you might do one upload just to add view_sql to all views and another upload to add ts_created, this upload will retain the previously added view_sql if you omit that column.

Add properties:

Let us upload a TABLE, COLUMN and INDEX object with some of their properties.

Sample Request Body

{"key": "7.schema_a.table_a", "table_type": "TABLE", "db_owner": "alation", "definition_sql": "create table schema_a.table_a(column1 int);", "constraint_text": "column_a UNIQUE", "ts_created": "2018-03-13T22:09:33Z", "ts_last_altered": "2018-03-13T22:09:33Z", "partitioning_attributes": ["column1", "column2"], "bucket_attributes": ["column1", "column2"], "sort_attributes": ["column1", "column2"], "synonyms":  [{"schema_name": "schema_a","table_name": "table_a"}, {"schema_name": "schema_b","table_name": "table_b"}], "skews_info": {"column1": ["column1_value1", "column1_value2"], "column2": ["column2_value1", "column2_value2"]}}
{"key": "7.schema_a.table_a.column1", "column_type": "int"}
{"key": "7.schema_a.table_b", "table_type": "TABLE", "db_owner": "alation", "definition_sql": "create table schema_a.table_a(column1 int);", "constraint_text": "column_a UNIQUE", "ts_created": "2018-03-13T22:09:33Z", "ts_last_altered": "2018-03-13T22:09:33Z", "partitioning_attributes": ["column1", "column2"], "bucket_attributes": ["column1", "column2"], "sort_attributes": ["column1", "column2"], "synonyms":  [{"schema_name": "schema_a","table_name": "table_a"}, {"schema_name": "schema_b","table_name": "table_b"}], "skews_info": {"column1": ["column1_value1", "column1_value2"], "column2": ["column2_value1", "column2_value2"]}}
{"key": "7.schema_a.table_b.column1", "column_type": "int"}
{"key": "7.schema_a.table_b.index1", "index_type": "SECONDARY", "column_names": ["column1"], "data_structure": "BTREE", "index_type_detail": "MULTI_COLUMN_STATISTICS", "is_ascending": true, "is_foreign_key": true, "foreign_key_table_name": "7.schema_a.table_a", "foreign_key_column_names": ["column1"]}

Update properties of an existing object:

While creating the above TABLE object we have not mentioned data_location property, we can now add that property and update existing property db_owner's value. For the COLUMN object, we can add position and update column_type.
Similarly for index, we can add filter_condition and update existing property data_structure as follows:

Sample Request Body

{"key": "7.schema_a.table_b", "table_type": "TABLE", "data_location": "hdfs:///user/hive/warehouse/table_a", "db_owner": "new_owner"}
{"key": "7.schema_a.table_b.column1", "column_type": "map<int,string>", "position": 2}
{"key": "7.schema_a.table_b.index1", "index_type": "SECONDARY", "data_structure": "HASH", "filter_condition": "([filteredIndexCol]>(0))"}

NOTE:
All the properties mentioned while adding the object for the first time are retained.
Index type and is_foreign_key cannot be changed once the index is created.
foreign_key_table_name and foreign_key_ciolumn_names are required if is_foreign_key is part of the request. However, is_foreign_key can be skipped in order to retain the existing foreign key reference values.

Delete properties of an existing object:

For already uploaded properties of an object, you can delete the properties by uploading the same object with the property value set to null.

NOTE: table_type, index_type, column_names, is_foreign_key, foreign_key_table_name and foreign_key_column_names property values cannot be deleted.

Sample Request Body

{"key": "7.schema_a.table_b", "data_location": null, "db_owner": null}
{"key": "7.schema_a.table_b.column1", "column_type": null, "position": null}
{"key": "7.schema_a.table_b.index1", "index_type": "SECONDARY", "data_structure": null, "filter_condition": null}

The above upload removes the data_location and db_owner properties for the TABLE object, removes column_type and position properties for the COLUMN object, removes data_structure and filter_condition for the INDEX object.

Delete Technical Metadata

Deleting individual objects is NOT supported. However, the above Add Technical Metadata API deletes all unmentioned objects when remove_not_seen=true parameter is passed.

unmentioned objects are the objects which are not a part of the request's body. Please refer to the below sections to learn more about the API usage.

URL

POST /api/v1/bulk\_metadata/extraction/**<ds\_id>**?remove_not_seen=true

Replace <ds_id> with the unique identifier of the data source inside Alation catalog.

URL parameters

NameRequiredDescription
remove_not_seenYesThis parameter specifies if the technical metadata of a data source should be deleted.
NOTE: To delete a data source's technical metadata, the value for this parameter must be set to true.

Sample Request Body

{"key": "7.schema_a.table_a"}
{"key": "7.schema_a.table_b"}
{"key": "7.schema_a.table_a.column1"}

NOTE: The request body can have multiple JSON objects split by a new line. Each JSON object on a line specifies an object. To avoid JSON parsing errors, please put the entire information for an object in a single line.

NameRequiredValue
keyYesPeriod delimited, Fully Qualified Name of a schema, table or column.

Headers

HTTP HeaderValue
TOKEN<your_token>

Replace <your_token> with API Token which can be obtained from getToken API call (Get API Token).

Success Response

Content-Type: application/json

Status: 200 OK

Body

{
    "job_name":"MetadataExtraction#1_Virtual_2017-01-12-23-43-00-876997-00-00-140650af-37d5-4cbe-a644-bb7ee781ddb"
}

NOTE: The response returned is a job_name that represents a delete metadata job triggered after a successful call to the API. This job is responsible for deleting the technical metadata of a data source. To know the status of a job, please refer Job Status API.

Error Response

Invalid Token

Status: 401 UNAUTHORIZED

Body

{
   "detail": "Authentication failed"
}

Missing Token Header

Status: 401 UNAUTHORIZED

Body

{
    "detail": "Authentication credentials were not provided."
}

NOTE: The Error responses are common to all of the requests above.

Code Samples

cURL

#!/bin/bash

# This is an example token. Please replace this with your token.
API_TOKEN="2abcd-4c04-4c21-8692-eda27a877f90"

BASE_URL="https://alation.yourcompany.com/api/v1/bulk_metadata/extraction"

# Add metadata to a data source with unique identifier as 1
# The example below adds a schema 'emp_schema' and two tables 'emp_table' and 'salary_table' to the metadata
curl -X POST "${BASE_URL}/1" -H 'content-type: application/json' -H "TOKEN: ${API_TOKEN}" --data-binary @body.txt

# Contents of body.txt
# {"key":"1.emp_schema.emp_table", "table_type":"TABLE"}
# {"key":"1.emp_schema.salary_table"}

# Add technical metadata along with title and description
# The example below adds a schema 'dept_schema' and a table 'dept_table' to the metadata
# The title and description of dept_table is set to the values passed
curl -X POST "${BASE_URL}/1?set_title_descs=true" -H 'content-type: application/json' -H "TOKEN: ${API_TOKEN}" -d '{"key":"1.dept_schema.dept_table", "table_type":"TABLE", "title":"Department Table", "description":"This is department table"}'

# Delete technical metadata of a data source
# The example below removes all objects from the data source with id as 1 except 'emp_table' and 'dept_table'
curl -X POST "${BASE_URL}/1?remove_not_seen=true" -H 'content-type: application/json' -H "TOKEN: ${API_TOKEN}" --data-binary @delete.txt

# Contents of body.txt
# {"key":"1.dept_schema.dept_table"}
# {"key":"1.emp_schema.emp_table"}

Python

import requests
import json

# This is an example token. Please replace this with your token.
headers = {'Token': '2abcd-4c04-4c21-8692-eda27a877f90'}
data = '{"key":"3.emp_schema.emp_table", "table_type":"VIEW"}\n{"key":"3.emp_schema.emp_table.emp_name", "column_type":"varchar"}'

# Add a schema (emp_schema) that has a table (emp_table) with a column (emp_name)
response = requests.post('https://alation.yourcompany.com/api/v1/bulk_metadata/extraction/3', data=data, headers=headers)
job_name = json.loads(response.text)
print "Job name: %s" % (job_name['job_name'])

data = '{"key":"3.dept_schema.dept_table"}\n{"key":"3.dept_schema.dept_table.dept_name"}'

# Delete all objects from data source's metadata but keep dept_schema, dept_table and dept_name
response = requests.post('https://alation.yourcompany.com/api/v1/bulk_metadata/extraction/3?remove_not_seen=true', data=data, headers=headers)
job_name = json.loads(response.text)
print "Job name: %s" % (job_name['job_name'])

data = '{"key":"3."}'

# Delete all objects from data source's metadata
response = requests.post('https://alation.yourcompany.com/api/v1/bulk_metadata/extraction/3?remove_not_seen=true', data=data, headers=headers)
job_name = json.loads(response.text)
print "Job name: %s" % (job_name['job_name'])