Upload Virtual Data Source
Description
This API allows users to update a data source's technical metadata. The technical metadata includes names and types for schemas, tables, columns and indices.
Users can perform the following operations:
- Add Technical Metadata
- Delete Technical Metadata
To upload logical metadata (titles, descriptions, stewards, and custom fields), see Upload Logical Metadata.
Note: Only admin users can update a data source's metadata using this API.
Add Technical Metadata
This API allows you to add a list of schemas, tables, columns to the data source and indices to the table objects.
URL
POST
/api/v1/bulk\_metadata/extraction/**<ds\_id>**?**<params>**
Replace with your list of parameters. Multiple parameters can be combined as PARAM1&PARAM2
Replace <ds_id> with the unique identifier of the data source inside Alation catalog.
Note: Chunked encoding is not supported.
URL Parameters
Name | Description |
---|---|
set_title_descs | This parameter specifies if the title and description of the object passed should be updated. Passing set_title_descs=true, will add a title and description to the specified object, if the object is being newly added to a database's metadata. If the object already exists, they will be updated. The value for this parameter should be one of the following: true false NOTE: This parameter is available to use from version 4.16.0. Previous versions should use get_title_descs parameter instead of set_title_descs for same functionality. |
Sample Request Body
{"key": "7.schema_a.table_a", "table_type": "TABLE"}
{"key": "7.schema_a.table_b", "table_type": "VIEW"}
{"key": "7.schema_a.table_a.column1", "column_type": "int"}
{"key": "7.schema_a.table_b.column1", "column_type": "char"}
{"key": "7.schema_a.table_b.index1", "index_type": "PRIMARY", "column_names": ["column1"]}
NOTE: The request body can have multiple JSON objects split by a new line. Each JSON object on a line specifies an object. To avoid JSON parsing errors, please put the entire information for an object in a single line.
Fields applicable for all Object Types
Name | Required | Value |
---|---|---|
key | Yes | Period delimited, Fully Qualified Name of a schema, table, column or an index. |
Fields applicable for Table/View
Name | Required | Value |
---|---|---|
table_type | No | The type of the table. The value for this parameter can be one of the following: TABLE VIEW NOTE: Specifying {"table_type": "VIEW"} creates a view. If no table_type is provided, it is assumed that the object passed is a table. |
data_location | No | A URI or file path to the location of the underlying data, such as an HDFS URL for a Hive table. NOTE: Use this parameter when the object specified is a TABLE. |
db_owner | No | Name of the database account that owns this table. |
definition_sql | No | CREATE TABLE statement which was used to create the table. NOTE: Use this parameter only when the object specified is a TABLE. |
view_sql | No | CREATE VIEW statement which was used to create the view. NOTE: Use this parameter only when the object specified is a VIEW. |
view_sql_expanded | No | CREATE VIEW statement with fully qualified object references. NOTE: Use this parameter only when the object specified is a VIEW. |
constraint_text | No | Constraint statements which are enforced by the DB. NOTE: Use this parameter when the object specified is a TABLE. |
ts_created | No | Timestamp at which the table or view was created. Example: 2018-03-13T22:09:33Z Please ensure the timestamp is in UTC and follows the format: YYYY-MM-DDThh:mm:ssZ |
ts_last_altered | No | Timestamp of the last ALTER statement executed against this table. Example: 2018-03-13T22:09:33Z Please ensure the timestamp is in UTC and follows the format: YYYY-MM-DDThh:mm:ssZ |
partitioning_attributes | No | An array of columns which are used to partition the table. Example: ["column1", "column2"] |
bucket_attributes | No | An array of columns which are used to bucket the table (in data sources like Hive, bucketing is an alternative mechanism to partitioning for grouping similar data together: LanguageManualDDL-BucketedTables). Example: ["column1", "column2"] |
sort_attributes | No | An array of columns used to sort the table (in Hive, used with bucketing to store data for faster computation: LanguageManualDDL-BucketedSortedTables). Example: ["column1", "column2"] |
synonyms | No | An array of other names that can be used to refer to this table. Each synonym is represented as a JSON comprising schema_name and table_name. Example: [{"schema_name": "schema_a","table_name": "table_a"}, {"schema_name": "schema_b","table_name": "table_b"}] |
skews_info | No | A JSON of the skew column names to an array of their respective skewed column values that appear very often. Example:{"column1": ["column1_value1", "column1_value2"], "column2": ["column2_value1", "column2_value2"]} |
table_comment | No | A comment field that stores a description of the table which is ingested from the source system. Example: "This Table is created by ELT" |
Fields applicable for Column
Name | Required | Value |
---|---|---|
column_type | No | The type of the column. The value for this parameter can be any of the column types supported by the underlying database. NOTE: When specifying a column, please make sure the table it corresponds to is already a part of the database's metadata. |
position | No | Position of the column in the table which contains it. NOTE: 1) This value needs to be a positive integer. 2) When specifying a column, please make sure the table it corresponds to is already a part of the database's metadata. |
column_comment | No | A comment field that stores a description of the column which is ingested from the source system. Example: "Column has a default value" |
nullable | No | Field to indicate if the column can be nullable. Set this to true if the column is a nullable field, false otherwise. Default value: true. Example: true |
Fields applicable for Index
Name | Required | Value |
---|---|---|
index_type | Yes | The type of the index. The value for this field can be one among: ‘PRIMARY’, ‘SECONDARY’, ‘PARTITIONED_PRIMARY’, ‘UNIQUE’, ‘OTHER’. Example: "PRIMARY" The presence of this field distinguishes index object from a column. NOTE: 1) When specifying an index, please make sure the table it corresponds to is already a part of the database's metadata. 2) Even in case of index upsert(details below), this field is required. |
column_names | Yes | An array of column names on which the index is defined. Example: ["column1"] If the index is composite, this array will have multiple column names. NOTE: 1) This cannot be an empty array. 2) When specifying an index, please make sure the columns it corresponds to is already a part of the database's metadata. 3) In case of index upsert(details below), this field can be optional. 4) The order in which the column names are specified is important. Since this implies the sequencing of the column in case of composite indices. |
data_structure | No | The underlying data structure used by the index. The value for this field can be one among: ‘BTREE’, ‘HASH’, ’BITMAP’, ‘DENSE’, ‘SPARSE’, ‘REVERSE’, ‘OTHER’, ‘NONE’. Example: "BTREE" Default: "NONE" |
index_type_detail | No | A string containing custom detailed information about the index. Example: "MULTI_COLUMN_STATISTICS" |
is_ascending | No | Set this boolean to True, if the index is created in ascending order, else set False. NOTE: This is not valid for composite index. |
filter_condition | No | Filter condition used while creating index for a portion of rows in the table. Example: "([filteredIndexCol]>(0))" NOTE: This is not valid for composite index. |
is_foreign_key | No | Set this boolean to True, if the index is a foreign key. NOTE: When this is true, fields: ‘foreign_key_table_name‘ and ‘foreign_key_column_names‘ are required. |
foreign_key_table_name | No | The key of the parent table object which the foreign index refers to. Example: "7.schema_a.table_a" NOTE: This is required only if ‘is_foreign_key‘ is set to True. Please make sure the table it corresponds to is already a part of the database's metadata. |
foreign_key_column_names | No | An array of column names on the parent table object which the foreign index refers to. Example: ["column1"] NOTE: 1) This is required only if ‘is_foreign_key‘ is set to True. 2) Please make sure the columns it corresponds to is already a part of the database's metadata. 3) The number of columns here should match the number of columns in ‘column_names’ field. |
NOTE: You can have newline within the text values of the above field by enclosing the newline character \n
wherever needed.
Example: "view_sql": "CREATE VIEW view_name AS \n SELECT column1, column2 FROM table_name WHERE condition;"
Headers
HTTP Header | Value |
---|---|
TOKEN | <your_token> |
Replace <your_token> with API Token which can be obtained from getToken API call (Get API Token).
Success Response
Content-Type: application/json
Status: 200 OK
Body
{
"job_name":"MetadataExtraction#1_Virtual_2017-01-12-23-43-00-876997-00-00-140650af-37d5-4cbe-a644-bb7ee781ddb2"
}
NOTE: The response returned is a job_name
that represents an update metadata job triggered after a successful call to the API. This job is responsible for updating the technical metadata. To know the status of a job, please refer Job Status API.
Add/Update/Delete Properties per Technical Metadata Object
Users will be able to upload only some of the properties for a key and can later update the same key with new properties/update existing property values/delete existing property values.
So you might do one upload just to add view_sql to all views and another upload to add ts_created, this upload will retain the previously added view_sql if you omit that column.
Add properties:
Let us upload a TABLE, COLUMN and INDEX object with some of their properties.
Sample Request Body
{"key": "7.schema_a.table_a", "table_type": "TABLE", "db_owner": "alation", "definition_sql": "create table schema_a.table_a(column1 int);", "constraint_text": "column_a UNIQUE", "ts_created": "2018-03-13T22:09:33Z", "ts_last_altered": "2018-03-13T22:09:33Z", "partitioning_attributes": ["column1", "column2"], "bucket_attributes": ["column1", "column2"], "sort_attributes": ["column1", "column2"], "synonyms": [{"schema_name": "schema_a","table_name": "table_a"}, {"schema_name": "schema_b","table_name": "table_b"}], "skews_info": {"column1": ["column1_value1", "column1_value2"], "column2": ["column2_value1", "column2_value2"]}}
{"key": "7.schema_a.table_a.column1", "column_type": "int"}
{"key": "7.schema_a.table_b", "table_type": "TABLE", "db_owner": "alation", "definition_sql": "create table schema_a.table_a(column1 int);", "constraint_text": "column_a UNIQUE", "ts_created": "2018-03-13T22:09:33Z", "ts_last_altered": "2018-03-13T22:09:33Z", "partitioning_attributes": ["column1", "column2"], "bucket_attributes": ["column1", "column2"], "sort_attributes": ["column1", "column2"], "synonyms": [{"schema_name": "schema_a","table_name": "table_a"}, {"schema_name": "schema_b","table_name": "table_b"}], "skews_info": {"column1": ["column1_value1", "column1_value2"], "column2": ["column2_value1", "column2_value2"]}}
{"key": "7.schema_a.table_b.column1", "column_type": "int"}
{"key": "7.schema_a.table_b.index1", "index_type": "SECONDARY", "column_names": ["column1"], "data_structure": "BTREE", "index_type_detail": "MULTI_COLUMN_STATISTICS", "is_ascending": true, "is_foreign_key": true, "foreign_key_table_name": "7.schema_a.table_a", "foreign_key_column_names": ["column1"]}
Update properties of an existing object:
While creating the above TABLE object we have not mentioned data_location property, we can now add that property and update existing property db_owner's value. For the COLUMN object, we can add position and update column_type.
Similarly for index, we can add filter_condition and update existing property data_structure as follows:
Sample Request Body
{"key": "7.schema_a.table_b", "table_type": "TABLE", "data_location": "hdfs:///user/hive/warehouse/table_a", "db_owner": "new_owner"}
{"key": "7.schema_a.table_b.column1", "column_type": "map<int,string>", "position": 2}
{"key": "7.schema_a.table_b.index1", "index_type": "SECONDARY", "data_structure": "HASH", "filter_condition": "([filteredIndexCol]>(0))"}
NOTE:
All the properties mentioned while adding the object for the first time are retained.
Index type and is_foreign_key cannot be changed once the index is created.
foreign_key_table_name and foreign_key_ciolumn_names are required if is_foreign_key is part of the request. However, is_foreign_key can be skipped in order to retain the existing foreign key reference values.
Delete properties of an existing object:
For already uploaded properties of an object, you can delete the properties by uploading the same object with the property value set to null.
NOTE: table_type, index_type, column_names, is_foreign_key, foreign_key_table_name and foreign_key_column_names property values cannot be deleted.
Sample Request Body
{"key": "7.schema_a.table_b", "data_location": null, "db_owner": null}
{"key": "7.schema_a.table_b.column1", "column_type": null, "position": null}
{"key": "7.schema_a.table_b.index1", "index_type": "SECONDARY", "data_structure": null, "filter_condition": null}
The above upload removes the data_location and db_owner properties for the TABLE object, removes column_type and position properties for the COLUMN object, removes data_structure and filter_condition for the INDEX object.
Delete Technical Metadata
Deleting individual objects is NOT supported. However, the above Add Technical Metadata API deletes all unmentioned objects when remove_not_seen=true
parameter is passed.
unmentioned objects are the objects which are not a part of the request's body. Please refer to the below sections to learn more about the API usage.
URL
POST
/api/v1/bulk\_metadata/extraction/**<ds\_id>**?remove_not_seen=true
Replace <ds_id> with the unique identifier of the data source inside Alation catalog.
URL parameters
Name | Required | Description |
---|---|---|
remove_not_seen | Yes | This parameter specifies if the technical metadata of a data source should be deleted. NOTE: To delete a data source's technical metadata, the value for this parameter must be set to true. |
Sample Request Body
{"key": "7.schema_a.table_a"}
{"key": "7.schema_a.table_b"}
{"key": "7.schema_a.table_a.column1"}
NOTE: The request body can have multiple JSON objects split by a new line. Each JSON object on a line specifies an object. To avoid JSON parsing errors, please put the entire information for an object in a single line.
Name | Required | Value |
---|---|---|
key | Yes | Period delimited, Fully Qualified Name of a schema, table or column. |
Headers
HTTP Header | Value |
---|---|
TOKEN | <your_token> |
Replace <your_token> with API Token which can be obtained from getToken API call (Get API Token).
Success Response
Content-Type: application/json
Status: 200 OK
Body
{
"job_name":"MetadataExtraction#1_Virtual_2017-01-12-23-43-00-876997-00-00-140650af-37d5-4cbe-a644-bb7ee781ddb"
}
NOTE: The response returned is a job_name
that represents a delete metadata job triggered after a successful call to the API. This job is responsible for deleting the technical metadata of a data source. To know the status of a job, please refer Job Status API.
Error Response
Invalid Token
Status: 401 UNAUTHORIZED
Body
{
"detail": "Authentication failed"
}
Missing Token Header
Status: 401 UNAUTHORIZED
Body
{
"detail": "Authentication credentials were not provided."
}
NOTE: The Error responses are common to all of the requests above.
Code Samples
#!/bin/bash
# This is an example token. Please replace this with your token.
API_TOKEN="2abcd-4c04-4c21-8692-eda27a877f90"
BASE_URL="https://alation.yourcompany.com/api/v1/bulk_metadata/extraction"
# Add metadata to a data source with unique identifier as 1
# The example below adds a schema 'emp_schema' and two tables 'emp_table' and 'salary_table' to the metadata
curl -X POST "${BASE_URL}/1" -H 'content-type: application/json' -H "TOKEN: ${API_TOKEN}" --data-binary @body.txt
# Contents of body.txt
# {"key":"1.emp_schema.emp_table", "table_type":"TABLE"}
# {"key":"1.emp_schema.salary_table"}
# Add technical metadata along with title and description
# The example below adds a schema 'dept_schema' and a table 'dept_table' to the metadata
# The title and description of dept_table is set to the values passed
curl -X POST "${BASE_URL}/1?set_title_descs=true" -H 'content-type: application/json' -H "TOKEN: ${API_TOKEN}" -d '{"key":"1.dept_schema.dept_table", "table_type":"TABLE", "title":"Department Table", "description":"This is department table"}'
# Delete technical metadata of a data source
# The example below removes all objects from the data source with id as 1 except 'emp_table' and 'dept_table'
curl -X POST "${BASE_URL}/1?remove_not_seen=true" -H 'content-type: application/json' -H "TOKEN: ${API_TOKEN}" --data-binary @delete.txt
# Contents of body.txt
# {"key":"1.dept_schema.dept_table"}
# {"key":"1.emp_schema.emp_table"}
import requests
import json
# This is an example token. Please replace this with your token.
headers = {'Token': '2abcd-4c04-4c21-8692-eda27a877f90'}
data = '{"key":"3.emp_schema.emp_table", "table_type":"VIEW"}\n{"key":"3.emp_schema.emp_table.emp_name", "column_type":"varchar"}'
# Add a schema (emp_schema) that has a table (emp_table) with a column (emp_name)
response = requests.post('https://alation.yourcompany.com/api/v1/bulk_metadata/extraction/3', data=data, headers=headers)
job_name = json.loads(response.text)
print ("Job name: %s" % (job_name['job_name']))
data = '{"key":"3.dept_schema.dept_table"}\n{"key":"3.dept_schema.dept_table.dept_name"}'
# Delete all objects from data source's metadata but keep dept_schema, dept_table and dept_name
response = requests.post('https://alation.yourcompany.com/api/v1/bulk_metadata/extraction/3?remove_not_seen=true', data=data, headers=headers)
job_name = json.loads(response.text)
print ("Job name: %s" % (job_name['job_name']))
data = '{"key":"3."}'
# Delete all objects from data source's metadata
response = requests.post('https://alation.yourcompany.com/api/v1/bulk_metadata/extraction/3?remove_not_seen=true', data=data, headers=headers)
job_name = json.loads(response.text)
print ("Job name: %s" % (job_name['job_name']))