Virtual Data Source

Upload Virtual Data Source

Description

This API allows users to update a data source's technical metadata. The technical metadata includes names and types for schemas, tables, columns and indices.

Users can perform the following operations:

  1. Add Technical Metadata
  2. Delete Technical Metadata
    To upload logical metadata (titles, descriptions, stewards, and custom fields), see Upload Logical Metadata.

Note:

  • Only admin users can update a data source's metadata using this API.
  • To ensure optimal performance for VDS creation, we recommend increasing the delay between the concurrent API requests instead of scaling the worker count.

Add Technical Metadata

This API allows you to add a list of schemas, tables, columns to the data source and indices to the table objects.

URL

POST /api/v1/bulk_metadata/extraction/<ds_id>?<params>

Replace <params> with your list of parameters. Multiple parameters can be combined as PARAM1&PARAM2.

Replace <ds_id> with the unique identifier of the data source inside Alation catalog.

Note: Chunked encoding is not supported.

URL Parameters

NameDescription
set_title_descsSpecifies if the title and description of the object passed should be updated. Passing set_title_descs=true will add a title and description to the specified object if the object is being newly added to a database's metadata. If the object already exists, they will be updated.
The value for this parameter should be one of the following:
true
false
NOTE: This parameter is available from version 4.16.0. Previous versions should use get_title_descs instead for the same functionality.

Sample Request Body

{"key": "7.schema_a.table_a", "table_type": "TABLE"}
{"key": "7.schema_a.table_b", "table_type": "VIEW"}
{"key": "7.schema_a.table_a.column1", "column_type": "int"}
{"key": "7.schema_a.table_b.column1", "column_type": "char"}
{"key": "7.schema_a.table_b.index1", "index_type": "PRIMARY", "column_names": ["column1"]}

NOTE: The request body can have multiple JSON objects split by a new line. Each JSON object on a line specifies an object. To avoid JSON parsing errors, please put the entire information for an object in a single line.

Fields applicable for all Object Types

NameRequiredValue
keyYesPeriod delimited, Fully Qualified Name of a schema, table, column or an index.

Fields applicable for Table/View

NameRequiredValue
table_typeNoThe type of the table. The value can be one of the following:
TABLE
VIEW
NOTE: Specifying {"table_type": "VIEW"} creates a view. If no table_type is provided, the object is assumed to be a table.
data_locationNoA URI or file path to the location of the underlying data, such as an HDFS URL for a Hive table.
NOTE: Use this parameter when the object specified is a TABLE.
db_ownerNoName of the database account that owns this table.
definition_sqlNoCREATE TABLE statement which was used to create the table.
NOTE: Use this parameter only when the object specified is a TABLE.
view_sqlNoCREATE VIEW statement which was used to create the view.
NOTE: Use this parameter only when the object specified is a VIEW.
view_sql_expandedNoCREATE VIEW statement with fully qualified object references.
NOTE: Use this parameter only when the object specified is a VIEW.
constraint_textNoConstraint statements which are enforced by the DB.
NOTE: Use this parameter when the object specified is a TABLE.
ts_createdNoTimestamp at which the table or view was created.
Example: 2018-03-13T22:09:33Z
Please ensure the timestamp is in UTC and follows the format: YYYY-MM-DDThh:mm:ssZ
ts_last_alteredNoTimestamp of the last ALTER statement executed against this table.
Example: 2018-03-13T22:09:33Z
Please ensure the timestamp is in UTC and follows the format: YYYY-MM-DDThh:mm:ssZ
partitioning_attributesNoAn array of columns which are used to partition the table.
Example: ["column1", "column2"]
bucket_attributesNoAn array of columns which are used to bucket the table (in data sources like Hive, bucketing is an alternative mechanism to partitioning for grouping similar data together: LanguageManualDDL-BucketedTables).
Example: ["column1", "column2"]
sort_attributesNoAn array of columns used to sort the table (in Hive, used with bucketing to store data for faster computation: LanguageManualDDL-BucketedSortedTables).
Example: ["column1", "column2"]
synonymsNoAn array of other names that can be used to refer to this table. Each synonym is represented as a JSON comprising schema_name and table_name.
Example: [{"schema_name": "schema_a","table_name": "table_a"}, {"schema_name": "schema_b","table_name": "table_b"}]
skews_infoNoA JSON of the skew column names to an array of their respective skewed column values that appear very often.
Example: {"column1": ["column1_value1", "column1_value2"], "column2": ["column2_value1", "column2_value2"]}
table_commentNoA comment field that stores a description of the table which is ingested from the source system.
Example: "This Table is created by ELT"

Fields applicable for Column

NameRequiredValue
column_typeNoThe type of the column. The value can be any of the column types supported by the underlying database.
NOTE: When specifying a column, please make sure the table it corresponds to is already a part of the database's metadata.
positionNoPosition of the column in the table which contains it.
NOTE:
1) This value needs to be a positive integer.
2) When specifying a column, please make sure the table it corresponds to is already a part of the database's metadata.
column_commentNoA comment field that stores a description of the column which is ingested from the source system.
Example: "Column has a default value"
nullableNoField to indicate if the column can be nullable.
Set this to true if the column is a nullable field, false otherwise.
Default value: true.
Example: true

Fields applicable for Index

NameRequiredValue
index_typeYesThe type of the index. The value can be one of: 'PRIMARY', 'SECONDARY', 'PARTITIONED_PRIMARY', 'UNIQUE', 'OTHER'.
Example: "PRIMARY"
The presence of this field distinguishes an index object from a column.
NOTE:
1) When specifying an index, please make sure the table it corresponds to is already a part of the database's metadata.
2) Even in case of index upsert, this field is required.
column_namesYesAn array of column names on which the index is defined.
Example: ["column1"]
If the index is composite, this array will have multiple column names.
NOTE:
1) This cannot be an empty array.
2) When specifying an index, please make sure the columns it corresponds to are already a part of the database's metadata.
3) In case of index upsert, this field can be optional.
4) The order in which column names are specified is important, as it implies the sequencing in composite indices.
data_structureNoThe underlying data structure used by the index. The value can be one of: 'BTREE', 'HASH', 'BITMAP', 'DENSE', 'SPARSE', 'REVERSE', 'OTHER', 'NONE'.
Example: "BTREE"
Default: "NONE"
index_type_detailNoA string containing custom detailed information about the index.
Example: "MULTI_COLUMN_STATISTICS"
is_ascendingNoSet this boolean to True if the index is created in ascending order, else set False.
NOTE: This is not valid for composite index.
filter_conditionNoFilter condition used while creating index for a portion of rows in the table.
Example: "([filteredIndexCol]>(0))"
NOTE: This is not valid for composite index.
is_foreign_keyNoSet this boolean to True if the index is a foreign key.
NOTE: When this is true, fields: 'foreign_key_table_name' and 'foreign_key_column_names' are required.
foreign_key_table_nameNoThe key of the parent table object which the foreign index refers to.
Example: "7.schema_a.table_a"
NOTE: This is required only if 'is_foreign_key' is set to True. Please make sure the table it corresponds to is already a part of the database's metadata.
foreign_key_column_namesNoAn array of column names on the parent table object which the foreign index refers to.
Example: ["column1"]
NOTE:
1) This is required only if 'is_foreign_key' is set to True.
2) Please make sure the columns it corresponds to are already a part of the database's metadata.
3) The number of columns here should match the number of columns in the 'column_names' field.

NOTE: You can have a newline within text values by enclosing the newline character \n wherever needed.
Example: "view_sql": "CREATE VIEW view_name AS \n SELECT column1, column2 FROM table_name WHERE condition;"

Headers

HTTP HeaderValue
TOKEN<your_token>

Replace <your_token> with your API Token, which can be obtained from the getToken API call (Get API Token).

Success Response

Content-Type: application/json

Status: 200 OK

{
    "job_name": "MetadataExtraction#1_Virtual_2017-01-12-23-43-00-876997-00-00-140650af-37d5-4cbe-a644-bb7ee781ddb2"
}

NOTE: The response returned is a job_name that represents an update metadata job triggered after a successful call to the API. This job is responsible for updating the technical metadata. To know the status of a job, please refer Job Status API.

Add/Update/Delete Properties per Technical Metadata Object

Users can upload only some of the properties for a key and can later update the same key with new properties, update existing property values, or delete existing property values. For example, you might do one upload just to add view_sql to all views and another upload to add ts_created — the second upload will retain the previously added view_sql if you omit that column.

Add properties:

Let us upload a TABLE, COLUMN and INDEX object with some of their properties.

Sample Request Body

{"key": "7.schema_a.table_a", "table_type": "TABLE", "db_owner": "alation", "definition_sql": "create table schema_a.table_a(column1 int);", "constraint_text": "column_a UNIQUE", "ts_created": "2018-03-13T22:09:33Z", "ts_last_altered": "2018-03-13T22:09:33Z", "partitioning_attributes": ["column1", "column2"], "bucket_attributes": ["column1", "column2"], "sort_attributes": ["column1", "column2"], "synonyms": [{"schema_name": "schema_a","table_name": "table_a"}, {"schema_name": "schema_b","table_name": "table_b"}], "skews_info": {"column1": ["column1_value1", "column1_value2"], "column2": ["column2_value1", "column2_value2"]}}
{"key": "7.schema_a.table_a.column1", "column_type": "int"}
{"key": "7.schema_a.table_b", "table_type": "TABLE", "db_owner": "alation", "definition_sql": "create table schema_a.table_a(column1 int);", "constraint_text": "column_a UNIQUE", "ts_created": "2018-03-13T22:09:33Z", "ts_last_altered": "2018-03-13T22:09:33Z", "partitioning_attributes": ["column1", "column2"], "bucket_attributes": ["column1", "column2"], "sort_attributes": ["column1", "column2"], "synonyms": [{"schema_name": "schema_a","table_name": "table_a"}, {"schema_name": "schema_b","table_name": "table_b"}], "skews_info": {"column1": ["column1_value1", "column1_value2"], "column2": ["column2_value1", "column2_value2"]}}
{"key": "7.schema_a.table_b.column1", "column_type": "int"}
{"key": "7.schema_a.table_b.index1", "index_type": "SECONDARY", "column_names": ["column1"], "data_structure": "BTREE", "index_type_detail": "MULTI_COLUMN_STATISTICS", "is_ascending": true, "is_foreign_key": true, "foreign_key_table_name": "7.schema_a.table_a", "foreign_key_column_names": ["column1"]}

Update properties of an existing object:

While creating the above TABLE object we have not mentioned the data_location property. We can now add that property and update the existing db_owner value. For the COLUMN object, we can add position and update column_type. Similarly for the index, we can add filter_condition and update data_structure:

Sample Request Body

{"key": "7.schema_a.table_b", "table_type": "TABLE", "data_location": "hdfs:///user/hive/warehouse/table_a", "db_owner": "new_owner"}
{"key": "7.schema_a.table_b.column1", "column_type": "map<int,string>", "position": 2}
{"key": "7.schema_a.table_b.index1", "index_type": "SECONDARY", "data_structure": "HASH", "filter_condition": "([filteredIndexCol]>(0))"}

NOTE: All the properties mentioned while adding the object for the first time are retained. Index type and is_foreign_key cannot be changed once the index is created. foreign_key_table_name and foreign_key_column_names are required if is_foreign_key is part of the request. However, is_foreign_key can be skipped in order to retain the existing foreign key reference values.

Delete properties of an existing object:

For already uploaded properties of an object, you can delete them by uploading the same object with the property value set to null.

NOTE: table_type, index_type, column_names, is_foreign_key, foreign_key_table_name, and foreign_key_column_names property values cannot be deleted.

Sample Request Body

{"key": "7.schema_a.table_b", "data_location": null, "db_owner": null}
{"key": "7.schema_a.table_b.column1", "column_type": null, "position": null}
{"key": "7.schema_a.table_b.index1", "index_type": "SECONDARY", "data_structure": null, "filter_condition": null}

The above upload removes the data_location and db_owner properties for the TABLE object, removes column_type and position properties for the COLUMN object, and removes data_structure and filter_condition for the INDEX object.

Delete Technical Metadata

Individual schemas, tables, or columns can be deleted from a Virtual Data Source, or all unmentioned objects can be removed in bulk using the remove_not_seen=true parameter.

Delete Individual Objects

Individual schemas, tables, or columns can be deleted using the Asset API:

DELETE /integration/v2/asset/<object_type>/<object_id>/

⚠️ Rate Limiting: This API processes one object per request. Space out your requests (recommended: no more than 5 requests per second). For bulk removal of many objects, use the remove_not_seen=true approach described below.

Supported Object Types

Object TypeURL PatternCascade Behavior
SchemaDELETE /integration/v2/asset/schema/<id>/Deletes the schema and all its child tables and columns
TableDELETE /integration/v2/asset/table/<id>/Deletes the table and all its child columns
ColumnDELETE /integration/v2/asset/attribute/<id>/Deletes only the specified column

URL

DELETE /integration/v2/asset/<object_type>/<object_id>/

Replace <object_type> with schema, table, or attribute.

Replace <object_id> with the unique identifier of the object. You can obtain object IDs using the Integration APIs:

  • Schemas: GET /integration/v2/schema/?ds_id=<ds_id>
  • Tables: GET /integration/v2/table/?ds_id=<ds_id>&schema_name=<schema_name>
  • Columns: GET /integration/v2/column/?ds_id=<ds_id>&table_name=<table_name>&schema_name=<schema_name>

Headers

HTTP HeaderValue
TOKEN<your_token>

Success Response

Content-Type: application/json

Status: 200 OK

{
    "job_id": "628",
    "delete_method": "soft"
}

NOTE: The deletion is a soft delete — objects are removed from the catalog UI, search, and API listings but not permanently erased from the database. To restore a soft-deleted object, re-ingest it using the same key via the Add Technical Metadata API.

NOTE: Deletion is recursive. Deleting a schema will automatically delete all tables and columns under it. Deleting a table will delete all its columns.

Code Sample

#!/bin/bash
 
API_TOKEN="2abcd-4c04-4c21-8692-eda27a877f90"
BASE_URL="https://alation.yourcompany.com"
 
# Find the ID of the table you want to delete
curl -H "TOKEN: ${API_TOKEN}" "${BASE_URL}/integration/v2/table/?ds_id=1&schema_name=emp_schema"
 
# Delete a specific table (and all its columns)
curl -X DELETE -H "TOKEN: ${API_TOKEN}" "${BASE_URL}/integration/v2/asset/table/12345/"
 
# Check the job status
curl -H "TOKEN: ${API_TOKEN}" "${BASE_URL}/api/v1/bulk_metadata/job/?id=628"

Delete by Omission (Bulk)

The Add Technical Metadata API deletes all unmentioned objects when the remove_not_seen=true parameter is passed.

Unmentioned objects are the objects which are not a part of the request body.

💡 When to use: Use this approach when you need to remove many objects at once, or when you want to replace the entire metadata of a data source with a new set.

URL

POST /api/v1/bulk_metadata/extraction/<ds_id>?remove_not_seen=true

Replace <ds_id> with the unique identifier of the data source inside Alation catalog.

URL Parameters

NameRequiredDescription
remove_not_seenYesSpecifies if the technical metadata should be deleted. Value must be set to true.

Sample Request Body

{"key": "7.schema_a.table_a"}
{"key": "7.schema_a.table_b"}
{"key": "7.schema_a.table_a.column1"}

NOTE: The request body can have multiple JSON objects split by a new line. Each JSON object on a line specifies an object.

NameRequiredValue
keyYesPeriod delimited, Fully Qualified Name of a schema, table or column.

Headers

HTTP HeaderValue
TOKEN<your_token>

Replace <your_token> with your API Token, which can be obtained from the getToken API call (Get API Token).

Success Response

Content-Type: application/json

Status: 200 OK

{
    "job_name": "MetadataExtraction#1_Virtual_2017-01-12-23-43-00-876997-00-00-140650af-37d5-4cbe-a644-bb7ee781ddb"
}

NOTE: The response returned is a job_name that represents a delete metadata job triggered after a successful call to the API. To know the status of a job, please refer Job Status API.

Error Response

Invalid Token

Status: 401 UNAUTHORIZED

{
    "detail": "Authentication failed"
}

Missing Token Header

Status: 401 UNAUTHORIZED

{
    "detail": "Authentication credentials were not provided."
}

NOTE: The error responses are common to all of the requests above.

Comparison: Deletion Methods

MethodScopeTypeBest For
DELETE /integration/v2/asset/<type>/<id>/Single object (with cascade)SoftRemoving a specific schema, table, or column (up to ~50 objects)
POST ...?remove_not_seen=trueBulk (by omission)SoftReplacing entire metadata or removing many objects at once
DELETE /integration/v2/datasource/<ds_id>/Entire data sourceHardPermanently removing the VDS and all its objects

Code Samples

#!/bin/bash
 
# This is an example token. Please replace this with your token.
API_TOKEN="2abcd-4c04-4c21-8692-eda27a877f90"
BASE_URL="https://alation.yourcompany.com/api/v1/bulk_metadata/extraction"
 
# Add metadata to a data source with unique identifier 1
# The example below adds a schema 'emp_schema' and two tables 'emp_table' and 'salary_table'
curl -X POST "${BASE_URL}/1" -H 'content-type: application/json' -H "TOKEN: ${API_TOKEN}" --data-binary @body.txt
 
# Contents of body.txt
# {"key":"1.emp_schema.emp_table", "table_type":"TABLE"}
# {"key":"1.emp_schema.salary_table"}
 
# Add technical metadata along with title and description
# The example below adds a schema 'dept_schema' and a table 'dept_table'
# The title and description of dept_table is set to the values passed
curl -X POST "${BASE_URL}/1?set_title_descs=true" -H 'content-type: application/json' -H "TOKEN: ${API_TOKEN}" -d '{"key":"1.dept_schema.dept_table", "table_type":"TABLE", "title":"Department Table", "description":"This is department table"}'
 
# Delete technical metadata - removes all objects except those listed
# The example below removes all objects from data source 1 except 'emp_table' and 'dept_table'
curl -X POST "${BASE_URL}/1?remove_not_seen=true" -H 'content-type: application/json' -H "TOKEN: ${API_TOKEN}" --data-binary @delete.txt
 
# Contents of delete.txt
# {"key":"1.dept_schema.dept_table"}
# {"key":"1.emp_schema.emp_table"}
import requests
import json
 
# This is an example token. Please replace this with your token.
headers = {'Token': '2abcd-4c04-4c21-8692-eda27a877f90'}
data = '{"key":"3.emp_schema.emp_table", "table_type":"VIEW"}\n{"key":"3.emp_schema.emp_table.emp_name", "column_type":"varchar"}'
 
# Add a schema (emp_schema) that has a table (emp_table) with a column (emp_name)
response = requests.post('https://alation.yourcompany.com/api/v1/bulk_metadata/extraction/3', data=data, headers=headers)
job_name = json.loads(response.text)
print("Job name: %s" % (job_name['job_name']))
 
data = '{"key":"3.dept_schema.dept_table"}\n{"key":"3.dept_schema.dept_table.dept_name"}'
 
# Delete all objects from data source's metadata but keep dept_schema, dept_table and dept_name
response = requests.post('https://alation.yourcompany.com/api/v1/bulk_metadata/extraction/3?remove_not_seen=true', data=data, headers=headers)
job_name = json.loads(response.text)
print("Job name: %s" % (job_name['job_name']))
 
data = '{"key":"3."}'
 
# Delete all objects from data source's metadata
response = requests.post('https://alation.yourcompany.com/api/v1/bulk_metadata/extraction/3?remove_not_seen=true', data=data, headers=headers)
job_name = json.loads(response.text)
print("Job name: %s" % (job_name['job_name']))