Legacy API
This version of the API will be deprecated in 2024.3.2 in early November 2024. Please use the newer NoSQL API instead.
Description
This API allows users to upload and maintain the metadata details of a NoSQL database that are then displayed in the Alation Catalog.
Using this API, you can:
- Create new NoSQL catalog pages for folders, collections, and schemas.
- Update the details of an existing NoSQL catalog.
- Delete objects within a NoSQL catalog. The given objects will be marked soft-deleted, and can be restored.
- It is not possible to delete the entire metadata using the API. To completely delete everything you have uploaded, please delete the data source via the Alation Catalog. Visit the General Settings tab of its settings page and click on the Delete link in the Delete Data Source section.
Once the NoSQL catalog has been created using the API, metadata details of the catalog can be viewed by visiting its catalog pages.
Required NoSQL Metadata structure
A NoSQL database is modeled as a collection of top level folders, each containing one or more collections. Each collection is a set of documents. The metadata details of a NoSQL database follow the same structure as above at the folder and collection level. However, each collection will contain a list of schemas instead of a list of documents. The schemas in a collection describe the structure of the documents in the collection.
The metadata structure for a given NoSQL database instance depends on the structure of the database itself. For example, a Mongodb instance has top level databases, with each database containing a number of collections and each collection containing one or more documents. Its metadata structure would have top level folders, with each folder containing a number of collections and each collection containing one or more schemas. In this way, we would think of a database in Mongodb as a folder in our metadata structure, that contains only collections with schemas.
This metadata tree structure of folders, collections in folders, and schemas in each collection is represented by a JSON object, with schemas themselves represented in the JSON or AVRO format.
Restrictions on the Structure
Β Β - Folders cannot contain other folders. They can only contain collections.
- Collections cannot contain collections. They can only contain schemas.
Defining Folders, Collections, and Schemas
Your NoSQL metadata JSON object should be defined as follows:
{
"folder": [
{
"name": "$folder_name",
"collections": [
{
"name": "$Collection_name",
"schemata": [...]
}
]
}
]
}
Key | IfΒ required | Type | Value |
---|---|---|---|
Folders | yes | array | An array of Folder objects, with each object containing two keys: "name" and "collections". |
name | yes | string | The name property of a folder or a collection object |
collections | yes | array | An array of Collection objects, with each object containing two keys: "name" and "schemata". The value of "name" denotes the name of this collection. The value of "schemata" is an array of Schema objects. |
schemata | yes | array | An array of schemas describing the uploaded metadata. The properties of this key depend on the format of the schema you are uploading: JSON or Avro. |
Describing Schemata
The properties of the "schemata" key depend on the format of the schema you are uploading. JSON or Avro.
JSON Schemas
JSON Schema is a web standard used to specify the format of a piece of JSON. This API uses JSON Schema Draft 7 to specify the schemas of documents in a collection when they are being added to Alation Catalog.
The βschemataβ property for a JSON schema must be an array of schema objects with each schema object containing two keys: "name" and "definition". The value of "name" denotes the name of this schema as it would appear on its catalog page, and the value of "definition" is a JSON schema object with schema properties.
JSON Schema Key | Purpose |
---|---|
type | to specific the type of a JSON field. |
required (optional) | to note the input fields that are required (fields not in the list are not required). Not used on output of the Resource. |
properties | to declare the contents of an object. |
items | to declare the contents of a list. This can be a single item if the list is all of a single type, or a list of objects if the list is comprised of multiple types. |
examples (optional) | to denote a list of examples for non-object, non-list types (string, number, character types). |
The Definition Object
Key | IfΒ required | Purpose |
---|---|---|
type | yes | to specify the type of a JSON field |
required/optional | yes | to note the input fields that are required (fields not in the list are not required). Not used on output of the Resource. |
properties | yes | to declare the contents of an object |
items | yes, if applicable | to declare the contents of a list. This can be a single item if the list is all of a single type, or a list of objects if the list is comprised of multiple types |
examples | No | to denote a list of examples for non-object, non-list types (string, number, character types) |
JSON Schemas do not have to be written by hand! There are many tools to generate a schema given a sample piece of JSON, such as this one. You can paste the output of the API you are documenting into the generator to create the JSON Schema needed. Take caution with sensitive data when taking this approach, as the contents of your sample piece of json will be used as example values. For local programmatic generation of JSON schemas, there are libraries in various languages. One in particular is GenSON, which is a Python library.
Each field in a JSON Schema will have its own page in the Catalog where sample values will be displayed along with logical metadata such as a title, description and custom fields.
Sample JSON Schema Structure
{
"folders": [{
"name": "$folder_name",
"collections": [{
"name": "$collection_name",
"schemata": [{
"name":"$schema_name",
"definition": {
"title": "$title",
"type": "$type",
"required": ["$attribute1", "$attribute2"],
"properties": {
"$attribute1": {
"type": "$data_type"
},
"$attribute2": {
"type": "$data_type"
}
}
}
}]
}]
}]
}
AVRO schemas
Avro schemas are used for serializing and deserializing data written to topics in the event streaming service Kafka. You will still need to wrap your Avro schema in a Folder and Collection. Alation will support all types found in an Avro schema. You can find an example of how to upload Avro schemas do this in the Upload your Metadata into the NoSQL catalog section.
The "schemata" key for ingesting Avro schemas must contain Avro schemas .
Sample Avro Schema Structure
{
"folders": [{
"name": "$folder_name",
"collections": [{
"name": "$collection_name",
"schemata": [{
"type": "record",
"name": "$value",
"namespace": "$value",
"fields": [
{"name": "$value", "type": "string"},
{"name": "$value", "type": "int"}
]
}]
}]
}]
}
Important When posting Avro schemas to Alation, please make sure that the name of an Avro object comes before the type. The ingestion process will fail otherwise.
Supported Avro Data Types
We use Avro 1.9.0 to specify schemas in collections. Alation supports all Avro data types and their attributes. For details on Avro data type properties, refer to this link.
Uploading Metadata
Create a NoSQL catalog
Before uploading the metadata via the API, make sure you have created a Virtual Data Source of Generic NoSQL type in your Alation catalog.
If you are the user who created the Virtual Data Source to which you are uploading, then you are already its Data Source Admin. Otherwise, make sure you have Data Source Admin privileges to the Data Source you are uploading metadata into.
β
Your Data Source has been created. Navigate to the General Settings tab and copy the URL from the "Bulk Upload API" section. You will use this URL to upload your metadata structure into the catalog.
You will also see a switch to turn on versioning for this data source in the General Settings tab. This is only recommended for data sources that you know will only contain a small amount of data. If you have the switch turned on and load a large number schemas, you may find that Alation slows down. Versioning was built for clients who wish to version their Avro schemas in Alation. This switch cannot be turned off without the help of Alation Customer Support.
Upload your Metadata into the NoSQL catalog
This section describes the API with which you upload your metadata structure into the catalog. The metadata structure describes all the folders, collections and schemas that need to be uploaded.
If you created the Data Source yourself as described in the previous section, you are already its admin. Otherwise, make sure you have admin privileges to the Data Source you are uploading metadata into.
URL for JSON schemas
POST
/integration/v1/data/**<datasource_id>**/parse_docstore/
Replace <datasource_id> with the ID of your Data Source, or use the URL you copied in the previous section.
Sample Request Body JSON
{
"folders": [
{
"name": "testDB",
"collections": [
{
"name": "testTable",
"schemata": [
{
"name": "restaurant",
"definition": {
"title": "Restaurant",
"type": "object",
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": ["string", "number", "integer"]
}
}
}
}
]
}
]
}
]
}
Key | Required | Value | Type |
---|---|---|---|
folders | Yes | This is a top-level key-value pair and the value is an array of Folder objects, with each object containing two keys: "name" and "collections". The value of "name" denotes the name of this folder and the value of "collections" is an array of Collection objects, described below. | array of Folder objects |
collections | Yes | An array of Collection objects, with each object containing two keys: "name" and "schemata". The value of "name" denotes the name of this collection and the value of "schemata" is an array of Schema objects, described below. | array of Collection objects |
schemata | Yes | An array of Schema objects, with each object containing two keys: "name" and "definition". The value of "name" denotes the name of this schema as it would appear on its catalog page and the value of "definition" is a JSON schema object. | array of object Schema objects |
definition | Yes | A JSON Schema object describing the structure of one or more documents in a collection, using keys such as "title", "type", "required" and "properties". | JSON Schema object |
URL for Avro schemas
POST
/integration/v1/data/**<datasource_id>**/parse_docstore/?json_type=avro
Replace <datasource_id> with the ID of your Data Source, or use the URL you copied in the previous section.
Sample Request Body AVRO
{
"folders": [{
"name": "Avro-folder",
"collections": [{
"name": "avro-collection",
"schemata": [{
"name": "restaurant",
"type": "record",
"namespace": "com.restaurant.datasource",
"fields": [
{"name": "owner", "type": "string"},
{"name": "address", "type": "string"}
]
}]
}]
}]
}
Headers
HTTP Header | Value |
---|---|
TOKEN | <your_token> |
Content-Type | application/json |
Replace <your_token> with API Token, which can be obtained from a call to Generate Tokens for the Alation API.
Success Response
Status: 202 ACCEPTED
Content-Type: application/json
Response Body
Object containing the ID of the job that has started ingesting the metadata.
{
"job_id": ...
To get the job status and a detailed response once it finishes, use the Jobs APIs by passing it the job ID. On successful completion of an ingestion job, the Job Status API should return a response like the sample below.
{
"status": "successful",
"msg": "Job finished in 0.114268 seconds at 2018-12-20 00:53:43.324103+00:00",
"result": "{'collections_deleted': 0, 'schemas_deleted': 0, 'folders_added': 1, 'folders_deleted': 0, 'collections_added': 1, 'schemas_added': 1, 'schemas_updated': 0}"
}
Update a NoSQL catalog
The URL request is the same as used for the initial upload, except the request body can contain additional or updated details. Updates (appends or replaces) can be done only at the level of the folder, collection or schema using the name of the folder, collection or schema respectively as the identifying key.
URL
POST
/integration/v1/data/**<datasource_id>**/parse_docstore/
Replace <datasource_id> with the ID of your Data Source, or use the URL you copied in the "Create" section above.
Sample Append Request Body
{
"folders": [
{
"name": "testDB",
"collections": [
{
"name": "testTable",
"schemata": [
{
"name": "airline",
"definition": {
"definitions": {
"metric_length": {
"type": "number"
},
"airplane_type": {
"type": "object",
"properties": {
"num_wheels": {
"type": "integer"
}
}
}
},
"type": "object",
"required": [
"name",
"price"
],
"properties": {
"checked": {
"type": "boolean"
},
"dimensions": {
"type": "object",
"properties": {
"width": {
"type": "integer"
},
"height": {
"type": "object",
"properties": {
"super_height": {
"type": "object",
"properties": {
"jumbo_height": {
"type": "integer"
}
}
}
}
}
}
},
"id": {
"type": "integer"
},
"name": {
"type": [
"string",
"number",
"integer"
]
},
"price": {
"type": "number"
},
"airport_percentage": {
"$ref": "metric_length"
},
"tags": {
"type": "array",
"items": {
"type": "string"
}
},
"airplane_dimensions": {
"type": "object",
"properties": {
"girth": {
"$ref": "metric_length"
},
"legroom": {
"$ref": "metric_length"
}
}
},
"airplane_info": {
"type": "array",
"items": [
{
"type": "string"
},
{
"type": "number"
},
{
"$ref": "airplane_type"
}
]
}
}
}
}
]
}
]
}
]
}
This JSON body uses the same keys as in the initial upload in the previous section. Note that a new schema "airline" has been added. Since the "restaurant" schema from the earlier request has not been mentioned, it would be kept in the catalog as it is. Details on uploading Avro are later in this section.
Headers
HTTP Header | Value |
---|---|
TOKEN | <your_token> |
Content-Type | application/json |
Replace <your_token> with API Token which can be obtained from a call to Generate Tokens for the Alation API.
Success Response
Status: 202 ACCEPTED
Content-Type: application/json
Body
Object containing the ID of the job that has started ingesting the metadata.
{
"job_id": ...
}
To get the job status and a detailed response once it finishes, use the Jobs APIs by passing it the job ID. On successful completion of an ingestion job, the Job Status API should return a response like the sample below.
{
"status": "successful",
"msg": "Job finished in 0.114268 seconds at 2018-12-20 00:53:43.324103+00:00",
"result": "{'collections_deleted': 0, 'schemas_deleted': 0, 'folders_added': 0, 'folders_deleted': 0, 'collections_added': 0, 'schemas_added': 1, 'schemas_updated': 0}"
}
Sample Replace Request Body
{
"folders": [
{
"name": "testDB",
"collections": [
{
"name": "testTable",
"schemata": [
{
"name": "restaurant",
"definition": {
"title": "Restaurant",
"type": "object",
"required": ["name", "employees", "equipment"],
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": ["string", "number", "integer"]
},
"stars": {
"description": "Stripe's evaluation of the riskiness of the payment",
"type": "integer"
},
"categories": {
"description": "Restaurant Category",
"type": "array",
"items": {
"type": "string"
}
},
"employees": {
"description": "Restaurant Employees",
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"salary": {
"type": "integer"
},
"tax": {
"type": "object",
"properties": {
"bracket": {
"type": "string"
},
"withheld": {
"type": "integer"
}
}
}
}
}
},
"equipment": {
"description": "Restaurant Equipment",
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"price": {
"type": "integer"
},
"location": {
"type": "string"
}
}
}
}
}
}
}
]
}
]
}
]
}
This JSON body also uses the same keys as in the initial upload in the previous section. Note that a new schema "airline" just added has not been mentioned, so it would be kept in the catalog as it is. Since the "restaurant" schema from the earlier request has been modified with additional details, the old copy in the catalog will be replaced with this new schema definition.
On successful ingestion of this JSON body, the Job Status API would return a response like the one below.
{
"status": "successful",
"msg": "Job finished in 0.114268 seconds at 2018-12-20 00:53:43.324103+00:00",
"result": "{'collections_deleted': 0, 'schemas_deleted': 0, 'folders_added': 0, 'folders_deleted': 0, 'collections_added': 0, 'schemas_added': 0, 'schemas_updated': 1}"
}
Uploading Avro Schemas.
The URL for uploading Avro schemas is the same as for uploading JSON schemas. However you will need to add an extra parameter to your POST header.
headers = {
# ...
"json_type": "avro",
# ...
}
Example Avro Post body
from collections import OrderedDict
payload_avro = OrderedDict([
("folders", [
OrderedDict([
("name", "Kafka Schemas"),
("collections", [
OrderedDict([
("name", "Collection 1"),
("schemata", [
{
"type": "record",
"namespace": "MyRecord",
"name": "Employee",
"fields": [
{"name": "Name", "type": "string"},
{"name": "Age", "type": "int"}
]
}
])
])
])
])
])
])
Delete sub-objects in a NoSQL catalog
NOTE: This will mark the deleted objects without actually erasing it. If the same object is created again or updated, the object's description and custom field values will be restored.
The URL request is the same as earlier, except a new boolean URL parameter remove_not_seen
is accepted. The parameter is optional and defaults to false
.
URL
POST
/integration/v1/data/**<datasource_id>**/parse_docstore/?remove_not_seen=true
Replace <datasource_id> with the ID of your Data Source, or use the URL you copied in the "Create" section above.
Sample Delete Request Body
{
"folders": [
{
"name": "testDB",
"collections": [
{
"name": "testTable",
"schemata": [
{
"name": "airline",
"definition": {
"type": "object",
"properties": {
"checked": {
"type": "boolean"
},
"dimensions": {
"type": "object",
"properties": {
"width": {
"type": "integer"
},
"height": {
"type": "object",
"properties": {
"super_height": {
"type": "object",
"properties": {
"jumbo_height": {
"type": "integer"
}
}
}
}
}
}
},
"id": {
"type": "integer"
},
"name": {
"type": ["string", "number", "integer"]
},
"price": {
"type": "number"
}
}
}
}
]
}
]
}
]
}
This JSON metadata body uses the same keys as in the earlier requests. Note that since the "restaurant" schema is not mentioned and the remove_not_seen
parameter is true, it is (soft-)deleted from the catalog. The "airline" schema has updates to it in this JSON body, and updating it in the catalog is not affected by the remove_not_seen
parameter.
Headers
HTTP Header | Value |
---|---|
TOKEN | <your_token> |
Content-Type | application/json |
Replace <your_token> with API Token which can be obtained from a call to Generate Tokens for the Alation API.
Success Response
Status: 202 ACCEPTED
Content-Type: application/json
Body
Object containing the ID of the job that has started ingesting the metadata.
{
"job_id": ...
}
To get the job status and a detailed response once it finishes, use the Jobs APIs by passing it the job ID. On successful completion of an ingestion job, the Job Status API should return a response like the sample below.
{
"status": "successful",
"msg": "Job finished in 0.114268 seconds at 2018-12-20 00:53:43.324103+00:00",
"result": "{'collections_deleted': 0, 'schemas_deleted': 1, 'folders_added': 0, 'folders_deleted': 0, 'collections_added': 0, 'schemas_added': 0, 'schemas_updated': 1}"
}
Error Response
Invalid Token
Status: 401 UNAUTHORIZED
Body
{
"detail": "Authentication failed"
}
Missing Token Header
Status: 401 UNAUTHORIZED
Body
{
"detail": "Authentication credentials were not provided."
}
NOTE: The error responses are common to all of the requests above.
Code Samples
cURL
#!/bin/bash
# This is an example token. Please replace this with your token.
API_TOKEN="2abcd-4c04-4c21-8692-eda27a877f90"
BASE_URL="https://alation.yourcompany.com/integration/v1/data"
# Add metadata to a data source with unique identifier as 1
# The example below adds a folder '&FCS t@Γ¦st' with a collection 'C_FCS' in it containing two schemas 'Schemata_FCS' and 'TBD_Schemata_FCS' to the metadata
curl -X POST "${BASE_URL}/1/parse_docstore/?remove_not_seen=true&json_type=json" -H 'content-type: application/json' -H "TOKEN: ${API_TOKEN}" --data-binary @body.txt
# Contents of body.txt
# {
# "folders": [
# {
# "name": "&FCS t@Γ¦st",
# "collections": [
# {
# "name": "C_FCS",
# "schemata": [
# {
# "name": "Schemata_FCS",
# "definition": {
# "title": "Restaurant",
# "type": "object",
# "required": ["name", "employees", "equipment"],
# "properties": {
# "id": {
# "type": "integer"
# },
# "name": {
# "type": ["string", "number", "integer"]
# },
# "stars": {
# "description": "Stripe'\''s evaluation of the riskiness of the payment",
# "type": "integer"
# },
# "categories": {
# "description": "Restaurant Category",
# "type": "array",
# "items": {
# "type": "string"
# }
# },
# "employees": {
# "description": "Restaurant Employees",
# "type": "array",
# "items": {
# "type": "object",
# "properties": {
# "name": {
# "type": "string"
# },
# "salary": {
# "type": "integer"
# },
# "tax": {
# "type": "object",
# "properties": {
# "bracket": {
# "type": "string"
# },
# "withheld": {
# "type": "integer"
# }
# }
# }
# }
# }
# },
# "equipment": {
# "description": "Restaurant Equipment",
# "type": "array"
# }
# }
# }
# },
# {
# "name": "TBD_Schemata_FCS",
# "definition": {
# "title": "Restaurant",
# "type": "object",
# "required": ["name", "employees", "equipment"],
# "properties": {
# "id": {
# "type": "integer"
# }
# }
# }
# }
# ]
# }
# ]
# }
# ]
# }
Python
import requests
import json
from collections import OrderedDict
import time
headers = {
# this is an example API token, make sure to replace it with your own.
'TOKEN': "e9532638-2732-4873-a71d-2ff8c9925f5b",
'Content-Type': "application/json",
'cache-control': "no-cache",
}
# Add json schemas to a data source with unique identifier as 1
url = "https://alation.yourcompany.com/integration/v1/data/1/parse_docstore/"
# Setting the remove_not_seen parameter to true will delete any object in the data source that does not appear in the payload below
querystring = {"remove_not_seen":"true"}
# The example payload below adds a folder 'Example JSON' with a collection 'Collection JSON' in it containing two schemas 'Restaurant' and 'Airline' to the metadata
payload = OrderedDict([
("folders", [
OrderedDict([
("name", "Example JSON"),
("collections", [
OrderedDict([
("name", "Collection JSON"),
("schemata", [
OrderedDict([
("name", "Restaurant"),
("definition", {
"type": "object",
"required": ["name", "employees", "id"],
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": ["string", "number", "integer"]
},
"employees": {
"description": "Restaurant Employees",
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"salary": {
"type": "integer"
},
}
}
},
}
})
]),
OrderedDict([
('name', 'Airline'),
('definition', {
"name": "Airline",
"definition": {
"type": "object",
"required": ["checked", "dimensions", "name"],
"properties": {
"checked": {
"type": "boolean"
},
"dimensions": {
"type": "object",
"properties": {
"width": {
"type": "integer"
},
"height": {
"type": "integer",
}
}
},
"name": {
"type": ["string", "number", "integer"]
},
}
}
})
])
])
])
])
])
])
])
response = requests.request("POST", url, data=json.dumps(payload), headers=headers, params=querystring)
print(response.text)
# Should be enough time for the ingestion job to finish
time.sleep(5)
# Get the job status
get_response_job = requests.request("GET", url="https://alation.yourcompany.com/api/v1/bulk_metadata/job/?id=" + str(json.loads(response.text)['job_id']), headers=headers)
print get_response_job.text