NoSQL Apis

NoSQL API

Description

This API allows users to upload and maintain the metadata details of a NoSQL database that are then displayed in the Alation Catalog.

Using this API, you can:

  • Create new NoSQL Catalog Pages for folders, collections and schemas.
  • Update the details of an existing NoSQL catalog.
  • Delete objects within a NoSQL catalog. The given objects will be marked soft-deleted, and can be restored.
    • It is not lossible to delete the entire metadata using the API. To completely delete everything you have uploaded, please delete the data source via the Alation Catalog.

Once the NoSQL catalog has been created using the API, metadata details of the catalog can be viewed by visiting its catalog pages. To delete the entire catalog, visit the General Settings tab of its settings page and click on the Delete link in the Delete Data Source section.

Required NoSQL Metadata structure

A NoSQL database is modeled as a collection of top level folders, each containing one or more collections. Each collection is a set of documents. The metadata details of a NoSQL database follow the same structure as above at the folder and collection level. However, each collection will contain a list of schemas instead of a list of documents. The schemas in a collection describe the structure of the documents in the collection.

The metadata structure for a given NoSQL database instance depends on the structure of the database itself. For example, a Mongodb instance has top level databases, with each database containing a number of collections and each collection containing one or more documents. Its metadata structure would have top level folders, with each folder containing a number of collections and each collection containing one or more schemas. In this way, we would think of a database in Mongodb as a folder in our metadata structure, that contains only collections with schemas.

This metadata tree structure of folders, collections in folders, and schemas in each collection is represented by a JSON object, with schemas themselves represented in the JSON or AVRO format.

Restrictions on the Structure

  - Folders cannot contain other folders. The can only contain collections
 - COllections cannot contain collections. They can only contain schemas

Defining Folders, Collections, and Schemas.

Your NoSQL metadata JSON object should be defined as follows:

{
  "folder": [
    {
      "name": "$folder_name",
      "collections": [
        {
          "name": "$Collection_name",
          "schemata": [...]
        }
      ]
    }
  ]
}
KeyIf requiredTypeValue
FoldersyesarrayAn array of Folder objects, with each object containing two keys: "name" and "collections".
nameyesstringThe name property of a folder or a collection obejct
collectionsyesarrayAn array of Collection objects, with each object containing two keys: "name" and "schemata". The value of "name" denotes the name of this collection. The value of "schemata" is an array of Schema objects.
schematayesarrayAn array of schemas describing the uploaded metadata. The properties of this key depend on the format of the schema you are uploading: JSON or Avro.

Describing Schemata

The properties of the "schemata" key depend on the format of the schema you are uploading. JSON or Avro.

JSON Schemas

JSON Schema is a web standard used to specify the format of a piece of JSON. This API uses JSON Schema Draft 7 to specify the schemas of documents in a collection when they are being added to Alation Catalog.

The “schemata” property for a JSON schema must be an array of schema objects with each schema object containing two keys: "name" and "definition". The value of "name" denotes the name of this schema as it would appear on its catalog page, and the value of "definition" is a JSON schema object with schema properties.

JSON Schema KeyPurpose
typeto specific the type of a JSON field.
required (optional)to note the input fields that are required (fields not in the list are not required). Not used on output of the Resource.
propertiesto declare the contents of an object.
itemsto declare the contents of a list. This can be a single item if the list is all of a single type, or a list of objects if the list is comprised of multiple types.
examples (optional)to denote a list of examples for non-object, non-list types (string, number, character types).

The Definition Object

KeyIf requiredPurpose
typeyesto specify the type of a JSON field
required/optionalyesto note the input fields that are required (fields not in the list are not required). Not used on output of the Resource.
propertiesyesto declare the contents of an object
itemsyes, if applicableto declare the contents of a list. This can be a single item if the list is all of a single type, or a list of objects if the list is comprised of multiple types
examplesNoto denote a list of examples for non-object, non-list types (string, number, character types)

JSON Schemas do not have to be written by hand! There are many tools to generate a schema given a sample piece of JSON, such as this one. You can paste the output of the API you are documenting into the generator to create the JSON Schema needed. Take caution with sensitive data when taking this approach, as the contents of your sample piece of json will be used as example values. For local programmatic generation of JSON schemas, there are libraries in various languages. One in particular is GenSON, which is a Python library.

Each field in a JSON Schema will have its own page in the Catalog where sample values will be displayed along with logical metadata such as a title, description and custom fields.

Sample JSON Schema Structure

{
  "folders": [{
    "name": "$folder_name",
    "collections": [{
      "name": "$collection_name",
      "schemata": [{
        "name":"$schema_name",
        "definition": {
          "title": "$title",
          "type": "$type",
          "required": ["$attribute1", "$attribute2"],
          "properties": {
            "$attribute1": {
              "type": "$data_type"
            },
            "$attribute2": {
              "type": "$data_type"
            }
          }
        }
      }]
    }]
  }]
}

AVRO schemas

Avro Schemas are used for serializing and deserializing data written to topics in the event streaming service Kafka. You will still need to wrap your avro schema in a Folder and Collection. Alation will support all types found in an Avro Schema. You can find an example of how to upload avro schemas do this in the Upload your Metadata into the NoSQL catalog section.

The "schemata" key for ingesting Avro schemas must contain avro schemas .

Sample Avro Schema Structure

{
  "folders": [{
    "name": "$folder_name",
    "collections": [{
      "name": "$collection_name",
      "schemata": [{
        "type": "record",
        "name": "$value",
        "namespace": "$value",
        "fields": [
          {"name": "$value", "type": "string"},
          {"name": "$value", "type": "int"}
        ]
      }]
    }]
  }]
}

Important When posting Avro schemas to Alation, please make sure that the name of an avro object comes before the type. The ingestion process will fail otherwise.

Supported Avro Data Types

We use Avro 1.9.0 to specify schemas in collections. Alation supports all Avro data types and their attributes. For details on Avro data type properties, refer to this link.

Uploading Metadata

Create a NoSQL catalog

Before uploading the metadata via the API, make sure you have created a Virtual Data Source of Generic NoSQL type in your Alation catalog. For steps, please consult alation zendesk.

If you are the user who created the Virtual Data Source to which you are uploading, then you are already its Data Source Admin. Otherwise, make sure you have Data Source Admin privileges to the Data Source you are uploading metadata into.

Your Data Source has been created. Navigate to the General Settings tab and copy the URL from the "Bulk Upload API" section. You will use this URL to upload your metadata structure into the catalog.

You will also see a switch to turn on versioning for this datasource in the General Settings tab. This is only recommended for datasources that you know will only contain a small amount of data. If you have the switch turned on and load a large number schemas, you may find that Alation slows down. Versioning was built for clients who wish to version their avro schemas in Alation. This switch cannot be turned off without the help of Alation Customer Support.

Upload your Metadata into the NoSQL catalog

This section describes the API with which you upload your metadata structure into the catalog. The metadata structure describes all the folders, collections and schemas that need to be uploaded.

If you created the Data Source yourself as described in the previous section, you are already its admin. Otherwise, make sure you have admin privileges to the Data Source you are uploading metadata into.

URL for JSON schemas

POST /integration/v1/data/**<datasource_id>**/parse_docstore/

Replace <datasource_id> with the ID of your Data Source, or use the URL you copied in the previous section.

Sample Request Body JSON

{
  "folders": [
    {
      "name": "testDB",
      "collections": [
        {
          "name": "testTable",
          "schemata": [
            {
              "name": "restaurant",
              "definition": {
                "title": "Restaurant",
                "type": "object",
                "properties": {
                  "id": {
                    "type": "integer"
                  },
                  "name": {
                    "type": ["string", "number", "integer"]
                  }
                }
              }
            }
          ]
        }
      ]
    }
  ]
}
KeyRequiredValueType
foldersYesThis is a top-level key-value pair and the value is an array of Folder objects, with each object containing two keys: "name" and "collections". The value of "name" denotes the name of this folder and the value of "collections" is an array of Collection objects, described below.array of Folder objects
collectionsYesAn array of Collection objects, with each object containing two keys: "name" and "schemata". The value of "name" denotes the name of this collection and the value of "schemata" is an array of Schema objects, described below.array of Collection objects
schemataYesAn array of Schema objects, with each object containing two keys: "name" and "definition". The value of "name" denotes the name of this schema as it would appear on its catalog page and the value of "definition" is a JSON schema object.array of object Schema objects
definitionYesA JSON Schema object describing the structure of one or more documents in a collection, using keys such as "title", "type", "required" and "properties".JSON Schema object

URL for AVRO for avro schemas

POST /integration/v1/data/**<datasource_id>**/parse_docstore/?json_type=avro

Replace <datasource_id> with the ID of your Data Source, or use the URL you copied in the previous section.

Sample Request Body AVRO

{
  "folders": [{
    "name": "Avro-folder",
    "collections": [{
      "name": "avro-collection",
      "schemata": [{
        "name": "restaraunt",
        "type": "record",
        "namespace": "com.restaraunt.datasource",
        "fields": [
          {"name": "owner", "type": "string"},
          {"name": "address", "type": "string"}
        ]
      }]
    }]
  }]
}

Headers

HTTP HeaderValue
TOKEN<your_token>
Content-Typeapplication/json

Replace <your_token> with API Token which can be obtained from getToken API call (Get API Token).

Success Response

Status: 202 ACCEPTED

Content-Type: application/json

Response Body

Object containing the ID of the job that has started ingesting the metadata.

{
    "job_id": ...

To get the job status and a detailed response once it finishes, use the Job Status API by passing it the job ID. On successful completion of an ingestion job, the Job Status API should return a response like the sample below.

{
    "status": "successful",
    "msg": "Job finished in 0.114268 seconds at 2018-12-20 00:53:43.324103+00:00",
    "result": "{'collections_deleted': 0, 'schemas_deleted': 0, 'folders_added': 1, 'folders_deleted': 0, 'collections_added': 1, 'schemas_added': 1, 'schemas_updated': 0}"
}

Update a NoSQL catalog

The URL request is the same as used for the initial upload, except the request body can contain additional or updated details. Updates (appends or replaces) can be done only at the level of the folder, collection or schema using the name of the folder, collection or schema respectively as the identifying key.

URL

POST /integration/v1/data/**<datasource_id>**/parse_docstore/

Replace <datasource_id> with the ID of your Data Source, or use the URL you copied in the "Create" section above.

Sample Append Request Body

{
    "folders": [
        {
            "name": "testDB",
            "collections": [
                {
                    "name": "testTable",
                    "schemata": [
                        {
                            "name": "airline",
                            "definition": {
                                "definitions": {
                                    "metric_length": {
                                        "type": "number"
                                    },
                                    "airplane_type": {
                                        "type": "object",
                                        "properties": {
                                            "num_wheels": {
                                                "type": "integer"
                                            }
                                        }
                                    }
                                },
                                "type": "object",
                                "required": [
                                    "name",
                                    "price"
                                ],
                                "properties": {
                                    "checked": {
                                        "type": "boolean"
                                    },
                                    "dimensions": {
                                        "type": "object",
                                        "properties": {
                                            "width": {
                                                "type": "integer"
                                            },
                                            "height": {
                                                "type": "object",
                                                "properties": {
                                                    "super_height": {
                                                        "type": "object",
                                                        "properties": {
                                                            "jumbo_height": {
                                                                "type": "integer"
                                                            }
                                                        }
                                                    }
                                                }
                                            }
                                        }
                                    },
                                    "id": {
                                        "type": "integer"
                                    },
                                    "name": {
                                        "type": [
                                            "string",
                                            "number",
                                            "integer"
                                        ]
                                    },
                                    "price": {
                                        "type": "number"
                                    },
                                    "airport_percentage": {
                                        "$ref": "metric_length"
                                    },
                                    "tags": {
                                        "type": "array",
                                        "items": {
                                            "type": "string"
                                        }
                                    },
                                    "airplane_dimensions": {
                                        "type": "object",
                                        "properties": {
                                            "girth": {
                                                "$ref": "metric_length"
                                            },
                                            "legroom": {
                                                "$ref": "metric_length"
                                            }
                                        }
                                    },
                                    "airplane_info": {
                                        "type": "array",
                                        "items": [
                                            {
                                                "type": "string"
                                            },
                                            {
                                                "type": "number"
                                            },
                                            {
                                                "$ref": "airplane_type"
                                            }
                                        ]
                                    }
                                }
                            }
                        }
                    ]
                }
            ]
        }
    ]
}

This JSON body uses the same keys as in the initial upload in the previous section. Note that a new schema "airline" has been added. Since the "restaurant" schema from the earlier request has not been mentioned, it would be kept in the catalog as it is. Details on uploading avro are later in this section.

Headers

HTTP HeaderValue
TOKEN<your_token>
Content-Typeapplication/json

Replace <your_token> with API Token which can be obtained from getToken API call (Get API Token).

Success Response

Status: 202 ACCEPTED

Content-Type: application/json

Body

Object containing the ID of the job that has started ingesting the metadata.

{
    "job_id": ...
}

To get the job status and a detailed response once it finishes, use the Job Status API by passing it the job ID. On successful completion of an ingestion job, the Job Status API should return a response like the sample below.

{
    "status": "successful",
    "msg": "Job finished in 0.114268 seconds at 2018-12-20 00:53:43.324103+00:00",
    "result": "{'collections_deleted': 0, 'schemas_deleted': 0, 'folders_added': 0, 'folders_deleted': 0, 'collections_added': 0, 'schemas_added': 1, 'schemas_updated': 0}"
}

Sample Replace Request Body

{
    "folders": [
        {
            "name": "testDB",
            "collections": [
                {
                    "name": "testTable",
                    "schemata": [
                        {
                            "name": "restaurant",
                            "definition": {
                                "title": "Restaurant",
                                "type": "object",
                                "required": ["name", "employees", "equipment"],
                                "properties": {
                                    "id": {
                                      "type": "integer"
                                    },
                                    "name": {
                                      "type": ["string", "number", "integer"]
                                    },
                                    "stars": {
                                      "description": "Stripe's evaluation of the riskiness of the payment",
                                      "type": "integer"
                                    },
                                    "categories": {
                                      "description": "Restaurant Category",
                                      "type": "array",
                                      "items": {
                                        "type": "string"
                                      }
                                    },
                                    "employees": {
                                      "description": "Restaurant Employees",
                                      "type": "array",
                                      "items": {
                                          "type": "object",
                                          "properties": {
                                            "name": {
                                              "type": "string"
                                            },
                                            "salary": {
                                              "type": "integer"
                                            },
                                            "tax": {
                                              "type": "object",
                                              "properties": {
                                                "bracket": {
                                                  "type": "string"
                                                },
                                                "withheld": {
                                                  "type": "integer"
                                                }
                                              }
                                            }
                                          }
                                      }
                                    },
                                    "equipment": {
                                      "description": "Restaurant Equipment",
                                      "type": "array",
                                      "items": {
                                          "type": "object",
                                          "properties": {
                                            "name": {
                                              "type": "string"
                                            },
                                            "price": {
                                              "type": "integer"
                                            },
                                            "location": {
                                              "type": "string"
                                            }
                                          }
                                      }
                                    }
                                }
                            }
                        }
                     ]
                }
            ]
        }
    ]
}

This JSON body also uses the same keys as in the initial upload in the previous section. Note that a new schema "airline" just added has not been mentioned, so it would be kept in the catalog as it is. Since the "restaurant" schema from the earlier request has been modified with additional details, the old copy in the catalog will be replaced with this new schema definition.

On successful ingestion of this JSON body, the Job Status API would return a response like the one below.

{
    "status": "successful",
    "msg": "Job finished in 0.114268 seconds at 2018-12-20 00:53:43.324103+00:00",
    "result": "{'collections_deleted': 0, 'schemas_deleted': 0, 'folders_added': 0, 'folders_deleted': 0, 'collections_added': 0, 'schemas_added': 0, 'schemas_updated': 1}"
}

Uploading Avro Schemas.

The URL for uploading Avro schemas is the same as for uploading JSON schemas. However you will need to add an extra parameter to your POST header.

headers = {
  # ...
  "json_type": "avro",
  # ...
}

Example Avro Post body

from collections import OrderedDict
payload_avro = OrderedDict([
  ("folders", [
    OrderedDict([
      ("name", "Kafka Schemas"),
      ("collections", [
        OrderedDict([
          ("name", "Collection 1"),
          ("schemata", [
              {
               "type": "record",
               "namespace": "MyRecord",
               "name": "Employee",
               "fields": [
                  {"name": "Name", "type": "string"},
                  {"name": "Age", "type": "int"}
                ]
              }
          ])
        ])
      ])
    ])
  ])
])

Delete sub-objects in a NoSQL catalog

NOTE: This will mark the deleted objects without actually erasing it. If the same object is created again or updated, the object's description and custom field values will be restored.

The URL request is the same as earlier, except a new boolean URL parameter remove_not_seen is accepted. The parameter is optional and defaults to false.

URL

POST /integration/v1/data/**<datasource_id>**/parse_docstore/?remove_not_seen=true

Replace <datasource_id> with the ID of your Data Source, or use the URL you copied in the "Create" section above.

Sample Delete Request Body

{
    "folders": [
        {
            "name": "testDB",
            "collections": [
                {
                    "name": "testTable",
                    "schemata": [
                        {
                            "name": "airline",
                            "definition": {
                                "type": "object",
                                "properties": {
                                    "checked": {
                                      "type": "boolean"
                                    },
                                    "dimensions": {
                                      "type": "object",
                                      "properties": {
                                            "width": {
                                              "type": "integer"
                                            },
                                            "height": {
                                              "type": "object",
                                              "properties": {
                                                "super_height": {
                                                  "type": "object",
                                                  "properties": {
                                                    "jumbo_height": {
                                                      "type": "integer"
                                                    }
                                                  }
                                                }
                                              }
                                            }
                                      }
                                    },
                                    "id": {
                                      "type": "integer"
                                    },
                                    "name": {
                                      "type": ["string", "number", "integer"]
                                    },
                                    "price": {
                                      "type": "number"
                                    }
                                }
                            }
                        }
                    ]
                }
            ]
        }
    ]
}

This JSON metadata body uses the same keys as in the earlier requests. Note that since the "restaurant" schema is not mentioned and the remove_not_seen parameter is true, it is (soft-)deleted from the catalog. The "airline" schema has updates to it in this JSON body, and updating it in the catalog is not affected by the remove_not_seen parameter.

Headers

HTTP HeaderValue
TOKEN<your_token>
Content-Typeapplication/json

Replace <your_token> with API Token which can be obtained from getToken API call (Get API Token).

Success Response

Status: 202 ACCEPTED

Content-Type: application/json

Body

Object containing the ID of the job that has started ingesting the metadata.

{
    "job_id": ...
}

To get the job status and a detailed response once it finishes, use the Job Status API by passing it the job ID. On successful completion of an ingestion job, the Job Status API should return a response like the sample below.

{
    "status": "successful",
    "msg": "Job finished in 0.114268 seconds at 2018-12-20 00:53:43.324103+00:00",
    "result": "{'collections_deleted': 0, 'schemas_deleted': 1, 'folders_added': 0, 'folders_deleted': 0, 'collections_added': 0, 'schemas_added': 0, 'schemas_updated': 1}"
}

Error Response

Invalid Token

Status: 401 UNAUTHORIZED

Body

{
   "detail": "Authentication failed"
}

Missing Token Header

Status: 401 UNAUTHORIZED

Body

{
    "detail": "Authentication credentials were not provided."
}

NOTE: The error responses are common to all of the requests above.

Code Samples

cURL

#!/bin/bash

# This is an example token. Please replace this with your token.
API_TOKEN="2abcd-4c04-4c21-8692-eda27a877f90"

BASE_URL="https://alation.yourcompany.com/integration/v1/data"

# Add metadata to a data source with unique identifier as 1
# The example below adds a folder '&FCS [email protected]æst' with a collection 'C_FCS' in it containing two schemas 'Schemata_FCS' and 'TBD_Schemata_FCS' to the metadata
curl -X POST "${BASE_URL}/1/parse_docstore/?remove_not_seen=true&json_type=json" -H 'content-type: application/json' -H "TOKEN: ${API_TOKEN}" --data-binary @body.txt

# Contents of body.txt
# {
#     "folders": [
#         {
#             "name": "&FCS [email protected]æst",
#             "collections": [
#                 {
#                     "name": "C_FCS",
#                     "schemata": [
#                         {
#                             "name": "Schemata_FCS",
#                             "definition": {
#                                 "title": "Restaurant",
#                                 "type": "object",
#                                 "required": ["name", "employees", "equipment"],
#                                 "properties": {
#                                     "id": {
#                                       "type": "integer"
#                                     },
#                                     "name": {
#                                       "type": ["string", "number", "integer"]
#                                     },
#                                     "stars": {
#                                       "description": "Stripe'\''s evaluation of the riskiness of the payment",
#                                       "type": "integer"
#                                     },
#                                     "categories": {
#                                       "description": "Restaurant Category",
#                                       "type": "array",
#                                       "items": {
#                                         "type": "string"
#                                       }
#                                     },
#                                     "employees": {
#                                       "description": "Restaurant Employees",
#                                       "type": "array",
#                                       "items": {
#                                           "type": "object",
#                                           "properties": {
#                                             "name": {
#                                               "type": "string"
#                                             },
#                                             "salary": {
#                                               "type": "integer"
#                                             },
#                                             "tax": {
#                                               "type": "object",
#                                               "properties": {
#                                                 "bracket": {
#                                                   "type": "string"
#                                                 },
#                                                 "withheld": {
#                                                   "type": "integer"
#                                                 }
#                                               }
#                                             }
#                                           }
#                                       }
#                                     },
#                                     "equipment": {
#                                       "description": "Restaurant Equipment",
#                                       "type": "array"
#                                     }
#                                 }
#                             }
#                         },
#                         {
#                             "name": "TBD_Schemata_FCS",
#                             "definition": {
#                                 "title": "Restaurant",
#                                 "type": "object",
#                                 "required": ["name", "employees", "equipment"],
#                                 "properties": {
#                                     "id": {
#                                       "type": "integer"
#                                     }
#                                 }
#                             }
#                         }
#                     ]
#                 }
#             ]
#         }
#     ]
# }

Python

import requests
import json
from collections import OrderedDict
import time

headers = {
    # this is an example API token, make sure to replace it with your own.
    'TOKEN': "e9532638-2732-4873-a71d-2ff8c9925f5b",
    'Content-Type': "application/json",
    'cache-control': "no-cache",
}

# Add json schemas to a data source with unique identifier as 1
url = "https://alation.yourcompany.com/integration/v1/data/1/parse_docstore/"

# Setting the remove_not_seen parameter to true will delete any object in the datasource that does not appear in the payload below
querystring = {"remove_not_seen":"true"}

# The example payload below adds a folder 'Example JSON' with a collection 'Collection JSON' in it containing two schemas 'Restaurant' and 'Airline' to the metadata
payload =  OrderedDict([
  ("folders", [
    OrderedDict([
      ("name", "Example JSON"),
      ("collections", [
        OrderedDict([
          ("name", "Collection JSON"),
          ("schemata", [
            OrderedDict([
              ("name", "Restaurant"),
              ("definition", {
                "type": "object",
                "required": ["name", "employees", "id"],
                "properties": {
                  "id": {
                    "type": "integer"
                  },
                  "name": {
                    "type": ["string", "number", "integer"]
                  },
                  "employees": {
                    "description": "Restaurant Employees",
                    "type": "array",
                    "items": {
                      "type": "object",
                      "properties": {
                        "name": {
                          "type": "string"
                        },
                        "salary": {
                          "type": "integer"
                        },
                      }
                    }
                  },
                }
              })
            ]),
            OrderedDict([
              ('name', 'Airline'),
              ('definition', {
                "name": "Airline",
                "definition": {
                  "type": "object",
                  "required": ["checked", "dimensions", "name"],
                  "properties": {
                    "checked": {
                      "type": "boolean"
                    },
                    "dimensions": {
                      "type": "object",
                      "properties": {
                        "width": {
                          "type": "integer"
                        },
                        "height": {
                          "type": "integer",
                        }
                      }
                    },
                    "name": {
                      "type": ["string", "number", "integer"]
                    },
                  }
                }
              })
            ])
          ])
        ])
      ])
    ])
  ])
])

response = requests.request("POST", url, data=json.dumps(payload), headers=headers, params=querystring)

print(response.text)
# should be enough time for the ingestion job to finish
time.sleep(5)

# get the job status
get_response_job = requests.request("GET", url="https://alation.yourcompany.com/api/v1/bulk_metadata/job/?id=" + str(json.loads(response.text)['job_id']), headers=headers)
print get_response_job.text