HomeGuidesRecipesAPI ReferencePython SDK
Alation Help Center

Upload NoSQL Metadata to Alation

Overview

You can catalog a NoSQL database in Alation using the virtual data source functionality and the NoSQL API.

πŸ“˜

Note

Alation does not provide automated metadata extraction (MDE) for virtual data sources, and you will need to use the API to upload and maintain the metadata.

Before using the API, you must enable support for NoSQL and create a virtual NoSQL data source in Alation. See Virtual NoSQL Data Sources for help with these steps.

Once you have a virtual NoSQL data source in Alation, you'll need to create a JSON object that contains the metadata for your NoSQL data. This JSON object will be passed in the body of a POST call to upload the metadata to the catalog. The JSON object is described below.

NoSQL API Versions

There are two versions of the NoSQL API:

  • Version 1 requires the metadata for a virtual NoSQL data source to represented as a single, large JSON object. Every API request uses the POST method. The entire JSON object has to be uploaded every time, even if you're only updating or deleting one small thing.
  • Version 2 of the NoSQL API has new GET, PATCH, and DELETE methods, allowing you to retrieve, update, and delete specific objects within the NoSQL structure.

Version 1 has a number of deficiencies compared to version 2. We recommend using version 2 of the API. This guide focuses on version 2.

Add a New Folder and Its Children

To add a new folder and all its children to the Alation catalog, use the Create NoSQL metadata endpoint and submit a JSON object as a body parameter. The JSON object represents the folder and its contents. The JSON object is described below. Make sure you understand how NoSQL structures are represented in Alation so you can construct the JSON object correctly.

JSON Object Structure

Here's an example of the required JSON structure for POST calls.

{
  "folders": [
    {
      "name": "$folder_name",
      "collections": [
        {
          "name": "$collection_name",
          "schemata": [...]
        }
      ]
    }
  ]
}

The details of each object in the JSON structure are given below.

Root object

The root JSON object contains a single key named folders. The value of folders is an array of Folder objects.

KeyRequiredTypeValue
foldersyesarrayAn array of Folder objects.

Folder object

A Folder object contains two keys: name and collections. The name key must come before the collections key.

KeyRequiredTypeValue
nameyesstringThe name of the folder.
collectionsyesarrayAn array of Collection objects.

Collection object

A Collection object contains two keys: name and schemata. The name key must come before the schemata key.

KeyRequiredTypeValue
nameyesstringThe name of the collection.
schematayesarrayAn array of either JSON Schema objects or Avro Schema objects.

Example of a schemata key containing a list of JSON Schema objects.

{
  "folders": [
    {
      "name": "$folder_name",
      "collections": [
        {
          "name": "$collection_name",
          "schemata": [
            {
              "name": "JSON Schema 1",
              "definition":
              {...}
            },
            {
              "name": "JSON Schema 2",
              "definition":
              {...}
            }
          ]
        }
      ]
    }
  ]
}

Example of a schemata key containing a list of Avro Schema objects:

{
  "folders": [
    {
      "name": "$folder_name",
      "collections": [
        {
          "name": "$collection_name",
          "schemata": [
            {
              "name": "Avro Schema 1",
              "type": "record",
              "namespace": "$value",
              "fields": [...]
            },
            {
              "name": "Avro Schema 2",
              "type": "record",
              "namespace": "$value",
              "fields": [...]
            }
          ]
        }
      ]
    }
  ]
}

JSON Schema object

A JSON Schema object describes the structure of a set of documents in this collection. It should be included under the schemata key. A JSON Schema object contains two keys: name and definition. The name key must come before the definition key.

KeyRequiredTypeValue
nameyesstringThe name of the schema.
definitionyesobjectA Definition object.

Example JSON Schema object with a Definition object:

{
   "name": "schema_sample",
   "definition":
   {
      "title": "schema_sample",
      "type": "object",
      "description": "schema_sample",
      "required": ["$attribute1", "$attribute2"],
      "properties":
      {
        "$attribute1": {
          "type": "$data_type"
        },
        "$attribute2": {
          "type": "$data_type"
        },
        "$attribute3": {
          "type": "$data_type"
         }
      }
   }
}

Definition object

A Definition object is a JSON Schema that describes the content of your documents. Each property in the JSON schema will have its own catalog page in Alation.

JSON Schema is a web standard used to specify the format of a piece of JSON. This API uses JSON Schema Draft 7 to specify the schemas of documents in a collection when they are being added to Alation Catalog.

πŸ“˜

Using $ref

In your JSON schema, if you want to reference an object with $ref, make sure the definitions property for this object appears in the JSON code before you use $ref.

JSON schemas do not need to be written by hand. Your database management platform may include an export tool to use to retrieve the database collections in JSON format. You can use this export tool, if available, to export all the collections you want to include in the catalog and then feed the JSON into a JSON schema generator to infer the schemas from.

πŸ“˜

JSON Schema generator examples

You can use https://jsonschema.net/ to generate a JSON schema from a sample JSON document.

If you use Python, you can use the Genson library to generate a JSON schema.

Use caution with the sample JSON that you feed to the JSON generator. The contents of the sample JSON, including any sensitive data it contains, may be used as example values in the resulting schema.

See MongoDB below for an example of how to export a database collection in JSON format.

Avro Schema object

Avro schemas are used for serializing and deserializing data written to topics in the event streaming service Kafka. Alation will support all types found in an Avro schema.

To use Avro schemas, you must include the query parameterΒ ?json_type=avro in your POST call. When you include this parameter, all your schemas must be Avro schemas. The Avro schema object should be included under the schemata key.

When posting Avro schemas to Alation, the name of an Avro field must come before the type.

Supported Avro Data Types

We use Avro 1.9.0 to specify schemas in collections. Alation supports all Avro data types and their attributes. For details on Avro data type properties, see the Apache Avro documentation.

Sample Avro Schema Structure

{
   "name":"$value",
   "type":"record",
   "namespace":"$value",
   "fields":[
      {
         "name":"$value",
         "type":"string"
      },
      {
         "name":"$value",
         "type":"int"
      }
   ]
}

MongoDB Example

You can use this example for steps to create the JSON object for loading metadata into a MongoDB virtual data source:

  1. Identify the MongoDB instance you want to catalog, including authentication details (username/password), server network address, and the database name.
  2. Connect to the MongoDB server with the Mongo client (usually /usr/bin/mongo ).
  3. List all the databases on the server:
    show dbs
    
  4. To verify you have the right database, run:
    use <dbname>
    show collections
    
  5. Exit the Mongo client console and go back to the OS shell.
  6. A Mongo installation includes an export tool (usually /usr/bin/mongoexport) that can be used to retrieve the collections in JSON format. Run the mongoexport command on all collections you want to include in the catalog. For example, the following command will export the contents of the some_collection collection in the local database into the schema_file file.
    mongoexport -d local -c some_collection > schema_file
    
  7. Use a JSON schema generator to infer the schema from the JSON objects of the collections. If you are using the Python library Genson, from the Genson installation directory, run the following command to output the schema of the objects in schema_file:
    python bin/genson.py schema_file
    
  8. Fill in the required JSON object structure with the JSON schemas as described above.
  9. When JSON is ready, you can make a POST request to the upload API to load the metadata to a virtual data source.

Add a New NoSQL Object

You can add new NoSQL objects to an existing folder hierarchy with the Update NoSQL data objects endpoint. To add a collection, schema, or schema property under and existing folder, you must include the entire hierarchy of objects above the one you are adding. Use the same JSON structure as describe above.

Update Metadata on a NoSQL Schema

You can update the type, title, or description property on a NoSQL schema with the Update NoSQL data objects endpoint. In this case, just provide the necessary query parameters to uniquely identify the schema, and submit a JSON payload consisting of the property you want to change.