Azure Cosmos DB

The Cosmos DB data source connector allows you to fetch data from the NoSQL Azure Cosmos DB using the SQL API. If your CosmosDB instance uses the MongoDB API, you should use the MongoDB data source connector instead.

Configuration

Create a new configuration of the CosmosDB data source connector.

Fill in the Endpoint, Database ID and Key. Then click Save.

Screenshot - Extractor configuration

Click Add Row to add one or more Configuration Rows.

Screenshot - Extractor configuration

Fill in the name, and optionally the description. Then click Add Row.

Screenshot - Extractor configuration

In the Configuration Row, fill in the Configuration Parameters. Then click Save.

Screenshot - Extractor configuration

Configuration Parameters

  • containerId: string (required); the ID of the Cosmos DB container
  • output: string (required); the name of the output table in your bucket
  • incremental: boolean (optional); enables incremental loading; the default is false
  • incrementalFetchingKey: string (optional); the name of the key for incremental fetching, e.g., c.id
  • mode: enum (optional)
    • mapping (default) – items are exported using specified mapping
    • raw - items are exported as plain JSON strings; the table will contain id and data columns
  • mapping: string; required for mode = mapping
  • maxTries: integer (optional); the max number of tries if an error occurs; the default is 5
  • ignoredKeys: array (optional)
    • CosmosDB automatically adds some metadata keys when the item is inserted.
    • By default, the following keys are ignored: ["_rid", "_self", "_etag", "_attachments", "_ts"]

By default, the connector exports all documents using the generated SQL query. The default query is SELECT * FROM c. The query can be modified with the following parameters:

  • select: string (optional), e.g., c.name, c.date; the default is *; read more
    • For raw mode the id field must be present in the query results.
  • from: string (optional), e.g., Families f; the default is c; read more
  • sort: string (optional), e.g., c.date; read more
  • limit: integer (optional), e.g., 500; read more

Or you can set a custom query using the following parameter:

  • query: string (optional), e.g., SELECT f.name FROM Families f

Examples

Raw mode – full load:

{
  "containerId": "my-container",
  "output": "my-table",
  "mode": "raw"
}

Mapping mode – full load:

{
  "containerId": "my-container",
  "output": "my-table",
  "mode": "mapping", 
  "mapping": {
    "id": {
      "type": "column",
      "mapping": {
        "destination": "id",
        "primaryKey": true
      }
    },
    "business_name": "name",
    "result": "result",
    "address": {
      "type": "table",
      "destination": "city",
      "tableMapping": {
        "city": "name"
      }
    }
  }
}

Raw mode – incremental load:

{
  "containerId": "my-container",
  "output": "my-table",
  "mode": "raw",
  "incremental": true,
  "incrementalFetchingKey": "c.id"
}

Raw mode – custom query:

{
  "containerId": "my-container",
  "output": "my-table",
  "mode": "raw",
  "query": "SELECT f.name FROM Families f"
}