Connectors Overview

Katalogue uses Connectors to pull/extract metadata from datasources.

This section lists all connectors in Katalogue and connector-specific information.

Connector	Category	Connector Type
Databricks	database	primary
dbt	transformation	secondary
IBM DB2	database	primary
Microsoft SQL Server	database	primary
ODBC	database	primary
Oracle	database	primary
PostgreSQL	database	primary
Snowflake	database	primary

Connection Requirements

Getting a connection to a datasource to work is fairly straight forward, but there are a few things to keep in mind:

Network connectivity. Make sure that the appropriate firewall openings between the Katalogue backend API service and the datasource are in place.
User permissions. Make sure that there is a user with the right permissions in each datasource that is to be synced. Common for all connectors is that Katalogue only reads metadata from information schema or similar. For security reasons, it is strongly recommended to limit each user to only have read access to the required resources. Each connector require its own connection credentials and permissions, see the page for each connector for specifics.

See the Datasource Syncing section on more details on how to actually create a connection in the GUI.

Primary and Secondary Connectors

The concept of primary and secondary connectors are only relevant when setting up multiple connections in a System that refer to the same database. A scenario for that is to have a Snowflake-based data warehouse which uses dbt for transformations. In such a scenario, it is possible to use only a dbt connection or a Snowflake connection. Using only the dbt connection will create the database structure with schemas, tables and columns with datatypes etc. and some additional dbt-related metadata like the raw, uncompiled dataset definitions. Some metadata, like the number of rows in a table, will not be available though since that metadata does not exist in dbt. The opposite is true when setting up a Snowflake connection, all metadata that only exist in dbt will be missed.

However, having two connections that update the same asset might lead to unwanted behaviour like flip-flop updates of table descriptions (if there are different descriptions in Snowflake and dbt, the description will change on each sync).

Enter the concept of primary and secondary connectors in Katalogue. It is a way to combine metadata from multiple sources in a controlled manner. Primary connectors are dominant when it comes to schema metadata, and secondary connectors are dominant when it comes to descriptions. This means that when there are two connections that refer to the same database, schema-related information like table name, datatypes etc will only be updated by the primary connector, whilst descriptions will only be updated by the secondary connector.

If there is a table that only exist in dbt, it will show up in Katalogue and have the dbt connection as owner “under the hood”, but the asset will be “claimed” by the Snowflake connection if/when it appears in Snowflake. If the opposite is true, a table exists only in Snowflake and not in dbt, the dbt-related metadata will be added to the asset once it appear in dbt, but the Snowflake connection will continue to be the owner of the asset.

This property is only visible to users as a filter option in the GUI. It is possible to filter on “Connector” to find assets that only exist in one or the other connection, or both. This way, it is possible to find e.g. “dead” tables in Snowflake where its counterpart has been removed from dbt.

Supported combinations of connections are these:

One primary connection
One secondary connection
One primary and one secondary connection

Eventhough it is technically possible to set up multiple primary connection, one primary connector and multiple secondary connections etc, it will probably result in unexpected behaviour.

Disabling a Connector

On some occasions, it might be useful to completely disable a connector in Katalogue. The primary use case for this is to simplify installation in environments where the connector is never going to be used.

Disabling a connector has the following effects:

The Nodejs package(s) used for the connector will not be loaded (“required”) by the backend API service
The connector cannot be selected when adding/editing a connection in the GUI
Previously created Datasource sync tasks for any connections that use the connector will fail with an error which says that the connector is disabled

Disabling a connector DOES NOT affect this (hence, this must be handled manually to completely disable the connector):

/api/package.json, meaning that running npm install will still try to install the Nodejs package(s) used for the connector
Any other connector-specific steps in the /api/Dockerfile

To disable a connector, set the <CONNECTOR_CODE>_IS_ENABLED config parameter to false, either as an environment variable or in the config file. E.g. to disable the Oracle connector, add the key config.connectors.oracle.ORACLE_IS_ENABLED=false to the config file.