Connectors Overview
Katalogue uses Connectors to pull/extract metadata from datasources.
This section lists all connectors in Katalogue and connector-specific information.
| Connector | Category | Connector Type |
|---|---|---|
| Databricks | database | primary |
| dbt | transformation | secondary |
| IBM DB2 | database | primary |
| Microsoft SQL Server | database | primary |
| ODBC | database | primary |
| Oracle | database | primary |
| PostgreSQL | database | primary |
| Snowflake | database | primary |
Connection Requirements
Section titled “Connection Requirements”Getting a connection to a datasource to work is fairly straight forward, but there are a few things to keep in mind:
- Network connectivity. Make sure that the appropriate firewall openings between the Katalogue backend API service and the datasource are in place.
- User permissions. Make sure that there is a user with the right permissions in each datasource that is to be synced. Common for all connectors is that Katalogue only reads metadata from information schema or similar. For security reasons, it is strongly recommended to limit each user to only have read access to the required resources. Each connector require its own connection credentials and permissions, see the page for each connector for specifics.
See the Datasource Syncing section on more details on how to actually create a connection in the GUI.
Primary and Secondary Connectors
Section titled “Primary and Secondary Connectors”The concept of primary and secondary connectors are only relevant when setting up multiple connections in a System that refer to the same database. A scenario for that is to have a Snowflake-based data warehouse which uses dbt for transformations. In such a scenario, it is possible to use only a dbt connection or a Snowflake connection. Using only the dbt connection will create the database structure with schemas, tables and columns with datatypes etc. and some additional dbt-related metadata like the raw, uncompiled dataset definitions. Some metadata, like the number of rows in a table, will not be available though since that metadata does not exist in dbt. The opposite is true when setting up a Snowflake connection, all metadata that only exist in dbt will be missed.
However, having two connections that update the same asset might lead to unwanted behaviour like flip-flop updates of table descriptions (if there are different descriptions in Snowflake and dbt, the description will change on each sync).
Enter the concept of primary and secondary connectors in Katalogue. It is a way to combine metadata from multiple sources in a controlled manner. Primary connectors are dominant when it comes to schema metadata, and secondary connectors are dominant when it comes to descriptions. This means that when there are two connections that refer to the same database, schema-related information like table name, datatypes etc will only be updated by the primary connector, whilst descriptions will only be updated by the secondary connector.
If there is a table that only exist in dbt, it will show up in Katalogue and have the dbt connection as owner “under the hood”, but the asset will be “claimed” by the Snowflake connection if/when it appears in Snowflake. If the opposite is true, a table exists only in Snowflake and not in dbt, the dbt-related metadata will be added to the asset once it appear in dbt, but the Snowflake connection will continue to be the owner of the asset.
This property is only visible to users as a filter option in the GUI. It is possible to filter on “Connector” to find assets that only exist in one or the other connection, or both. This way, it is possible to find e.g. “dead” tables in Snowflake where its counterpart has been removed from dbt.
Supported combinations of connections are these:
- One primary connection
- One secondary connection
- One primary and one secondary connection
Eventhough it is technically possible to set up multiple primary connection, one primary connector and multiple secondary connections etc, it will probably result in unexpected behaviour.
Disabling a Connector
Section titled “Disabling a Connector”On some occasions, it might be useful to completely disable a connector in Katalogue. The primary use case for this is to simplify installation in environments where the connector is never going to be used.
Disabling a connector has the following effects:
- The Nodejs package(s) used for the connector will not be loaded (“required”) by the backend API service
- The connector cannot be selected when adding/editing a connection in the GUI
- Previously created Datasource sync tasks for any connections that use the connector will fail with an error which says that the connector is disabled
Disabling a connector DOES NOT affect this (hence, this must be handled manually to completely disable the connector):
/api/package.json, meaning that runningnpm installwill still try to install the Nodejs package(s) used for the connector- Any other connector-specific steps in the
/api/Dockerfile
To disable a connector, set the <CONNECTOR_CODE>_IS_ENABLED config parameter to false, either as an environment variable or in the config file. E.g. to disable the Oracle connector, add the key config.connectors.oracle.ORACLE_IS_ENABLED=false to the config file.