Skip to content

File

The File connector is a generic connector that can be used to bulk-create datasources with corresponding metadata using a CSV, JSON, or XSD file. Use this connector when direct datasource network connectivity is not an option, or to quickly onboard exported schema snapshots.

The file can either be manually uploaded or referred to using a URL (if it is hosted and accessible over HTTP).

The connector accepts a UTF-8 encoded delimited text file using either comma (,) or semicolon (;) as the delimiter; choose one delimiter and use it consistently throughout the file. Note that fields that include the delimiter must be enclosed in double quotes.

The first row must be a header row, where the values represent the field names specified in the Custom Import Query and Custom Relationship Import Query. In addition, there is a column parent_field_path that accepts the full hierarchical field path within the same dataset, e.g. <field_name_1>::<field_name_2>. Any additional columns will be ignored.

The connector accepts a JSON file using the same schema as a system export returned by the Katalogue REST API.

That is, the JSON file should describe a system with a list of associated datasources. The JSON file must have at least one datasource.

// REST API system export (`/api/system/export/<system_id>`)
// System declaration
{
"system_id": "...",
"datasources": [] // required
}

Each datasource is processed for its name (required), description (optional), and associated dataset groups (required).

// Datasource declaration
{
"datasource_id": "...",
"datasource_name": "...", // required
"datasource_description": "...", // optional
"dataset_groups": [] // required
}

Each dataset group is described by a name (required), description (optional), and associated datasets (required).

// Dataset group declaration
{
"dataset_group_id": "...",
"dataset_group_name": "...", // required
"dataset_group_description": "...", // optional
"datasets": [] // required
}

Each dataset is described by a name (required), description (optional), type (optional), and associated fields.

// Dataset declaration
{
"dataset_id": "...",
"dataset_name": "...", // required
"dataset_type_name": "...", // defaults to `table`
"dataset_description": "...", // optional
"fields": [] // required
}

Each field within a dataset is described by its name, type, and optional metadata.

// Field declaration
{
"field_id": 1, // optional — used as a stable reference for parent_field_id and relationships
"parent_field_id": null, // optional — field_id of the parent field (for nested/hierarchical fields)
"field_name": "...", // required
"field_source_description": "...", // optional
"field_datatype": "...", // optional
"field_datatype_length": null, // optional
"field_datatype_precision": null, // optional
"field_datatype_scale": null, // optional
"field_is_primary_key": false, // optional
"field_is_nullable": true, // optional
"field_default_value": null, // optional
"field_ordinal_position": null, // optional
"field_to_relationships": [] // optional — see below
}

Fields can be nested by setting parent_field_id to the field_id of another field in the same file. The matching happens after all fields from the import have been written to the database, so the parent does not need to pre-exist — it just needs to be present in the same import file.

field_id can be any unique string or integer within the file; it does not need to be a database ID.

field_to_relationships is an array of outgoing field-level relationships from this field to fields in other datasets. Each entry must identify the target fully:

{
"to_datasource_name": "...", // required
"to_dataset_group_name": "...", // required
"to_dataset_name": "...", // required
"to_field_name": "...", // required
"relationship_name": "...", // optional — auto-generated if omitted
"relationship_ordinal_position": 1 // optional — defaults to 1
}

The connector accepts a standard XML Schema Definition (XSD) file and imports it as a single dataset. Unlike CSV and JSON, the XSD format is not Katalogue-specific — any well-formed XSD can be used as-is.

Because an XSD file describes a single dataset, the datasource and dataset group it belongs to must be specified explicitly in the connection form:

Connection fieldPurpose
File TypeSet to XSD
Import File URLURL or local path to the XSD file. Also used as the dataset name.
Datasource NameName of the datasource the dataset will be placed under.
Dataset Group NameName of the dataset group the dataset will be placed under.

The parser traverses the XSD element tree recursively and produces one Katalogue field for each xs:element and xs:attribute it encounters. Nested elements produce a hierarchy: a child element b inside a parent element a appears with field_name = b and its parent field path set to a. Attributes are included with an @ prefix (e.g. @id).

The following XSD constructs are supported and expanded inline:

  • xs:sequence, xs:all, xs:choice
  • xs:group and xs:attributeGroup (resolved by reference)
  • Named and inline xs:complexType and xs:simpleType
  • xs:complexContent and xs:simpleContent with xs:extension and xs:restriction
  • Circular type references are detected and skipped

A field is marked nullable when its minOccurs attribute is "0". Fields without a minOccurs attribute are treated as non-nullable.

Descriptions are extracted from xs:annotation/xs:documentation elements. As a convenience, XML comments (<!-- ... -->) placed on the line immediately before an xs:element, xs:complexType, xs:simpleType, or xs:attribute tag are automatically treated as documentation.

If the field’s type is a xs:simpleType with enumerated values, those values are appended to the description: Enums: val1, val2, val3.

<!-- This comment becomes the field description -->
<xs:element name="status" type="StatusType"/>
<xs:simpleType name="StatusType">
<xs:restriction base="xs:string">
<xs:enumeration value="active"/>
<xs:enumeration value="inactive"/>
</xs:restriction>
</xs:simpleType>
<!-- Result: field_description = "This comment becomes the field description Enums: active, inactive" -->
  • One XSD file imports exactly one dataset. To import multiple datasets, create one connection per XSD file.
  • Relationships are not extracted from XSD files.
  • The dataset type is always set to table and the datasource type to Folder.

Here are a few downloadable example datasources expressed in different import formats to be used as template files.

  • Sample database with relationships [CSV, JSON]
  • Hierarchial dataset with nested fields [CSV]
  • This connector only imports structural metadata (such as datasources, dataset groups, datasets, fields, and their basic descriptions) as well as relationships between datasets on field level. It does not import lineage, associated field descriptions, custom attributes, or owners.
  • Use JSON instead of CSV if one and the same field has multiple relationships.

Import is all-or-nothing. If any record in the file fails validation, the entire import is rejected and no data is written.

FormatChecks
CSVDelimiter must be , or ;. The header row must contain all required columns (see below). Quoted fields must have a matching closing quote.
JSONFile must be a JSON object ({...}). Must contain at least one datasource. Must produce at least one field record.
XSDFile must have an XML declaration (<?xml version="1.0" encoding="..."?>). Must have an <xs:schema> root element.

The following fields are required and must be non-empty for every record in the file, regardless of format. Validation fails immediately if any of these are missing.

FieldCSV columnJSON field
Datasource namedatasource_namedatasource_name
Dataset group namedataset_group_namedataset_group_name
Dataset namedataset_namedataset_name
Dataset typedataset_type_namedataset_type_name
Field namefield_namefield_name
Data typedatatype_namefield_datatype

The following fields must match a value that exists in the Katalogue database (case-insensitive). If the value is not found, the import fails with an error listing the allowed values.

FieldAllowed valuesDefault if omitted
datasource_type_nameValues from the Datasource Types list (e.g. Database, Folder, API, …)Other
dataset_type_nameValues from the Dataset Types list (e.g. table, view, …)(none — required)

CSV: parent_field_path must match the full path of an existing field in the same file. Paths are built by joining ancestor field names with :: (e.g. root::child). A reference to a non-existent parent fails the import.

JSON: parent_field_id is resolved after the import by matching it against field_id values from the same file. There is no pre-import validation — an unresolvable reference is silently ignored rather than causing the import to fail.

For CSV and JSON, if a relationship target is partially specified the import will fail. All four target fields must be present together:

  • to_datasource_name
  • to_dataset_group_name
  • to_dataset_name
  • to_field_name