File

The File connector is a generic connector that can be used to bulk-create datasources with corresponding metadata using a CSV, JSON, or XSD file. Use this connector when direct datasource network connectivity is not an option, or to quickly onboard exported schema snapshots.

The file can either be manually uploaded or referred to using a URL (if it is hosted and accessible over HTTP).

CSV

The connector accepts a UTF-8 encoded delimited text file using either comma (,) or semicolon (;) as the delimiter; choose one delimiter and use it consistently throughout the file. Note that fields that include the delimiter must be enclosed in double quotes.

The first row must be a header row, where the values represent the field names specified in the Custom Import Query and Custom Relationship Import Query. In addition, there is a column parent_field_path that accepts the full hierarchical field path within the same dataset, e.g. <field_name_1>::<field_name_2>. Any additional columns will be ignored.

JSON

The connector accepts a JSON file using the same schema as a system export returned by the Katalogue REST API.

That is, the JSON file should describe a system with a list of associated datasources. The JSON file must have at least one datasource.

// REST API system export (`/api/system/export/<system_id>`)
// System declaration
{
  "system_id": "...",
  "datasources": [] // required
}

Datasources

Each datasource is processed for its name (required), description (optional), and associated dataset groups (required).

// Datasource declaration
{
  "datasource_id": "...",
  "datasource_name": "...", // required
  "datasource_description": "...", // optional
  "dataset_groups": [] // required
}

Dataset Groups

Each dataset group is described by a name (required), description (optional), and associated datasets (required).

// Dataset group declaration
{
  "dataset_group_id": "...",
  "dataset_group_name": "...", // required
  "dataset_group_description": "...", // optional
  "datasets": [] // required
}

Datasets

Each dataset is described by a name (required), description (optional), type (optional), and associated fields.

// Dataset declaration
{
  "dataset_id": "...",
  "dataset_name": "...", // required
  "dataset_type_name": "...", // defaults to `table`
  "dataset_description": "...", // optional
  "fields": [] // required
}

Fields

Each field within a dataset is described by its name, type, and optional metadata.

// Field declaration
{
  "field_id": 1,                      // optional — used as a stable reference for parent_field_id and relationships
  "parent_field_id": null,            // optional — field_id of the parent field (for nested/hierarchical fields)
  "field_name": "...",                // required
  "field_source_description": "...",  // optional
  "field_datatype": "...",            // optional
  "field_datatype_length": null,      // optional
  "field_datatype_precision": null,   // optional
  "field_datatype_scale": null,       // optional
  "field_is_primary_key": false,      // optional
  "field_is_nullable": true,          // optional
  "field_default_value": null,        // optional
  "field_ordinal_position": null,     // optional
  "field_to_relationships": []        // optional — see below
}

Nested fields

Fields can be nested by setting parent_field_id to the field_id of another field in the same file. The matching happens after all fields from the import have been written to the database, so the parent does not need to pre-exist — it just needs to be present in the same import file.

field_id can be any unique string or integer within the file; it does not need to be a database ID.

Relationships

field_to_relationships is an array of outgoing field-level relationships from this field to fields in other datasets. Each entry must identify the target fully:

{
  "to_datasource_name": "...",       // required
  "to_dataset_group_name": "...",    // required
  "to_dataset_name": "...",          // required
  "to_field_name": "...",            // required
  "relationship_name": "...",        // optional — auto-generated if omitted
  "relationship_ordinal_position": 1 // optional — defaults to 1
}

XSD (experimental)

The connector accepts a standard XML Schema Definition (XSD) file and imports it as a single dataset. Unlike CSV and JSON, the XSD format is not Katalogue-specific — any well-formed XSD can be used as-is.

Connection Configuration

Because an XSD file describes a single dataset, the datasource and dataset group it belongs to must be specified explicitly in the connection form:

Connection field	Purpose
File Type	Set to `XSD`
Import File URL	URL or local path to the XSD file. Also used as the dataset name.
Datasource Name	Name of the datasource the dataset will be placed under.
Dataset Group Name	Name of the dataset group the dataset will be placed under.

How Fields Are Extracted

The parser traverses the XSD element tree recursively and produces one Katalogue field for each xs:element and xs:attribute it encounters. Nested elements produce a hierarchy: a child element b inside a parent element a appears with field_name = b and its parent field path set to a. Attributes are included with an @ prefix (e.g. @id).

The following XSD constructs are supported and expanded inline:

xs:sequence, xs:all, xs:choice
xs:group and xs:attributeGroup (resolved by reference)
Named and inline xs:complexType and xs:simpleType
xs:complexContent and xs:simpleContent with xs:extension and xs:restriction
Circular type references are detected and skipped

Nullability

A field is marked nullable when its minOccurs attribute is "0". Fields without a minOccurs attribute are treated as non-nullable.

Field Descriptions

Descriptions are extracted from xs:annotation/xs:documentation elements. As a convenience, XML comments () placed on the line immediately before an xs:element, xs:complexType, xs:simpleType, or xs:attribute tag are automatically treated as documentation.

If the field’s type is a xs:simpleType with enumerated values, those values are appended to the description: Enums: val1, val2, val3.

<!-- This comment becomes the field description -->
<xs:element name="status" type="StatusType"/>

<xs:simpleType name="StatusType">
  <xs:restriction base="xs:string">
    <xs:enumeration value="active"/>
    <xs:enumeration value="inactive"/>
  </xs:restriction>
</xs:simpleType>
<!-- Result: field_description = "This comment becomes the field description Enums: active, inactive" -->

Limitations

One XSD file imports exactly one dataset. To import multiple datasets, create one connection per XSD file.
Relationships are not extracted from XSD files.
The dataset type is always set to table and the datasource type to Folder.

Template Files

Here are a few downloadable example datasources expressed in different import formats to be used as template files.

Sample database with relationships [CSV, JSON]
Hierarchial dataset with nested fields [CSV]

Filtering

The full file is read first, then datasource filters are applied within Katalogue before import. See Datasource Filters.

Limitations

This connector only imports structural metadata (such as datasources, dataset groups, datasets, fields, and their basic descriptions) as well as relationships between datasets on field level. It does not import lineage, associated field descriptions, custom attributes, or owners.
Use JSON instead of CSV if one and the same field has multiple relationships.

Validations

Import is all-or-nothing. If any record in the file fails validation, the entire import is rejected and no data is written.

Format Checks

Format	Checks
CSV	Delimiter must be `,` or `;`. The header row must contain all required columns (see below). Quoted fields must have a matching closing quote.
JSON	File must be a JSON object (`{...}`). Must contain at least one datasource. Must produce at least one field record.
XSD	File must have an XML declaration (`<?xml version="1.0" encoding="..."?>`). Must have an `<xs:schema>` root element.

Required Fields Per Record

The following fields are required and must be non-empty for every record in the file, regardless of format. Validation fails immediately if any of these are missing.

Field	CSV column	JSON field
Datasource name	`datasource_name`	`datasource_name`
Dataset group name	`dataset_group_name`	`dataset_group_name`
Dataset name	`dataset_name`	`dataset_name`
Dataset type	`dataset_type_name`	`dataset_type_name`
Field name	`field_name`	`field_name`
Data type	`datatype_name`	`field_datatype`

Controlled Vocabulary Fields

The following fields must match a value that exists in the Katalogue database (case-insensitive). If the value is not found, the import fails with an error listing the allowed values.

Field	Allowed values	Default if omitted
`datasource_type_name`	Values from the Datasource Types list (e.g. `Database`, `Folder`, `API`, …)	`Other`
`dataset_type_name`	Values from the Dataset Types list (e.g. `table`, `view`, …)	(none — required)

Parent Field References

CSV: parent_field_path must match the full path of an existing field in the same file. Paths are built by joining ancestor field names with :: (e.g. root::child). A reference to a non-existent parent fails the import.

JSON: parent_field_id is resolved after the import by matching it against field_id values from the same file. There is no pre-import validation — an unresolvable reference is silently ignored rather than causing the import to fail.

Relationship References

For CSV and JSON, if a relationship target is partially specified the import will fail. All four target fields must be present together:

to_datasource_name
to_dataset_group_name
to_dataset_name
to_field_name