File
The File connector is a generic connector that can be used to bulk-create datasources with corresponding metadata using a CSV, JSON, or XSD file. Use this connector when direct datasource network connectivity is not an option, or to quickly onboard exported schema snapshots.
The file can either be manually uploaded or referred to using a URL (if it is hosted and accessible over HTTP).
The connector accepts a UTF-8 encoded delimited text file using either comma (,) or semicolon (;) as the delimiter; choose one delimiter and use it consistently throughout the file. Note that fields that include the delimiter must be enclosed in double quotes.
The first row must be a header row, where the values represent the field names specified in the Custom Import Query and Custom Relationship Import Query.
In addition, there is a column parent_field_path that accepts the full hierarchical field path within the same dataset, e.g. <field_name_1>::<field_name_2>. Any additional columns will be ignored.
The connector accepts a JSON file using the same schema as a system export returned by the Katalogue REST API.
That is, the JSON file should describe a system with a list of associated datasources. The JSON file must have at least one datasource.
// REST API system export (`/api/system/export/<system_id>`)// System declaration{ "system_id": "...", "datasources": [] // required}Datasources
Section titled “Datasources”Each datasource is processed for its name (required), description (optional), and associated dataset groups (required).
// Datasource declaration{ "datasource_id": "...", "datasource_name": "...", // required "datasource_description": "...", // optional "dataset_groups": [] // required}Dataset Groups
Section titled “Dataset Groups”Each dataset group is described by a name (required), description (optional), and associated datasets (required).
// Dataset group declaration{ "dataset_group_id": "...", "dataset_group_name": "...", // required "dataset_group_description": "...", // optional "datasets": [] // required}Datasets
Section titled “Datasets”Each dataset is described by a name (required), description (optional), type (optional), and associated fields.
// Dataset declaration{ "dataset_id": "...", "dataset_name": "...", // required "dataset_type_name": "...", // defaults to `table` "dataset_description": "...", // optional "fields": [] // required}Fields
Section titled “Fields”Each field within a dataset is described by its name, type, and optional metadata.
// Field declaration{ "field_id": 1, // optional — used as a stable reference for parent_field_id and relationships "parent_field_id": null, // optional — field_id of the parent field (for nested/hierarchical fields) "field_name": "...", // required "field_source_description": "...", // optional "field_datatype": "...", // optional "field_datatype_length": null, // optional "field_datatype_precision": null, // optional "field_datatype_scale": null, // optional "field_is_primary_key": false, // optional "field_is_nullable": true, // optional "field_default_value": null, // optional "field_ordinal_position": null, // optional "field_to_relationships": [] // optional — see below}Nested fields
Section titled “Nested fields”Fields can be nested by setting parent_field_id to the field_id of another field in the same file. The matching happens after all fields from the import have been written to the database, so the parent does not need to pre-exist — it just needs to be present in the same import file.
field_id can be any unique string or integer within the file; it does not need to be a database ID.
Relationships
Section titled “Relationships”field_to_relationships is an array of outgoing field-level relationships from this field to fields in other datasets. Each entry must identify the target fully:
{ "to_datasource_name": "...", // required "to_dataset_group_name": "...", // required "to_dataset_name": "...", // required "to_field_name": "...", // required "relationship_name": "...", // optional — auto-generated if omitted "relationship_ordinal_position": 1 // optional — defaults to 1}XSD (experimental)
Section titled “XSD (experimental)”The connector accepts a standard XML Schema Definition (XSD) file and imports it as a single dataset. Unlike CSV and JSON, the XSD format is not Katalogue-specific — any well-formed XSD can be used as-is.
Connection Configuration
Section titled “Connection Configuration”Because an XSD file describes a single dataset, the datasource and dataset group it belongs to must be specified explicitly in the connection form:
| Connection field | Purpose |
|---|---|
| File Type | Set to XSD |
| Import File URL | URL or local path to the XSD file. Also used as the dataset name. |
| Datasource Name | Name of the datasource the dataset will be placed under. |
| Dataset Group Name | Name of the dataset group the dataset will be placed under. |
How Fields Are Extracted
Section titled “How Fields Are Extracted”The parser traverses the XSD element tree recursively and produces one Katalogue field for each xs:element and xs:attribute it encounters. Nested elements produce a hierarchy: a child element b inside a parent element a appears with field_name = b and its parent field path set to a. Attributes are included with an @ prefix (e.g. @id).
The following XSD constructs are supported and expanded inline:
xs:sequence,xs:all,xs:choicexs:groupandxs:attributeGroup(resolved by reference)- Named and inline
xs:complexTypeandxs:simpleType xs:complexContentandxs:simpleContentwithxs:extensionandxs:restriction- Circular type references are detected and skipped
Nullability
Section titled “Nullability”A field is marked nullable when its minOccurs attribute is "0". Fields without a minOccurs attribute are treated as non-nullable.
Field Descriptions
Section titled “Field Descriptions”Descriptions are extracted from xs:annotation/xs:documentation elements. As a convenience, XML comments (<!-- ... -->) placed on the line immediately before an xs:element, xs:complexType, xs:simpleType, or xs:attribute tag are automatically treated as documentation.
If the field’s type is a xs:simpleType with enumerated values, those values are appended to the description: Enums: val1, val2, val3.
<!-- This comment becomes the field description --><xs:element name="status" type="StatusType"/>
<xs:simpleType name="StatusType"> <xs:restriction base="xs:string"> <xs:enumeration value="active"/> <xs:enumeration value="inactive"/> </xs:restriction></xs:simpleType><!-- Result: field_description = "This comment becomes the field description Enums: active, inactive" -->Limitations
Section titled “Limitations”- One XSD file imports exactly one dataset. To import multiple datasets, create one connection per XSD file.
- Relationships are not extracted from XSD files.
- The dataset type is always set to
tableand the datasource type toFolder.
Template Files
Section titled “Template Files”Here are a few downloadable example datasources expressed in different import formats to be used as template files.
Limitations
Section titled “Limitations”- This connector only imports structural metadata (such as datasources, dataset groups, datasets, fields, and their basic descriptions) as well as relationships between datasets on field level. It does not import lineage, associated field descriptions, custom attributes, or owners.
- Use JSON instead of CSV if one and the same field has multiple relationships.
Validations
Section titled “Validations”Import is all-or-nothing. If any record in the file fails validation, the entire import is rejected and no data is written.
Format Checks
Section titled “Format Checks”| Format | Checks |
|---|---|
| CSV | Delimiter must be , or ;. The header row must contain all required columns (see below). Quoted fields must have a matching closing quote. |
| JSON | File must be a JSON object ({...}). Must contain at least one datasource. Must produce at least one field record. |
| XSD | File must have an XML declaration (<?xml version="1.0" encoding="..."?>). Must have an <xs:schema> root element. |
Required Fields Per Record
Section titled “Required Fields Per Record”The following fields are required and must be non-empty for every record in the file, regardless of format. Validation fails immediately if any of these are missing.
| Field | CSV column | JSON field |
|---|---|---|
| Datasource name | datasource_name | datasource_name |
| Dataset group name | dataset_group_name | dataset_group_name |
| Dataset name | dataset_name | dataset_name |
| Dataset type | dataset_type_name | dataset_type_name |
| Field name | field_name | field_name |
| Data type | datatype_name | field_datatype |
Controlled Vocabulary Fields
Section titled “Controlled Vocabulary Fields”The following fields must match a value that exists in the Katalogue database (case-insensitive). If the value is not found, the import fails with an error listing the allowed values.
| Field | Allowed values | Default if omitted |
|---|---|---|
datasource_type_name | Values from the Datasource Types list (e.g. Database, Folder, API, …) | Other |
dataset_type_name | Values from the Dataset Types list (e.g. table, view, …) | (none — required) |
Parent Field References
Section titled “Parent Field References”CSV: parent_field_path must match the full path of an existing field in the same file. Paths are built by joining ancestor field names with :: (e.g. root::child). A reference to a non-existent parent fails the import.
JSON: parent_field_id is resolved after the import by matching it against field_id values from the same file. There is no pre-import validation — an unresolvable reference is silently ignored rather than causing the import to fail.
Relationship References
Section titled “Relationship References”For CSV and JSON, if a relationship target is partially specified the import will fail. All four target fields must be present together:
to_datasource_nameto_dataset_group_nameto_dataset_nameto_field_name