📁 Datasets

A dataset is a collection of files that are all redacted and synthesized in the same way. Datasets are a helpful organization tool to ensure that you can easily track a collections of files and how sensitive data is removed from those files.

Datasets are typically configured from the Textual UI, but for ease of use, the SDK also supports many dataset operations. However, some operations can only be performed from the Textual UI.

Creating a dataset

To create a dataset:

from tonic_textual.redact_api import TonicTextual

textual = TonicTextual("https://textual.tonic.ai")

dataset = textual.create_dataset('my_dataset')

Retrieving an existing dataset

To retrieve an existing dataset by the dataset name:

dataset = textual.get_dataset('my_dataset')

Editing a dataset

You can use the SDK to edit a dataset. However, not all properties of the dataset can be edited from the SDK.

The following snippet renames the dataset and disables modification of entities that are tagged as ORGANIZATION.

dataset.edit(name='new_dataset_name', generator_config={'ORGANIZATION': 'Off'})

Uploading files to a dataset

You can upload files to your dataset from the SDK. Provide the complete path to the file, and the complete name of the file as you want it to appear in Textual.

dataset.add_file('<path to file>','<file name>')

Viewing the list of files in a dataset

To get the list of files in a dataset, view the files property of the dataset.

To filter dataset files based on their processing status, call:

  • get_failed_files

  • get_running_files

  • get_queued_files

  • get_processed_files

Downloading a redacted dataset file

To download the redacted or synthesized version of the file, get the specific file from the dataset, then call the download function.

For example:

files = dataset.get_processed_files()
for file in files:
    file_bytes = file.download()
    with open('<file name>', 'wb') as f:
        f.write(file_bytes)

To download a specific file in a dataset that you fetch by name:

file = txt_file = list(filter(lambda x: x.name=='<file to download>', dataset.files))[0]
file_bytes = file.download()
with open('<file name>', 'wb') as f:
    f.write(file_bytes)