Redact API

TonicTextual Redact Class

Dataset Class

class tonic_textual.classes.dataset.Dataset(id: str, name: str, files: List[Dict[str, Any]], client: HttpClient)

Class to represent and provide access to a Tonic Textual dataset.

Parameters:
  • id (str) – Dataset identifier.

  • name (str) – Dataset name.

  • files (Dict) – Serialized DatasetFile objects representing the files in a dataset.

  • client (HttpClient) – The HTTP client to use.

describe() str

Returns a string of the dataset name, identifier, and the list of files.

Examples

>>> workspace.describe()
Dataset: your_dataset_name [dataset_id]
Number of Files: 2
Number of Rows: 1000
fetch_all_df()

Fetches all of the data in the dataset as a pandas dataframe.

Returns:

Dataset data in a pandas dataframe.

Return type:

pd.DataFrame

fetch_all_json() str

Fetches all of the data in the dataset as JSON.

Returns:

Dataset data in JSON format.

Return type:

str

get_failed_files() List[DatasetFile]

Gets all of the files in dataset that encountered an error when they were processed. These files are effectively ignored.

Returns:

The list of files that had processing errors.

Return type:

List[DatasetFile]

get_processed_files() List[DatasetFile]

Gets all of the files in the dataset for which processing is complete. The data in these files is returned when data is requested.

Returns:

The list of processed dataset files.

Return type:

List[DatasetFile]

get_queued_files() List[DatasetFile]

Gets all of the files in the dataset that are waiting to be processed.

Returns:

The list of dataset files that await processing.

Return type:

List[DatasetFile]

get_running_files() List[DatasetFile]

Gets all of the files in the dataset that are currently being processed.

Returns:

The list of files that are being processed.

Return type:

List[DatasetFile]

reset()
upload_then_add_file(file_path: str, file_name: str | None = None)

Uploads a file to the dataset.

Parameters:
  • file_path (str) – The absolute path of the file to upload.

  • file_name (str) – The name of the file to save to Tonic Textual.

Raises:

DatasetFileMatchesExistingFile – Returned if the file content matches an existing file.

DatasetFile Class

class tonic_textual.classes.datasetfile.DatasetFile(id: str, name: str, num_rows: int | None, num_columns: int, processing_status: str, processing_error: str | None)

Class to store the metadata for a dataset file.

Parameters:
  • id (str) – The identifier of the dataset file.

  • name (str) – The file name of the dataset file.

  • num_rows (long) – The number of rows in the dataset file.

  • num_columns (int) – The number of columns in the dataset file.

  • processing_status (string) – The status of the dataset file in the processing pipeline. Possible values are ‘Completed’, ‘Failed’, ‘Cancelled’, ‘Running’, and ‘Queued’.

  • processing_error (string) – If the dataset file processing failed, a description of the issue that caused the failure.

  • uploaded_timestamp (str) – Timestamp in UTC when dataset file was uploaded to the dataset.

describe() str

Returns the dataset file metadata as string. Includes the identifier, file name, number of rows, and number of columns.

Redaction Response

class tonic_textual.classes.redact_api_responses.redaction_response.RedactionResponse(original_text: str, redacted_text: str, usage: int, de_identify_results: List[Replacement])

Redaction response object

Variables:
  • original_text (str) – The original text

  • redacted_text (str) – The redacted/synthesized text

  • usage (int) – The number of words used

  • de_identify_results (List[SingleDetectionResult]) – The list of named entities found in original_text

class tonic_textual.classes.common_api_responses.single_detection_result.SingleDetectionResult(start: int, end: int, label: str, text: str, score: float, json_path: str | None = None)

A span of text that has been detected as a named entity.

Variables:
  • start (int) – The start index of the entity in the original text

  • end (int) – The end index of the entity in the original text. The end index is exclusive.

  • label (str) – The label of the entity

  • text (str) – The substring of the original text that was detected as an entity

  • score (float) – The confidence score of the detection

  • json_path (Optional[str]) – The JSON path of the entity in the original JSON document. This is only present if the input text was a JSON document.