Redact API

TonicTextual Redact Class

Dataset Class

class tonic_textual.classes.dataset.Dataset(id: str, name: str, files: List[Dict[str, Any]], client: HttpClient)

Class to represent and provide access to a Tonic Textual dataset.

Parameters:

id (str) – Dataset identifier.
name (str) – Dataset name.
files (Dict) – Serialized DatasetFile objects representing the files in a dataset.
client (HttpClient) – The HTTP client to use.

describe() → str

Returns a string of the dataset name, identifier, and the list of files.

Examples

>>> workspace.describe()
Dataset: your_dataset_name [dataset_id]
Number of Files: 2
Number of Rows: 1000

fetch_all_df()

Fetches all of the data in the dataset as a pandas dataframe.

Returns:: Dataset data in a pandas dataframe.
Return type:: pd.DataFrame

fetch_all_json() → str

Fetches all of the data in the dataset as JSON.

Returns:: Dataset data in JSON format.
Return type:: str

get_failed_files() → List[DatasetFile]

Gets all of the files in dataset that encountered an error when they were processed. These files are effectively ignored.

Returns:: The list of files that had processing errors.
Return type:: List[DatasetFile]

get_processed_files() → List[DatasetFile]

Gets all of the files in the dataset for which processing is complete. The data in these files is returned when data is requested.

Returns:: The list of processed dataset files.
Return type:: List[DatasetFile]

get_queued_files() → List[DatasetFile]

Gets all of the files in the dataset that are waiting to be processed.

Returns:: The list of dataset files that await processing.
Return type:: List[DatasetFile]

get_running_files() → List[DatasetFile]

Gets all of the files in the dataset that are currently being processed.

Returns:: The list of files that are being processed.
Return type:: List[DatasetFile]

reset()

upload_then_add_file(file_path: str, file_name: str | None = None)

Uploads a file to the dataset.

Parameters:

file_path (str) – The absolute path of the file to upload.
file_name (str) – The name of the file to save to Tonic Textual.

Raises:

DatasetFileMatchesExistingFile – Returned if the file content matches an existing file.

DatasetFile Class

class tonic_textual.classes.datasetfile.DatasetFile(id: str, name: str, num_rows: int | None, num_columns: int, processing_status: str, processing_error: str | None)

Class to store the metadata for a dataset file.

Parameters:

id (str) – The identifier of the dataset file.
name (str) – The file name of the dataset file.
num_rows (long) – The number of rows in the dataset file.
num_columns (int) – The number of columns in the dataset file.
processing_status (string) – The status of the dataset file in the processing pipeline. Possible values are ‘Completed’, ‘Failed’, ‘Cancelled’, ‘Running’, and ‘Queued’.
processing_error (string) – If the dataset file processing failed, a description of the issue that caused the failure.
uploaded_timestamp (str) – Timestamp in UTC when dataset file was uploaded to the dataset.

describe() → str: Returns the dataset file metadata as string. Includes the identifier, file name, number of rows, and number of columns.

Redaction Response

class tonic_textual.classes.redact_api_responses.redaction_response.RedactionResponse(original_text: str, redacted_text: str, usage: int, de_identify_results: List[Replacement])

Redaction response object

Variables:

original_text (str) – The original text
redacted_text (str) – The redacted/synthesized text
usage (int) – The number of words used
de_identify_results (List[SingleDetectionResult]) – The list of named entities found in original_text

class tonic_textual.classes.common_api_responses.single_detection_result.SingleDetectionResult(start: int, end: int, label: str, text: str, score: float, json_path: str | None = None)

A span of text that has been detected as a named entity.

Variables:

start (int) – The start index of the entity in the original text
end (int) – The end index of the entity in the original text. The end index is exclusive.
label (str) – The label of the entity
text (str) – The substring of the original text that was detected as an entity
score (float) – The confidence score of the detection
json_path (Optional[str]) – The JSON path of the entity in the original JSON document. This is only present if the input text was a JSON document.

Overview