Redact API
TonicTextual Redact Class
Dataset Class
- class tonic_textual.classes.dataset.Dataset(id: str, name: str, files: List[Dict[str, Any]], client: HttpClient)
Class to represent and provide access to a Tonic Textual dataset.
- Parameters:
id (str) – Dataset identifier.
name (str) – Dataset name.
files (Dict) – Serialized DatasetFile objects representing the files in a dataset.
client (HttpClient) – The HTTP client to use.
- describe() str
Returns a string of the dataset name, identifier, and the list of files.
Examples
>>> workspace.describe() Dataset: your_dataset_name [dataset_id] Number of Files: 2 Number of Rows: 1000
- fetch_all_df()
Fetches all of the data in the dataset as a pandas dataframe.
- Returns:
Dataset data in a pandas dataframe.
- Return type:
pd.DataFrame
- fetch_all_json() str
Fetches all of the data in the dataset as JSON.
- Returns:
Dataset data in JSON format.
- Return type:
str
- get_failed_files() List[DatasetFile]
Gets all of the files in dataset that encountered an error when they were processed. These files are effectively ignored.
- Returns:
The list of files that had processing errors.
- Return type:
List[DatasetFile]
- get_processed_files() List[DatasetFile]
Gets all of the files in the dataset for which processing is complete. The data in these files is returned when data is requested.
- Returns:
The list of processed dataset files.
- Return type:
List[DatasetFile]
- get_queued_files() List[DatasetFile]
Gets all of the files in the dataset that are waiting to be processed.
- Returns:
The list of dataset files that await processing.
- Return type:
List[DatasetFile]
- get_running_files() List[DatasetFile]
Gets all of the files in the dataset that are currently being processed.
- Returns:
The list of files that are being processed.
- Return type:
List[DatasetFile]
- upload_then_add_file(file_path: str, file_name: str | None = None)
Uploads a file to the dataset.
- Parameters:
file_path (str) – The absolute path of the file to upload.
file_name (str) – The name of the file to save to Tonic Textual.
- Raises:
DatasetFileMatchesExistingFile – Returned if the file content matches an existing file.
DatasetFile Class
- class tonic_textual.classes.datasetfile.DatasetFile(id: str, name: str, num_rows: int | None, num_columns: int, processing_status: str, processing_error: str | None)
Class to store the metadata for a dataset file.
- Parameters:
id (str) – The identifier of the dataset file.
name (str) – The file name of the dataset file.
num_rows (long) – The number of rows in the dataset file.
num_columns (int) – The number of columns in the dataset file.
processing_status (string) – The status of the dataset file in the processing pipeline. Possible values are ‘Completed’, ‘Failed’, ‘Cancelled’, ‘Running’, and ‘Queued’.
processing_error (string) – If the dataset file processing failed, a description of the issue that caused the failure.
uploaded_timestamp (str) – Timestamp in UTC when dataset file was uploaded to the dataset.
Redaction Response
- class tonic_textual.classes.redact_api_responses.redaction_response.RedactionResponse(original_text: str, redacted_text: str, usage: int, de_identify_results: List[Replacement])
Redaction response object
- Variables:
original_text (str) – The original text
redacted_text (str) – The redacted/synthesized text
usage (int) – The number of words used
de_identify_results (List[SingleDetectionResult]) – The list of named entities found in original_text
- class tonic_textual.classes.common_api_responses.single_detection_result.SingleDetectionResult(start: int, end: int, label: str, text: str, score: float, json_path: str | None = None)
A span of text that has been detected as a named entity.
- Variables:
start (int) – The start index of the entity in the original text
end (int) – The end index of the entity in the original text. The end index is exclusive.
label (str) – The label of the entity
text (str) – The substring of the original text that was detected as an entity
score (float) – The confidence score of the detection
json_path (Optional[str]) – The JSON path of the entity in the original JSON document. This is only present if the input text was a JSON document.