The Extraction and Imputation Module of T-Rx package aims to transform raw prescription records into clean prescription records with ready-to-analyse data frames.
In a standard extraction & imputation pipeline, it consists of the following functions:
Figure 1: Overview of the Extraction & Imputation Module
Extraction
The algorithm utilizes regular expression (REGEX) patterns in prescription records, and extracts strength / dosage, as well as quantity information from prescription records.
In this module, users are required to specify:
Using these information, the numbers preceding user-specified dosage units / dosage forms are extracted as dose and quantity respectively.
Figure 2: Regular expression pattern matching in the Extraction Module
Imputation
The presence of missingness is common in prescription records (such as dosage and quantity for example).
Besides, pharmaceutical products (especially combination products) were often coded as brand names instead of active ingredients, occasionally without the presence of dosage information.
Figure 3: Overview in Imputation Module (using dose as example)
The imputation functions infer dosage / quantity information from cleaned prescription records (as outputs from extraction functions or run independently), with most commonly occurring dosage / quantity being assigned.
Please post questions as an issue on the T-Rx GitHub repo here.
The T-Rx package is currently under beta testing. Most functions should have adequate documentation on possible errors.
Please kindly reach out to Chris Lo (chris.lowh@kcl.ac.uk) for feedback on documentation.