Overview

Background

PINES (Progressive Inference Networked Episodic Service) is a natural language processing (NLP) package aimed at detecting clinical events in the electronic health record (EHR). This software suite incorporates specialized functions and a dedicated application programming interface (API) designed to facilitate its use as a service integrated with a CEDARS instance, even though it can be used as a standalone tool as well. PINES exists as an open-source Python package under GPL-3 license. The latest package and prior versions can be cloned from GitHub. Full documentation is available here. Please see the Terms of Use before using this software. PINES is provided as-is with no guarantee whatsoever and users agree to be held responsible for compliance with their local government/institutional regulations.

General Requirements

Local installation

  • Python 3.9 or later
  • poetry

Docker installation

  • Docker

Installation

Local

To install the package locally, run the following commands:

git clone https://github.com/CEDARS-NLP/PINES.git
cd PINES
poetry install # this will install all required packages
poetry run python pines.py # this will run the package

Docker

git clone https://github.com/CEDARS-NLP/PINES.git
docker build -t pines-api .
docker run -dp 127.0.0.1:8036:8036 pines-api

Basic Concepts

Input: Clinical Note

Output: Label, Score

We fine tuned the clinical-longformer1 model on our dataset. The clinical-longformer, starting with Longformer checkpoint, was further pre-trained on MIMIC-III dataset. After finetuning, the model is then used to predict the presence of a label in a new clinical note. The model outputs a score which is a measure of the confidence of the model in the prediction.

Note: The trained models are not open-source and are not included in the repository. Please email the authors for access to the trained models.

Model Card

  • VTE Detection Model
Property Value
Model Name vte-longformer-4k-cedars
Model Version 1.0
Model Type Longformer
Context Length 4096
Training Data Internal MSKCC dataset
  • Metastatic Disease Detection Model
Property Value
Model Name mets-longformer-4k-pycedars
Model Version 1.0
Model Type Longformer
Model Size 4k
Training Data Internal MSKCC dataset

Operational Schema

PINES Operational Schema

PINES can be run as a standalone service or as part of a CEDARS deployment. The standalone service can be run as a Docker container or as a local installation.

In all deployments, the service can be accessed via a REST API.

Sample Code

Detection of metastatic disease in a clinical note.

Using Httpie

http POST http://localhost:8036/predict text="The patient had metastates."

Using Curl

curl -X POST "http://localhost:8036/predict" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d "{\"text\":\"The patient had metastates.\"}"

Output

{
    "model": "mets-longformer-4k-pycedars",
    "prediction": {
        "label": "LABEL_1",
        "score": 0.9969003200531006
    }
}

Future Development

We are currently documenting the performance of PINES with a focus on hematology and oncology clinical research. Please communicate with package author Simon Mantha, MD, MPH (smantha@cedars.io) if you want to discuss new features or using this software for your clinical research application.

References


  1. Yikuan Li, Ramsey M Wehbe, Faraz S Ahmad, Hanyin Wang, and Yuan Luo. A comparative study of pretrained language models for long clinical text. Journal of the American Medical Informatics Association, 30(2):340–347, 2023.