Converters - EDS-NLP #446

2025-09-12T16:51:30Z

giscus[bot]
bot Sep 12, 2025

Converters - EDS-NLP

https://aphp.github.io/edsnlp/latest/data/converters/

lucas-sedran · 2025-09-12T16:51:31Z

lucas-sedran
Sep 12, 2025 — with giscus

Add columns in input and output

By default, when taking a dataframe as input (for example saved in .parquet), only the column note_datetime will be taken into account in addition to the column containing the text: note_text.
If we want to keep other columns when retrieving the data, we can use doc_attributes such as:

docs = edsnlp.data.read_parquet(DATA_PATH, 
                                converter="omop",
                                doc_attributes=["note_id", "person_id", "note_type", "note_datetime"])

These columns can then be used during the pipelines that we apply to our data.

If we then want to retrieve columns as output, for example note_id and person_id, we have 2 options (in the case where we want to apply the pipelines and then save the data in parquet):

Either we reuse doc_attributes when using edsnlp.data.write_parquet():

edsnlp.data.write_parquet(docs,
                          SAVE_PATH,
                          doc_attributes=["note_id", "person_id"]
                           )

Or we use a custom converter, for example "get_entities()" such as:

def get_entities(doc):
    entities = []

    for ent in doc.ents:
        d = dict(note_id=ent.doc._.note_id,
                 person_id=ent.doc._.person_id,
                 label=ent.label_,
                 sentence=ent.sent.text
                )

Then:

edsnlp.data.write_parquet(docs,
                          SAVE_PATH,
                          converter=get_entities,
                          )

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Converters - EDS-NLP #446

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Converters - EDS-NLP #446

Uh oh!

giscus[bot] bot Sep 12, 2025

Converters - EDS-NLP

Replies: 1 comment

Uh oh!

lucas-sedran Sep 12, 2025 — with giscus

giscus[bot]
bot Sep 12, 2025

lucas-sedran
Sep 12, 2025 — with giscus