Widespread use of the electronic chart system has resulted in the accumulation of an unprecedented amount of medical data, which provide a platform for conducting large-scale statistical studies and developing new point-of-care medical support systems. To increase the usefulness of available electronic records, those records should be processed using NLP techniques for de-identification, extraction of medical information (e.g., dates, diseases, medications), and normalization. To assist in research that uses electronic healthcare data, we have created and distributed a database of hypothetical patient records for research and educational purposes.
Hypothetical Patient Narrative Corpus
This database provides a body of hypothetical patient narrative text, developed by the Project for the Establishment and Implementation of an Educational Electronic Chart System and Database for the Training of Paramedical Staff Members, a project funded under the Strategic Networking Support Program for Enhancing Higher Education Initiative.
The patient narrative refers to free-text clinical documentation on the patient's conditions, written by a healthcare provider (e.g., physician, nurse).
Individuals interested in this corpus are invited to visit the website of the non-profit organization Gengo-Shigen-Kyokai (GSK) for more details.
|Description||This database provides annotated patient narrative text developed by the Project for the Establishment and Implementation of an Educational Electronic Chart System and Database for the Training of Paramedical Staff Members. For more details, please refer to the User Manual.|
|Creator||Eiji Aramaki, (Educational Electronic Chart Joint Use Council, Kyoto University Design School, the University of Tokyo Center for Knowledge Structuring)|
|Sample||Samples are available.|
MedNLP-2 Medical Chart Corpus
This corpus contains the Japanese-language electronic chart text distributed at the MedNLP-2 Shared Task Meeting.
Individuals who are interested in clinical NLP, who hope to take a look the corpus, or who wish to investigate whether their electronic system is compatible with others are encouraged to apply for a copy of the corpus.
For more information, please visit
the NTCIR website.
|Description||This electronic corpus provides a basis for testing and evaluating clinical NLP techniques. The recent rapid expansion of the electronic chart system has provided a basis for processing medical information in a much more efficient and more extensive manner than when the paper medical chart was dominant. Raw patient data should be processed using the NLP techniques for de-identification and data extraction (e.g., diseases, medications), while efficient processing necessitates the standardization of disease, medication codes, and data reporting formats. This corpus contains annotations on diseases, times, and other factual data and helps evaluate the effectiveness of different Japanese-language data extraction methods.|
|Contents (data size)||82 documents (172KB)|
A 64-year-old male factory worker.
<t>Around August 2, 2025 (5 days before visit),</t> the patient began to experience <c icd="R104">abdominal pain</c>, <c icd="R630">anorexia</c>, <c icd="R11_">nausea</c> and <c icd="R11_">vomiting</c>.
The body trunk surface temperature was normal,<c icd="R579">and the patient presented with shock</c>.
No apparent <c icd="G839" modality="negation">motor paralysis </c> was observed.
On the next day, <c icd="R402">disturbance of consciousness</c>developed, <c icd="N289">and renal impairment</c> worsened. <t>At 18:10, August 9,</t> <c icd="I469">cardiopulmonary arrest</c>was noted.
<t>At 21:44, August 9,</t> <c icd="R99_"> death was confirmed</c>.
NAIST-ARS Corpus Guideline
Annotation Guideline for the Twitter Surveillance System
Aramaki, Eiji; Wakamiya, Shoko (2016): NAIST-ARS Guideline Ver. 1 (in Japanese). figshare.
Japanese Elders' Narrative Corpus
Narrative Corpus of Japanese Elders, JELiCo (Japanese Elders' Language Index Corpus)
This corpus has been prepared to assist research on elderly narratives.
This database consists of written text and speech transcription.
It also contains the results of a cognitive test (Hasegawa Dementia Scale).
|Description:||This database consists of written text and speech transcription, collected to enrich the elderly narrative corpus. Specifically, speech (mean duration: 20 minutes) and written text (mean length: 500 characters) were collected from 30 elderly persons (mean age: 78 years, including 7 individuals with mild cognitive impairment).|
|Price||Handling and shipping fees|
|Sample||Voice samples (in preparation), text samples (in preparation).|
Electronic Chart Analysis Tool
Clinical NLP Application Plus an Excel Add-in
This program allows the user to extract date and disease data and to automatically code the extracted data.
Instructions for Use
In the text area (B in the figure below), enter the text to analyze.
Then, press the Execution button (A) to start the analysis.
The results will be shown in the Results area (C).
Please click the link here to download.
Measuring Speech-Language Ability
This NLP-based smartphone application helps assess the speech-language ability of the user who speaks into the phone. It evaluates the user's cognitive speech abilities in 4 dimensions on a 4-point scale.
- A simple and ready-to-use design allows for immediate measurement and diagnosis.
- Cognitive-linguistic abilities are classified on a 4-point scale in terms of redundancy, working and potential vocabulary size, word complexity, and standard usage.
Creating an Interactive and Georeferenced Map of Kyoto
Kyoto University School of Design has several times hosted an event entitled "100 Walkers to Create a Map of Kyoto," in which 100 participants walked around the city of Kyoto to enter georeferenced comments and observations. The interactive and georeferenced map of Kyoto thus created was accessible only to the participants until recently. The map-based application is now open to anybody interested in adding their comments to personalize the map.
|Onomatopoeia Hoichi (2013)||Homunculus Genkyu (2012)|
|Image of a human figure visualizing the association between common Japanese onomatopoeias and body locations.||Image of a human figure representing the frequencies of reference to different body parts made on the Web.|
|[PDF Download] [JPG Download]||[Color download] [B&W download]|