Widespread use of the electronic chart system has resulted in the accumulation of an unprecedented amount of medical data, which provide a platform for conducting large-scale statistical studies and developing new point-of-care medical support systems. To increase the usefulness of available electronic records, those records should be processed using NLP techniques for de-identification, extraction of medical information (e.g., dates, diseases, medications), and normalization. To assist in research that uses electronic healthcare data, we have created and distributed a database of hypothetical patient records for research and educational purposes.

Hypothetical Patient Narrative Corpus

    This database provides a body of hypothetical patient narrative text, developed by the Project for the Establishment and Implementation of an Educational Electronic Chart System and Database for the Training of Paramedical Staff Members, a project funded under the Strategic Networking Support Program for Enhancing Higher Education Initiative. The patient narrative refers to free-text clinical documentation on the patient's conditions, written by a healthcare provider (e.g., physician, nurse). Individuals interested in this corpus are invited to visit the website of the non-profit organization Gengo-Shigen-Kyokai (GSK) for more details.
    Description This database provides annotated patient narrative text developed by the Project for the Establishment and Implementation of an Educational Electronic Chart System and Database for the Training of Paramedical Staff Members. For more details, please refer to the User Manual.
    Creator Eiji Aramaki, (Educational Electronic Chart Joint Use Council, Kyoto University Design School, the University of Tokyo Center for Knowledge Structuring)
    Sample Samples are available.

MedNLP-2 Medical Chart Corpus

    This corpus contains the Japanese-language electronic chart text distributed at the MedNLP-2 Shared Task Meeting. Individuals who are interested in clinical NLP, who hope to take a look the corpus, or who wish to investigate whether their electronic system is compatible with others are encouraged to apply for a copy of the corpus. For more information, please visit the NTCIR website.
    Description This electronic corpus provides a basis for testing and evaluating clinical NLP techniques. The recent rapid expansion of the electronic chart system has provided a basis for processing medical information in a much more efficient and more extensive manner than when the paper medical chart was dominant. Raw patient data should be processed using the NLP techniques for de-identification and data extraction (e.g., diseases, medications), while efficient processing necessitates the standardization of disease, medication codes, and data reporting formats. This corpus contains annotations on diseases, times, and other factual data and helps evaluate the effectiveness of different Japanese-language data extraction methods.
    Creator MedNLP Secretariat
    Contents (data size) 82 documents (172KB)
    Sample A 64-year-old male factory worker.
    <t>Around August 2, 2025 (5 days before visit),</t> the patient began to experience <c icd="R104">abdominal pain</c>, <c icd="R630">anorexia</c>, <c icd="R11_">nausea</c> and <c icd="R11_">vomiting</c>.
    The body trunk surface temperature was normal,<c icd="R579">and the patient presented with shock</c>.
    No apparent <c icd="G839" modality="negation">motor paralysis </c> was observed.
    On the next day, <c icd="R402">disturbance of consciousness</c>developed, <c icd="N289">and renal impairment</c> worsened. <t>At 18:10, August 9,</t> <c icd="I469">cardiopulmonary arrest</c>was noted.
    <t>At 21:44, August 9,</t> <c icd="R99_"> death was confirmed</c>.

NAIST-ARS Corpus Guideline

Annotation Guideline for the Twitter Surveillance System

Japanese Elders' Narrative Corpus

Narrative Corpus of Japanese Elders, JELiCo (Japanese Elders' Language Index Corpus)

    This corpus has been prepared to assist research on elderly narratives. This database consists of written text and speech transcription. It also contains the results of a cognitive test (Hasegawa Dementia Scale).
    Description: This database consists of written text and speech transcription, collected to enrich the elderly narrative corpus. Specifically, speech (mean duration: 20 minutes) and written text (mean length: 500 characters) were collected from 30 elderly persons (mean age: 78 years, including 7 individuals with mild cognitive impairment).
    Creator MedNLP Secretariat
    Price Handling and shipping fees
    Language Japanese
    Format mp3, UTF-8
    Sample Voice samples (in preparation), text samples (in preparation).
    Individuals who hope to obtain one or more of the resources mentioned above are requested to complete the User Request and Declaration Form, and send a PDF copy of the filled-out form by e-mail to the MedNLP Secretariat. The requested data will be transmitted via an email or by a CD-R. Sending of a CD-R will require prepayment of handling and shipping fees. Upon receipt of a request for a CD-R, the Secretariat will send the requester an invoice. The shipping destination must be specified at the time of the request. Individuals who wish to use any of the above databases for purposes other than those related to education or research are encouraged to consult the MedNLP Secretariat for terms and conditions.
    [English page]


Electronic Chart Analysis Tool

Clinical NLP Application Plus an Excel Add-in

    This program allows the user to extract date and disease data and to automatically code the extracted data.
Instructions for Use
    In the text area (B in the figure below), enter the text to analyze. Then, press the Execution button (A) to start the analysis. The results will be shown in the Results area (C).
    Please click the link here to download.


Measuring Speech-Language Ability

This NLP-based smartphone application helps assess the speech-language ability of the user who speaks into the phone. It evaluates the user's cognitive speech abilities in 4 dimensions on a 4-point scale.

  • A simple and ready-to-use design allows for immediate measurement and diagnosis.
  • Cognitive-linguistic abilities are classified on a 4-point scale in terms of redundancy, working and potential vocabulary size, word complexity, and standard usage.


Please visit the to download.


Creating an Interactive and Georeferenced Map of Kyoto

Kyoto University School of Design has several times hosted an event entitled "100 Walkers to Create a Map of Kyoto," in which 100 participants walked around the city of Kyoto to enter georeferenced comments and observations. The interactive and georeferenced map of Kyoto thus created was accessible only to the participants until recently. The map-based application is now open to anybody interested in adding their comments to personalize the map.


Please visit the to download.


Onomatopoeia Hoichi (2013) Homunculus Genkyu (2012)
Image of a human figure visualizing the association between common Japanese onomatopoeias and body locations. Image of a human figure representing the frequencies of reference to different body parts made on the Web.
[PDF Download] [JPG Download] [Color download] [B&W download]