AI COMMONS HEALTH & WELLBEING HACKATHON SOLUTIONS

OVERVIEW

The AI Commons Project is a proof of concept of a new methodology of developing Artificial Intelligence solutions that allows anyone, anywhere to benefit from the possibilities that AI can provide. The project aims to increase/improve the accessibility, reproducibility, contextualization and enhancement of Artificial Intelligence solutions globally and especially in emerging markets.

The project aims to demonstrate how a global community of AI experts can learn and co-create mutually beneficial solutions with the opportunity for cross-county incremental enhancement.

NAMED ENTITY RECOGNITION(NER)

STATEMENT OF PURPOSE

INTRODUCTION

PROBLEM DEFINITION

Conversations on social media are known to be casual, informal and open across several topics including medical related topics. Even though social media are good sources of information, its informal nature and the use of pseudonyms have made information extraction difficult for medical related applications.

This problem is faced by everyone building a solution that requires medical(clinical) data from social media.

SOLUTION

This solution extracts medical information(entity) such as symptoms, diseases, drugs, organisms and others medical related entities from social media text which can be used for NLP applications such as information extraction, summarization, and data mining.

The output of the solution is an highlight/a list of all health related words and the class of information(entity) present in a given text.
The solution was trained to identify 14 entities namely:
PERSON (Any Human)
SYMPTOM (Symptom of any disease)
MEDICAL FIELD (Medical speciality)
DRUG (Medicinal product)
FOOD (Edible and source of Nutrients)
DOSAGE (Dosage of Medication)
BODY PART(Part of the body)
PLACE (Location, Town, City)
MEDICAL PROCEDURE (Medical Procedure and processes)
DISEASE (Illnesses)
ORGANISM (Causative organism or disease vector)
INJURY (Breakage in skin continuity)
PHYSIOLOGIC PROCESS (Biological Processes)
ADVERSE REACTION (Unintended consequences of medication or food).

Health practitioners
Data scientists/machine learning engineers
Social media users.

Increasing the number of labeled entities
Enriching the data source and format from more diverse data sources
.

USAGE

The intended use is extract medical information from social media text.

Builders of AI solutions in health.

A user feeds the model with a text and the text is returned on the screen with all the medical related entities highlighted with the class entity (such as Person, Symptom, Drug etc.) on the screen.

The solution can be made to read user’s incoming text automatically and return a notification appropriately.

DOMAIN AND APPLICATIONS

DATASET

COMPOSITION

A total of 12029 text documents were used for training and evaluation.

Yes, the dataset is self contained. Though it was scraped online, it doesn’t rely on the external sources from which it was gotten in order to be used. The collected dataset is constant as it is captured with reference to the date and time it was scraped. There are no restrictions such as licenses or fees of any kind assiotiated with any of the external resources.

COLLECTION PROCESS

PREPROCESSING/CLEANING/
LABELLING

USES

MAINTENANCE

A message can be sent by filling the form on https://backup.datasciencenigeria.org/contact-us/

DATASET PUBLICLY AVAILABLE

MODEL

MODEL DETAILS

Model date: 2019. The data was trained using the spaCy , an open-source software library for advanced NLP using all default training parameters.

The health related entities in the data scrapped from socia media were labelled using the TagEditor(v1.5) annotation tool. The annotated Data was converted to spaCy gold format then data format was confirmed using the spaCy command line debugger “!python -m spacy debug-data en”

SAFETY

GENERAL

EXPLAINABILITY

FAIRNESS

CONCEPT DRIFT

This project is brought to you by

Copyright © 2020 Data Science Nigeria.