Home Project-material A System for Health Document Classification Using Machine Learning

A System for Health Document Classification Using Machine Learning

Dept: COMPUTER SCIENCE File: Word(doc) Chapters: 1-5 Views: 18

Abstract

Due to the massive increase in medical documents every day (including books, journals, blogs, articles, doctors’ instructions and prescriptions, emails from patients, etc.), it is becoming very challenging to handle and to categorize them manually. One of the most challenging projects in information systems is extracting information from unstructured texts, including medical document classification. The discovery of knowledge from medical datasets is important in order to make effective medical diagnosis. Developing a classification algorithm that classifies a medical document by analyzing its content and categorizing it under predefined topics is the primary aim of this research. In this project work we were able to succeed in applying Natural Language Processing which is a branch of Machine Learning to Classifying Health related documents. We made use of the OpenNLP Application Programming Interface which is a Java API for training a model and classifying the doc
CHAPTER ONE

1.0 INTRODUCTIONThis chapter introduces the topic of the project work A System for

Health Document Classification Using Machine Learning. In this

chapter, we will consider the background of the study, statement of the

problem, aims and objectives, methodology used to design the system,

scope of the study, its significance, definition of terms, and we conclude

with the project layout or organization of the project work.1.1 BACKGROUND OF THE STUDY

Contemporarily, most hospitals, medical laboratories and other health

facilities make use of some kind of information system. These could be

either a hospital management system or a pharmacy management

system. Among other functions that these systems provide, they are

mainly used in collecting patient records. These information systems

stores patient records in digital format. Numerous patient data are being

recorded on a daily basis which forms a large data set popularly referred

to as “Big Data”.

Every day physicians and other health workers are required to work with

this “Big Data” in other to provide solution. Some of the everyday tasks

include information retrieval and data mining. Retrieving information

from big data can be very laborious and time consuming. This has given

rise to the study of text or document classification in other to aid the

process of retrieving information from big data. Today, text

classification is a necessity due to the very large amount of text

documents that we have to deal with daily.

Document classification is the task of grouping documents into

categories based upon their content. Document classification is a

significant learning problem that is at the core of many information

management and retrieval tasks. Document classification performs an

essential role in various applications that deals with organizing,

classifying, searching and concisely representing a significant amount of

information. Document classification is a longstanding problem in

information retrieval which has been well studied (Russell, 2018).

.

Usually, machine learning, statistical pattern recognition, or neural

network approaches are used to construct classifiers automatically.

Machine learning approaches to classification suggest the automatic

construction of classifiers using induction over pre-classified sample

documents. In this project work we will employ machine learning in

classifying health documents.1.2 STATEMENT OF THE PROBLEM

With the explosion of information fuelled by the growth of the World

Wide Web it is no longer feasible for a human observer to understand all

the data coming in or even classify it into categories. Also in the health

sector, numerous patient records are being collected everyday and are

used for analysis. How do we efficiently classify or categorize these

health documents to complement easy retrieval.1.3 AIM AND OBJECTIVES OF THE STUDY

The aim of this project is to develop A System for Health Document

Classification Using Machine Learning.

Other objectives include:

1. Study the various machine learning classification algorithm.

2. Implement classification algorithm in JAVA.1.4 SCOPE OF THE STUDY

As stated earlier, statistical pattern recognition, or neural network are

used in classifying documents, this project work will concentrate on

using machine learning algorithm to classify document.1.5 SIGNIFICANCE OF THE STUDY

The software delivered from this project work will greatly reduce the

time used by doctors, physicians and other health workers in searching

and retrieving documents.

Other importance of this project work includes:

1. Helps students and other interested individuals that want to develop a

similar application.

2. It will serve as source of materials for those interested in investigating

the processes involved in developing a document classification system

using machine learning.

3. It will serve as source of materials for students who are interested in

studying machine learning.1.6 DEFINITION OF TERMS

Document Classification: is the task of grouping documents into

categories based upon their content.

Health Document: A health certificate is written by a doctor and

displays the official results of a physical examination.

Machine Learning: the study and construction of algorithms that can

learn from and make predictions on data.

JSP: Java Server Pages is a java technology for creating dynamic web

pages.

HTML: Hyper Text Markup Language for creating web-pages.

MYSQL: A database management system for creating, storing and

manipulating databases.

SERVLET: is a small pluggable extension to a Server that enhances the

Server’s functionality.

BOOTSTRAP: is a sleek, intuitive, and powerful mobile first front-end

framework for faster and easier web development. It uses HTML, CSS

and Javascript.1.7 ORGANIZATION OF WORK

Chapter one introduces the background of the project with the statement

of the problems, objectives of the project, its significance, scope, and

constraints are pointed out.

Chapter two reviews literatures on machine learning, document

classification and the review of related literature.

Chapter three discusses system Investigation and Analysis. It deals with

detailed investigation and analysis of the existing system and problem

identification. It also proposed for the new system.

Chapter four covers the system design and implementation.

Chapter five was the summary and conclusion of the project.CHAPTER ONE

1.0 INTRODUCTION

This chapter introduces the topic of the project work A System for

Health Document Classification Using Machine Learning. In this

chapter, we will consider the background of the study, statement of the

problem, aims and objectives, methodology used to design the system,

scope of the study, its significance, definition of terms, and we conclude

with the project layout or organization of the project work.1.1 BACKGROUND OF THE STUDY

Contemporarily, most hospitals, medical laboratories and other health

facilities make use of some kind of information system. These could be

either a hospital management system or a pharmacy management

system. Among other functions that these systems provide, they are

mainly used in collecting patient records. These information systems

stores patient records in digital format. Numerous patient data are being

recorded on a daily basis which forms a large data set popularly referred

to as “Big Data”.

Every day physicians and other health workers are required to work with

this “Big Data” in other to provide solution. Some of the everyday tasks

include information retrieval and data mining. Retrieving information

from big data can be very laborious and time consuming. This has given

rise to the study of text or document classification in other to aid the

process of retrieving information from big data. Today, text

classification is a necessity due to the very large amount of text

documents that we have to deal with daily.

Document classification is the task of grouping documents into

categories based upon their content. Document classification is a

significant learning problem that is at the core of many information

management and retrieval tasks. Document classification performs an

essential role in various applications that deals with organizing,

classifying, searching and concisely representing a significant amount of

information. Document classification is a longstanding problem in

information retrieval which has been well studied (Russell, 2018).

.

Usually, machine learning, statistical pattern recognition, or neural

network approaches are used to construct classifiers automatically.

Machine learning approaches to classification suggest the automatic

construction of classifiers using induction over pre-classified sample

documents. In this project work we will employ machine learning in

classifying health documents.1.2 STATEMENT OF THE PROBLEM

With the explosion of information fuelled by the growth of the World

Wide Web it is no longer feasible for a human observer to understand all

the data coming in or even classify it into categories. Also in the health

sector, numerous patient records are being collected everyday and are

used for analysis. How do we efficiently classify or categorize these

health documents to complement easy retrieval.1.3 AIM AND OBJECTIVES OF THE STUDY

The aim of this project is to develop A System for Health Document

Classification Using Machine Learning.

Other objectives include:

1. Study the various machine learning classification algorithm.

2. Implement classification algorithm in JAVA.1.4 SCOPE OF THE STUDY

As stated earlier, statistical pattern recognition, or neural network are

used in classifying documents, this project work will concentrate on

using machine learning algorithm to classify document.1.5 SIGNIFICANCE OF THE STUDY

The software delivered from this project work will greatly reduce the

time used by doctors, physicians and other health workers in searching

and retrieving documents.

Other importance of this project work includes:

1. Helps students and other interested individuals that want to develop a

similar application.

2. It will serve as source of materials for those interested in investigating

the processes involved in developing a document classification system

using machine learning.

3. It will serve as source of materials for students who are interested in

studying machine learning.1.6 DEFINITION OF TERMS

Document Classification: is the task of grouping documents into

categories based upon their content.

Health Document: A health certificate is written by a doctor and

displays the official results of a physical examination.

Machine Learning: the study and construction of algorithms that can

learn from and make predictions on data.

JSP: Java Server Pages is a java technology for creating dynamic web

pages.

HTML: Hyper Text Markup Language for creating web-pages.

MYSQL: A database management system for creating, storing and

manipulating databases.

SERVLET: is a small pluggable extension to a Server that enhances the

Server’s functionality.

BOOTSTRAP: is a sleek, intuitive, and powerful mobile first front-end

framework for faster and easier web development. It uses HTML, CSS

and Javascript.1.7 ORGANIZATION OF WORK

Chapter one introduces the background of the project with the statement

of the problems, objectives of the project, its significance, scope, and

constraints are pointed out.

Chapter two reviews literatures on machine learning, document

classification and the review of related literature.

Chapter three discusses system Investigation and Analysis. It deals with

detailed investigation and analysis of the existing system and problem

identification. It also proposed for the new system.

Chapter four covers the system design and implementation.

Chapter five was the summary and conclusion of the project.


Recent Project Materials

Abstract migration norms is defined as all policies and laws that govern the movement of people from one cou...
Word(doc) 1-5 46 Read More
Abstract A study on the removal of lead from soil samples in zamfara using modified kaolinite clay was studi...
Word(doc) 1-5 12 Read More
Abstract The study examines the impact of Corona Virus on small and medium scale enterprises in Nigeria. CO...
Word(doc) 1-5 16 Read More
Abstract Weed flora of different management techniques under different cropping systems have been reported b...
Word(doc) 1-5 6 Read More
View More Topics

Browse by Departments