[1911.01600v1] Integrating Dictionary Feature into A Deep Learning Model for Disease Named Entity Recognition
Instead of hand engineering the features for tokens, character-level embeddings are used to represent orthographical features of tokens in addition to using word embeddings and dictionary information
Abstract In recent years, Deep Learning (DL) models are becoming important due to their demonstrated success at overcoming complex learning problems. DL models have been applied effectively for different Natural Language Processing (NLP) tasks such as part-of-Speech (PoS) tagging and Machine Translation (MT). Disease Named Entity Recognition (Disease-NER) is a crucial task which aims at extracting disease Named Entities (NEs) from text. In this paper, a DL model for Disease-NER using dictionary information is proposed and evaluated on National Center for Biotechnology Information (NCBI) disease corpus and BC5CDR dataset. Word embeddings trained over general domain texts as well as biomedical texts have been used to represent input to the proposed model. This study also compares two different Segment Representation (SR) schemes, namely IOB2 and IOBES for Disease-NER. The results illustrate that using dictionary information, pre-trained word embeddings, character embeddings and CRF with global score improves the performance of Disease-NER system.
‹Figure 1: An abstract with disease mentions highlighted (Introduction)Figure 2: Classical and deep learning flows (Introduction)Figure 3: Structure of a biological neuron and a simple neuron model (Background)Figure 4: Structure of a simple feed-forward ANN (Background)Figure 5: A Simple RNN Structure (Background)Figure 6: Gradient vanishing problem for RNN (Background)Figure 7: Structure of LSTM unit (Background)Figure 8: Structure of Bidirectional LSTM (Background)Figure 9: Example of word vector calculations (Word Embeddings)Figure 10: General structure of the proposed model (Methodology)Figure 11: The contextual representation learning model (Methodology)Figure 12: An example of decoding a text fragement (Decoding)Figure 13: Character representation learning model (Pre-processing)›