Abstract

Named Entity Recognition (NER) is the process of identifying Person, Organization, Location name and other miscellaneous information like number, date and measure from text. In this paper, we describe the development of a NER system for Urdu Language using Hidden Markov Model (HMM). We first show a comparison of IOB2 and IOE2 tagging schemes. We then show preprocess of the Urdu language before feeding data to the HMM model for training using the IOE2 tagging scheme. Finally, we use the Part of Speech (POS) information, gazetteers and rules to improve the accuracy of the system. Our system yields 66.71%, 71.70% and 69.12% as the values for precision, recall, and f-measure, respectively.