MNLP Project

Maithili Natural Language Processing (MNLP) Toolkit


riturajsingh

Overview

Maithili is an Indo-Aryan language native to the Indian subcontinent, mainly spoken in India and Nepal. Maithili’s speaker base is spread across a large part of Bihar and eastern tarai region of Nepal. There are over 30-35 million speakers of the Maithili language.


The idea of MNLP is to build a natural language processing toolkit for Maithili Language. The tool will help to tokenize Maithili text, word embeddings in maithili, Maithili POS Tagging, Name Entity Recognition, build Neural Models for Maithili language. Maithili is a resource constraint language and have very less digital footprint. It makes the data collection, annotation as well as building machine learning model complex. This is an open source project, hosted on github


Open Source Project

All the codes, data and API will be publicly available for greater good of increasing digital footprint of Maithili language. Note that, it is a volunteer project and no paid employment/internship is available. All research will be published on ArXiv/HAL with contributors as authors.

How you can contribute ?