Researching on Analysis and creating Corpus from Primary level Sindhi language Book for Sindhi

Naveen Talpur; Mir Jahanzeb Talpur; Timotheous  Samar

Researching on Analysis and creating Corpus from Primary level Sindhi language Book for Sindhi

Authors

Naveen Talpur Student
Mir Jahanzeb Talpur
Timotheous Samar

Keywords:

Sindhi corpus, UOPS, Sentimental analyis, Document term metrix

Abstract

Sindhi is an amusing vernacular with a large abundance of pieces of literature and non-literary works. Despite the availability of several books, newspapers, magazines, and internet resources for developing Sindhi text corpora, a suitable and effective textual corpus could not be generated and offered accessible for investigation, language characteristics research, semantics assessment, and information gathering systems. The paucity of tools for computational linguistics research and NLP apps for Sindhi is stimulating complications at this time. Moreover, we have built Sindhi text libraries to provide computer linguistics, NLP specialists, and academics with text resources. The Sindh Text Book Board and primary school textbooks are used to create the Sindhi text corpus. Using the 2-gram approach of the n-gram model, using the Document Term Matrix and TF-IDF models, a Sindhi belief text dataset is produced and evaluated. The dataset might be useful for research on linguistic suggested work, topic detection, and sentiment classification by aspect.

Downloads

Published

2023-03-10

How to Cite

Talpur, N., Talpur, M. J., & Samar, T. . (2023). Researching on Analysis and creating Corpus from Primary level Sindhi language Book for Sindhi. Repertus: Journal of Linguistics, Language Planning and Policy, 2(1), 37–48. Retrieved from https://rjllp.muet.edu.pk/index.php/repertus/article/view/24

Download Citation

Issue

Repertus: Journal of Linguistics, Language Planning and Policy, March 2023, Vol 2, Issue 1

Section

Articles

Researching on Analysis and creating Corpus from Primary level Sindhi language Book for Sindhi

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Current Issue

Developed By

Make a Submission

Information

Browse