India Languages, asked by radharadhabai32, 7 months ago

split the word karycharana and what is sandhi name​

Answers

Answered by aadhya825
0

Answer:

Kannada Spell Checker with Sandhi Splitter

Akshatha A N

Department of ISE

RVCE, Bangalore

Chandana G Upadhyaya

Department of ISE

RVCE, Bangalore

Rajashekara Murthy S

Associate Professor, Department of ISE

RVCE, Bangalore

Abstract—Spelling errors are introduced in text either

during typing, or when the user does not know the correct

phoneme or grapheme. If a language contains complex

words like sandhi where two or more morphemes join

based on some rules, spell checking becomes very tedious.

In such situations, having a spell checker with sandhi

splitter which alerts the user by flagging the errors and

providing suggestions is very useful. A novel algorithm of

sandhi splitting is proposed in this paper. The sandhi

splitter can split about 7000 most common sandhi words in

Kannada language used as test samples. The sandhi splitter

was integrated with a Kannada spell checker and a

mechanism for generating suggestions was added. A

comprehensive, platform independent, standalone spell

checker with sandhi splitter application software was thus

developed and tested extensively for its efficiency and

correctness. A comparative analysis of this spell checker

with sandhi splitter was made and results concluded that

the Kannada spell checker with sandhi splitter has an

improved performance. It is twice as fast, 200 times more

space efficient, and it is 90% accurate in case of complex

nouns and 50% accurate for complex verbs. Such a spell

checker with sandhi splitter will be of foremost significance

in machine translation systems, voice processing, etc. This

is the first sandhi splitter in Kannada and the advantage of

the novel algorithm is that, it can be extended to all Indian

languages.

Keywords— Natural language processing; Morphology;

Computational linguistics; Sandhi splitter; Spell checke.

I. INTRODUCTION

Kannada is an agglutinative language. It is one of the

Dravidian languages, and by the nature of the Dravidian

languages it has very clear rules defined for every aspect

of its structure. Kannada has roughly 40 million native

speakers and it is one of the 40 most spoken languages in

the world [1]. It is influenced greatly by Sanskrit, and

therefore we can find an overlap of words, structure and

grammar rules including the sandhi and lexicon between

the two languages. Like any other language, Kannada has

grown and will continue to grow and change with the

intervention of other languages and accents, and by

people who want to make the language and its words easy

to pronounce, spell and write. There is no specific

boundary to the words in it. In a language like Kannada,

where there are abundant complex structures and

compound words, a spell checker demands a sandhi

splitter for two reasons. First, since any database of

Kannada words cannot store every sandhi word without

huge redundancy, the sandhi splitter would hugely reduce

the dictionary size. Second, sandhi splitters are critical for

recognizing spelling errors arising due to an erroneous

morpheme or an erroneous segment at the morpheme

boundary of a sandhi word.

A morpheme is the smallest meaningful unit in a

language. Joining morphemes to derive complex and

meaningful words without changing the spelling or the

phonetics of the constituent morphemes is called

agglutination. Inflection, on the other hand, is the refitting

of the words to express various grammatical aspects like

gender, tense, mood and number.

In the processing of any language, morphological

analysis, sentence structure analysis and recognition

become the founding pillars. In processing Indian

languages, in addition to the aforementioned factors,

several factors such as sandhis, samaasas, and inflections

specific to gender and tense also play a role. In Kannada,

there are three ways of forming complex words: samaasa,

jodi pada and sandhi.

A. Samaasa, and Jodi Pada

Samaasa is also known as nominal compound.

Morphologically, a samaasa has each noun or adjective in

its stem form with only the last element obtaining the case

inflection. Examples of samaasa include “peetaMbara”

and “vRukoodara”. A jodi pada is a phonemic binding of

two unrelated morphemes separated by a hyphen used in

the Kannada dialect. Examples of jodi pada include

“mane-maTha” and “deevaru-diMDaru”.

B. Sandhi

Sandhi means ‘to join’. In sandhi formation at the

word boundary, several phonological processes take

place to produce the complex word or the sandhi word.

During this process of joining, one or both following

operations occur at the word boundary:

 A new letter will appear at the word boundary.

Similar questions