LanguageNeutralInterfaces

From IndLinux

Jump to: navigation, search

This is an attempt make applications language-neutral on the user-facing side, and there by enabling users to use any language in a single document without selecting language options for using common language tools like spell checker

Contents

Problem

  1. It is common to have English text inside the text written in Indic languages. Currently language tools like spell checking either checks the spelling of only one language and leaves other words as wrong words. User needs to switch the spell checker for first checking the xx_IN language and then English
  2. It is even possible to have to Indic languages present in the same text
  3. The above problem is not limited to Indic languages.

Affects

Spell checkers

Text to speech systems

  • Dhvani already has language detection feature. But fails when english text comes inside the Indic text
  • SpeechDispatcher is designed to handle this problem. We need to study its features.

Suggested Solutions

  • For each xx_IN spell checker have a variant as combination of xx_IN and en_US (en_UK?)
  1. We need to see the possibility of this variant by trying out a hi_IN and en_UN spellchecker combination
  • The applications should be capable of tokenizing(already doing) and detecting the language of the word. Then pass the language as option to the back end spell checker
  1. This might require some study on the existing spell check framework, single dictionary approach used in Fedora, and study of tools like enchant

Language Detection

  1. Code point based approach is the widely used one
  2. What to do with languages sharing code points? For eg: Marathi and Hindi?

TODO

  1. Is there any library for doing the language detection already?

References

  1. Fedora wiki: Fix the dictionary proliferation problem
Personal tools
communication
Development