indic_transliteration.detect¶
Example usage:
from indic_transliteration import detect
detect.detect('pitRRIn') == Scheme.ITRANS
detect.detect('pitRRn') == Scheme.HK
When handling a Sanskrit string, it’s almost always best to explicitly
state its transliteration scheme. This avoids embarrassing errors with
words like pitRRIn. But most of the time, it’s possible to infer the
encoding from the text itself.
detect.py automatically detects a string’s transliteration scheme:
detect('pitRRIn') == Scheme.ITRANS
detect('pitRRn') == Scheme.HK
detect('pitFn') == Scheme.SLP1
detect('पितॄन्') == Scheme.Devanagari
detect('পিতৄন্') == Scheme.Bengali
Supported schemes¶
All schemes are attributes on the Scheme class. You can also just
use the scheme name:
Scheme.IAST == 'IAST'
Scheme.Devanagari == 'Devanagari'
Scripts:
- Bengali (
'Bengali') - Devanagari (
'Devanagari') - Gujarati (
'Gujarati') - Gurmukhi (
'Gurmukhi') - Kannada (
'Kannada') - Malayalam (
'Malayalam') - Oriya (
'Oriya') - Tamil (
'Tamil') - Telugu (
'Telugu')
Romanizations:
- Harvard-Kyoto (
'HK') - IAST (
'IAST') - ITRANS (
'ITRANS') - Kolkata (
'Kolkata') - SLP1 (
'SLP1') - Velthuis (
'Velthuis')
-
indic_transliteration.detect.BLOCKS= [('malayalam', 3328), ('kannada', 3200), ('telugu', 3072), ('tamil', 2944), ('oriya', 2816), ('gujarati', 2688), ('gurmukhi', 2560), ('bengali', 2432), ('devanagari', 2304)]¶ Schemes sorted by Unicode code point. Ignore schemes with none defined.
-
indic_transliteration.detect.BRAHMIC_FIRST_CODE_POINT= 2304¶ Start of the Devanagari block.
-
indic_transliteration.detect.BRAHMIC_LAST_CODE_POINT= 3455¶ End of the Malayalam block.
-
class
indic_transliteration.detect.Regex[source]¶ -
IAST_OR_KOLKATA_ONLY= <_sre.SRE_Pattern object>¶ Match on special Roman characters
-
ITRANS_ONLY= <_sre.SRE_Pattern object>¶ Match on ITRANS-only
-
ITRANS_OR_VELTHUIS_ONLY= <_sre.SRE_Pattern object>¶ Match on chars shared by ITRANS and Velthuis
-
KOLKATA_ONLY= <_sre.SRE_Pattern object>¶ Match on Kolkata-specific Roman characters
-
SLP1_ONLY= <_sre.SRE_Pattern object>¶ Match on SLP1-only characters and bigrams
-
VELTHUIS_ONLY= <_sre.SRE_Pattern object>¶ Match on Velthuis-only characters
-
-
indic_transliteration.detect.Scheme¶ Enum for Sanskrit schemes.
alias of
indic_transliteration.detect.Enum