indic_transliteration.detect¶
Example usage:
from indic_transliteration import detect
detect.detect('pitRRIn') == Scheme.ITRANS
detect.detect('pitRRn') == Scheme.HK
When handling a Sanskrit string, it’s almost always best to explicitly
state its transliteration scheme. This avoids embarrassing errors with
words like pitRRIn
. But most of the time, it’s possible to infer the
encoding from the text itself.
detect.py
automatically detects a string’s transliteration scheme:
detect('pitRRIn') == Scheme.ITRANS
detect('pitRRn') == Scheme.HK
detect('pitFn') == Scheme.SLP1
detect('पितॄन्') == Scheme.Devanagari
detect('পিতৄন্') == Scheme.Bengali
Supported schemes¶
All schemes are attributes on the Scheme
class. You can also just
use the scheme name:
Scheme.IAST == 'IAST'
Scheme.Devanagari == 'Devanagari'
Scripts:
- Bengali (
'Bengali'
) - Devanagari (
'Devanagari'
) - Gujarati (
'Gujarati'
) - Gurmukhi (
'Gurmukhi'
) - Kannada (
'Kannada'
) - Malayalam (
'Malayalam'
) - Oriya (
'Oriya'
) - Tamil (
'Tamil'
) - Telugu (
'Telugu'
)
Romanizations:
- Harvard-Kyoto (
'HK'
) - IAST (
'IAST'
) - ITRANS (
'ITRANS'
) - Kolkata (
'Kolkata'
) - SLP1 (
'SLP1'
) - Velthuis (
'Velthuis'
)
-
indic_transliteration.detect.
BLOCKS
= [('malayalam', 3328), ('kannada', 3200), ('telugu', 3072), ('tamil', 2944), ('oriya', 2816), ('gujarati', 2688), ('gurmukhi', 2560), ('bengali', 2432), ('devanagari', 2304)]¶ Schemes sorted by Unicode code point. Ignore schemes with none defined.
-
indic_transliteration.detect.
BRAHMIC_FIRST_CODE_POINT
= 2304¶ Start of the Devanagari block.
-
indic_transliteration.detect.
BRAHMIC_LAST_CODE_POINT
= 3455¶ End of the Malayalam block.
-
class
indic_transliteration.detect.
Regex
[source]¶ -
IAST_OR_KOLKATA_ONLY
= <_sre.SRE_Pattern object>¶ Match on special Roman characters
-
ITRANS_ONLY
= <_sre.SRE_Pattern object>¶ Match on ITRANS-only
-
ITRANS_OR_VELTHUIS_ONLY
= <_sre.SRE_Pattern object>¶ Match on chars shared by ITRANS and Velthuis
-
KOLKATA_ONLY
= <_sre.SRE_Pattern object>¶ Match on Kolkata-specific Roman characters
-
SLP1_ONLY
= <_sre.SRE_Pattern object>¶ Match on SLP1-only characters and bigrams
-
VELTHUIS_ONLY
= <_sre.SRE_Pattern object>¶ Match on Velthuis-only characters
-
-
indic_transliteration.detect.
Scheme
¶ Enum for Sanskrit schemes.
alias of
indic_transliteration.detect.Enum