Skip to content

Commit 470746b

Browse files
committed
move to uv_build and modernize cli
1 parent d5e8d7c commit 470746b

File tree

10 files changed

+409
-202
lines changed

10 files changed

+409
-202
lines changed

.gitattributes

Lines changed: 0 additions & 1 deletion
This file was deleted.

.github/workflows/docs.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,18 +27,18 @@ jobs:
2727
python-version: '3.11'
2828

2929
- name: Install uv
30-
uses: astral-sh/setup-uv@v3
30+
uses: astral-sh/setup-uv@v7.1.4
3131
with:
3232
version: "latest"
3333

3434
- name: Install dependencies
3535
run: |
36-
uv pip install --system -e ".[docs]"
36+
uv sync --group docs
3737
3838
- name: Build documentation
3939
run: |
4040
cd docs
41-
make html
41+
uv run make html
4242
4343
- name: Setup Pages
4444
if: github.ref == 'refs/heads/master' || github.ref == 'refs/heads/main'

.github/workflows/python-publish.yml

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -30,23 +30,21 @@ jobs:
3030
- name: Checkout repository
3131
uses: actions/checkout@v4
3232

33-
- name: Set up Python
34-
uses: actions/setup-python@v5
33+
- name: Set up uv
34+
uses: astral-sh/setup-[email protected]
3535
with:
36-
python-version: '3.11'
36+
version: "latest"
3737

38-
- name: Install build dependencies
39-
run: |
40-
python -m pip install --upgrade pip
41-
pip install build
38+
- name: Set up Python
39+
run: uv python install 3.11
4240

4341
- name: Build package
44-
run: python -m build
42+
run: uv build
4543

4644
- name: Verify package
4745
run: |
48-
python -m pip install twine
49-
python -m twine check dist/*
46+
uv tool install twine
47+
uvx twine check dist/*
5048
5149
- name: Publish package to PyPI
5250
uses: pypa/gh-action-pypi-publish@release/v1

.github/workflows/test.yml

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -18,33 +18,32 @@ jobs:
1818
python-version: ['3.10', '3.11', '3.12']
1919
steps:
2020
- uses: actions/checkout@v4
21-
- name: Set up Python ${{ matrix.python-version }}
22-
uses: actions/setup-python@v4
21+
- name: Set up uv
22+
uses: astral-sh/setup-[email protected]
2323
with:
24-
python-version: ${{ matrix.python-version }}
24+
version: "latest"
25+
- name: Set up Python ${{ matrix.python-version }}
26+
run: uv python install ${{ matrix.python-version }}
2527

2628
- name: Install dependencies
2729
run: |
28-
python -m pip install --upgrade pip
29-
pip install -e .[dev,test]
30+
uv sync --group dev --group test
3031
3132
- name: Run tests
32-
run: python -m pytest -v
33+
run: uv run python -m pytest -v
3334
yamllint:
3435
name: Yaml Linting
3536
runs-on: ubuntu-latest
36-
strategy:
37-
matrix:
38-
python-version: ['3.10']
3937
steps:
4038
- uses: actions/checkout@v4
41-
- name: Set up Python ${{ matrix.python-version }}
42-
uses: actions/setup-python@v4
39+
- name: Set up uv
40+
uses: astral-sh/setup-[email protected]
4341
with:
44-
python-version: ${{ matrix.python-version }}
42+
version: "latest"
43+
- name: Set up Python
44+
run: uv python install 3.10
4545
- name: Install Dependencies
4646
run: |
47-
python -m pip install --upgrade pip setuptools wheel
48-
pip install yamllint
47+
uv tool install yamllint
4948
- name: Run yamllint
50-
run: yamllint */
49+
run: uvx yamllint */

README.md

Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
# Indicate: Transliterate Indic Languages to English
2+
3+
[![Notary Badge](https://notarypy.soodoku.workers.dev/badge/indicate/0.2.1/indicate-0.2.1-py3-none-any.whl)](https://pypi.org/integrity/indicate/0.2.1/indicate-0.2.1-py3-none-any.whl/provenance)
4+
[![PyPI Version](https://img.shields.io/pypi/v/indicate.svg)](https://pypi.python.org/pypi/indicate)
5+
[![Downloads](https://static.pepy.tech/badge/indicate)](https://pepy.tech/project/indicate)
6+
[![Tests](https://github.com/in-rolls/indicate/workflows/test/badge.svg)](https://github.com/in-rolls/indicate/actions?query=workflow%3Atest)
7+
[![Documentation](https://img.shields.io/badge/docs-github.io-blue)](https://in-rolls.github.io/indicate/)
8+
9+
Transliterations to/from Indian languages are still generally low quality. One problem is access to data. Another is that there is no standard transliteration.
10+
11+
For Hindi--English, we build novel dataset for names using the ESPNcricinfo. For instance, see [here](https://www.espncricinfo.com/hindi/series/pakistan-tour-of-england-2021-1239529/england-vs-pakistan-1st-odi-1239537/full-scorecard) for hindi version of the [english scorecard](https://www.espncricinfo.com/series/pakistan-tour-of-england-2021-1239529/england-vs-pakistan-1st-odi-1239537/full-scorecard).
12+
13+
We also create a dataset from [election affidavits](https://affidavit.eci.gov.in/CandidateCustomFilter) and exploit the [Google Dakshina dataset](https://github.com/google-research-datasets/dakshina).
14+
15+
To overcome the fact that there isn't one standard way of transliteration, we provide k-best transliterations.
16+
17+
## Install
18+
19+
We strongly recommend installing `indicate` inside a Python virtual environment (see [venv documentation](https://docs.python.org/3/library/venv.html#creating-virtual-environments))
20+
21+
**Requirements:** Python 3.10 or higher
22+
23+
```bash
24+
pip install indicate
25+
```
26+
27+
## Usage
28+
29+
### Python API
30+
31+
```python
32+
from indicate import transliterate
33+
english_translated = transliterate.hindi2english("हिंदी")
34+
print(english_translated)
35+
# Output: hindi
36+
```
37+
38+
### Command Line Interface
39+
40+
The package provides both modern and legacy CLI interfaces:
41+
42+
#### Modern CLI (Recommended)
43+
44+
```bash
45+
# Basic usage
46+
indicate hindi2english "राजशेखर चिंतालपति"
47+
48+
# From file
49+
indicate hindi2english --input hindi.txt --output english.txt
50+
51+
# From stdin
52+
echo "गौरव सूद" | indicate hindi2english
53+
54+
# Batch processing for large files
55+
indicate hindi2english --input large_file.txt --batch --quiet
56+
57+
# Get help
58+
indicate hindi2english --help
59+
60+
# Package information
61+
indicate info
62+
```
63+
64+
#### Legacy CLI (Backward Compatibility)
65+
66+
```bash
67+
# Still supported for backward compatibility
68+
hindi2english --type hin2eng --input "हिंदी"
69+
```
70+
71+
## Functions
72+
73+
We expose 1 function, which will take Hindi text and transliterate it to English.
74+
75+
- **transliterate.hindi2english(input)**
76+
- What it does: Converts given hindi text into English alphabet
77+
- Output: Returns text in English
78+
79+
## Testing Locally
80+
81+
To test the package locally, follow these steps:
82+
83+
1. **Clone the repository**:
84+
```bash
85+
git clone https://github.com/in-rolls/indicate.git
86+
cd indicate
87+
```
88+
89+
2. **Install with uv (recommended)**:
90+
```bash
91+
uv sync
92+
```
93+
94+
Or with pip:
95+
```bash
96+
python -m venv venv
97+
source venv/bin/activate # On Windows: venv\Scripts\activate
98+
pip install -e .
99+
```
100+
101+
3. **Run tests**:
102+
```bash
103+
# Run all tests
104+
python -m unittest discover tests/
105+
106+
# Run specific test
107+
python -m unittest tests.test_010_hindi_translate
108+
```
109+
110+
4. **Test the transliteration**:
111+
```bash
112+
# Modern CLI
113+
indicate hindi2english "हिंदी"
114+
115+
# Legacy CLI
116+
hindi2english --type hin2eng --input "हिंदी"
117+
118+
# Python usage
119+
python -c "from indicate import transliterate; print(transliterate.hindi2english('हिंदी'))"
120+
```
121+
122+
## Data
123+
124+
The datasets used to train the model:
125+
126+
- [Indian Election affidavits](https://affidavit.eci.gov.in/CandidateCustomFilter)
127+
- [Google Dakshina dataset](https://github.com/google-research-datasets/dakshina)
128+
- [ESPN Cric Info](https://www.espncricinfo.com/hindi/series/pakistan-tour-of-england-2021-1239529/england-vs-pakistan-1st-odi-1239537/full-scorecard) for hindi version of the [english scorecard](https://www.espncricinfo.com/series/pakistan-tour-of-england-2021-1239529/england-vs-pakistan-1st-odi-1239537/full-scorecard)
129+
- [IIT Bombay English-Hindi Corpus](https://www.cfilt.iitb.ac.in/iitb_parallel/)
130+
131+
## Evaluation
132+
133+
Model was evaluated on test dataset of Google Dakshina dataset, Model predicted 73.64% exact matches.
134+
[Indic-trans](https://github.com/libindic/indic-trans) predicted 63.12% exact matches on Google Dakshina dataset.
135+
136+
Below is the edit distance metrics on test dataset (0.0 mean exact match, the farther away from 0.0, the difference is more between predicted text and actual text):
137+
138+
![Edit distance metrics of model on Google Dakshina test dataset](https://github.com/in-rolls/indicate/raw/master/images/h2e_ed.png)
139+
140+
## Authors
141+
142+
Rajashekar Chintalapati and Gaurav Sood
143+
144+
## Contributor Code of Conduct
145+
146+
The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the [Contributor Code of Conduct](http://contributor-covenant.org/version/1/0/0/).
147+
148+
## License
149+
150+
The package is released under the [MIT License](https://opensource.org/licenses/MIT).

README.rst

Lines changed: 0 additions & 141 deletions
This file was deleted.

0 commit comments

Comments
 (0)