Skip to content

Commit 724d7c4

Browse files
authored
docs: update readme for installation method (#7)
Signed-off-by: Mingzhuo Yin <yinmingzhuo@gmail.com>
1 parent 0976471 commit 724d7c4

File tree

2 files changed

+155
-5
lines changed

2 files changed

+155
-5
lines changed

README.md

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,42 @@
1-
# pg_tokenizer (WIP)
1+
# pg_tokenizer
22

33
A PostgreSQL extension that provides tokenizers for full-text search.
44

55
## Quick Start
6+
The official `ghcr.io/tensorchord/vchord_bm25-postgres` Docker image comes pre-configured with several complementary extensions:
7+
- `pg_tokenizer` - This extension
8+
- [`VectorChord-bm25`](https://github.com/tensorchord/VectorChord-bm25) - Native BM25 Ranking Index
9+
- [`VectorChord`](https://github.com/tensorchord/VectorChord) - Scalable, high-performance, and disk-efficient vector similarity search
10+
- [`pgvector`](https://github.com/pgvector/pgvector) - Popular vector similarity search
11+
12+
Simply run the Docker container as shown below:
13+
14+
```bash
15+
docker run \
16+
--name vectorchord-demo \
17+
-e POSTGRES_PASSWORD=mysecretpassword \
18+
-p 5432:5432 \
19+
-d ghcr.io/tensorchord/vchord_bm25-postgres:pg17-v0.2.0
20+
```
21+
22+
Once everything’s set up, you can connect to the database using the `psql` command line tool. The default username is `postgres`, and the default password is `mysecretpassword`. Here’s how to connect:
23+
24+
```sh
25+
psql -h localhost -p 5432 -U postgres
26+
```
27+
28+
After connecting, run the following SQL to make sure the extension is enabled:
629

7-
TODO
30+
```sql
31+
CREATE EXTENSION pg_tokenizer;
32+
```
33+
34+
Then, don’t forget to add `tokenizer_catalog` to your `search_path`:
35+
36+
```sql
37+
ALTER SYSTEM SET search_path TO "$user", public, tokenizer_catalog;
38+
SELECT pg_reload_conf();
39+
```
840

941
## Example
1042

docs/01-installation.md

Lines changed: 121 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,133 @@
22

33
## Docker
44

5-
TODO
5+
The official `ghcr.io/tensorchord/vchord_bm25-postgres` Docker image comes pre-configured with several complementary extensions:
6+
- `pg_tokenizer` - This extension
7+
- [`VectorChord-bm25`](https://github.com/tensorchord/VectorChord-bm25) - Native BM25 Ranking Index
8+
- [`VectorChord`](https://github.com/tensorchord/VectorChord) - Scalable, high-performance, and disk-efficient vector similarity search
9+
- [`pgvector`](https://github.com/pgvector/pgvector) - Popular vector similarity search
10+
11+
Simply run the Docker container as shown below:
12+
13+
```bash
14+
docker run \
15+
--name vectorchord-demo \
16+
-e POSTGRES_PASSWORD=mysecretpassword \
17+
-p 5432:5432 \
18+
-d ghcr.io/tensorchord/vchord_bm25-postgres:pg17-v0.2.0
19+
```
20+
21+
Once everything’s set up, you can connect to the database using the `psql` command line tool. The default username is `postgres`, and the default password is `mysecretpassword`. Here’s how to connect:
22+
23+
```sh
24+
psql -h localhost -p 5432 -U postgres
25+
```
26+
27+
After connecting, run the following SQL to make sure the extension is enabled:
28+
29+
```sql
30+
CREATE EXTENSION pg_tokenizer;
31+
```
32+
33+
Then, don’t forget to add `tokenizer_catalog` to your `search_path`:
34+
35+
```sql
36+
ALTER SYSTEM SET search_path TO "$user", public, tokenizer_catalog;
37+
SELECT pg_reload_conf();
38+
```
639

740
## From Debian package
841

9-
TODO
42+
> Installation from the Debian package requires a dependency on `GLIBC >= 2.35`, e.g:
43+
> - `Ubuntu 22.04` or later
44+
> - `Debian Bullseye` or later
45+
46+
Debian packages(.deb) are used in distributions based on Debian, such as Ubuntu and many others. They can be easily installed by `dpkg` or `apt-get`.
47+
48+
1. Download the deb package in [the release page](https://github.com/tensorchord/pg_tokenizer.rs/releases/latest), and type `sudo apt install postgresql-17-pg-tokenizer_*.deb` to install the deb package.
49+
50+
2. Configure your PostgreSQL by modifying the `shared_preload_libraries` and `search_path` to include the extension.
51+
52+
```sh
53+
psql -U postgres -c 'ALTER SYSTEM SET shared_preload_libraries = "pg_tokenizer.so"'
54+
psql -U postgres -c 'ALTER SYSTEM SET search_path TO "$user", public, tokenizer_catalog'
55+
# You need restart the PostgreSQL cluster to take effects.
56+
sudo systemctl restart postgresql.service # for pg_tokenizer running with systemd
57+
```
58+
59+
3. Connect to the database and enable the extension.
60+
61+
```sql
62+
DROP EXTENSION IF EXISTS pg_tokenizer;
63+
CREATE EXTENSION pg_tokenizer CASCADE;
64+
```
1065

1166
## From ZIP package
1267

13-
TODO
68+
> Installation from the ZIP package requires a dependency on `GLIBC >= 2.35`, e.g:
69+
> - `RHEL 9` or later
70+
71+
For systems that are not Debian based and cannot run a Docker container, please follow these steps to install:
72+
73+
1. Before install, make sure that you have the necessary packages installed, including `PostgreSQL`, `pg_config`, `unzip`, `wget`.
74+
75+
```sh
76+
# Example for RHEL 9 dnf
77+
# Please check your package manager
78+
sudo dnf install -y unzip wget libpq-devel
79+
sudo dnf module install -y postgresql:15/server
80+
sudo postgresql-setup --initdb
81+
sudo systemctl start postgresql.service
82+
sudo systemctl enable postgresql.service
83+
```
84+
85+
2. Verify whether `$pkglibdir` and `$shardir` have been set by PostgreSQL.
86+
87+
```sh
88+
pg_config --pkglibdir
89+
# Print something similar to:
90+
# /usr/lib/postgresql/15/lib or
91+
# /usr/lib64/pgsql
92+
93+
pg_config --sharedir
94+
# Print something similar to:
95+
# /usr/share/postgresql/15 or
96+
# /usr/share/pgsql
97+
```
98+
99+
3. Download the zip package in [the release page](https://github.com/tensorchord/pg_tokenizer.rs/releases/latest) and extract it to a temporary directory.
100+
101+
```sh
102+
wget https://github.com/tensorchord/pg_tokenizer.rs/releases/download/0.1.0/postgresql-17-pg-tokenizer_*_x86_64-linux-gnu.zip -O pg_tokenizer.zip
103+
unzip pg_tokenizer.zip -d pg_tokenizer
104+
```
105+
106+
4. Copy the extension files to the PostgreSQL directory.
107+
108+
```sh
109+
# Copy library to `$pkglibdir`
110+
sudo cp pg_tokenizer/pg_tokenizer.so $(pg_config --pkglibdir)/
111+
# Copy schema to `$shardir`
112+
sudo cp pg_tokenizer/pg_tokenizer--*.sql $(pg_config --sharedir)/extension/
113+
sudo cp pg_tokenizer/pg_tokenizer.control $(pg_config --sharedir)/extension/
114+
```
115+
116+
5. Configure your PostgreSQL by modifying the `shared_preload_libraries` and `search_path` to include the extension.
117+
118+
```sh
119+
psql -U postgres -c 'ALTER SYSTEM SET shared_preload_libraries = "pg_tokenizer.so"'
120+
psql -U postgres -c 'ALTER SYSTEM SET search_path TO "$user", public, tokenizer_catalog'
121+
# You need restart the PostgreSQL cluster to take effects.
122+
sudo systemctl restart postgresql.service # for pg_tokenizer running with systemd
123+
```
124+
125+
6. Connect to the database and enable the extension.
126+
127+
```sql
128+
DROP EXTENSION IF EXISTS pg_tokenizer;
129+
CREATE EXTENSION pg_tokenizer CASCADE;
130+
```
131+
14132

15133
## From Source
16134

0 commit comments

Comments
 (0)