|
2 | 2 |
|
3 | 3 | ## Docker |
4 | 4 |
|
5 | | -TODO |
| 5 | +The official `ghcr.io/tensorchord/vchord_bm25-postgres` Docker image comes pre-configured with several complementary extensions: |
| 6 | +- `pg_tokenizer` - This extension |
| 7 | +- [`VectorChord-bm25`](https://github.com/tensorchord/VectorChord-bm25) - Native BM25 Ranking Index |
| 8 | +- [`VectorChord`](https://github.com/tensorchord/VectorChord) - Scalable, high-performance, and disk-efficient vector similarity search |
| 9 | +- [`pgvector`](https://github.com/pgvector/pgvector) - Popular vector similarity search |
| 10 | + |
| 11 | +Simply run the Docker container as shown below: |
| 12 | + |
| 13 | +```bash |
| 14 | +docker run \ |
| 15 | + --name vectorchord-demo \ |
| 16 | + -e POSTGRES_PASSWORD=mysecretpassword \ |
| 17 | + -p 5432:5432 \ |
| 18 | + -d ghcr.io/tensorchord/vchord_bm25-postgres:pg17-v0.2.0 |
| 19 | +``` |
| 20 | + |
| 21 | +Once everything’s set up, you can connect to the database using the `psql` command line tool. The default username is `postgres`, and the default password is `mysecretpassword`. Here’s how to connect: |
| 22 | + |
| 23 | +```sh |
| 24 | +psql -h localhost -p 5432 -U postgres |
| 25 | +``` |
| 26 | + |
| 27 | +After connecting, run the following SQL to make sure the extension is enabled: |
| 28 | + |
| 29 | +```sql |
| 30 | +CREATE EXTENSION pg_tokenizer; |
| 31 | +``` |
| 32 | + |
| 33 | +Then, don’t forget to add `tokenizer_catalog` to your `search_path`: |
| 34 | + |
| 35 | +```sql |
| 36 | +ALTER SYSTEM SET search_path TO "$user", public, tokenizer_catalog; |
| 37 | +SELECT pg_reload_conf(); |
| 38 | +``` |
6 | 39 |
|
7 | 40 | ## From Debian package |
8 | 41 |
|
9 | | -TODO |
| 42 | +> Installation from the Debian package requires a dependency on `GLIBC >= 2.35`, e.g: |
| 43 | +> - `Ubuntu 22.04` or later |
| 44 | +> - `Debian Bullseye` or later |
| 45 | +
|
| 46 | +Debian packages(.deb) are used in distributions based on Debian, such as Ubuntu and many others. They can be easily installed by `dpkg` or `apt-get`. |
| 47 | + |
| 48 | +1. Download the deb package in [the release page](https://github.com/tensorchord/pg_tokenizer.rs/releases/latest), and type `sudo apt install postgresql-17-pg-tokenizer_*.deb` to install the deb package. |
| 49 | + |
| 50 | +2. Configure your PostgreSQL by modifying the `shared_preload_libraries` and `search_path` to include the extension. |
| 51 | + |
| 52 | +```sh |
| 53 | +psql -U postgres -c 'ALTER SYSTEM SET shared_preload_libraries = "pg_tokenizer.so"' |
| 54 | +psql -U postgres -c 'ALTER SYSTEM SET search_path TO "$user", public, tokenizer_catalog' |
| 55 | +# You need restart the PostgreSQL cluster to take effects. |
| 56 | +sudo systemctl restart postgresql.service # for pg_tokenizer running with systemd |
| 57 | +``` |
| 58 | + |
| 59 | +3. Connect to the database and enable the extension. |
| 60 | + |
| 61 | +```sql |
| 62 | +DROP EXTENSION IF EXISTS pg_tokenizer; |
| 63 | +CREATE EXTENSION pg_tokenizer CASCADE; |
| 64 | +``` |
10 | 65 |
|
11 | 66 | ## From ZIP package |
12 | 67 |
|
13 | | -TODO |
| 68 | +> Installation from the ZIP package requires a dependency on `GLIBC >= 2.35`, e.g: |
| 69 | +> - `RHEL 9` or later |
| 70 | +
|
| 71 | +For systems that are not Debian based and cannot run a Docker container, please follow these steps to install: |
| 72 | + |
| 73 | +1. Before install, make sure that you have the necessary packages installed, including `PostgreSQL`, `pg_config`, `unzip`, `wget`. |
| 74 | + |
| 75 | +```sh |
| 76 | +# Example for RHEL 9 dnf |
| 77 | +# Please check your package manager |
| 78 | +sudo dnf install -y unzip wget libpq-devel |
| 79 | +sudo dnf module install -y postgresql:15/server |
| 80 | +sudo postgresql-setup --initdb |
| 81 | +sudo systemctl start postgresql.service |
| 82 | +sudo systemctl enable postgresql.service |
| 83 | +``` |
| 84 | + |
| 85 | +2. Verify whether `$pkglibdir` and `$shardir` have been set by PostgreSQL. |
| 86 | + |
| 87 | +```sh |
| 88 | +pg_config --pkglibdir |
| 89 | +# Print something similar to: |
| 90 | +# /usr/lib/postgresql/15/lib or |
| 91 | +# /usr/lib64/pgsql |
| 92 | + |
| 93 | +pg_config --sharedir |
| 94 | +# Print something similar to: |
| 95 | +# /usr/share/postgresql/15 or |
| 96 | +# /usr/share/pgsql |
| 97 | +``` |
| 98 | + |
| 99 | +3. Download the zip package in [the release page](https://github.com/tensorchord/pg_tokenizer.rs/releases/latest) and extract it to a temporary directory. |
| 100 | + |
| 101 | +```sh |
| 102 | +wget https://github.com/tensorchord/pg_tokenizer.rs/releases/download/0.1.0/postgresql-17-pg-tokenizer_*_x86_64-linux-gnu.zip -O pg_tokenizer.zip |
| 103 | +unzip pg_tokenizer.zip -d pg_tokenizer |
| 104 | +``` |
| 105 | + |
| 106 | +4. Copy the extension files to the PostgreSQL directory. |
| 107 | + |
| 108 | +```sh |
| 109 | +# Copy library to `$pkglibdir` |
| 110 | +sudo cp pg_tokenizer/pg_tokenizer.so $(pg_config --pkglibdir)/ |
| 111 | +# Copy schema to `$shardir` |
| 112 | +sudo cp pg_tokenizer/pg_tokenizer--*.sql $(pg_config --sharedir)/extension/ |
| 113 | +sudo cp pg_tokenizer/pg_tokenizer.control $(pg_config --sharedir)/extension/ |
| 114 | +``` |
| 115 | + |
| 116 | +5. Configure your PostgreSQL by modifying the `shared_preload_libraries` and `search_path` to include the extension. |
| 117 | + |
| 118 | +```sh |
| 119 | +psql -U postgres -c 'ALTER SYSTEM SET shared_preload_libraries = "pg_tokenizer.so"' |
| 120 | +psql -U postgres -c 'ALTER SYSTEM SET search_path TO "$user", public, tokenizer_catalog' |
| 121 | +# You need restart the PostgreSQL cluster to take effects. |
| 122 | +sudo systemctl restart postgresql.service # for pg_tokenizer running with systemd |
| 123 | +``` |
| 124 | + |
| 125 | +6. Connect to the database and enable the extension. |
| 126 | + |
| 127 | +```sql |
| 128 | +DROP EXTENSION IF EXISTS pg_tokenizer; |
| 129 | +CREATE EXTENSION pg_tokenizer CASCADE; |
| 130 | +``` |
| 131 | + |
14 | 132 |
|
15 | 133 | ## From Source |
16 | 134 |
|
|
0 commit comments