Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: Deploy Documentation

on:
push:
branches:
- main
paths:
- 'backend/docs/**'
- 'backend/mkdocs.yml'
- '.github/workflows/docs.yml'

jobs:
build-deploy:
runs-on: ubuntu-latest

steps:
- name: Checkout Code
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.10'

- name: Install Dependencies
run: |
cd backend
pip install mkdocs mkdocs-material mkdocstrings[python] pymdown-extensions

- name: Build and Deploy
run: |
cd backend
mkdocs gh-deploy --force
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,4 +48,4 @@ jobs:

- name: Run unit tests with pytest
run: |
cd backend && pytest --color=yes tests
cd backend && pytest --color=yes tests
43 changes: 33 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,8 @@ Our goal is to provide a familiar, spreadsheet-like interface for business users

For a limited demo, check out the [Knowledge Table Demo](https://knowledge-table-demo.whyhow.ai/).


https://github.com/user-attachments/assets/8e0e5cc6-6468-4bb5-888c-6b552e15b58a


To learn more about WhyHow and our projects, visit our [website](https://whyhow.ai/).

## Table of Contents
Expand Down Expand Up @@ -102,11 +100,13 @@ The frontend can be accessed at `http://localhost:3000`, and the backend can be
4. **Install the dependencies:**

For basic installation:

```sh
pip install .
```

For installation with development tools:

```sh
pip install .[dev]
```
Expand Down Expand Up @@ -180,6 +180,7 @@ To set up the project for development:
black .
isort .
```

---

## Features
Expand All @@ -189,12 +190,12 @@ To set up the project for development:
- **Chunk Linking** - Link raw source text chunks to the answers for traceability and provenance.
- **Extract with natural language** - Use natural language queries to extract structured data from unstructured documents.
- **Customizable extraction rules** - Define rules to guide the extraction process and ensure data quality.
- **Custom formatting** - Control the output format of your extracted data.
- **Custom formatting** - Control the output format of your extracted data. Knowledge table current supports text, list of text, number, list of numbers, and boolean formats.
- **Filtering** - Filter documents based on metadata or extracted data.
- **Exporting as CSV or Triples** - Download extracted data as CSV or graph triples.
- **Chained extraction** - Reference previous columns in your extraction questions using @ i.e. "What are the treatments for `@disease`?".
- **Split Cell Into Rows** - Turn outputs within a single cell from List of Numbers or List of Values and split it into individual rows to do more complex Chained Extraction

---

## Concepts
Expand All @@ -211,6 +212,15 @@ Each **document** is an unstructured data source (e.g., a contract, article, or

A **Question** is the core mechanism for guiding extraction. It defines what data you want to extract from a document.

### Rule

A **Rule** guides the extraction from the LLM. You can add rules on a column level or on a global level. Currently, the following rule types are supported:

- **May Return** rules give the LLM examples of answers that can be used to guide the extraction. This is a great way to give more guidance for the LLM on the type of things it should keep an eye out for.
- **Must Return** rules give the LLM an exhaustive list of answers that are allowed to be returned. This is a great way to give guardrails for the LLM to ensure only certain terms are returned.
- **Allowed # of Responses** rules are useful for provide guardrails in the event there are may be a range of potential ‘grey-area’ answers and we want to only restrict and guarantee only a certain number of the top responses are provided.
- **Resolve Entity** rules allow you to resolve values to a specific entity. This is useful for ensuring output conforms to a specific entity type. For example, you can write rules that ensure "blackrock", "Blackrock, Inc.", and "Blackrock Corporation" all resolve to the same entity - "Blackrock".

---

## Practical Usage
Expand All @@ -225,12 +235,25 @@ Once you've set up your questions, rules, and documents, the Knowledge Table pro
- **Metadata Generation**: Classify and tag information about your documents and files by running targeted questions against the files (i.e. "What project is this email thread about?")

---

## Export to Triples

To create the Schema for the Triples, we use an LLM to consider the Entity Type of the Column, the question that was used to generate the cells, and the values themselves, to create the schema and the triples. The document name is inserted as a node property. The vector chunk ids are also included in the JSON file of the triples, and tied to the triples created.

---

## Rules

We now have 3 types of [Rules](https://medium.com/enterprise-rag/rules-extraction-guardrails-knowledge-table-studio-e84999ade353) you can now incorporate within your processes, which are:

- **Entity Resolution Rules**: Resolving discrepencies between Entities or imposing a common terminology on top of Entities

- **Entity Extraction Rules**: Imposing Guardrails and Context for the Entities that should be detected and returned across Documents

- **Entity Relationship Rules**: Imposing Guardrails on the types of Patterns that should be returned on the Relationships between the extracted Entities

---

## Extending the Project

Knowledge Table is built to be flexible and customizable, allowing you to extend it to fit your workflow:
Expand Down Expand Up @@ -263,15 +286,15 @@ To use the Unstructured API integration:

When the `UNSTRUCTURED_API_KEY` is set, Knowledge Table will automatically use the Unstructured API for document processing. If the key is not set or if there's an issue with the Unstructured API, the system will fall back to the default document loaders.

Note: Usage of the Unstructured API may incur costs based on your plan with Unstructured.io.
---
## Note: Usage of the Unstructured API may incur costs based on your plan with Unstructured.io.

## Roadmap

- [ ] Expansion of Rules System
- [ ] Upload Extraction Rules via CSV
- [ ] Entity Resolution Rules
- [ ] Rules Dashboard
- [x] Expansion of Rules System
- [x] Upload Extraction Rules via CSV
- [x] Entity Resolution Rules
- [x] Rules Dashboard
- [x] Rules Log
- [ ] Support for more LLMs
- [ ] Azure OpenAI
- [ ] Llama3
Expand Down
7 changes: 7 additions & 0 deletions backend/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## Unreleased

## Added

- Backedn documentation

## [v0.1.6] - 2024-11-04

### Added

- Added support for queries without source data in vector database
- Graceful failure of triple export when no chunks are found
- Tested Qdrant vector database service
- Added resolve entity rule

### Changed

Expand Down
1 change: 1 addition & 0 deletions backend/docs/CONTRIBUTING.md
59 changes: 59 additions & 0 deletions backend/docs/api/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Knowledge Table API Overview

Welcome to the Knowledge Table API! This summary provides a quick overview of key endpoints, usage guidelines, and how to access the interactive API documentation.

---

**Base URL**

All API requests should be made to the following base URL for version 1:

```
https://api.example.com/v1
```

---

**Documentation**

Explore and test all API endpoints through the interactive docs provided by FastAPI:

- **Swagger UI**: [http://localhost:8000/docs](http://localhost:8000/docs) – A user-friendly interface for API exploration.
- **ReDoc**: [http://localhost:8000/redoc](http://localhost:8000/redoc) – A clean reference for detailed API information.

---

Knowledge Table currently offers the following backend endpoints for document management, graph export, and query processing:

**Document**
Upload and manage documents within the Knowledge Table system.

- **POST** `/document` – Uploads and processes a document.
- **DELETE** `/document/{document_id}` – Deletes a document by its ID.
For details, refer to [Document Endpoints](v1/endpoints/document.md).

**Graph**
Export structured data from processed documents in the form of triples.

- **POST** `/graph/export-triples` – Exports triples (subject, predicate, object) based on table data.
More information is available at [Graph Endpoints](v1/endpoints/graph.md).

**Query**
Run queries to interact with documents using natural language or structured queries.

- **POST** `/query` – Submits a query and receives a structured response with relevant document data.
See [Query Endpoints](v1/endpoints/query.md) for further details.

---

**Error Codes**

Standard HTTP status codes are used to indicate request success or failure:

| Status Code | Error | Description |
| ----------- | ----------------------- | -------------------------------- |
| `200` | `OK` | Successful request |
| `400` | `Bad Request` | Invalid request parameters |
| `401` | `Unauthorized` | Authentication failed or missing |
| `404` | `Not Found` | Resource not found |
| `500` | `Internal Server Error` | Server encountered an error |
Loading
Loading