whyhow-ai · tomsmoker · Oct 14, 2024 · Oct 14, 2024 · Oct 14, 2024 · Oct 14, 2024
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -0,0 +1,33 @@
+name: Deploy Documentation
+
+on:
+  push:
+    branches:
+      - main
+    paths:
+      - 'backend/docs/**'
+      - 'backend/mkdocs.yml'
+      - '.github/workflows/docs.yml'
+
+jobs:
+  build-deploy:
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout Code
+        uses: actions/checkout@v2
+
+      - name: Set up Python
+        uses: actions/setup-python@v2
+        with:
+          python-version: '3.10'
+
+      - name: Install Dependencies
+        run: |
+          cd backend
+          pip install mkdocs mkdocs-material mkdocstrings[python] pymdown-extensions
+
+      - name: Build and Deploy
+        run: |
+          cd backend
+          mkdocs gh-deploy --force
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
@@ -48,4 +48,4 @@ jobs:
 
       - name: Run unit tests with pytest
         run: |
-          cd backend && pytest --color=yes tests
+          cd backend && pytest --color=yes tests
diff --git a/README.md b/README.md
@@ -10,10 +10,8 @@ Our goal is to provide a familiar, spreadsheet-like interface for business users
 
 For a limited demo, check out the [Knowledge Table Demo](https://knowledge-table-demo.whyhow.ai/).
 
-
 https://github.com/user-attachments/assets/8e0e5cc6-6468-4bb5-888c-6b552e15b58a
 
-
 To learn more about WhyHow and our projects, visit our [website](https://whyhow.ai/).
 
 ## Table of Contents
@@ -102,11 +100,13 @@ The frontend can be accessed at `http://localhost:3000`, and the backend can be
 4. **Install the dependencies:**
 
    For basic installation:
+
    ```sh
    pip install .
    ```
 
    For installation with development tools:
+
    ```sh
    pip install .[dev]
    ```
@@ -180,6 +180,7 @@ To set up the project for development:
    black .
    isort .
    ```
+
 ---
 
 ## Features
@@ -189,12 +190,12 @@ To set up the project for development:
 - **Chunk Linking** - Link raw source text chunks to the answers for traceability and provenance.
 - **Extract with natural language** - Use natural language queries to extract structured data from unstructured documents.
 - **Customizable extraction rules** - Define rules to guide the extraction process and ensure data quality.
-- **Custom formatting** - Control the output format of your extracted data.
+- **Custom formatting** - Control the output format of your extracted data. Knowledge table current supports text, list of text, number, list of numbers, and boolean formats.
 - **Filtering** - Filter documents based on metadata or extracted data.
 - **Exporting as CSV or Triples** - Download extracted data as CSV or graph triples.
 - **Chained extraction** - Reference previous columns in your extraction questions using @ i.e. "What are the treatments for `@disease`?".
 - **Split Cell Into Rows** - Turn outputs within a single cell from List of Numbers or List of Values and split it into individual rows to do more complex Chained Extraction
- 
+
 ---
 
 ## Concepts
@@ -211,6 +212,15 @@ Each **document** is an unstructured data source (e.g., a contract, article, or
 
 A **Question** is the core mechanism for guiding extraction. It defines what data you want to extract from a document.
 
+### Rule
+
+A **Rule** guides the extraction from the LLM. You can add rules on a column level or on a global level. Currently, the following rule types are supported:
+
+- **May Return** rules give the LLM examples of answers that can be used to guide the extraction. This is a great way to give more guidance for the LLM on the type of things it should keep an eye out for.
+- **Must Return** rules give the LLM an exhaustive list of answers that are allowed to be returned. This is a great way to give guardrails for the LLM to ensure only certain terms are returned.
+- **Allowed # of Responses** rules are useful for provide guardrails in the event there are may be a range of potential ‘grey-area’ answers and we want to only restrict and guarantee only a certain number of the top responses are provided.
+- **Resolve Entity** rules allow you to resolve values to a specific entity. This is useful for ensuring output conforms to a specific entity type. For example, you can write rules that ensure "blackrock", "Blackrock, Inc.", and "Blackrock Corporation" all resolve to the same entity - "Blackrock".
+
 ---
 
 ## Practical Usage
@@ -225,12 +235,25 @@ Once you've set up your questions, rules, and documents, the Knowledge Table pro
 - **Metadata Generation**: Classify and tag information about your documents and files by running targeted questions against the files (i.e. "What project is this email thread about?")
 
 ---
+
 ## Export to Triples
 
 To create the Schema for the Triples, we use an LLM to consider the Entity Type of the Column, the question that was used to generate the cells, and the values themselves, to create the schema and the triples. The document name is inserted as a node property. The vector chunk ids are also included in the JSON file of the triples, and tied to the triples created.
 
 ---
 
+## Rules
+
+We now have 3 types of [Rules](https://medium.com/enterprise-rag/rules-extraction-guardrails-knowledge-table-studio-e84999ade353) you can now incorporate within your processes, which are:
+
+- **Entity Resolution Rules**: Resolving discrepencies between Entities or imposing a common terminology on top of Entities
+
+- **Entity Extraction Rules**: Imposing Guardrails and Context for the Entities that should be detected and returned across Documents
+
+- **Entity Relationship Rules**: Imposing Guardrails on the types of Patterns that should be returned on the Relationships between the extracted Entities
+
+---
+
 ## Extending the Project
 
 Knowledge Table is built to be flexible and customizable, allowing you to extend it to fit your workflow:
@@ -263,15 +286,15 @@ To use the Unstructured API integration:
 
 When the `UNSTRUCTURED_API_KEY` is set, Knowledge Table will automatically use the Unstructured API for document processing. If the key is not set or if there's an issue with the Unstructured API, the system will fall back to the default document loaders.
 
-Note: Usage of the Unstructured API may incur costs based on your plan with Unstructured.io.
----
+## Note: Usage of the Unstructured API may incur costs based on your plan with Unstructured.io.
 
 ## Roadmap
 
-- [ ] Expansion of Rules System
-  - [ ] Upload Extraction Rules via CSV
-  - [ ] Entity Resolution Rules
-  - [ ] Rules Dashboard
+- [x] Expansion of Rules System
+  - [x] Upload Extraction Rules via CSV
+  - [x] Entity Resolution Rules
+  - [x] Rules Dashboard
+  - [x] Rules Log
 - [ ] Support for more LLMs
   - [ ] Azure OpenAI
   - [ ] Llama3

diff --git a/backend/CHANGELOG.md b/backend/CHANGELOG.md
@@ -7,11 +7,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## Unreleased
 
+## Added
+
+- Backedn documentation
+
+## [v0.1.6] - 2024-11-04
+
 ### Added
 
 - Added support for queries without source data in vector database
 - Graceful failure of triple export when no chunks are found
 - Tested Qdrant vector database service
+- Added resolve entity rule
 
 ### Changed
 

diff --git a/backend/docs/CONTRIBUTING.md b/backend/docs/CONTRIBUTING.md
@@ -0,0 +1 @@
+../../CONTRIBUTING.md
diff --git a/backend/docs/api/overview.md b/backend/docs/api/overview.md
@@ -0,0 +1,59 @@
+# Knowledge Table API Overview
+
+Welcome to the Knowledge Table API! This summary provides a quick overview of key endpoints, usage guidelines, and how to access the interactive API documentation.
+
+---
+
+**Base URL**
+
+All API requests should be made to the following base URL for version 1:
+
+```
+https://api.example.com/v1
+```
+
+---
+
+**Documentation**
+
+Explore and test all API endpoints through the interactive docs provided by FastAPI:
+
+- **Swagger UI**: [http://localhost:8000/docs](http://localhost:8000/docs) – A user-friendly interface for API exploration.
+- **ReDoc**: [http://localhost:8000/redoc](http://localhost:8000/redoc) – A clean reference for detailed API information.
+
+---
+
+Knowledge Table currently offers the following backend endpoints for document management, graph export, and query processing:
+
+**Document**  
+ Upload and manage documents within the Knowledge Table system.
+
+- **POST** `/document` – Uploads and processes a document.
+- **DELETE** `/document/{document_id}` – Deletes a document by its ID.  
+  For details, refer to [Document Endpoints](v1/endpoints/document.md).
+
+**Graph**  
+ Export structured data from processed documents in the form of triples.
+
+- **POST** `/graph/export-triples` – Exports triples (subject, predicate, object) based on table data.  
+  More information is available at [Graph Endpoints](v1/endpoints/graph.md).
+
+**Query**  
+ Run queries to interact with documents using natural language or structured queries.
+
+- **POST** `/query` – Submits a query and receives a structured response with relevant document data.  
+  See [Query Endpoints](v1/endpoints/query.md) for further details.
+
+---
+
+**Error Codes**
+
+Standard HTTP status codes are used to indicate request success or failure:
+
+| Status Code | Error                   | Description                      |
+| ----------- | ----------------------- | -------------------------------- |
+| `200`       | `OK`                    | Successful request               |
+| `400`       | `Bad Request`           | Invalid request parameters       |
+| `401`       | `Unauthorized`          | Authentication failed or missing |
+| `404`       | `Not Found`             | Resource not found               |
+| `500`       | `Internal Server Error` | Server encountered an error      |