Skip to content

Commit 79fae30

Browse files
Merge branch 'main' into dependabot/pip/ai/generative-ai-service/hr-goal-alignment/files/aiohttp-3.12.14
2 parents 5214df9 + 6f17cc6 commit 79fae30

File tree

210 files changed

+30007
-1182
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

210 files changed

+30007
-1182
lines changed

.gitignore

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,4 +39,8 @@ terraform.rc
3939
.DS_Store
4040

4141
#VSC files
42-
.vscode
42+
.vscode
43+
44+
# Exclude cached Python binary files
45+
*.pyc
46+
__pycache__
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
Copyright (c) 2025 Oracle and/or its affiliates.
2+
3+
The Universal Permissive License (UPL), Version 1.0
4+
5+
Subject to the condition set forth below, permission is hereby granted to any
6+
person obtaining a copy of this software, associated documentation and/or data
7+
(collectively the "Software"), free of charge and under any and all copyright
8+
rights in the Software, and any and all patent rights owned or freely
9+
licensable by each licensor hereunder covering either (i) the unmodified
10+
Software as contributed to or provided by such licensor, or (ii) the Larger
11+
Works (as defined below), to deal in both
12+
13+
(a) the Software, and
14+
(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if
15+
one is included with the Software (each a "Larger Work" to which the Software
16+
is contributed by such licensors),
17+
18+
without restriction, including without limitation the rights to copy, create
19+
derivative works of, display, perform, and distribute the Software and make,
20+
use, sell, offer for sale, import, export, have made, and have sold the
21+
Software and the Larger Work(s), and to sublicense the foregoing rights on
22+
either these or other terms.
23+
24+
This license is subject to the following condition:
25+
The above copyright notice and either this complete permission notice or at
26+
a minimum a reference to the UPL must be included in all copies or
27+
substantial portions of the Software.
28+
29+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
30+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
31+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
32+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
33+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
34+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
35+
SOFTWARE.
Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# MCP Document Understanding Invoice Agent
2+
3+
The **Document Understanding Agent** is an AI-powered assistant designed to extract and understand text from documents (e.g., PDFs, images) using Oracle Cloud Infrastructure (OCI) Generative AI Agents and Document Understanding services.
4+
5+
This tool demonstrates an end-to-end workflow involving:
6+
7+
- File upload (via React frontend)
8+
- File storage in OCI Object Storage
9+
- Text extraction with OCI Document Understanding
10+
- Summary and reasoning via a GenAI Agent powered by MCP (Model Context Protocol)
11+
12+
The architecture is modular and can be easily extended by adding tools directly from the OCI Console, such as a RAG (Retrieval-Augmented Generation) tool or any other custom MCP-compatible tool, enabling more advanced workflows beyond document extraction—such as contextual question answering, validation, enrichment, or classification
13+
14+
---
15+
16+
## When to use this asset?
17+
18+
Use this assistant when you want to:
19+
20+
- Automatically extract text from scanned documents (images, PDFs)
21+
- Invoke OCI Document Understanding tools through an AI agent
22+
- Demonstrate AI-based document orchestration on OCI and Validation
23+
24+
### Ideal for:
25+
26+
- AI developers building document understanding pipelines
27+
- Oracle Cloud users integrating Generative AI with Object Storage
28+
- Showing document AI capabilities
29+
30+
---
31+
32+
## How to use this asset?
33+
34+
### Start the Backend
35+
36+
Navigate to the backend directory and run:
37+
38+
```bash
39+
cd backend
40+
python mcp_server_docunderstandingobjectextract.py
41+
(In a different terminal)
42+
uvicorn apiserverdocunderstandingobjectextract:app --reload --port 8001
43+
```
44+
45+
This does the following:
46+
47+
- Starts a local MCP server with a tool (`ocr_extract_from_object_storage2`) that wraps OCI Document Understanding.
48+
- Starts a FastAPI server that handles file uploads and routes them to Object Storage + the agent.
49+
50+
### Start the Frontend
51+
52+
In a separate terminal:
53+
54+
```bash
55+
cd oci-genai-agent-llama-react-frontend
56+
npm install
57+
npm run dev
58+
```
59+
60+
You will see a chat interface at [http://localhost:3000](http://localhost:3000), with support for file uploads (PDF, PNG, etc).
61+
62+
When you send a file:
63+
64+
- It's shown as a preview
65+
- Uploaded to the backend
66+
- Saved in OCI Object Storage
67+
- Routed to the GenAI Agent with an instruction like:
68+
69+
```
70+
Extract text from object storage. Namespace: <namespace>, Bucket: <bucket>, Name: <filename>
71+
```
72+
73+
---
74+
75+
## ⚙️ Setup Instructions
76+
77+
### 1. OCI Config
78+
79+
Set the following in `~/.oci/config`:
80+
81+
```ini
82+
[DEFAULT]
83+
user=ocid1.user.oc1..exampleuniqueID
84+
fingerprint=xx:xx:xx:xx
85+
key_file=~/.oci/oci_api_key.pem
86+
tenancy=ocid1.tenancy.oc1..exampleuniqueID
87+
region=us-chicago-1
88+
```
89+
90+
### 2. Object Storage Setup
91+
92+
- Create a bucket (e.g., `bucket-20250714-1419`)
93+
- Make sure the user has permission to `put_object` and `get_namespace`
94+
95+
### 3. MCP Tooling
96+
97+
- `mcp_server_docunderstandingobjectextract.py` exposes a tool `ocr_extract_from_object_storage2`
98+
- This is picked up by the agent automatically on FastAPI startup
99+
100+
---
101+
102+
## ✨ Key Features
103+
104+
| Feature | Description |
105+
| ---------------------- | ----------------------------------------------------------------- |
106+
| File Upload | Upload images or PDFs through the React UI |
107+
| OCI Object Storage | All files are stored securely in your OCI bucket |
108+
| OCR Extraction | Uses OCI Document Understanding to extract text from scanned docs |
109+
| GenAI Agent | Routes and responds to user requests intelligently |
110+
| Tool Orchestration | Agent can invoke tools dynamically (via MCP) |
111+
| Natural Language Reply | AI explains extracted results in human-readable format |
112+
113+
---
114+
115+
## Prompt Customization
116+
117+
The main agent prompt is set as:
118+
119+
```
120+
If the user wants to extract text from a document in Object Storage,
121+
call the `ocr_extract_from_object_storage2` tool with the `namespace`, `bucket`, and `name`.
122+
```
123+
124+
You can modify this in `apiserverdocunderstandingobjectextract.py` under `Agent(...)` setup.
125+
126+
---
127+
128+
## Useful for:
129+
130+
- Demos of OCI Document Understanding + GenAI Agents
131+
- Building your own document processing pipeline
132+
- AI chatbots that take file input and analyze content
133+
134+
---
135+
136+
## Directory Structure
137+
138+
```bash
139+
backend/
140+
├── apiserverdocunderstandingobjectextract.py # FastAPI app
141+
├── mcp_server_docunderstandingobjectextract.py # MCP server with document OCR tool
142+
143+
oci-genai-agent-llama-react-frontend/
144+
├── src/
145+
│ └── app/
146+
│ └── contexts/
147+
│ └── ChatContext.js # Hooks into backend API
148+
```
149+
150+
---
151+
152+
## License
153+
154+
Copyright (c) 2025 Oracle and/or its affiliates.
155+
Licensed under the Universal Permissive License (UPL), Version 1.0.
156+
157+
See LICENSE for more details.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# backend/apiserver.py
2+
import logging
3+
from fastapi import FastAPI, UploadFile, File, Form
4+
from fastapi.middleware.cors import CORSMiddleware
5+
from fastapi.responses import JSONResponse
6+
from rich.logging import RichHandler
7+
from rich.markup import escape
8+
9+
import oci
10+
from mcp.client.session_group import StreamableHttpParameters
11+
from oci.addons.adk import Agent, AgentClient
12+
from oci.addons.adk.mcp import MCPClientStreamableHttp
13+
14+
# — Logging —
15+
logging.basicConfig(level=logging.INFO, format="%(message)s", handlers=[RichHandler()])
16+
logger = logging.getLogger(__name__)
17+
18+
# — FastAPI Setup —
19+
app = FastAPI()
20+
app.add_middleware(CORSMiddleware, allow_origins=["*"], allow_methods=["*"], allow_headers=["*"])
21+
22+
BUCKET_NAME = "bucket-20250714-1419"
23+
24+
@app.on_event("startup")
25+
async def startup_event():
26+
logger.info("Connecting to MCP…")
27+
mcp_params = StreamableHttpParameters(url="http://localhost:8000/mcp")
28+
mcp_client = await MCPClientStreamableHttp(params=mcp_params, name="Smart Toolbox MCP").__aenter__()
29+
30+
client = AgentClient(auth_type="api_key", debug=False, region="us-chicago-1")
31+
agent = Agent(
32+
client=client,
33+
agent_endpoint_id="ocid1.genaiagentendpoint",
34+
instructions=(
35+
"If the user wants to extract text from a document in Object Storage, "
36+
"call the `ocr_extract_from_object_storage2` tool with the `namespace`, `bucket`, and `name`."
37+
38+
),
39+
tools=[await mcp_client.as_toolkit()],
40+
)
41+
agent.setup()
42+
app.state.agent = agent
43+
logger.info(" Agent ready.")
44+
45+
@app.post("/chat")
46+
async def chat(message: str = Form(...), file: UploadFile = File(None)):
47+
if file:
48+
try:
49+
file_bytes = await file.read()
50+
file_name = file.filename
51+
52+
config = oci.config.from_file()
53+
obj_client = oci.object_storage.ObjectStorageClient(config)
54+
namespace = obj_client.get_namespace().data
55+
56+
# Upload file
57+
obj_client.put_object(namespace, BUCKET_NAME, file_name, file_bytes)
58+
logger.info(f" Uploaded {file_name} to {BUCKET_NAME}")
59+
60+
# Inject the correct call instruction to the agent
61+
message = f"Extract text from object storage. Namespace: {namespace}, Bucket: {BUCKET_NAME}, Name: {file_name}"
62+
63+
except Exception as e:
64+
logger.exception(" Upload failed")
65+
return JSONResponse({"error": f"Upload failed: {str(e)}"}, status_code=500)
66+
67+
try:
68+
result = await app.state.agent.run_async(message)
69+
70+
if isinstance(result.output, dict) and result.output.get("type") == "function":
71+
name = result.output["name"]
72+
args = result.output["parameters"]
73+
logger.info(" Agent calls tool: %s(%r)", name, args)
74+
75+
tool_out = await app.state.agent.invoke_tool(name, args)
76+
followup = await app.state.agent.run_async({"type": "tool", "name": name, "output": tool_out})
77+
out = followup.output if not isinstance(followup.output, dict) else followup.output.get("text", "")
78+
else:
79+
out = result.output if not isinstance(result.output, dict) else result.output.get("text", "")
80+
81+
logger.info(" Replying: %s", escape(out))
82+
return JSONResponse({"text": out})
83+
84+
except Exception:
85+
logger.exception(" Chat error")
86+
return JSONResponse({"error": "internal error"}, status_code=500)
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
[DEFAULT]
2+
user=ocid1.user....
3+
fingerprint=c6:4f:6
4+
tenancy=ocid1.tenan...
5+
region=eu-frankfurt-1
6+
key_file=~/.oc
7+
8+
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
"""
2+
OCI models configuration and general config
3+
"""
4+
5+
DEBUG = False
6+
7+
MODEL_ID = "meta.llama-3.3-70b-instruct"
8+
9+
AUTH = "API_KEY"
10+
SERVICE_ENDPOINT = "https://inference.generativeai.eu-frankfurt-1.oci.oraclecloud.com"
11+
12+
TEMPERATURE = 0.1
13+
MAX_TOKENS = 1024
14+
TOP_P = 0.9
15+
16+
# OCI general
17+
COMPARTMENT_ID = "ocid1.compartment."
18+
19+
# history management
20+
MAX_MSGS_IN_HISTORY = 10
21+
# low, cause we're generating code
22+
MAX_ROWS_IN_SAMPLE = 10
23+
24+
# integration with RAG
25+
# RAG_AGENT_ID = "ocid1.genaiagentendpoint.oc1"
26+
# RAG_AGENT_ENDPOINT = "https://agent-runtime.generativeai.us-chicago-1.oci.oraclecloud.com"
27+
# switched to Ali Agent
28+
RAG_AGENT_ID = "ocid1.genaiagentend"
29+
RAG_AGENT_ENDPOINT = (
30+
"https://agent-runtime.generativeai.uk-london-1.oci.oraclecloud.com"
31+
)
32+
33+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
import os
2+
import logging
3+
from rich.logging import RichHandler
4+
from rich.markup import escape
5+
6+
import oci
7+
from oci.ai_document.models import (
8+
AnalyzeDocumentDetails,
9+
ObjectLocation,
10+
ObjectStorageDocumentDetails,
11+
DocumentTextExtractionFeature,
12+
)
13+
14+
from mcp.server.fastmcp import FastMCP
15+
16+
# — Logging —
17+
logging.basicConfig(level=logging.INFO, format="%(message)s", handlers=[RichHandler()])
18+
logger = logging.getLogger(__name__)
19+
20+
# — MCP Setup —
21+
os.environ["MCP_PORT"] = "8000"
22+
mcp = FastMCP("Smart Toolbox")
23+
24+
@mcp.tool()
25+
def ocr_extract_from_object_storage2(namespace: str, bucket: str, name: str) -> str:
26+
"""
27+
Extract text from a document in Object Storage using Document Understanding.
28+
"""
29+
try:
30+
cfg = oci.config.from_file()
31+
du_client = oci.ai_document.AIServiceDocumentClient(cfg)
32+
33+
doc = ObjectStorageDocumentDetails(
34+
source="OBJECT_STORAGE",
35+
namespace_name=namespace,
36+
bucket_name=bucket,
37+
object_name=name
38+
)
39+
40+
feats = [DocumentTextExtractionFeature(feature_type="TEXT_EXTRACTION")]
41+
42+
details = AnalyzeDocumentDetails(
43+
compartment_id="ocid1.compartment.oc1..aaa",
44+
document=doc,
45+
features=feats
46+
)
47+
48+
resp = du_client.analyze_document(analyze_document_details=details)
49+
50+
text = "\n".join(
51+
line.text for page in resp.data.pages or [] for line in page.lines or []
52+
).strip()
53+
54+
logger.info(" Extracted %d characters from %s", len(text), name)
55+
return text or "No text found in document."
56+
57+
except Exception as e:
58+
logger.exception(" Document extraction failed")
59+
return f"Error: {str(e)}"
60+
61+
if __name__ == "__main__":
62+
print(" MCP server running at http://localhost:8000/mcp")
63+
mcp.run(transport="streamable-http")
2.85 MB
Loading

0 commit comments

Comments
 (0)