Skip to content

Commit 1c96cf9

Browse files
committed
Implement comprehensive NDP MCP server for dataset search and discovery
This implementation provides a robust Model Context Protocol (MCP) server for the National Data Platform API, enabling seamless dataset search and discovery across multiple CKAN instances. Key features: - Multi-server support (local, global, pre-production CKAN instances) - Organization discovery with filtering capabilities - Comprehensive dataset search (simple term-based and advanced field-specific) - Dataset detail retrieval with complete metadata - Robust error handling with retry logic for network failures - Automatic result limiting to prevent context overflow (default: 20 results) - String/integer parameter handling for MCP client compatibility Tools implemented: - list_organizations: Discover available data organizations - search_datasets: Search datasets with flexible filtering options - get_dataset_details: Retrieve complete dataset metadata by ID or name Technical improvements: - FastMCP framework integration with proper type annotations - Pydantic models for data validation (Dataset model) - Comprehensive test suite with 30 tests covering edge cases - UV package manager integration for modern Python workflow - Extensive documentation following IOWarp MCP standards The server addresses common issues like token overflow by implementing smart limiting and provides clear workflow guidance for effective dataset discovery.
1 parent 655b67e commit 1c96cf9

24 files changed

+3243
-820853
lines changed

mcps/NDP/README.md

Lines changed: 155 additions & 500 deletions
Large diffs are not rendered by default.

mcps/NDP/docs/mcp_development_guide.md

Lines changed: 660 additions & 0 deletions
Large diffs are not rendered by default.

mcps/NDP/docs/ndp_api.md

Lines changed: 220 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
# API Search Documentation
2+
3+
This document provides comprehensive information about the search functionality in the API, specifically covering the `/organizations` and `/search` endpoints.
4+
5+
## Organizations Endpoint
6+
7+
### GET /organization - List Organizations
8+
9+
**Purpose**: Retrieve a list of all organizations with optional filtering.
10+
11+
**Method**: `GET`
12+
13+
**Parameters**:
14+
- `name` (query, optional): Filter organizations by name
15+
- `server` (query, optional): Specify server to list organizations from
16+
- Options: `local`, `global`, `pre_ckan`
17+
- Default: `global`
18+
19+
**Response**:
20+
- **200 OK**: Returns an array of organization names
21+
- **400 Bad Request**: Error message explaining the bad request
22+
- **422 Validation Error**: Validation error details
23+
24+
**Example Request**:
25+
```
26+
GET /organization?name=example&server=global
27+
```
28+
29+
**Example Response**:
30+
```json
31+
["org1", "org2", "org3"]
32+
```
33+
34+
### POST /organization - Create Organization
35+
36+
**Purpose**: Create a new organization.
37+
38+
**Method**: `POST`
39+
40+
**Parameters**:
41+
- `server` (query, optional): Specify server (`local` or `pre_ckan`, defaults to `local`)
42+
43+
**Request Body**:
44+
```json
45+
{
46+
"name": "example_org_name", // Required: Unique organization name
47+
"title": "Example Organization Title", // Required: Organization title
48+
"description": "This is an example organization." // Optional: Description
49+
}
50+
```
51+
52+
**Response**:
53+
- **201 Created**: Organization created successfully
54+
- **400 Bad Request**: Organization name already exists
55+
- **422 Validation Error**: Validation error details
56+
57+
**Example Response**:
58+
```json
59+
{
60+
"id": "305284e6-6338-4e13-b39b-e6efe9f1c45a",
61+
"message": "Organization created successfully"
62+
}
63+
```
64+
65+
## Search Endpoint
66+
67+
### GET /search - Search Datasets by Terms
68+
69+
**Purpose**: Search CKAN datasets using a list of search terms.
70+
71+
**Method**: `GET`
72+
73+
**Parameters**:
74+
- `terms` (query, required): Array of search terms
75+
- `keys` (query, optional): Array of keys corresponding to each term (use `null` for global search)
76+
- `server` (query, optional): Server to search on
77+
- Options: `local`, `global`
78+
- Default: `global`
79+
- Note: If 'local' CKAN is disabled, it cannot be used
80+
81+
**Response**:
82+
- **200 OK**: Returns array of matching datasets
83+
- **400 Bad Request**: Error message explaining the bad request
84+
- **422 Unprocessable Entity**: Validation error details
85+
86+
**Example Request**:
87+
```
88+
GET /search?terms=climate&terms=data&server=global
89+
```
90+
91+
### POST /search - Advanced Dataset Search
92+
93+
**Purpose**: Search datasets using various parameters with more granular control.
94+
95+
**Method**: `POST`
96+
97+
**Request Body Schema**:
98+
All fields are optional and can be used in combination:
99+
100+
#### Common Registration-Matching Parameters:
101+
- `dataset_name`: The name of the dataset
102+
- `dataset_title`: The title of the dataset
103+
- `owner_org`: The name of the organization
104+
- `resource_url`: The URL of the dataset resource
105+
- `resource_name`: The name of the dataset resource
106+
- `dataset_description`: The description of the dataset
107+
- `resource_description`: The description of the resource
108+
- `resource_format`: The format of the dataset resource
109+
110+
#### User-Defined Search Parameters:
111+
- `search_term`: Comma-separated list of terms to search across all fields
112+
- `filter_list`: Array of field filters in the form `key:value`
113+
- `timestamp`: Filter on the timestamp field of results
114+
- `server`: Server selection (`local`, `global`, or `pre_ckan`, defaults to `global`)
115+
116+
**Example Request Body**:
117+
```json
118+
{
119+
"dataset_name": "climate_data",
120+
"resource_format": "CSV",
121+
"search_term": "temperature,weather",
122+
"filter_list": ["type:sensor", "location:europe"],
123+
"server": "global"
124+
}
125+
```
126+
127+
**Response**:
128+
- **200 OK**: Returns array of matching datasets
129+
- **400 Bad Request**: Error occurred during search
130+
- **422 Validation Error**: Request validation failed
131+
132+
## Response Schema
133+
134+
Both search endpoints return an array of `DataSourceResponse` objects:
135+
136+
```json
137+
[
138+
{
139+
"id": "12345678-abcd-efgh-ijkl-1234567890ab",
140+
"name": "example_dataset_name",
141+
"title": "Example Dataset Title",
142+
"owner_org": "example_org_name",
143+
"notes": "This is an example dataset.",
144+
"resources": [
145+
{
146+
"id": "abcd1234-efgh5678-ijkl9012",
147+
"url": "http://example.com/resource",
148+
"name": "Example Resource Name",
149+
"description": "This is an example resource.",
150+
"format": "CSV"
151+
}
152+
],
153+
"extras": {
154+
"key1": "value1",
155+
"key2": "value2",
156+
"mapping": {
157+
"field1": "qeadw2",
158+
"field2": "gw4aw34",
159+
"time": "gw4aw34"
160+
},
161+
"processing": {
162+
"data_key": "",
163+
"info_key": "key_with_info"
164+
}
165+
}
166+
}
167+
]
168+
```
169+
170+
### DataSource Response Fields:
171+
- `id` (string, required): Unique dataset identifier
172+
- `name` (string, required): Unique dataset name
173+
- `title` (string, required): Dataset title
174+
- `owner_org` (string, optional): Organization ID that owns the dataset
175+
- `notes` (string, optional): Dataset description
176+
- `resources` (array, required): List of associated resources
177+
- `extras` (object, optional): Additional metadata
178+
179+
### Resource Object Fields:
180+
- `id` (string, required): Unique resource identifier
181+
- `url` (string, optional): Resource URL
182+
- `name` (string, required): Resource name
183+
- `description` (string, optional): Resource description
184+
- `format` (string, optional): Resource format (e.g., CSV, JSON, etc.)
185+
186+
## Search Examples
187+
188+
### 1. Simple term search:
189+
```bash
190+
curl -X GET "http://155.101.6.191:8003/search?terms=climate&server=global" \
191+
-H "Authorization: Bearer <token>"
192+
```
193+
194+
### 2. Advanced search with multiple criteria:
195+
```bash
196+
curl -X POST "http://155.101.6.191:8003/search" \
197+
-H "Content-Type: application/json" \
198+
-H "Authorization: Bearer <token>" \
199+
-d '{
200+
"dataset_name": "sensor_data",
201+
"resource_format": "CSV",
202+
"search_term": "temperature,humidity",
203+
"server": "global"
204+
}'
205+
```
206+
207+
### 3. List organizations with filtering:
208+
```bash
209+
curl -X GET "http://155.101.6.191:8003/organization?name=research&server=global" \
210+
-H "Authorization: Bearer <token>"
211+
```
212+
213+
## Server Options
214+
215+
The API supports multiple server configurations:
216+
- **local**: Search in local CKAN instance (may be disabled)
217+
- **global**: Search in global CKAN instance (default for most endpoints)
218+
- **pre_ckan**: Search in pre-production CKAN instance (available for some endpoints)
219+
220+
Choose the appropriate server based on your data access requirements and instance availability.

0 commit comments

Comments
 (0)