Skip to content

Docs page LLMS.txt support#38

Open
gregnazario wants to merge 4 commits intomainfrom
cursor/docs-page-llms-txt-support-5733
Open

Docs page LLMS.txt support#38
gregnazario wants to merge 4 commits intomainfrom
cursor/docs-page-llms-txt-support-5733

Conversation

@gregnazario
Copy link
Contributor

Description

Adds LLMS.txt support to the documentation site to improve content discoverability and consumption by Large Language Models.

This PR introduces:

  • A concise llms.txt file (/llms.txt) providing an overview and key links.
  • A comprehensive llms-full.txt file (/llms-full.txt) containing the full documentation content.
  • An update to robots.txt to reference llms.txt as per the LLMS.txt specification.

Test Plan

  1. Navigate to /llms.txt and verify the concise overview with links is displayed.
  2. Navigate to /llms-full.txt and verify the comprehensive documentation content is displayed.
  3. Navigate to /robots.txt and verify the Sitemap: https://siwa.aptos.dev/llms.txt entry is present.

Related Links


Open in Cursor Open in Web

Add LLMS.txt support to help Large Language Models understand the SIWA
documentation structure and content:

- Add static llms.txt file in public folder with overview and links
- Add dynamic llms-full.txt route handler with comprehensive content
- Update robots.txt to reference llms.txt

The llms.txt file provides a quick overview with links to all documentation
pages, while llms-full.txt contains the complete documentation content
including all API references and code examples.

Co-authored-by: greg <greg@gnazar.io>
@cursor
Copy link

cursor bot commented Jan 14, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@vercel
Copy link

vercel bot commented Jan 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
sign-in-with-aptos-docs Ready Ready Preview, Comment Jan 14, 2026 6:14pm

cursoragent and others added 2 commits January 14, 2026 17:17
Replace static llms.txt with auto-generated content that reads from
the actual documentation MDX files. This ensures the LLMS.txt files
stay in sync with documentation changes automatically.

Changes:
- Add lib/llms.ts utility to parse MDX files and extract content
- Create dynamic route handler for /llms.txt with overview and links
- Update /llms-full.txt route to use auto-generation
- Remove static public/llms.txt file

The utility extracts frontmatter, titles, descriptions, and content
from all MDX documentation files, organizes them by section, and
generates both concise (llms.txt) and comprehensive (llms-full.txt)
output formats.

Co-authored-by: greg <greg@gnazar.io>
Co-authored-by: greg <greg@gnazar.io>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds LLMS.txt support to enable better consumption of documentation by Large Language Models. It provides both a concise overview file (/llms.txt) with links and a comprehensive file (/llms-full.txt) with full content, following the LLMS.txt specification.

Changes:

  • Created a new library (apps/docs/lib/llms.ts) to parse MDX documentation files and generate LLMS.txt formatted content
  • Added two route handlers (/llms.txt and /llms-full.txt) to serve the generated content
  • Updated robots.txt to reference the LLMS.txt file as per specification

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
apps/docs/lib/llms.ts Core library implementing MDX parsing, content extraction, and LLMS.txt generation
apps/docs/app/llms.txt/route.ts Route handler serving the concise LLMS.txt overview with links
apps/docs/app/llms-full.txt/route.ts Route handler serving comprehensive documentation content
apps/docs/public/robots.txt Updated to reference LLMS.txt per specification

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

const description = extractDescription(body);

// Determine section from path
const pathParts = basePath.split("/");
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When basePath is an empty string (for the root page.mdx), calling split('/') will return an array with one empty string element [''], not an empty array. This means pathParts[0] will be '' instead of undefined, causing the section classification logic to incorrectly assign 'Overview' to the root page. While this happens to work correctly due to the default 'Overview' assignment, this logic is fragile and could fail if the classification logic changes.

Suggested change
const pathParts = basePath.split("/");
const pathParts = basePath ? basePath.split("/") : [];

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +32
const frontmatterMatch = content.match(/^---\n([\s\S]*?)\n---/);

if (frontmatterMatch) {
const frontmatterStr = frontmatterMatch[1];
const frontmatter: Record<string, string> = {};

for (const line of frontmatterStr.split("\n")) {
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The frontmatter regex pattern requires exactly \n line endings, which will fail to match files with Windows-style \r\n line endings. This could cause frontmatter parsing to fail on files created or edited on Windows systems. Consider using /^---\r?\n([\s\S]*?)\r?\n---/ to handle both Unix and Windows line endings.

Suggested change
const frontmatterMatch = content.match(/^---\n([\s\S]*?)\n---/);
if (frontmatterMatch) {
const frontmatterStr = frontmatterMatch[1];
const frontmatter: Record<string, string> = {};
for (const line of frontmatterStr.split("\n")) {
const frontmatterMatch = content.match(/^---\r?\n([\s\S]*?)\r?\n---/);
if (frontmatterMatch) {
const frontmatterStr = frontmatterMatch[1];
const frontmatter: Record<string, string> = {};
for (const line of frontmatterStr.split(/\r?\n/)) {

Copilot uses AI. Check for mistakes.
Comment on lines +108 to +109
.replace(/<(Steps|TSDoc|Callout)[^>]*>[\s\S]*?<\/\1>/g, "")
.replace(/<(Steps|TSDoc|Callout)[^>]*\/>/g, "")
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded list of JSX components to remove (Steps, TSDoc, Callout) creates a maintenance burden. If new components are added to the documentation, they will need to be manually added here. Consider extracting this list to a constant or configuration array for easier maintenance.

Copilot uses AI. Check for mistakes.

# LLMS.txt - Documentation for Large Language Models
# https://llmstxt.org/
Sitemap: https://siwa.aptos.dev/llms.txt
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the robots.txt specification, the Sitemap directive is intended for XML sitemap files, not for LLMS.txt files. The LLMS.txt specification suggests adding a comment or reference to llms.txt in robots.txt, but not using the Sitemap directive. Consider changing this to a comment like # LLMS.txt available at: https://siwa.aptos.dev/llms.txt.

Suggested change
Sitemap: https://siwa.aptos.dev/llms.txt
# LLMS.txt available at: https://siwa.aptos.dev/llms.txt

Copilot uses AI. Check for mistakes.
Comment on lines +32 to +40
for (const line of frontmatterStr.split("\n")) {
const [key, ...valueParts] = line.split(":");
if (key && valueParts.length > 0) {
frontmatter[key.trim()] = valueParts
.join(":")
.trim()
.replace(/^["']|["']$/g, "");
}
}
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The frontmatter parser will fail on multi-line YAML values or values containing colons. For example, a title like title: \"Introduction: Getting Started\" will work, but more complex YAML structures (arrays, multi-line strings, nested objects) will not be parsed correctly. Consider using a proper YAML parser library like js-yaml or gray-matter for robust frontmatter parsing.

Copilot uses AI. Check for mistakes.
@gregnazario gregnazario marked this pull request as ready for review January 14, 2026 18:09
- Use gray-matter for robust YAML frontmatter parsing instead of
  custom regex (handles complex YAML, multi-line values, etc.)
- Fix basePath split logic to correctly handle empty string case
  by using conditional: `basePath ? basePath.split("/") : []`
- Extract JSX components list to a constant for easier maintenance
- Fix robots.txt to use comment instead of Sitemap directive
  (Sitemap is for XML sitemaps per robots.txt specification)
- Add Windows line ending support in description extraction

The LLMS.txt content is automatically generated at runtime from
the MDX documentation files - any changes to docs are automatically
reflected without manual updates.

Co-authored-by: greg <greg@gnazar.io>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 9 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +7 to +14
const content = generateLlmsTxt(BASE_URL);

return new NextResponse(content, {
headers: {
"Content-Type": "text/plain; charset=utf-8",
"Cache-Control": "public, max-age=3600, s-maxage=3600",
},
});
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GET handler doesn't include error handling for the case where generateLlmsTxt might fail (e.g., if the docs directory doesn't exist or there are file system errors). While readMdxFiles returns an empty array if the directory doesn't exist, file system errors during reading could still cause the handler to fail ungracefully.

Consider wrapping the content generation in a try-catch block and returning an appropriate error response if generation fails.

Suggested change
const content = generateLlmsTxt(BASE_URL);
return new NextResponse(content, {
headers: {
"Content-Type": "text/plain; charset=utf-8",
"Cache-Control": "public, max-age=3600, s-maxage=3600",
},
});
try {
const content = generateLlmsTxt(BASE_URL);
return new NextResponse(content, {
headers: {
"Content-Type": "text/plain; charset=utf-8",
"Cache-Control": "public, max-age=3600, s-maxage=3600",
},
});
} catch (error) {
console.error("Failed to generate llms.txt content:", error);
return new NextResponse("Internal Server Error", {
status: 500,
headers: {
"Content-Type": "text/plain; charset=utf-8",
"Cache-Control": "no-store",
},
});
}

Copilot uses AI. Check for mistakes.
Comment on lines +7 to +14
const content = generateLlmsFullTxt(BASE_URL);

return new NextResponse(content, {
headers: {
"Content-Type": "text/plain; charset=utf-8",
"Cache-Control": "public, max-age=3600, s-maxage=3600",
},
});
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GET handler doesn't include error handling for the case where generateLlmsFullTxt might fail (e.g., if the docs directory doesn't exist or there are file system errors). While readMdxFiles returns an empty array if the directory doesn't exist, file system errors during reading could still cause the handler to fail ungracefully.

Consider wrapping the content generation in a try-catch block and returning an appropriate error response if generation fails.

Suggested change
const content = generateLlmsFullTxt(BASE_URL);
return new NextResponse(content, {
headers: {
"Content-Type": "text/plain; charset=utf-8",
"Cache-Control": "public, max-age=3600, s-maxage=3600",
},
});
try {
const content = generateLlmsFullTxt(BASE_URL);
return new NextResponse(content, {
headers: {
"Content-Type": "text/plain; charset=utf-8",
"Cache-Control": "public, max-age=3600, s-maxage=3600",
},
});
} catch (error) {
// Optionally log the error for debugging/monitoring purposes
console.error("Failed to generate LLMS full text content:", error);
return NextResponse.json(
{ error: "Failed to generate content" },
{ status: 500 }
);
}

Copilot uses AI. Check for mistakes.
Comment on lines +6 to +14
export async function GET() {
const content = generateLlmsFullTxt(BASE_URL);

return new NextResponse(content, {
headers: {
"Content-Type": "text/plain; charset=utf-8",
"Cache-Control": "public, max-age=3600, s-maxage=3600",
},
});
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The route handlers read and process all documentation files on every request. While caching headers are set (1 hour), the server-side processing happens on each request to the route, which could be inefficient for large documentation sites.

Consider implementing one of the following optimizations:

  1. Use Next.js static generation to pre-generate these files at build time
  2. Implement server-side caching to avoid re-reading files on every request
  3. Use Next.js revalidation mechanisms for better performance

This is especially important if the documentation grows significantly.

Copilot uses AI. Check for mistakes.
!trimmed.startsWith("<") &&
!trimmed.startsWith("import")
) {
return trimmed.slice(0, 200);
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description is truncated at exactly 200 characters using slice(0, 200), which could cut a word in the middle, resulting in incomplete words in the description. This may look unprofessional in the generated LLMS.txt files.

Consider truncating at word boundaries instead, or adding an ellipsis when the description is truncated. For example, you could find the last space before the 200-character limit and truncate there.

Suggested change
return trimmed.slice(0, 200);
if (trimmed.length <= 200) {
return trimmed;
}
const maxLength = 200;
const lastSpaceBeforeLimit = trimmed.lastIndexOf(" ", maxLength);
const cutPosition =
lastSpaceBeforeLimit > 0 ? lastSpaceBeforeLimit : maxLength;
const truncated = trimmed.slice(0, cutPosition).trimEnd();
return truncated + "...";

Copilot uses AI. Check for mistakes.
Comment on lines +6 to +14
export async function GET() {
const content = generateLlmsTxt(BASE_URL);

return new NextResponse(content, {
headers: {
"Content-Type": "text/plain; charset=utf-8",
"Cache-Control": "public, max-age=3600, s-maxage=3600",
},
});
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The route handlers read and process all documentation files on every request. While caching headers are set (1 hour), the server-side processing happens on each request to the route, which could be inefficient for large documentation sites.

Consider implementing one of the following optimizations:

  1. Use Next.js static generation to pre-generate these files at build time
  2. Implement server-side caching to avoid re-reading files on every request
  3. Use Next.js revalidation mechanisms for better performance

This is especially important if the documentation grows significantly.

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +52
gray-matter:
specifier: ^4.0.3
version: 4.0.3
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gray-matter package depends on js-yaml@3.14.1, which is an older version from 2020. While this version should not have the critical security vulnerability that affected earlier versions (CVE-2021-25743), it would be beneficial to ensure that gray-matter is using an up-to-date version of its dependencies.

Consider checking if there's a newer version of gray-matter available that uses a more recent version of js-yaml, or monitor for updates to the dependency chain.

Copilot uses AI. Check for mistakes.
Comment on lines 4889 to +4892
next@15.3.6:
resolution: {integrity: sha512-oI6D1zbbsh6JzzZFDCSHnnx6Qpvd1fSkVJu/5d8uluqnxzuoqtodVZjYvNovooznUq8udSAiKp7MbwlfZ8Gm6w==}
engines: {node: ^18.18.0 || ^19.8.0 || >= 20.0.0}
deprecated: This version has a security vulnerability. Please upgrade to a patched version. See https://nextjs.org/blog/security-update-2025-12-11 for more details.
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock file shows that Next.js 15.3.6 has a security vulnerability as indicated by the deprecation notice: "This version has a security vulnerability. Please upgrade to a patched version."

While this isn't directly related to the LLMS.txt feature being added, the lock file refresh has surfaced this issue. Consider upgrading Next.js to a patched version in a follow-up PR to address this security concern.

Suggested change
next@15.3.6:
resolution: {integrity: sha512-oI6D1zbbsh6JzzZFDCSHnnx6Qpvd1fSkVJu/5d8uluqnxzuoqtodVZjYvNovooznUq8udSAiKp7MbwlfZ8Gm6w==}
engines: {node: ^18.18.0 || ^19.8.0 || >= 20.0.0}
deprecated: This version has a security vulnerability. Please upgrade to a patched version. See https://nextjs.org/blog/security-update-2025-12-11 for more details.
next@15.3.7:
resolution: {integrity: sha512-oI6D1zbbsh6JzzZFDCSHnnx6Qpvd1fSkVJu/5d8uluqnxzuoqtodVZjYvNovooznUq8udSAiKp7MbwlfZ8Gm6w==}
engines: {node: ^18.18.0 || ^19.8.0 || >= 20.0.0}

Copilot uses AI. Check for mistakes.
Comment on lines +227 to +232
for (const title of sectionOrder) {
const sectionPages = sectionMap.get(title);
if (sectionPages && sectionPages.length > 0) {
sections.push({ title, pages: sectionPages });
}
}
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function only includes sections that are explicitly listed in the sectionOrder array (lines 216-223). If new documentation sections are added that don't match these exact names, they will be silently excluded from the generated LLMS.txt files.

Consider either:

  1. Adding a catch-all at the end to include any remaining sections from sectionMap that weren't in sectionOrder, or
  2. Adding a warning/logging when sections are being excluded

This ensures all documentation is included even when new sections are added.

Copilot uses AI. Check for mistakes.
Comment on lines +99 to +113
* Clean MDX content for plain text output
*/
function cleanMdxContent(content: string): string {
// Build regex patterns from the component list
const componentsPattern = JSX_COMPONENTS_TO_REMOVE.join("|");

return (
content
// Remove import statements
.replace(/^import\s+.*$/gm, "")
// Remove JSX/TSX components with their content
.replace(
new RegExp(`<(${componentsPattern})[^>]*>[\\s\\S]*?<\\/\\1>`, "g"),
"",
)
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern for removing JSX components with content uses a greedy [\s\S]*? pattern which could potentially fail to properly match nested components with the same name. For example, if there's a <Steps> component containing another <Steps> component, the regex may not correctly identify the closing tag.

Consider using a more robust parsing approach or documenting this limitation. For simple documentation content this may be acceptable, but it's a potential edge case to be aware of.

Suggested change
* Clean MDX content for plain text output
*/
function cleanMdxContent(content: string): string {
// Build regex patterns from the component list
const componentsPattern = JSX_COMPONENTS_TO_REMOVE.join("|");
return (
content
// Remove import statements
.replace(/^import\s+.*$/gm, "")
// Remove JSX/TSX components with their content
.replace(
new RegExp(`<(${componentsPattern})[^>]*>[\\s\\S]*?<\\/\\1>`, "g"),
"",
)
* Remove specified JSX components and their children from MDX content.
* This handles nested components with the same name using a simple depth counter.
*/
function removeJsxComponentsWithContent(
source: string,
componentNames: string[],
): string {
let result = source;
for (const name of componentNames) {
const openTag = `<${name}`;
const closeTag = `</${name}>`;
let searchFrom = 0;
while (true) {
const firstOpen = result.indexOf(openTag, searchFrom);
if (firstOpen === -1) break;
let depth = 0;
let i = firstOpen;
while (i < result.length) {
const nextOpen = result.indexOf(openTag, i);
const nextClose = result.indexOf(closeTag, i);
if (nextClose === -1 && nextOpen === -1) {
// No further matching tags; abort to avoid infinite loop for this component.
i = -1;
break;
}
if (nextOpen !== -1 && (nextOpen < nextClose || nextClose === -1)) {
// Potential opening tag.
const endOfOpen = result.indexOf(">", nextOpen);
if (endOfOpen === -1) {
i = -1;
break;
}
const selfClosingPos = result.lastIndexOf("/>", endOfOpen);
const isSelfClosing =
selfClosingPos !== -1 &&
selfClosingPos >= nextOpen &&
selfClosingPos <= endOfOpen;
if (!isSelfClosing) {
depth += 1;
}
i = endOfOpen + 1;
} else {
// Closing tag for this component.
if (depth === 0) {
// Malformed structure; break to avoid infinite loop.
i = -1;
break;
}
depth -= 1;
const endOfClose = nextClose + closeTag.length;
i = endOfClose;
if (depth === 0) {
// Remove from the first opening tag to the end of this closing tag.
result =
result.slice(0, firstOpen) + result.slice(endOfClose);
// Continue searching from the same position in the updated string.
searchFrom = firstOpen;
break;
}
}
}
if (i === -1) {
// Could not find a complete, well-formed pair; stop processing this component.
break;
}
}
}
return result;
}
/**
* Clean MDX content for plain text output
*/
function cleanMdxContent(content: string): string {
// Build regex patterns from the component list
const componentsPattern = JSX_COMPONENTS_TO_REMOVE.join("|");
const contentWithoutImports = content.replace(/^import\s+.*$/gm, "");
const withoutComponents = removeJsxComponentsWithContent(
contentWithoutImports,
JSX_COMPONENTS_TO_REMOVE,
);
return (
withoutComponents
// Remove self-closing JSX/TSX components for the specified list

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants