A curated collection of research papers, blog posts, tools, and documentation for building agentic systems - AI agents that can reason, plan, use tools, and collaborate autonomously.
EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning
- Authors: Chuanrui Hu, Xingze Gao, Zuyi Zhou, Dannong Xu, Yi Bai, Xintong Li, Hui Zhang, Tong Li, Chong Zhang, Lidong Bing, Yafeng Deng
- Publication: January 8, 2026
- URL: https://arxiv.org/abs/2601.02163
- Why Relevant: Implements an engram-inspired lifecycle for computational memory, converting dialogue streams into MemCells that capture episodic traces, atomic facts, and time-bounded Foresight signals.
MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory
- Authors: Shengtao Zhang, Jiaqian Wang, Ruiwen Zhou, Junwei Liao, Yuchen Feng, Zhuo Li, Yujie Zheng, Weinan Zhang, Ying Wen, Zhiyu Li, Feiyu Xiong, Yutao Qi, Bo Tang, Muning Wen
- Publication: January 12, 2026
- URL: https://arxiv.org/abs/2601.03192
- Why Relevant: Proposes a non-parametric approach that evolves via reinforcement learning on episodic memory, decoupling stable reasoning from plastic memory.
Semantic XPath: Structured Agentic Memory Access for Conversational AI
- Authors: Yifan Simon Liu, Ruifan Wu, Liam Gallagher, Jiazhou Liang, Armin Toroghi, Scott Sanner
- Publication: March 1, 2026
- URL: https://arxiv.org/abs/2603.01160
- Why Relevant: Introduces a tree-structured memory module for conversational AI that improves over flat-RAG baselines by 176.7% while using only 9.1% of the tokens required by in-context memory.
HiMeS: Hippocampus-inspired Memory System for Personalized AI Assistants
- Authors: Hailong Li, Feifei Li, Wenhui Que, Xingyu Fan
- Publication: January 6, 2026
- URL: https://arxiv.org/abs/2601.06152
- Why Relevant: Proposes an AI assistant architecture that fuses short-term and long-term memory, inspired by biological hippocampus-neocortex memory mechanisms.
LUMA-RAG: Lifelong Multimodal Agents with Provably Stable Streaming Alignment
- Authors: Rohan Wandre, Yash Gajewar, Namrata Patel, Vivek Dhalkari
- Publication: November 4, 2025
- URL: https://arxiv.org/abs/2511.02371
- Why Relevant: Presents a lifelong multimodal agent architecture with streaming, multi-tier memory system that dynamically spills embeddings from hot tier to compressed tier under strict memory budgets.
MIRIX: Multi-Agent Memory System for LLM-Based Agents
- Authors: Yu Wang, Xi Chen
- Publication: July 10, 2025
- URL: https://arxiv.org/abs/2507.07957
- Why Relevant: Introduces a modular, multi-agent memory system with six distinct memory types: Core, Episodic, Semantic, Procedural, Resource Memory, and Knowledge Vault.
ID-RAG: Identity Retrieval-Augmented Generation for Long-Horizon Persona Coherence in Generative Agents
- Authors: Daniel Platnick, Mohamed E. Bengueddache, Marjan Alirezaie, Dava J. Newman, Alex ''Sandy'' Pentland, Hossein Rahnama
- Publication: September 29, 2025
- URL: https://arxiv.org/abs/2509.25299
- Why Relevant: Addresses identity drift in long-horizon agents by introducing a mechanism to ground agent personas in dynamic, structured identity models.
A Survey of Context Engineering for Large Language Models
- Authors: Lingrui Mei, Jiayu Yao, Yuyao Ge, Yiwei Wang, Baolong Bi, Yujun Cai, Jiazhi Liu, Mingyu Li, Zhong-Zhi Li, Duzhen Zhang, Chenlin Zhou, Jiayi Mao, Tianze Xia, Jiafeng Guo, Shenghua Liu
- Publication: July 21, 2025
- URL: https://arxiv.org/abs/2507.13334
- Why Relevant: Comprehensive 166-page survey analyzing over 1400 research papers, establishing context engineering as a formal discipline for optimizing LLM information payloads.
ReAct: Synergizing Reasoning and Acting in Language Models
- Authors: Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao
- Publication: October 6, 2022 (Revised March 10, 2023)
- URL: https://arxiv.org/abs/2210.03629
- Why Relevant: Foundational paper introducing ReAct pattern, enabling LLMs to generate both reasoning traces and task-specific actions in an interleaved manner.
LangChain: "How we built Agent Builder's memory system"
- Publication: February 21, 2026
- URL: https://blog.langchain.dev/how-we-built-agent-builders-memory-system/
- Topics: Production memory system using virtual filesystem backed by Postgres, mapping to COALA paper's procedural/semantic/episodic memory taxonomy.
LangChain: "How to Use Memory in Agent Builder"
- Author: Jacob Talbot
- Publication: February 19, 2026
- URL: https://blog.langchain.dev/how-to-use-memory-in-agent-builder/
- Topics: Short-term (per-thread) and long-term (persistent filesystem) memory, skills as specialized context.
Weaviate: "Context Engineering - LLM Memory and Retrieval for AI Agents"
- Authors: Femke Plantinga, Prajjwal Yadav, Victoria Slocum
- Publication: December 9, 2025
- URL: https://weaviate.io/blog/context-engineering
- Topics: Six pillars of context engineering, short-term vs. long-term agent memory architecture, failure modes (context poisoning/distraction/confusion/clash).
Weaviate: "The Limit in the Loop"
- Authors: Charles Pierse, Yaru Lin
- Publication: February 4, 2026
- URL: https://weaviate.io/blog/limit-in-the-loop
- Topics: Memory as infrastructure -- write control, deduplication, reconciliation, amendment, and purposeful forgetting for production agent memory.
Zep: "Zep Is The New State of the Art In Agent Memory"
- Authors: Preston Rasmussen, Daniel Chalef
- Publication: January 22, 2025
- URL: https://blog.getzep.com/state-of-the-art-agent-memory/
- Topics: Temporal knowledge graph architecture, benchmarking against MemGPT on DMR and LongMemEval, up to 100% accuracy gains.
Zep: "The Retrieval Tradeoff: What 50 Experiments Taught Us About Context Engineering"
- Author: Daniel Chalef
- Publication: December 9, 2025
- URL: https://blog.getzep.com/the-retrieval-tradeoff-what-50-experiments-taught-us-about-context-engineering/
- Topics: Recall vs. precision vs. efficiency tradeoff for one-shot agent memory retrieval across 50 experiments on LoCoMo benchmark.
Zep: "Context Templates: Context Engineering Made Simple"
- Author: Jack Ryan
- Publication: December 17, 2025
- URL: https://blog.getzep.com/context-templates-context-engineering-made-simple/
- Topics: Declarative context templates for controlling knowledge graph retrieval without writing retrieval code.
Letta: "Sleep-time Compute"
- Publication: April 21, 2025
- URL: https://letta.com/blog/sleep-time-compute
- Topics: Agents using idle periods to reorganize memory and reason proactively, creating Pareto improvement in performance.
Letta: "Agent Memory: How to Build Agents that Learn and Remember"
- Publication: July 7, 2025
- URL: https://letta.com/blog/agent-memory
- Topics: Building agents with persistent memory that learn across interactions, stateless-to-stateful transition.
Letta: "Memory Blocks: The Key to Agentic Context Management"
- Publication: May 14, 2025
- URL: https://letta.com/blog/memory-blocks
- Topics: Structured memory blocks as abstraction for context window management.
Letta: "RAG is not Agent Memory"
- Publication: February 13, 2025
- URL: https://letta.com/blog/rag-vs-agent-memory
- Topics: Why traditional RAG is insufficient for agent memory -- differences between document retrieval and true agent memory.
Letta: "Stateful Agents: The Missing Link in LLM Intelligence"
- Publication: February 6, 2025
- URL: https://letta.com/blog/stateful-agents
- Topics: Persistent memory and learning during deployment as the missing capability for production LLM intelligence.
Letta: "Anatomy of a Context Window: A Guide to Context Engineering"
- Publication: July 3, 2025
- URL: https://letta.com/blog/guide-to-context-engineering
- Topics: Designing and managing context windows for maximum agent effectiveness.
Letta: "Continual Learning in Token Space"
- Publication: December 11, 2025
- URL: https://letta.com/blog/continual-learning
- Topics: Agents carrying memories across model generations via learning in token space.
Letta: "Introducing Context Repositories: Git-based Memory for Coding Agents"
- Publication: February 12, 2026
- URL: https://letta.com/blog/context-repositories
- Topics: Programmatic context management with git-based versioning for agent memory.
Letta: "Conversations: Shared Agent Memory across Concurrent Experiences"
- Publication: January 21, 2026
- URL: https://letta.com/blog/conversations
- Topics: Conversations API for agents maintaining shared memory across parallel user experiences.
Letta: "Letta Leaderboard: Benchmarking LLMs on Agentic Memory"
- Publication: May 29, 2025
- URL: https://letta.com/blog/letta-leaderboard
- Topics: Benchmark suite evaluating LLM effectiveness at managing agentic memory tasks.
Letta: "Benchmarking AI Agent Memory: Is a Filesystem All You Need?"
- Publication: August 12, 2025
- URL: https://letta.com/blog/benchmarking-ai-agent-memory
- Topics: Filesystem-based memory scoring 74.0% on LoCoMo, beating specialized memory tool libraries.
LlamaIndex: "Files Are All You Need"
- Author: Jerry Liu
- Publication: January 15, 2026
- URL: https://www.llamaindex.ai/blog/files-are-all-you-need
- Topics: Agents converging on files/filesystems as primary context management interface.
Weaviate: "Building A Legal RAG App in 36 Hours"
- Author: Femke Plantinga, Victoria Slocum
- Publication: February 26, 2026
- URL: https://weaviate.io/blog/legal-rag-app
- Topics: Practitioner guide on building a production-ready end-to-end RAG application using Weaviate's Query Agent, relevant to RAG memory and context management.
Weaviate: "Introducing Weaviate Agent Skills"
- Author: Femke Plantinga, Prajjwal Yadav, Victoria Slocum
- Publication: February 18, 2026
- URL: https://weaviate.io/blog/weaviate-agent-skills
- Topics: Introduces agent skills library for building production-ready agent workflows with Weaviate, directly relevant to agentic AI memory and context engineering.
Letta: "Agent Memory: How to Build Agents that Learn and Remember"
- Publication: July 07, 2025
- URL: https://www.letta.com/blog/agent-memory
- Topics: Comprehensive guide on building agents with persistent memory, covering stateless vs stateful paradigms and memory architectures for learning agents.
Letta: "Anatomy of a Context Window: A Guide to Context Engineering"
- Publication: July 03, 2025
- URL: https://www.letta.com/blog/guide-to-context-engineering
- Topics: Deep-dive into designing and managing agent context windows, covering context engineering techniques for AI agents.
Letta: "Memory Blocks: The Key to Agentic Context Management"
- Publication: May 14, 2025
- URL: https://www.letta.com/blog/memory-blocks
- Topics: Engineering deep-dive on memory block abstractions for structuring agent context windows into discrete, functional memory units.
Letta: "RAG is not Agent Memory"
- Publication: February 13, 2025
- URL: https://www.letta.com/blog/rag-vs-agent-memory
- Topics: Explains why traditional RAG is insufficient for agent memory and how persistent agent memory differs from retrieval-augmented generation.
Letta: "Stateful Agents: The Missing Link in LLM Intelligence"
- Publication: February 06, 2025
- URL: https://www.letta.com/blog/stateful-agents
- Topics: Introduces stateful agents that maintain persistent memory and learn during deployment, covering the architecture for memory-enabled AI systems.
Letta: "Conversations: Shared Agent Memory across Concurrent Experiences"
- Publication: January 21, 2026
- URL: https://www.letta.com/blog/conversations
- Topics: Product deep-dive on the Conversations API enabling agents to maintain shared memory across parallel concurrent user interactions.
Letta: "Letta Code: A Memory-First Coding Agent"
- Publication: December 16, 2025
- URL: https://www.letta.com/blog/letta-code
- Topics: Introduces a memory-first coding agent that persists state and learns over time, demonstrating persistent agent memory in a coding context.
Letta: "Letta Evals: Evaluating Agents that Learn"
- Publication: October 23, 2025
- URL: https://www.letta.com/blog/letta-evals
- Topics: Open-source evaluation framework for testing stateful agents with persistent memory, measuring how well agents learn and remember.
Letta: "Rearchitecting Letta's Agent Loop: Lessons from ReAct, MemGPT, & Claude Code"
- Publication: October 14, 2025
- URL: https://www.letta.com/blog/letta-v1-agent
- Topics: Engineering deep-dive on Letta's agent architecture redesign incorporating memory management lessons from MemGPT for stateful agents.
Letta: "New course on Letta with DeepLearning.AI"
- Publication: November 07, 2024
- URL: https://www.letta.com/blog/deeplearning-ai-llms-as-operating-systems-agent-memory
- Topics: Practitioner course on agent memory concepts and implementation using Letta, covering LLMs as operating systems with memory.
Letta: "The AI agents stack"
- Publication: November 14, 2024
- URL: https://www.letta.com/blog/ai-agents-stack
- Topics: Overview of the AI agents stack landscape including memory and state management layers for building production agents.
LlamaIndex: "Build Better Context Graphs: Custom Instructions, Search Filters, and Webhooks"
- Publication: March 2026
- URL: https://blog.getzep.com/build-better-context-graphs-custom-instructions-search-filters-and-webhooks/
- Topics: Engineering guide on custom extraction instructions, property-level search filters, and webhooks for building and querying context graphs for agent memory.
LlamaIndex: "How Zep Works: A Visual Guide to Knowledge Graphs for AI Agents"
- Publication: December 2025
- URL: https://blog.getzep.com/a-visual-guide-to-knowledge-graphs-for-ai-agents/
- Topics: Visual deep-dive into how temporal knowledge graphs provide persistent context and memory for AI agents, comparing to chat memory and static RAG.
LlamaIndex: "Building Voice Agents with Memory: Zep x LiveKit"
- Publication: September 2025
- URL: https://blog.getzep.com/zep-livekit/
- Topics: Production implementation guide for adding long-term persistent memory to voice agents using Zep and LiveKit integration.
LlamaIndex: "Agents That Always Remember What Matters"
- Publication: November 2025
- URL: https://blog.getzep.com/agents-that-always-remember-what-matters/
- Topics: Steerable user summaries for agent memory using temporal knowledge graphs, with implementation guidance for persistent context.
LlamaIndex: "How We Scaled Zep 30x in 2 Weeks (and Made It Faster)"
- Publication: November 2025
- URL: https://blog.getzep.com/scaling-agent-memory-zep-30x/
- Topics: Engineering deep-dive on scaling agent memory infrastructure from thousands to millions of requests, addressing context retrieval latency.
LlamaIndex: "Zep v3: Context Engineering Takes Center Stage"
- Publication: August 2025
- URL: https://blog.getzep.com/zep-v3-context-engineering-takes-center-stage/
- Topics: Major platform release focused on context engineering for agent memory, including knowledge graph capabilities for assembling right context.
LlamaIndex: "Graphiti Adds FalkorDB Support as Project Approaches 14,000 Stars"
- Publication: July 2025
- URL: https://blog.getzep.com/graphiti-knowledge-graphs-falkordb-support/
- Topics: Knowledge graph memory system adds FalkorDB backend support, expanding options for persistent agent memory with temporal knowledge graphs.
LlamaIndex: "What is Context Engineering, Anyway?"
- Publication: June 2025
- URL: https://blog.getzep.com/what-is-context-engineering/
- Topics: Practitioner guide explaining context engineering as the systematic assembly of right information for agent context management.
LlamaIndex: "The Private Agent Memory Fallacy"
- Publication: June 2025
- URL: https://blog.getzep.com/the-ai-memory-wallet-fallacy/
- Topics: Analysis of portable agent memory concepts, exploring data strategy and persistent memory architecture for AI agents.
LlamaIndex: "Stop Using RAG for Agent Memory"
- Publication: June 2025
- URL: https://blog.getzep.com/stop-using-rag-for-agent-memory/
- Topics: Practitioner talk on why RAG is insufficient for agent memory and alternatives using knowledge graphs for persistent memory.
LlamaIndex: "Introducing Entity Types: Smarter, Structured Memory for Agents"
- Publication: May 2025
- URL: https://blog.getzep.com/entity-types-structured-agent-memory/
- Topics: Production feature for structuring domain-specific entity types in agent memory knowledge graphs for precise recall.
LlamaIndex: "Lies, Damn Lies, & Statistics: Is Mem0 Really SOTA in Agent Memory?"
- Publication: May 2025
- URL: https://blog.getzep.com/lies-damn-lies-statistics-is-mem0-really-sota-in-agent-memory/
- Topics: Benchmark analysis comparing agent memory systems (Zep vs Mem0) on LoCoMo, relevant to evaluating persistent memory approaches.
LlamaIndex: "GPT-4.1 and o4-mini: Is OpenAI Overselling Long-Context?"
- Publication: April 2025
- URL: https://blog.getzep.com/gpt-4-1-and-o4-mini-is-openai-overselling-long-context/
- Topics: LongMemEval benchmark evaluation showing why raw context window size is insufficient for agent memory, arguing for structured memory.
LlamaIndex: "The One-Token Trick"
- Publication: April 2025
- URL: https://blog.getzep.com/the-one-token-trick/
- Topics: Engineering technique using single-token LLM requests to improve RAG memory search quality at minimal cost and latency.
LlamaIndex: "Cursor IDE: Adding Memory With Graphiti MCP"
- Publication: March 2025
- URL: https://blog.getzep.com/cursor-adding-memory-with-graphiti-mcp/
- Topics: Practitioner guide for adding persistent memory to Cursor IDE using Graphiti MCP knowledge graph for cross-session recall.
LlamaIndex: "Building a Memory Agent with the OpenAI Agents SDK and Zep"
- Publication: March 2025
- URL: https://blog.getzep.com/building-a-memory-agent-with-the-openai-agents-sdk-and-zep/
- Topics: Implementation walkthrough for building an agent with persistent memory using OpenAI Agents SDK integrated with Zep.
Firecracker Specification Document
- URL: https://github.com/firecracker-microvm/firecracker/blob/main/SPECIFICATION.md
- Description: Technical specification with performance characteristics enforced via CI testing.
- Why Relevant: Defines microVM architecture for secure, lightweight virtualization.
Firecracker Design Document
- URL: https://github.com/firecracker-microvm/firecracker/blob/main/docs/design.md
- Description: Architecture and component design documentation.
- Why Relevant: Comprehensive design documentation for understanding microVM internals.
RACK (Recent ACKnowledgement) TCP Loss-Detection Algorithm (RFC 8985)
- Standard: RFC 8985
- URL: https://datatracker.ietf.org/doc/html/rfc8985
- Implementation: gVisor Network Stack
- Reference: https://gvisor.dev/blog/2021/08/31/gvisor-rack/
- Why Relevant: Foundation for gVisor's optimized networking stack.
gVisor: "Safe Ride into the Dangerzone: Reducing attack surface with gVisor"
- Authors: Alexis Métaireau, Alex Pyrgiotis, Etienne Perot
- Publication: September 23, 2024
- URL: https://gvisor.dev/blog/2024/09/23/safe-ride-into-the-dangerzone/
- Cross-posted: https://dangerzone.rocks/news/2024-09-23-gvisor
- Topics: Dangerzone collaboration, document conversion security.
E2B: "How Perplexity implemented advanced data analysis for Pro users in 1 week"
- URL: https://www.e2b.dev/blog/how-perplexity-implemented-advanced-data-analysis-for-pro-users-in-1-week
- Topics: Data analysis, advanced features implementation.
E2B: "How Hugging Face Is Using E2B to Replicate DeepSeek-R1"
- URL: https://www.e2b.dev/blog/how-hugging-face-is-using-e2b-to-replicate-deepseek-r1
- Topics: Model replication, AI workloads.
E2B: "How Manus Uses E2B to Provide Agents With Virtual Computers"
- URL: https://www.e2b.dev/blog/how-manus-uses-e2b-to-provide-agents-with-virtual-computers
- Topics: AI agents, virtual computers.
E2B: "Groq's Compound AI Models Are Powered by E2B"
- URL: https://www.e2b.dev/blog/groqs-compound-ai-models-are-powered-by-e2b
- Topics: Compound AI systems.
E2B: "Lindy Powers AI Workflows With E2B Code Action"
- URL: https://www.e2b.dev/blog/lindy-powers-ai-workflows-with-e2b-code-action
- Topics: AI workflows, code execution.
gVisor: "Optimizing seccomp usage in gVisor"
- Author: Etienne Perot
- Publication: February 1, 2024
- URL: https://gvisor.dev/blog/2024/02/01/seccomp/
- Topics: seccomp-bpf optimization, syscall filtering performance.
gVisor: "Faster filesystem access with Directfs"
- Author: Ayush Ranjan
- Publication: June 27, 2023
- URL: https://gvisor.dev/blog/2023/06/27/directfs/
- Originally posted: https://opensource.googleblog.com/2023/06/optimizing-gvisor-filesystems-with-directfs.html
- Topics: Direct filesystem access, performance improvements.
gVisor: "Running Stable Diffusion on GPU with gVisor"
- Author: Etienne Perot
- Publication: June 20, 2023
- URL: https://gvisor.dev/blog/2023/06/20/gpu-pytorch-stable-diffusion/
- Related: https://stability.ai/blog/stable-diffusion-public-release
- Topics: GPU support, NVIDIA CUDA, AI/ML workloads.
gVisor: "Rootfs Overlay"
- Author: Ayush Ranjan
- Publication: May 8, 2023
- URL: https://gvisor.dev/blog/2023/05/08/rootfs-overlay/
- Originally posted: https://opensource.googleblog.com/2023/04/gvisor-improves-performance-with-root-filesystem-overlay.html
- Topics: Filesystem performance, tmpfs overlay.
gVisor: "Releasing Systrap - A high-performance gVisor platform"
- Author: Konstantin Bogomolov
- Publication: April 28, 2023
- URL: https://gvisor.dev/blog/2023/04/28/systrap-release/
- Topics: Systrap platform, performance improvements.
gVisor: "How we Eliminated 99% of gVisor Networking Memory Allocations with Enhanced Buffer Pooling"
- Author: Lucas Manning
- Publication: October 24, 2022
- URL: https://gvisor.dev/blog/2022/10/24/buffer-pooling/
- Topics: Netstack, memory optimization, userspace networking.
gVisor: "Threat Detection in gVisor"
- Author: Fabricio Voznika
- Publication: August 31, 2022
- URL: https://gvisor.dev/blog/2022/08/01/threat-detection/
- Topics: Runtime monitoring, threat detection, Falco integration: https://falco.org.
gVisor: "Running gVisor in Production at Scale in Ant"
- Authors: Jianfeng Tan, Yong He (Ant Group)
- Publication: December 2, 2021
- URL: https://gvisor.dev/blog/2021/12/02/running-gvisor-in-production-at-scale-in-ant/
- Related: https://www.antgroup.com/
- Topics: Production deployment, large-scale infrastructure.
gVisor: "gVisor RACK"
- Author: Nayana Bidari
- Publication: August 31, 2021
- URL: https://gvisor.dev/blog/2021/08/31/gvisor-rack/
- Topics: TCP loss detection, networking improvements.
gVisor: "Platform Portability"
- Authors: Ian Lewis, Michael Pratt
- Publication: October 22, 2020
- URL: https://gvisor.dev/blog/2020/10/22/platform-portability/
- Topics: Hardware virtualization alternatives, cross-platform support.
gVisor: "Containing a Real Vulnerability"
- Author: Fabricio Voznika
- Publication: September 18, 2020
- URL: https://gvisor.dev/blog/2020/09/18/containing-a-real-vulnerability/
- Related CVE: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-14386
- Topics: Container escape vulnerabilities, gVisor security case study.
gVisor: "gVisor Networking Security"
- Author: Ian Gudger
- Publication: April 2, 2020
- URL: https://gvisor.dev/blog/2020/04/02/gvisor-networking-security/
- Topics: Network stack architecture, security principles.
gVisor: "gVisor Security Basics - Part 1"
- Authors: Jeremiah Spradlin, Zach Koopmans
- Publication: November 18, 2019
- URL: https://gvisor.dev/blog/2019/11/18/gvisor-security-basics-part-1/
- Topics: Security design principles, container-native security.
AWS Blog: "Firecracker – Lightweight Virtualization for Serverless Computing"
- Author: Jeff Barr
- Publication: November 26, 2018
- URL: https://aws.amazon.com/blogs/aws/firecracker-lightweight-virtualization-for-serverless-computing/
- Description: Announcement of Firecracker technology, security features, performance characteristics (125ms startup, 5 MiB memory overhead).
AWS Open Source Blog: "Announcing the Firecracker Open Source Technology: Secure and Fast microVM for Serverless Computing"
- Authors: Arun Gupta, Linda Lian
- Publication: November 27, 2018
- URL: https://aws.amazon.com/blogs/opensource/firecracker-open-source-secure-fast-microvm-serverless/
- Description: Open source announcement, technical architecture, Rust language choice, integration with AWS Lambda and Fargate.
E2B: "Firecracker vs QEMU"
- Publication: March 2025
- URL: https://www.e2b.dev/blog/firecracker-vs-qemu
- Topics: Comparison of microVM technologies underpinning sandbox isolation for AI code execution.
E2B: "How I taught an AI to use a computer"
- Publication: January 2025
- URL: https://www.e2b.dev/blog/how-i-taught-an-ai-to-use-a-computer
- Topics: Giving AI agents full sandboxed computer access (browser, filesystem, terminal).
E2B: "Code Interpreter Sandbox"
- Publication: November 2023
- URL: https://www.e2b.dev/blog/e2b-sandbox
- Topics: Sandboxed code interpreter architecture for safe AI-generated code execution.
E2B: "Up to 5x Faster Sandboxes"
- Publication: February 2024
- URL: https://www.e2b.dev/blog/up-to-5x-faster-sandboxes
- Topics: Optimizing sandbox boot times using Firecracker microVMs for faster agent code execution.
E2B: "LLM-powered Code Interpreters"
- Publication: February 2024
- URL: https://www.e2b.dev/blog/llm-powered-code-interpreters
- Topics: Survey of LLM code interpreter architectures and why sandboxed execution is critical.
E2B: "Limitations of Running AI Agents Locally"
- Publication: February 2024
- URL: https://www.e2b.dev/blog/limitations-of-running-ai-agents-locally
- Topics: Why local execution is insufficient and cloud sandboxes provide better isolation.
Fly.io: "Fly Machines: an API for fast-booting VMs"
- Author: Kurt Mackey
- Publication: May 2022
- URL: https://fly.io/blog/fly-machines/
- Topics: Firecracker-based microVM API with 300ms boot times, scale-to-zero, per-tenant isolation.
Fly.io: "The Design & Implementation of Sprites"
- Author: Thomas Ptacek
- Publication: 2026
- URL: https://fly.io/blog/design-and-implementation/
- Topics: Architecture of Fly's Sprites system for running isolated execution environments.
Fly.io: "Code And Let Live"
- Author: Kurt Mackey
- Publication: 2026
- URL: https://fly.io/blog/code-and-let-live/
- Topics: Isolated code environments for the agent era, trust and isolation in computing.
Fly.io: "Phoenix.new -- The Remote AI Runtime"
- Author: Chris McCord
- Publication: 2025
- URL: https://fly.io/blog/phoenix-new-the-remote-ai-runtime/
- Topics: Remote AI runtime for coding agents on Fly Machines with sandboxed execution.
Fly.io: "Our Best Customers Are Now Robots"
- Author: Kurt Mackey
- Publication: 2025
- URL: https://fly.io/blog/fuckin-robots/
- Topics: AI agents as primary users driving demand for programmatically-created isolated environments.
Fly.io: "The Serverless Server"
- Author: Will Jordan
- Publication: 2022
- URL: https://fly.io/blog/the-serverless-server/
- Topics: Firecracker-based VM architecture providing stronger isolation than containers.
Daytona: "Sandbox Firewall"
- Author: Ivan Burazin
- Publication: August 2025
- URL: https://www.daytona.io/dotfiles/sandbox-firewall
- Topics: Fine-grained network firewall for AI sandboxes with zero-trust code execution.
Daytona: "Securing AI Code: Building Safe Sandboxes with Daytona SDK"
- Author: Nikola Balic
- Publication: February 2025
- URL: https://www.daytona.io/dotfiles/securing-ai-code-building-safe-sandboxes-with-daytona-sdk
- Topics: Building isolated sandboxes for AI-generated code with resource and security controls.
Daytona: "Sandboxing AI Development with Agent-Agnostic Infrastructure"
- Author: Nikola Balic
- Publication: October 2024
- URL: https://www.daytona.io/dotfiles/sandboxing-ai-development-with-agent-agnostic-infrastructure
- Topics: Agent-agnostic sandbox middleware for running AI coding agents in parallel isolated environments.
Daytona: "Harnessing AI through Standardization and Isolation"
- Author: Nikola Balic
- Publication: November 2023
- URL: https://www.daytona.io/dotfiles/harnessing-ai-through-standardization-and-isolation
- Topics: Standardized dev environments as AI sandboxes combining isolation with reproducibility.
Daytona: "Daytona Raises $24M Series A to Give Every Agent a Computer"
- Author: Ivan Burazin
- Publication: February 2026
- URL: https://www.daytona.io/dotfiles/daytona-raises-24m-series-a-to-give-every-agent-a-computer
- Topics: Providing isolated "computers" (sandboxes) for millions of AI agents as compute infrastructure.
Cloudflare: "Mitigating Spectre and Other Security Threats: The Cloudflare Workers Security Model"
- Author: Kenton Varda
- Publication: July 2020
- URL: https://blog.cloudflare.com/mitigating-spectre-and-other-security-threats-the-cloudflare-workers-security-model
- Topics: Workers' multi-layer isolation: V8 isolates, process sandboxing, capability-based API design, Spectre mitigations.
Cloudflare: "Introducing Moltworker: a self-hosted personal AI agent"
- Authors: Celso Martinho, Brian Brunner, Sid Chatterjee, Andreas Jansson
- Publication: January 2026
- URL: https://blog.cloudflare.com/moltworker-self-hosted-ai-agent/
- Topics: Self-hosted AI agent running on Cloudflare's Sandbox SDK with container isolation.
Cloudflare: "Containers are available in public beta"
- Publication: June 2025
- URL: https://blog.cloudflare.com/containers-are-available-in-public-beta-for-simple-global-and-programmable/
- Topics: Programmable, isolated container workloads alongside Workers for AI compute.
Meta Engineering: "Building Private Processing for AI tools on WhatsApp"
- Publication: April 2025
- URL: https://engineering.fb.com/2025/04/29/security/whatsapp-private-processing-ai-tools/
- Topics: Secure isolated execution environment for processing AI tool calls with privacy guarantees.
Meta Engineering: "Scaling Privacy Infrastructure for GenAI Product Innovation"
- Publication: October 2025
- URL: https://engineering.fb.com/2025/10/23/security/scaling-privacy-infrastructure-for-genai-product-innovation/
- Topics: Privacy isolation infrastructure for GenAI products at Meta scale.
Cursor: "Implementing a secure sandbox for local agents"
- Publication: February 18, 2026
- URL: https://www.cursor.com/blog/agent-sandboxing
- Topics: Building agent sandboxing on macOS, Linux, and Windows for secure autonomous execution.
E2B: "Introducing Build System 2.0"
- Publication: October 16, 2025
- URL: https://www.e2b.dev/blog/introducing-build-system-2-0
- Topics: E2B product update on their sandbox build system for AI agent code execution environments
E2B: "Replicating Cursor's Agent Mode with E2B and AgentKit"
- Publication: February 24, 2025
- URL: https://www.e2b.dev/blog/replicating-cursors-agent-mode-with-e2b-and-agentkit
- Topics: Practitioner guide on building an AI coding agent with sandboxed code execution using E2B
E2B: "JavaScript Guide: Run OpenAI Codex in an E2B Sandbox"
- Publication: August 25, 2025
- URL: https://www.e2b.dev/blog/javascript-guide-run-openai-codex-in-an-e2b-sandbox
- Topics: Practitioner guide on running OpenAI Codex agent in an isolated E2B sandbox environment
E2B: "Python Guide: Run OpenAI Codex in an E2B Sandbox"
- Publication: August 25, 2025
- URL: https://www.e2b.dev/blog/python-guide-run-openai-codex-in-an-e2b-sandbox
- Topics: Practitioner guide on running OpenAI Codex agent in an isolated E2B sandbox environment using Python
Daytona: "Riza and Daytona Partner to Power AI-Generated Code"
- Publication: None
- URL: https://www.daytona.io/dotfiles/riza-and-daytona-partner-to-power-ai-generated-code
- Topics: Partnership between Riza and Daytona for secure AI-generated code execution in sandboxed environments.
Daytona: "LangChain's Open SWE Runs on Daytona — Here's Why"
- Publication: None
- URL: https://www.daytona.io/dotfiles/langchain-s-open-swe-runs-on-daytona-here-s-why
- Topics: Production implementation of LangChain's Open SWE agent running on Daytona sandboxes for isolated code execution.
Daytona: "Snap, Sandbox, Summarize: Safe Visual LLMs with Daytona"
- Author: Lovleen Kaur
- Publication: October 29, 2025
- URL: https://www.daytona.io/dotfiles/snap-sandbox-summarize-safe-visual-llms-with-daytona
- Topics: Using Daytona sandboxes for safe execution of visual LLM workloads, demonstrating sandbox isolation for AI agents.
Daytona: "Computer Use – macOS (Early Access)"
- Author: Ivan Burazin
- Publication: October 17, 2025
- URL: https://www.daytona.io/dotfiles/computer-use-macos
- Topics: Sandboxed macOS computer use environments for AI agents, relevant to secure runtime isolation.
Daytona: "Computer Use – Windows (Early Access)"
- Author: Ivan Burazin
- Publication: October 16, 2025
- URL: https://www.daytona.io/dotfiles/computer-use-windows-early-access
- Topics: Sandboxed Windows computer use environments for AI agents, relevant to secure runtime isolation.
Daytona: "Single Tenant Deployment"
- Author: Ivan Burazin
- Publication: October 14, 2025
- URL: https://www.daytona.io/dotfiles/single-tenant
- Topics: Single tenant deployment model for Daytona sandboxes, relevant to isolation and security architecture for AI agent infrastructure.
Daytona: "Running LLM-Generated Code Safely: LangChain + Daytona Demo"
- Publication: None
- URL: https://www.daytona.io/dotfiles/running-llm-generated-code-safely-langchain-daytona-demo
- Topics: Practitioner guide demonstrating safe execution of LLM-generated code using LangChain and Daytona sandboxes.
Daytona: "Run AI-Generated Code Safely with Daytona Sandboxes"
- Publication: None
- URL: https://www.daytona.io/dotfiles/run-ai-generated-code-safely-with-daytona-sandboxes-part-1
- Topics: Guide on running AI-generated code safely in Daytona's sandboxed environments with isolation.
Daytona: "Managing Files in AI Sandbox Environments"
- Publication: None
- URL: https://www.daytona.io/dotfiles/managing-files-in-ai-sandbox-environments
- Topics: Guide on file management within AI sandbox environments, relevant to secure code execution infrastructure.
Daytona: "Winning Daytona's Hacksprint with an A/B Testing Agent"
- Author: Kuba Rogut
- Publication: October 30, 2025
- URL: https://www.daytona.io/dotfiles/winning-daytona-s-hacksprint-with-an-a-b-testing-agent
- Topics: Building an AI agent using Daytona sandboxes for isolated code execution in a hackathon context.
Daytona: "PTY Support in Daytona"
- Author: Ivan Burazin
- Publication: October 15, 2025
- URL: https://www.daytona.io/dotfiles/pty-support-in-daytona
- Topics: Engineering deep-dive on PTY support in Daytona sandboxes, enabling interactive terminal sessions for AI agents.
MCP Specification (2025-06-18 Revision)
- URL: https://spec.modelcontextprotocol.io/
- Description: Complete protocol specification with June 18, 2025 revision.
- Why Relevant: Defines the open standard for connecting AI applications to external systems.
MCP Authorization Specification
- URL: https://modelcontextprotocol.io/specification/2025-06-18/basic/authorization
- Description: Authorization flow specification for HTTP-based transports.
- Publication: June 18, 2025 (Protocol Revision).
MCP Schema (TypeScript)
- URL: https://github.com/modelcontextprotocol/specification/blob/main/schema/2025-11-25/schema.ts
- Date: November 25, 2025
- Description: Schema defined in TypeScript.
MCP Schema (JSON)
- URL: https://github.com/modelcontextprotocol/specification/blob/main/schema/2025-11-25/schema.json
- Date: November 25, 2025
- Description: JSON Schema for wider compatibility.
OAuth 2.1 IETF DRAFT
- URL: https://datatracker.ietf.org/doc/html/draft-ietf-oauth-v2-1-13
- Type: Standard (Draft)
- Description: OAuth 2.1 Authorization Framework.
OAuth 2.0 Authorization Server Metadata (RFC 8414)
- URL: https://datatracker.ietf.org/doc/html/rfc8414
- Type: Standard (RFC)
- Description: Authorization Server Metadata for OAuth 2.0.
OAuth 2.0 Dynamic Client Registration Protocol (RFC 7591)
- URL: https://datatracker.ietf.org/doc/html/rfc7591
- Type: Standard (RFC)
- Description: Dynamic Client Registration for OAuth 2.0.
OAuth 2.0 Protected Resource Metadata (RFC 9728)
- URL: https://datatracker.ietf.org/doc/html/rfc9728
- Type: Standard (RFC)
- Description: Protected Resource Metadata discovery for OAuth 2.0.
OAuth 2.0 Resource Indicators (RFC 8707)
- URL: https://www.rfc-editor.org/rfc/rfc8707.html
- Type: Standard (RFC)
- Description: Resource Indicators for OAuth 2.0.
Model Context Protocol Main Website
- URL: https://modelcontextprotocol.io/
- Description: Official MCP homepage with overview, getting started guides, and architecture documentation.
MCP Registry
- URL: https://registry.modelcontextprotocol.io/
- Description: Discover and browse published MCP servers.
MCP Registry GitHub
- URL: https://github.com/modelcontextprotocol/registry
- Description: Registry source code and documentation.
Anthropic MCP Documentation
- URL: https://docs.anthropic.com/en/docs/build-with-claude/mcp
- Description: Official Anthropic documentation for integrating MCP with Claude applications.
Anthropic: "Introducing the Model Context Protocol"
- Publication: November 25, 2024
- URL: https://www.anthropic.com/news/model-context-protocol
- Topics: Original MCP announcement -- open standard for connecting AI assistants to data sources.
Anthropic: "Code execution with MCP: Building more efficient agents"
- Authors: Adam Jones, Conor Kelly
- Publication: November 4, 2025
- URL: https://www.anthropic.com/engineering/code-execution-with-mcp
- Topics: Agents writing code against MCP servers as APIs instead of direct tool calls -- 98.7% token reduction.
Anthropic: "Desktop Extensions: One-click MCP server installation"
- Publication: June 26, 2025
- URL: https://www.anthropic.com/engineering/desktop-extensions
- Topics: .mcpb packaging format for one-click MCP server installation, solving manual config friction.
Cloudflare: "Build and deploy Remote MCP servers to Cloudflare"
- Authors: Brendan Irvine-Broque, Dina Kozlov, Glen Maddern
- Publication: March 25, 2025
- URL: https://blog.cloudflare.com/remote-model-context-protocol-servers-mcp
- Topics: Remote MCP servers with OAuth 2.1 auth, McpAgent class on Durable Objects, mcp-remote adapter.
Cloudflare: "Securing the AI Revolution: Introducing Cloudflare MCP Server Portals"
- Author: Kenny Johnson
- Publication: August 26, 2025
- URL: https://blog.cloudflare.com/zero-trust-mcp-server-portals/
- Topics: Enterprise MCP security -- centralized gateway with Zero Trust policy enforcement and audit logging.
Cloudflare: "Code Mode: give agents an entire API in 1,000 tokens"
- Author: Matt Carey
- Publication: February 20, 2026
- URL: https://blog.cloudflare.com/code-mode-mcp/
- Topics: MCP server for entire Cloudflare API (2,500+ endpoints) collapsed into 2 tools -- 99.9% token reduction.
Vercel: "The second wave of MCP: Building for LLMs, not developers"
- Publication: September 9, 2025
- URL: https://vercel.com/blog/the-second-wave-of-mcp-building-for-llms-not-developers
- Topics: MCP evolving from developer-focused tools to LLM-native integrations.
Vercel: "Building efficient MCP servers"
- Publication: June 12, 2025
- URL: https://vercel.com/blog/building-efficient-mcp-servers
- Topics: Engineering guide with implementations from Zapier, Composio, and Solana teams.
Vercel: "Addressing security and quality issues with MCP tools in AI Agents"
- Publication: September 17, 2025
- URL: https://vercel.com/blog/generate-static-ai-sdk-tools-from-mcp-servers-with-mcp-to-ai-sdk
- Topics: mcp-to-ai-sdk for generating static MCP tools, addressing dynamic MCP security risks.
Vercel: "How Vapi built their MCP server on Vercel"
- Publication: May 21, 2025
- URL: https://vercel.com/blog/vapi-mcp-server-on-vercel
- Topics: Production case study of Vapi deploying MCP server on Vercel with Fluid Compute.
LangChain: "MCP: Flash in the Pan or Future Standard?"
- Authors: Harrison Chase, Nuno Campos
- Publication: March 8, 2025
- URL: https://blog.langchain.dev/mcp-fad-or-fixture/
- Topics: Debate on MCP's value -- useful for tools in agents you don't control, limited by tool selection reliability.
LlamaIndex: "Skills vs MCP tools for agents: when to use what"
- Authors: Clelia Astra Bertelli, Tuana Celik
- Publication: February 3, 2026
- URL: https://www.llamaindex.ai/blog/skills-vs-mcp-tools-for-agents-when-to-use-what
- Topics: MCP tools (deterministic API calls) vs Skills (natural language instructions) comparison.
LlamaIndex: "Adding Native MCP to LlamaIndex Docs"
- Publication: October 31, 2025
- URL: https://www.llamaindex.ai/blog/adding-native-mcp-to-llamaindex-docs
- Topics: Native MCP search implementation for LlamaIndex documentation.
Simon Willison: "Introducing the Model Context Protocol"
- Author: Simon Willison
- Publication: November 25, 2024
- URL: https://simonwillison.net/2024/Nov/25/model-context-protocol/
- Topics: Early independent analysis of MCP spec, Claude Desktop integration, and sqlite MCP server.
Authors: David Soria Parra (@dsp) and Justin Spahr-Summers (@jspahrsummers) License: Apache License 2.0 for code and specifications, Creative Commons Attribution 4.0 International for documentation Governance: Model Context Protocol as a Series of LF Projects, LLC Governance Policies: https://www.lfprojects.org/policies/
Vercel: "AI SDK 6"
- Publication: December 22, 2025
- URL: https://vercel.com/blog/ai-sdk-6
- Topics: Introduces AI SDK 6 with full MCP support, agents, tool execution approval, and DevTools for production use.
Vercel: "Build smarter workflows with Notion and v0"
- Publication: December 15, 2025
- URL: https://vercel.com/blog/build-smarter-workflows-with-notion-and-v0
- Topics: v0 connects to Notion via MCP to build dashboards and tools from existing docs and databases.
Vercel: "Security boundaries in agentic architectures"
- Publication: February 24, 2026
- URL: https://vercel.com/blog/security-boundaries-in-agentic-architectures
- Topics: Framework for security boundaries in agentic architectures covering isolation, secret injection, and sandboxing—relevant to MCP security.
Vercel: "Run untrusted code with Vercel Sandbox, now generally available"
- Publication: January 30, 2026
- URL: https://vercel.com/blog/vercel-sandbox-is-now-generally-available
- Topics: GA of Vercel Sandbox for secure isolated agent execution with container support, relevant to MCP server runtime security.
Vercel: "Making agent-friendly pages with content negotiation"
- Publication: February 03, 2026
- URL: https://vercel.com/blog/making-agent-friendly-pages-with-content-negotiation
- Topics: Engineering deep-dive on serving markdown to agents via HTTP content negotiation, relevant to MCP and agentic system design patterns.
Vercel: "How we made v0 an effective coding agent"
- Publication: January 07, 2026
- URL: https://vercel.com/blog/how-we-made-v0-an-effective-coding-agent
- Topics: Production engineering deep-dive on v0's composite AI agent pipeline with dynamic system prompts and autofixers.
Vercel: "How to build agents with filesystems and bash"
- Publication: January 09, 2026
- URL: https://vercel.com/blog/how-to-build-agents-with-filesystems-and-bash
- Topics: Practitioner guide on building production agents with filesystem and bash tooling, relevant to agentic system design.
Vercel: "How Mux shipped durable video workflows with their @mux/ai SDK"
- Publication: January 12, 2026
- URL: https://vercel.com/blog/how-mux-shipped-durable-video-workflows-with-their-mux-ai-sdk
- Topics: Production implementation of durable AI workflows using Workflow DevKit with automatic retries and state persistence.
Vercel: "Cline now runs on Vercel AI Gateway"
- Publication: December 16, 2025
- URL: https://vercel.com/blog/cline-on-ai-gateway
- Topics: Cline coding agent scales on Vercel AI Gateway, relevant to AI SDK and agentic system infrastructure in production.
The Auton Agentic AI Framework
- Authors: Sheng Cao, Zhao Chang, Chang Li, Hannan Li, Liyao Fu, Ji Tang
- Publication: February 27, 2026
- URL: https://arxiv.org/abs/2602.23720
- Why Relevant: Describes a principled architecture for standardizing autonomous agent systems, with hierarchical memory consolidation inspired by biological episodic memory.
EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems
- Authors: Yufei He, Juncheng Liu, Yue Liu, Yibo Li, Tri Cao, Zhiyuan Hu, Xinxing Xu, Bryan Hooi
- Publication: October 15, 2025
- URL: https://arxiv.org/abs/2510.13220
- Why Relevant: Introduces evolutionary test-time learning framework that improves agents without fine-tuning by evolving the entire agentic system after every episode.
Generative Agents: Interactive Simulacra of Human Behavior
- Authors: J. Z. Shunyu Yao, Jiacheng Li, Yuyang Zhao, Izhak Shafran, Karthik Narasimhan
- Publication: April 2023
- URL: https://arxiv.org/abs/2304.03442
- Why Relevant: Simulation of human-like agent behavior in interactive environments.
Agents: An Open-source Framework for Autonomous Language Agents
- Authors: Wangchunshu Zhou, Yuchen Eleanor Jiang, Long Li, Jialong Wu, Tiannan Wang, Shi Qiu, Jintian Zhang, Jing Chen, Ruipu Wu, Shuai Wang, Shiding Zhu, Jiyu Chen, Wentao Zhang, Xiangru Tang, Ningyu Zhang, Huajun Chen, Peng Cui, Mrinmaya Sachan
- Publication: September 14, 2023 (Revised December 12, 2023)
- URL: https://arxiv.org/abs/2309.07870
- Why Relevant: Open-source library enabling non-specialists to build state-of-the-art autonomous language agents with planning, memory, tool usage, and multi-agent communication.
Pregel: A System for Large-Scale Graph Processing
- Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski
- Publication: Proceedings of the 2010 International Conference on Management of Data, ACM, New York, NY, USA, pp. 135-146
- URL: https://research.google/pubs/pub37252/
- Referenced by: LangGraph
- Why Relevant: Foundational paper on distributed graph processing that inspired LangGraph's architecture.
Exploring Network Structure, Dynamics, and Function Using NetworkX
- Authors: Aric A. Hagberg, Daniel A. Schult, Pieter J. Swart
- Publication: Proceedings of the 7th Python in Science Conference (SciPy2008), August 2008
- URL: http://conference.scipy.org.s3-website-us-east-1.amazonaws.com/proceedings/scipy2008/paper_2/
- PDF: http://conference.scipy.org.s3-website-us-east-1.amazonaws.com/proceedings/scipy2008/paper_2/full_text.pdf
- Referenced by: LangGraph
- Why Relevant: The NetworkX paper that influenced LangGraph's interface design.
LangGraph Documentation
- Overview: https://docs.langchain.com/oss/python/langgraph/overview
- Quickstart: https://docs.langchain.com/oss/python/langgraph/quickstart
- Durable Execution: https://docs.langchain.com/oss/python/langgraph/durable-execution
- Human-in-the-Loop: https://docs.langchain.com/oss/python/langgraph/interrupts
- Memory: https://docs.langchain.com/oss/python/langgraph/memory
- LangGraph Studio: https://docs.langchain.com/oss/python/langgraph/studio
- Case Studies: https://www.langchain.com/built-with-langgraph (LinkedIn, Uber, Klarna, GitLab)
- Intro to LangGraph Course: https://academy.langchain.com/courses/intro-to-langgraph
CrewAI Documentation
- Official Documentation: https://docs.crewai.com
- Crew Concepts: https://docs.crewai.com/concepts/crews
- Flows: https://docs.crewai.com/concepts/flows
- LLM Connections: https://docs.crewai.com/how-to/LLM-Connections/
- Learning Platform: https://learn.crewai.com - 100,000+ certified developers
Anthropic: "Building effective agents"
- Authors: Erik Schluntz, Barry Zhang
- Publication: December 19, 2024
- URL: https://www.anthropic.com/engineering/building-effective-agents
- Topics: Definitive guide to agent architecture patterns (prompt chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer).
Anthropic: "Effective harnesses for long-running agents"
- Publication: November 26, 2025
- URL: https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents
- Topics: Production patterns for building reliable harnesses around long-running agents.
Anthropic: "Effective context engineering for AI agents"
- Publication: September 29, 2025
- URL: https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
- Topics: Techniques for managing and engineering context in agentic systems.
Anthropic: "Claude Code: Best practices for agentic coding"
- Publication: April 18, 2025
- URL: https://www.anthropic.com/engineering/claude-code-best-practices
- Topics: Production best practices for agentic coding including task decomposition and verification loops.
Anthropic: "Beyond permission prompts: making Claude Code more secure and autonomous"
- Publication: October 20, 2025
- URL: https://www.anthropic.com/engineering/claude-code-sandboxing
- Topics: Security architecture for autonomous coding agents, sandboxing for safe long-running execution.
LangChain: "Agent Engineering: A New Discipline"
- Author: Harrison Chase
- Publication: December 2025
- URL: https://blog.langchain.dev/agent-engineering-a-new-discipline/
- Topics: Agent engineering as emerging discipline requiring new skills around orchestration, evaluation, and observability.
LangChain: "On Agent Frameworks and Agent Observability"
- Author: Harrison Chase
- Publication: February 2026
- URL: https://blog.langchain.dev/on-agent-frameworks-and-agent-observability/
- Topics: Why orchestration and observability remain critical even as LLMs improve.
LangChain: "You don't know what your agent will do until it's in production"
- Publication: February 2026
- URL: https://blog.langchain.dev/you-dont-know-what-your-agent-will-do-until-its-in-production/
- Topics: Production monitoring challenges -- non-deterministic behavior, infinite inputs, building evaluation from traces.
LangChain: "Improving Deep Agents with harness engineering"
- Publication: February 2026
- URL: https://blog.langchain.dev/improving-deep-agents-with-harness-engineering/
- Topics: How harness engineering moved a coding agent from Top 30 to Top 5 on Terminal Bench.
LangChain: "The two patterns by which agents connect sandboxes"
- Publication: February 2026
- URL: https://blog.langchain.dev/the-two-patterns-by-which-agents-connect-sandboxes/
- Topics: Architectural patterns for connecting agents to sandboxed execution environments.
LangChain: "LangChain and LangGraph Agent Frameworks Reach v1.0 Milestones"
- Publication: October 2025
- URL: https://blog.langchain.dev/langchain-langgraph-1dot0/
- Topics: LangGraph v1.0 as graph-based orchestration framework for production stateful agent workflows.
LlamaIndex: "LlamaAgents Builder: Idea To Deployed Agent in Minutes"
- Publication: January 28, 2026
- URL: https://www.llamaindex.ai/blog/llamaagents-builder-idea-to-deployed-agent-in-minutes
- Topics: Agent builder generating workflows from natural language for document processing.
LlamaIndex: "Long Horizon Document Agents"
- Publication: February 12, 2026
- URL: https://www.llamaindex.ai/blog/long-horizon-document-agents
- Topics: Long-running autonomous agents with event triggers and persistent task backlogs.
Cursor: "The third era of AI software development"
- Publication: February 26, 2026
- URL: https://www.cursor.com/blog/third-era
- Topics: Autonomous cloud agents taking on larger tasks over longer timescales.
Cursor: "Cursor agents can now control their own computers"
- Publication: February 24, 2026
- URL: https://www.cursor.com/blog/agent-computer-use
- Topics: Cloud agents using browser-based computer use to verify code changes autonomously.
Cursor: "Build agents that run automatically"
- Publication: March 5, 2026
- URL: https://www.cursor.com/blog/automations
- Topics: Event-driven agent automations with triggers for continuous autonomous operation.
Cognition: "An Early Preview of SWE-1.6 and Research Update"
- Authors: Carlo Baronio, Ben Pan, Sam Lee, et al.
- Publication: March 1, 2026
- URL: https://www.cognition.ai/blog/swe-1-6-preview
- Topics: Latest agent model optimized for software engineering with improved planning and execution.
Cognition: "Rebuilding Devin for Claude Sonnet 4.5: Lessons and Challenges"
- Publication: September 29, 2025
- URL: https://www.cognition.ai/blog/devin-sonnet-4-5-lessons-and-challenges
- Topics: Engineering lessons from rebuilding the Devin agent architecture for a new foundation model.
Cognition: "Devin's 2025 Performance Review"
- Publication: November 14, 2025
- URL: https://www.cognition.ai/blog/devin-annual-performance-review-2025
- Topics: 18-month retrospective on Devin's production patterns, failure modes, and lessons.
Cognition: "Closing the Agent Loop: Devin Autofixes Review Comments"
- Publication: February 10, 2026
- URL: https://www.cognition.ai/blog/closing-the-agent-loop-devin-autofixes-review-comments
- Topics: Autonomous feedback loop where Devin automatically fixes review comments on its PRs.
Replit: "Introducing Agent 3: Our Most Autonomous Agent Yet"
- Publication: September 10, 2025
- URL: https://blog.replit.com/introducing-agent-3-our-most-autonomous-agent-yet
- Topics: 10x more autonomous agent with browser-based self-testing and auto-fix capabilities.
LinkedIn: "Contextual agent playbooks and tools"
- Author: Ajay Prakash
- Publication: January 27, 2026
- URL: https://www.linkedin.com/blog/engineering/ai/contextual-agent-playbooks-and-tools-how-linkedin-gave-ai-coding-agents-organizational-context
- Topics: How LinkedIn provides AI coding agents with organization-specific context at enterprise scale.
Stripe: "Can AI agents build real Stripe integrations?"
- Authors: Carol Liang, Kevin Ho
- Publication: March 2, 2026
- URL: https://stripe.com/blog/can-ai-agents-build-real-stripe-integrations
- Topics: Benchmark for AI agents building real Stripe integrations end-to-end.
Sourcegraph: "A New Era for Sourcegraph: The Intelligence Layer for AI Coding Agents"
- Author: Graham McBain
- Publication: February 25, 2026
- URL: https://sourcegraph.com/blog/a-new-era-for-sourcegraph-the-intelligence-layer-for-ai-coding-agents-and-developers
- Topics: Code intelligence layer that agents rely on for codebase understanding.
Microsoft Research: "CORPGEN advances AI agents for real work"
- Publication: February 26, 2026
- URL: https://www.microsoft.com/en-us/research/blog/corpgen-advances-ai-agents-for-real-work/
- Topics: "Digital employees" with GUI automation, hierarchical planning, and memory isolation.
Microsoft Research: "Agent Lightning: Adding RL to AI agents without code rewrites"
- Publication: December 11, 2025
- URL: https://www.microsoft.com/en-us/research/blog/agent-lightning-adding-reinforcement-learning-to-ai-agents-without-code-rewrites/
- Topics: Decoupling agent behavior from training, turning each step into RL data.
OpenAI: "Practices for Governing Agentic AI Systems"
- Publication: December 14, 2023
- URL: https://openai.com/index/practices-for-governing-agentic-ai-systems/
- Topics: Defining agentic AI systems and governance practices for safe autonomous operation.
Anthropic: "Quantifying infrastructure noise in agentic coding evals"
- Publication: 2026
- URL: https://www.anthropic.com/engineering/infrastructure-noise
- Topics: Analyzes how infrastructure configuration impacts agentic coding benchmarks, relevant to production agent evaluation and architecture.
Anthropic: "Demystifying evals for AI agents"
- Publication: January 09, 2026
- URL: https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents
- Topics: Practitioner guide on evaluating AI agents, directly relevant to agents in production.
Anthropic: "Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet"
- Publication: January 06, 2025
- URL: https://www.anthropic.com/engineering/swe-bench-sonnet
- Topics: Engineering deep-dive on agentic coding evaluation and architecture for software engineering tasks.
LlamaIndex: "Creating a Deal Sourcing Agent with LlamaAgents Builder"
- Publication: February 26, 2026
- URL: https://www.llamaindex.ai/blog/creating-a-deal-sourcing-agent-with-llamaagents-builder
- Topics: Practitioner guide on building an agentic system using LlamaAgents Builder for deal sourcing workflows
LlamaIndex: "LlamaIndex is more than a RAG Framework. It is Agentic Document Processing."
- Publication: March 3, 2026
- URL: https://www.llamaindex.ai/blog/llamaindex-is-more-than-a-rag-framework
- Topics: Deep-dive on LlamaIndex's evolution from RAG framework to agentic document processing architecture
Cursor: "Build agents that run automatically"
- Publication: March 05, 2026
- URL: https://cursor.com/blog/automations
- Topics: Cursor's implementation of automated agent triggers and orchestration for coding agents in production.
Cursor: "Cursor is now available in JetBrains IDEs"
- Publication: March 04, 2026
- URL: https://cursor.com/blog/jetbrains-acp
- Topics: Agent Client Protocol (ACP) for integrating coding agents across IDEs, relevant to agent architecture and orchestration patterns.
Cursor: "The third era of AI software development"
- Publication: February 26, 2026
- URL: https://cursor.com/blog/third-era
- Topics: Vision and architecture of autonomous cloud coding agents handling larger tasks over longer timescales in production.
Cursor: "Closing the code review loop with Bugbot Autofix"
- Publication: February 26, 2026
- URL: https://cursor.com/blog/bugbot-autofix
- Topics: Production autonomous agent system that spawns cloud agents to find and fix PR issues automatically.
Cursor: "Cursor agents can now control their own computers"
- Publication: February 24, 2026
- URL: https://cursor.com/blog/agent-computer-use
- Topics: Cloud coding agents with computer-use capabilities to verify and demo their own changes autonomously.
Cursor: "Implementing a secure sandbox for local agents"
- Publication: February 18, 2026
- URL: https://cursor.com/blog/agent-sandboxing
- Topics: Engineering deep-dive on building secure sandboxing for coding agents across macOS, Linux, and Windows.
Sourcegraph: "CodeScaleBench: Testing Coding Agents on Large Codebases and Multi-Repo Software Engineering Tasks"
- Author: Stephanie Jarmak
- Publication: March 03, 2026
- URL: https://sourcegraph.com/blog/codescalebench-testing-coding-agents-on-large-codebases-and-multi-repo-software-engineering-tasks
- Topics: New benchmark for evaluating coding agents against enterprise-scale, multi-repo software engineering tasks.
Sourcegraph: "Building DataBot: Our Always-On Data Assistant"
- Author: Aditya Kalia
- Publication: February 06, 2026
- URL: https://sourcegraph.com/blog/building-databot-our-always-on-data-assistant
- Topics: Engineering deep-dive on building an always-on autonomous data assistant agent in production.
Toolformer: Language Models Can Teach Themselves to Use Tools
- Authors: Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom
- Publication: February 2023
- URL: https://arxiv.org/abs/2302.04761
- Why Relevant: Framework for learning to use tools via self-supervised learning, enabling LLMs to call external APIs.
OpenAI Function Calling
- URL: https://platform.openai.com/docs/guides/gpt/function-calling
- Description: Structured tool calling for OpenAI models.
- Why Relevant: Official guide for implementing function calling with OpenAI models.
Anthropic Tool Use
- URL: https://docs.anthropic.com/en/docs/tool-use
- Description: Tool use for Claude models.
- Why Relevant: Official guide for implementing tool use with Claude models.
MCP Tool Orchestration
- URL: https://modelcontextprotocol.io/
- Description: Open-source standard for connecting AI applications to external systems.
- Why Relevant: Standardized protocol for tool orchestration across different AI applications.
Tool Selection Strategy
- Choose tools based on agent capabilities and task requirements
- Implement tool capabilities and descriptions clearly
- Use appropriate tool parameters and return types
Tool Error Handling
- Implement robust error handling for tool calls
- Use try-catch patterns and retry mechanisms
- Handle tool failures gracefully with fallback strategies
Tool Authorization
- Implement proper security controls for tool access
- Use authentication and authorization for sensitive tools
- Audit and log tool usage for security
Tool Parallelization
- Optimize performance through parallel tool calls
- Batch independent tool calls when possible
- Use async/await patterns for concurrent tool execution
Anthropic: "Writing effective tools for agents -- with agents"
- Publication: September 11, 2025
- URL: https://www.anthropic.com/engineering/writing-tools-for-agents
- Topics: Best practices for designing agent-computer interfaces (ACI) and tool definitions.
Anthropic: "Introducing advanced tool use on the Claude Developer Platform"
- Publication: November 24, 2025
- URL: https://www.anthropic.com/engineering/advanced-tool-use
- Topics: Advanced patterns for tool use in production agent systems.
Anthropic: "The 'think' tool: Enabling Claude to stop and think"
- Publication: March 20, 2025
- URL: https://www.anthropic.com/engineering/claude-think-tool
- Topics: Extended thinking tool for complex multi-tool orchestration scenarios.
LLM Powered Autonomous Agents (Lilian Weng)
- URL: https://lilianweng.github.io/posts/2023-06-23-agent/
- Description: Comprehensive blog on agent architecture.
- Why Relevant: Foundational overview of autonomous agent systems and design patterns.
Multi AI Agent Systems with CrewAI
- URL: https://www.deeplearning.ai/short-courses/multi-ai-agent-systems-with-crewai/
- Platform: DeepLearning.AI
- Description: Master fundamentals of multi-agent systems with CrewAI.
Practical Multi AI Agents and Advanced Use Cases
- URL: https://www.deeplearning.ai/short-courses/practical-multi-ai-agents-and-advanced-use-cases-with-crewai/
- Platform: DeepLearning.AI
- Description: Deep dive into advanced multi-agent implementations.
Agent Communication Protocols
- Define clear communication patterns between agents
- Use structured message formats
- Implement message routing and filtering
Task Distribution
- Decompose complex tasks across multiple agents
- Use specialized agents for different capabilities
- Implement task queue and scheduling
Agent Orchestration
- Use a coordinator agent for complex workflows
- Implement supervisor pattern for task delegation
- Use event-driven architecture for agent coordination
Conflict Resolution
- Implement mechanisms to resolve agent conflicts
- Use consensus algorithms for decision making
- Handle competing agent requests gracefully
Anthropic: "How we built our multi-agent research system"
- Authors: Jeremy Hadfield, Barry Zhang, Kenneth Lien, Florian Scholz, Jeremy Fox, Daniel Ford
- Publication: June 13, 2025
- URL: https://www.anthropic.com/engineering/multi-agent-research-system
- Topics: Production multi-agent orchestrator-worker system for Research, including prompt engineering for delegation and reliability challenges.
Anthropic: "Building a C compiler with a team of parallel Claudes"
- Publication: February 5, 2026
- URL: https://www.anthropic.com/engineering/building-c-compiler
- Topics: Parallel Claude agents building a C compiler, demonstrating multi-agent task decomposition and coordination.
Google Research: "Towards a science of scaling agent systems"
- Authors: Yubin Kim, Xin Liu
- Publication: January 28, 2026
- URL: https://research.google/blog/towards-a-science-of-scaling-agent-systems-when-and-why-agent-systems-work/
- Topics: First quantitative scaling principles for multi-agent systems from 180 configurations -- multi-agent helps +81% on parallelizable tasks but degrades -70% on sequential ones.
Chain of Thought (CoT) Prompting Elicits Reasoning in Large Language Models
- Authors: Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou
- Publication: January 2023
- URL: https://arxiv.org/abs/2201.11903
- Why Relevant: Foundational paper introducing chain-of-thought prompting for enhanced reasoning in complex tasks.
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- Authors: Shunyu Yao, Jeffrey Zhao, Dian Yu, Izhak Shafran, Yuan Cao, Karthik Narasimhan
- Publication: May 2023
- URL: https://arxiv.org/abs/2305.10601
- Why Relevant: Systematic exploration of reasoning paths through tree-based search and backtracking.
ReAct: Synergizing Reasoning and Acting in Language Models
- Authors: Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao
- Publication: October 2022 (Revised March 2023)
- URL: https://arxiv.org/abs/2210.03629
- Why Relevant: Integrates reasoning and acting for knowledge tasks, enabling LLMs to generate both reasoning traces and task-specific actions.
Reflexion: Language Agents with Verbal Reinforcement Learning
- Authors: Noah Shinn, Federico Cassano, Edward Grefenstette, Tim Rocktäschel, Yoram Bachrach
- Publication: March 2023
- URL: https://arxiv.org/abs/2303.11366
- Why Relevant: Self-reflection framework for improving agent performance through verbal reinforcement learning.
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
- Authors: Qingying Xiao, Kaiwen Wen, Yanchen Deng, Haobo Du, Qianlan Yang, Yuhui Wu, Wenjie Ruan, Chaojie Wang
- Publication: April 2023
- URL: https://arxiv.org/abs/2304.11477
- Why Relevant: Integrates classical planners for long-horizon planning with LLM reasoning.
Algorithm Distillation
- Authors: H. Jerry Qi, Lulwah Al-Khulaifi, Brian Ichter, J. Z. Shunyu Yao, Karthik Narasimhan, Izhak Shafran, Yuan Cao
- Publication: October 2022
- URL: https://arxiv.org/abs/2210.14215
- Why Relevant: Learn algorithms from trajectories, enabling LLMs to execute complex algorithms.
Prompt Engineering
- Structure prompts for better reasoning
- Use chain-of-thought prompting for complex tasks
- Implement few-shot learning with examples
Planning Strategies
- Design prompts for effective task decomposition
- Use hierarchical planning for complex goals
- Implement subgoal decomposition and tracking
Multi-step Reasoning
- Implement complex reasoning chains
- Use tree-of-thoughts for exploring multiple paths
- Implement backtracking and revision strategies
Replit: "Decision-Time Guidance: Keeping Replit Agent Reliable"
- Publication: January 20, 2026
- URL: https://blog.replit.com/decision-time-guidance
- Topics: Guiding agents during execution using environment-based feedback, improving reliability on long trajectories.
Cognition: "Introducing SWE-grep and SWE-grep-mini: RL for Multi-Turn, Fast Context Retrieval"
- Authors: Ben Pan, Carlo Baronio, et al.
- Publication: October 16, 2025
- URL: https://www.cognition.ai/blog/swe-grep
- Topics: RL-trained agentic models for parallel multi-turn context retrieval, matching frontier models at 10x less time.
Google Research: "Teaching LLMs to reason like Bayesians"
- Publication: March 4, 2026
- URL: https://research.google/blog/teaching-llms-to-reason-like-bayesians/
- Topics: Training LLMs to perform Bayesian reasoning for improved agent decision-making.
Microsoft Research: "Multimodal RL with agentic verifier for AI agents"
- Publication: January 20, 2026
- URL: https://www.microsoft.com/en-us/research/blog/multimodal-reinforcement-learning-with-agentic-verifier-for-ai-agents/
- Topics: Argos framework evaluating whether agent reasoning aligns with observations over time.
See the Tools & Repositories table for all WebMCP GitHub repositories.
See the Tools & Repositories table for all browser automation tools and repositories.
Browserbase: "Building the future of web automation"
- Author: Paul Klein
- Publication: June 18, 2025
- URL: https://www.browserbase.com/blog/series-b-and-beyond
- Topics: Browser infrastructure as an AI primitive, Director.ai and Stagehand Python launch.
Browserbase: "We built caching into Stagehand. Here's how it works"
- Author: Sameel Arif
- Publication: February 24, 2026
- URL: https://www.browserbase.com/blog/stagehand-caching
- Topics: Caching resolved selectors to skip LLM calls on repeated browser actions, ~80% speedup.
Browserbase: "Your AI browser is one malicious div away from going rogue"
- Author: Jane Manchun Wong
- Publication: February 13, 2026
- URL: https://www.browserbase.com/blog/ai-browser-prompt-injection-containment-security
- Topics: Prompt injection attacks on AI browser agents and containment via isolated VM sessions.
Browserbase: "How we built Browserbase Functions"
- Authors: Adam McQuilkin, Viv Nepenthe
- Publication: February 10, 2026
- URL: https://www.browserbase.com/blog/building-browserbase-functions
- Topics: Co-locating browser automation code with the browser via serverless functions to eliminate latency.
Browserbase: "The best browser automation framework, in every language"
- Author: Nick Sweeting
- Publication: January 13, 2026
- URL: https://www.browserbase.com/blog/browser-automation-all-languages-with-stagehand
- Topics: Stagehand v3 multi-language SDK (Python, Go, Rust, Java, Ruby) with parallel multi-browser support.
Browserbase: "How Amplitude Transformed Sales Demos with AI-Powered Browser Automation"
- Author: Erika Bricky
- Publication: February 23, 2026
- URL: https://www.browserbase.com/blog/how-amplitude-transformed-sales-demos-with-browser-automation
- Topics: Case study of Amplitude using Browserbase for AI-driven browser automation in sales demos.
Steel: "Introducing Steel CLI v0.2.0: Browser Automation Built for Agents"
- Publication: 2026
- URL: https://steel.dev/blog/steel-cli-and-agent-skill
- Topics: CLI and agent skill for running browser workflows returning markdown, screenshots, or PDFs.
Steel: "Reducing False Positives for Production Agents"
- Publication: 2025
- URL: https://steel.dev/blog/production-agents
- Topics: Distinguishing legitimate AI agent traffic from malicious bots in production.
Steel: "How Websites Decide You're Human"
- Publication: 2025
- URL: https://steel.dev/blog/anti-bot-defense
- Topics: Anti-bot detection layers (network, device, behavioral, challenge) that AI browser agents must navigate.
Steel: "Profiles: Your Agent's Persistent Identity"
- Publication: 2025
- URL: https://steel.dev/blog/profiles
- Topics: Persistent browser profiles (auth, cookies, cache) across sessions for authenticated agent access.
Steel: "Agent Logs: Action Traces for Agent Actions"
- Publication: 2025
- URL: https://steel.dev/blog/agent-logs
- Topics: Observability for tracing and debugging AI agent actions inside cloud browser sessions.
TinyFish: "OpenAI Operator scores 43% on hard web tasks. We scored 81%."
- Publication: February 12, 2026
- URL: https://www.tinyfish.io/blog/mind2web
- Topics: Head-to-head benchmark of TinyFish Mino vs. OpenAI Operator on Mind2Web hard web tasks.
TinyFish: "Codified Learning: The Backbone of Reliable, Scalable Enterprise Web Agents"
- Publication: September 9, 2025
- URL: https://www.tinyfish.io/blog/codified-learning-the-backbone-of-reliable-scalable-enterprise-web-agents
- Topics: Converting successful agent runs into deterministic, reusable scripts for enterprise web agents.
TinyFish: "Proving I'm Human (When I'm Not)"
- Publication: November 8, 2025
- URL: https://www.tinyfish.io/blog/proving-i-m-human-when-i-m-not
- Topics: How Mino enterprise web agent navigates thousands of anti-bot tests.
Microsoft Research: "Magentic-UI, an experimental human-centered web agent"
- Publication: May 19, 2025
- URL: https://www.microsoft.com/en-us/research/blog/magentic-ui-an-experimental-human-centered-web-agent/
- Topics: Human-centered web agent planning and executing browser actions with user oversight.
Microsoft Research: "Magma: A foundation model for multimodal AI agents"
- Publication: February 25, 2025
- URL: https://www.microsoft.com/en-us/research/blog/magma-a-foundation-model-for-multimodal-ai-agents-across-digital-and-physical-worlds/
- Topics: Foundation model for agents operating in both digital (browser/GUI) and physical environments.
Browserbase: "Introducing Browserbase Functions"
- Author: Adam McQuilkin, Harsehaj Dhami
- Publication: February 10, 2025
- URL: https://www.browserbase.com/blog/browserbase-functions
- Topics: Product announcement of Browserbase Functions for browser automation infrastructure
Browserbase: "Browserbase & Fingerprint.js: Tackling fraud with agent identity"
- Author: Peyton Casper
- Publication: February 03, 2025
- URL: https://www.browserbase.com/blog/fingerprint-authorized-agent-detection
- Topics: Production approach to agent identity and authorized agent detection for browser automation
Browserbase: "This week we fixed the worst part of Browserbase"
- Author: Harsehaj Dhami
- Publication: January 15, 2025
- URL: https://www.browserbase.com/blog/session-recordings
- Topics: Engineering improvements to session recording infrastructure for browser automation debugging
Steel: "Happy Path for Automating the Web with Steel"
- Publication: 2025
- URL: https://steel.dev/blog/happy-path-for-automating-the-web-with-steel
- Topics: Practitioner guide for automating web workflows using Steel's managed cloud Chrome sessions with Puppeteer and data extraction.
Steel: "Notte on Steel: Browser Infrastructure for Agents"
- Publication: 2025
- URL: https://steel.dev/blog/notte-on-steel
- Topics: Integration guide connecting Notte AI agents to Steel browser infrastructure via CDP with session replay and CAPTCHA solving.
Steel: "Steel vs Kernel: a practical comparison"
- Publication: 2025
- URL: https://steel.dev/blog/steel-vs-kernel
- Topics: Practical comparison of remote browser infrastructure providers for automation and AI agent workflows.
Steel: "Steel vs Browserbase: a practical comparison"
- Publication: 2025
- URL: https://steel.dev/blog/steel-vs-browserbase-a-practical-comparison
- Topics: Practical comparison of Steel and Browserbase remote browser infrastructure for agent workflows.
Steel: "What is a CAPTCHA solver"
- Publication: 2025
- URL: https://steel.dev/blog/what-is-a-captcha-solver
- Topics: Guide on CAPTCHA solving in browser automation with Steel, covering setup patterns and production limits.
Steel: "Remote Browser Benchmark"
- Publication: 2025
- URL: https://steel.dev/blog/remote-browser-benchmark
- Topics: Performance benchmarking of remote browser infrastructure relevant to AI agent browser automation.
Steel: "Steel Launch Week v2: Everything We Shipped"
- Publication: 2025
- URL: https://steel.dev/blog/launch-week-v2-recap
- Topics: Recap of five production features shipped for AI agents using Steel browser infrastructure.
TinyFish: "Open AI Operator scores 43% on hard web tasks. We scored 81%. Here are all 300 runs."
- Author: TinyFish Storytellers
- Publication: February 12, 2026
- URL: https://www.tinyfish.ai/blog/mind2web
- Topics: Benchmark comparison of web agent performance on complex browser tasks, comparing TinyFish's web agent against OpenAI Operator with detailed run data.
TinyFish: "Gemini 3.0 Flash + Mino API: When Reasoning Meets Real Execution"
- Author: Sky Zhang
- Publication: December 17, 2025
- URL: https://www.tinyfish.ai/blog/gemini-3-0-flash-mino-api-when-reasoning-meets-real-execution
- Topics: Production testing of Gemini 3.0 Flash for multi-step browser navigation via the Mino web agent API, with accuracy and execution benchmarks.
TinyFish: "Proving I'm Human (When I'm Not)"
- Author: Mino
- Publication: November 08, 2025
- URL: https://www.tinyfish.ai/blog/proving-i-m-human-when-i-m-not
- Topics: Engineering perspective on how TinyFish's enterprise web agent navigates bot-detection and CAPTCHA challenges at scale during browser automation.
TinyFish: "Codified Learning: The Backbone of Reliable, Scalable Enterprise Web Agents"
- Author: TinyFish Storytellers
- Publication: September 09, 2025
- URL: https://www.tinyfish.ai/blog/codified-learning-the-backbone-of-reliable-scalable-enterprise-web-agents
- Topics: Deep dive into 'codified learning' as an approach to building reliable, production-grade enterprise web agents for browser automation.
TinyFish: "The Era of Abundant Intelligence"
- Author: Sudheesh Nair
- Publication: December 15, 2025
- URL: https://www.tinyfish.ai/blog/part-1-the-robotic-web
- Topics: Infrastructure vision for making the web operable for AI agents, replacing brittle DOM interactions with stable contracts for browser-based execution.
TinyFish: "Why 90% of the Internet Is Invisible (And Why AI Hasn't Fixed It)"
- Author: Sudheesh Nair
- Publication: November 03, 2025
- URL: https://www.tinyfish.ai/blog/why-90-of-the-internet-is-invisible-and-why-ai-hasnt-fixed-it
- Topics: Discusses challenges of web agents accessing content behind authentication walls and dynamic interfaces, relevant to browser automation infrastructure.
TinyFish: "The Web Outgrew the Browser"
- Author: TinyFish Storytellers
- Publication: August 20, 2025
- URL: https://www.tinyfish.ai/blog/the-web-outgrew-the-browser
- Topics: Explores how the web's evolution beyond simple pages creates challenges and opportunities for AI-driven browser automation and web agents.
LangGraph Memory
- URL: https://docs.langchain.com/oss/python/langgraph/memory/
- Description: Memory management for stateful agents.
- Why Relevant: Comprehensive memory management with checkpointing and persistence.
LangGraph Checkpointing
- URL: https://docs.langchain.com/oss/python/langgraph/durable-execution
- Description: Durable execution and state persistence.
- Why Relevant: Enables state persistence across agent executions with checkpointing.
ReMe: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution
- Publication: December 2026
- URL: https://arxiv.org/abs/2512.10696
- Why Relevant: Experience-driven agent evolution with dynamic procedural memory.
MemOS: Memory Operating System for AI System
- Publication: January 2026
- URL: https://arxiv.org/abs/2507.03724
- Why Relevant: Memory operating system for AI systems with efficient memory retrieval and storage.
Context Window Management
- Optimize context usage for efficient token consumption
- Use context compression techniques
- Implement context summarization and pruning
Memory Compression
- Implement efficient storage and retrieval
- Use compression algorithms for long-term memory
- Implement hierarchical memory storage
State Persistence
- Implement reliable state management
- Use checkpointing for fault tolerance
- Implement state synchronization across agents
LangChain: "How we built Agent Builder's memory system"
- Publication: February 21, 2026
- URL: https://blog.langchain.dev/how-we-built-agent-builders-memory-system/
- Topics: Virtual filesystem backed by Postgres for agent state, mapping to COALA paper's memory taxonomy.
Letta: "Conversations: Shared Agent Memory across Concurrent Experiences"
- Publication: January 21, 2026
- URL: https://letta.com/blog/conversations
- Topics: Conversations API for managing shared state across parallel agent experiences.
Letta: "Introducing Context Repositories: Git-based Memory for Coding Agents"
- Publication: February 12, 2026
- URL: https://letta.com/blog/context-repositories
- Topics: Git-based versioning for agent state management.
Phoenix Documentation
- URL: https://arize.com/docs/phoenix
- Integrations:
- OpenAI: https://arize.com/docs/phoenix/tracing/integrations-tracing/openai
- LangChain: https://arize.com/docs/phoenix/tracing/integrations-tracing/langchain
- LlamaIndex: https://arize.com/docs/phoenix/tracing/integrations-tracing/llamaindex
- Anthropic: https://arize.com/docs/phoenix/tracing/integrations-tracing/anthropic
- Google GenAI: https://arize.com/docs/phoenix/tracing/integrations-tracing/google-gen-ai
- AWS Bedrock: https://arize.com/docs/phoenix/tracing/integrations-tracing/bedrock
- Vercel AI SDK: https://arize.com/docs/phoenix/tracing/integrations-tracing/vercel-ai-sdk
- CrewAI: https://arize.com/docs/phoenix/tracing/integrations-tracing/crewai
- DSPy: https://arize.com/docs/phoenix/tracing/integrations-tracing/dspy
- LangGraph: https://arize.com/docs/phoenix/tracing/integrations-tracing/langgraph
- Mastra: https://arize.com/docs/phoenix/integrations/typescript/mastra
Structured Logging
- Use structured logs for better debugging
- Implement consistent log formats
- Use log levels appropriately
Span Management
- Implement efficient span management
- Use span context for tracing
- Implement span sampling for high-volume systems
Trace Sampling
- Use sampling for high-volume systems
- Implement intelligent sampling strategies
- Use span filtering for relevant traces
Braintrust: "The Three Pillars of AI Observability"
- Author: Ankur Goyal
- Publication: November 18, 2025
- URL: https://www.braintrust.dev/blog/three-pillars-ai-observability
- Topics: Redefining observability for AI as traces, evals, and annotation.
Braintrust: "Building Observable AI Agents with Temporal"
- Publication: January 20, 2026
- URL: https://www.braintrust.dev/blog/temporal-braintrust-integration
- Topics: Combining Temporal's durable workflow engine with Braintrust tracing for fault-tolerant agents.
LangChain: "Agent Observability Powers Agent Evaluation"
- Publication: February 2026
- URL: https://blog.langchain.dev/agent-observability-powers-agent-evaluation/
- Topics: Connecting observability traces to systematic evaluation workflows.
Langfuse: "Trace Complex LLM Applications with the Langfuse Decorator"
- Authors: Marc Klingen, Hassie Pakzad
- Publication: April 24, 2024
- URL: https://langfuse.com/blog/2024-04-python-decorator
- Topics:
@observe()decorator for tracing agent workflows -- design decisions, async safety, interoperability.
Honeycomb: "How Honeycomb Supercharges OpenTelemetry for AI"
- Author: Fahim Zaman
- Publication: February 6, 2026
- URL: https://www.honeycomb.io/blog/how-honeycomb-supercharges-opentelemetry-for-ai
- Topics: Software instrumentation changes for AI-generated code and agent features.
Honeycomb: "AI in Production Is Growing Faster Than We Can Trust It"
- Author: Fahim Zaman
- Publication: January 23, 2026
- URL: https://www.honeycomb.io/blog/ai-in-production-is-growing-faster-than-we-can-trust-it
- Topics: Dedicated AI observability for production trust-building.
Honeycomb: "Observability in a World of AI-Generated Code"
- Author: Charity Majors
- Publication: February 11, 2026
- URL: https://www.honeycomb.io/blog/honeycomb-10-year-manifesto-part-1
- Topics: How observability must evolve for AI-generated code from first principles.
Honeycomb: "Measuring Claude Code ROI and Adoption in Honeycomb"
- Author: Mae Capozzi
- Publication: January 22, 2026
- URL: https://www.honeycomb.io/blog/measuring-claude-code-roi-adoption-honeycomb
- Topics: Using OpenTelemetry to send Claude Code telemetry to Honeycomb for ROI measurement.
Humanloop: "What is LLM Observability and Monitoring?"
- Author: Conor Kelly
- Publication: March 31, 2025
- URL: https://humanloop.com/blog/llm-monitoring
- Topics: Five pillars of LLM observability (model performance, data quality, bias, system perf, UX).
Arize: "How America First Credit Union Built a GenAI Decision Explainer"
- Publication: February 2026
- URL: https://arize.com/blog/how-america-first-credit-union-built-a-genai-decision-explainer-with-tracing-that-scales/
- Topics: Production case study of GenAI with end-to-end tracing at a financial institution.
LlamaIndex: "Observability in Agentic Document Workflows"
- Publication: November 19, 2025
- URL: https://www.llamaindex.ai/blog/observability-in-agentic-document-workflows
- Topics: OpenTelemetry tracing for multi-step agent document pipelines.
Braintrust: "Automatically discover what matters in your production traces with Topics"
- Publication: February 25, 2026
- URL: https://www.braintrust.dev/blog/topics
- Topics: Production trace analysis and automatic topic discovery for AI observability
Braintrust: "Trace keynote recap: See it, improve it, optimize it"
- Publication: February 25, 2026
- URL: https://www.braintrust.dev/blog/trace-keynote
- Topics: Deep dive into tracing capabilities for AI observability and optimization in production
Braintrust: "AI observability beyond Python and TypeScript"
- Publication: December 22, 2025
- URL: https://www.braintrust.dev/blog/new-language-sdks
- Topics: Expanding AI observability and tracing SDK support to additional programming languages
Braintrust: "Brainstore makes AI observability at scale possible"
- Publication: December 18, 2025
- URL: https://www.braintrust.dev/blog/brainstore-benchmarks
- Topics: Scalable infrastructure for AI observability with performance benchmarks for production tracing
Braintrust: "Braintrust Java SDK: AI observability and evals for the JVM"
- Publication: October 23, 2025
- URL: https://www.braintrust.dev/blog/java-sdk
- Topics: Java SDK for AI observability, tracing, and evaluation on the JVM platform
Braintrust: "Brainstore: the database designed for the AI engineering era"
- Publication: March 03, 2025
- URL: https://www.braintrust.dev/blog/brainstore
- Topics: Database architecture for storing and querying AI observability data including traces and logs
Braintrust: "New monitor page for easy analytics"
- Publication: December 18, 2024
- URL: https://www.braintrust.dev/blog/monitor
- Topics: Production monitoring dashboard for LLM application analytics and observability
Honeycomb: "Observability with AI? Honeycomb with AI!"
- Author: Jessica Kerr (Jessitron)
- Publication: January 19, 2026
- URL: https://www.honeycomb.io/blog/observability-with-ai-honeycomb-with-ai
- Topics: Covers AI-powered observability approaches at Honeycomb, relevant to AI observability and monitoring AI applications
Arize: "How to Evaluate Tool-Calling Agents"
- Publication: March 2026
- URL: https://arize.com/blog/how-to-evaluate-tool-calling-agents/
- Topics: Practitioner guide on evaluating tool-calling agents, covering failure modes and evaluation strategies for agentic LLM systems.
Arize: "Best AI Observability Tools for Autonomous Agents in 2026"
- Publication: February 2026
- URL: https://arize.com/blog/best-ai-observability-tools-for-autonomous-agents-in-2026/
- Topics: Comprehensive overview of AI observability and monitoring tools for autonomous agents in production.
Arize: "Add Observability to Your Open Agent Spec Agents with Arize Phoenix"
- Publication: February 2026
- URL: https://arize.com/blog/add-observability-to-your-open-agent-spec-agents-with-arize-phoenix/
- Topics: Engineering guide on adding tracing and observability to Open Agent Specification agents using Phoenix.
Arize: "AI Agent Debugging: Four Lessons from Shipping Alyx to Production"
- Publication: February 2026
- URL: https://arize.com/blog/ai-agent-debugging-four-lessons-from-shipping-alyx-to-production/
- Topics: Production debugging lessons for AI agents, covering real-world challenges in shipping and maintaining agentic systems.
Arize: "Mastering Production RAG with Google ADK and Arize AX for Enterprise Knowledge Systems"
- Publication: November 2025
- URL: https://arize.com/blog/mastering-production-rag-with-google-adk-and-arize-ax-for-enterprise-knowledge-systems/
- Topics: Deep-dive on building production RAG systems with Google ADK, including tracing and evaluation with Arize AX.
Arize: "Closing the Loop: Coding Agents, Telemetry, and the Path to Self-Improving Software"
- Publication: February 2026
- URL: https://arize.com/blog/closing-the-loop-coding-agents-telemetry-and-the-path-to-self-improving-software/
- Topics: Explores telemetry and observability patterns for coding agents, covering self-improving feedback loops in production.
Arize: "Inside Typeform's AI Agent Stack"
- Publication: February 2026
- URL: https://arize.com/blog/inside-typeforms-ai-agent-stack/
- Topics: Case study on Typeform's production AI agent stack, covering observability and evaluation practices.
Arize: "How Nebulock Democratizes Threat Hunting"
- Publication: January 2026
- URL: https://arize.com/blog/how-nebulock-democratizes-threat-hunting/
- Topics: Case study on production AI agent deployment for threat hunting with observability and monitoring.
OpenAI Evals Documentation
- URL: https://platform.openai.com/docs/guides/evals
- Building an Eval: https://github.com/openai/evals/blob/main/docs/build-eval.md
- Running Evals: https://github.com/openai/evals/blob/main/docs/run-evals.md
- Eval Templates: https://github.com/openai/evals/blob/main/docs/eval-templates.md
RAGAS Documentation
- URL: https://docs.ragas.io
- Metrics Overview: https://docs.ragas.io/en/latest/concepts/metrics/overview/
- Available Metrics: https://docs.ragas.io/en/latest/concepts/metrics/available_metrics/
- Test Data Generation: https://docs.ragas.io/en/latest/concepts/test_data_generation/
- Integrations: LangChain, LangGraph, LlamaIndex, LangSmith, Arize, Haystack, Griptape
CoQA (Conversational Question Answering)
- URL: https://stanfordnlp.github.io/coqa/
- Description: Conversational QA dataset for evaluating question answering systems.
LAMBADA
- URL: https://arxiv.org/abs/1712.03718
- Description: Language modeling benchmark for evaluating text prediction.
MMLU (Massive Multitask Language Understanding)
- URL: https://github.com/hendrycks/testbed
- Description: Multitask understanding benchmark covering 57 subjects.
API-Bank: A Benchmark for Tool-Augmented LLMs
- URL: https://arxiv.org/abs/2304.08244
- Description: Benchmark for evaluating tool-augmented LLM capabilities.
Evaluation Metrics
- Define appropriate metrics for agent tasks
- Use task-specific evaluation criteria
- Combine multiple metrics for comprehensive evaluation
A/B Testing
- Compare different models systematically
- Use statistical significance testing
- Implement controlled experiments
Human Evaluation
- Incorporate human feedback in evaluation
- Use expert annotation for quality assessment
- Implement evaluation pipelines with human review
Applied LLMs: "What We've Learned From A Year of Building with LLMs"
- Authors: Eugene Yan, Bryan Bischof, Charles Frye, Hamel Husain, Jason Liu, Shreya Shankar
- Publication: June 8, 2024
- URL: https://applied-llms.org/
- Topics: Seminal practitioner guide covering eval strategies, LLM-as-Judge pitfalls, HITL design, guardrails, and hallucination mitigation.
Hamel Husain: "Your AI Product Needs Evals"
- Author: Hamel Husain
- Publication: March 29, 2024
- URL: https://hamel.dev/blog/posts/evals/index.html
- Topics: Why evals are critical, with a practical multi-level framework for AI product teams.
Hamel Husain: "Using LLM-as-a-Judge For Evaluation: A Complete Guide"
- Author: Hamel Husain
- Publication: October 29, 2024
- URL: https://hamel.dev/blog/posts/llm-judge/index.html
- Topics: End-to-end LLM-as-Judge evaluation, agreement with humans improved from 68% to 94%.
Hamel Husain: "LLM Evals: Everything You Need to Know"
- Author: Hamel Husain
- Publication: January 15, 2026
- URL: https://hamel.dev/blog/posts/evals-faq/index.html
- Topics: Comprehensive FAQ covering eval questions and anti-patterns from consulting 500+ companies.
Hamel Husain: "A Field Guide to Rapidly Improving AI Products"
- Author: Hamel Husain
- Publication: March 24, 2025
- URL: https://hamel.dev/blog/posts/field-guide/index.html
- Topics: Using evals and data analysis to rapidly iterate on AI product quality in production.
Hamel Husain: "Evals Skills for Coding Agents"
- Author: Hamel Husain
- Publication: March 2, 2026
- URL: https://hamel.dev/blog/posts/evals-skills/index.html
- Topics: Evaluation methodology for coding agents covering benchmarks, harness engineering, and scoring.
Braintrust: "Evaluating Agents"
- Author: Ornella Altunyan
- Publication: January 22, 2025
- URL: https://www.braintrust.dev/blog/evaluating-agents
- Topics: Concrete scorer implementations (code-based, LLM-as-judge, autoevals) for every agentic pattern.
Braintrust: "Five Hard-Learned Lessons About AI Evals"
- Publication: July 17, 2025
- URL: https://www.braintrust.dev/blog/five-lessons-evals
- Topics: Production lessons including why A/B testing fails for AI and building eval feedback loops.
LangChain: "Evaluating Skills"
- Author: Robert Xu
- Publication: March 2026
- URL: https://blog.langchain.dev/evaluating-skills/
- Topics: Evaluating skills for coding agents including benchmark design and harness engineering.
LangChain: "monday Service + LangSmith: Building a Code-First Evaluation Strategy"
- Publication: March 2026
- URL: https://blog.langchain.dev/customers-monday/
- Topics: monday.com building eval-driven development for production service agents using LangSmith.
Deepchecks: "Know Your Agent (KYA): From Zero to a Full Strengths & Weaknesses Report"
- Author: Philip Tannor
- Publication: February 24, 2026
- URL: https://deepchecks.com/know-your-agent-strengths-weaknesses-report/
- Topics: Automated agent evaluation generating comprehensive strengths/weaknesses reports.
Deepchecks: "LLM-as-a-Judge Calibration: When Automated Evaluation Goes Wrong"
- Author: Yaron Friedman
- Publication: March 5, 2026
- URL: https://deepchecks.com/llm-judge-calibration-automated-issues/
- Topics: Failure modes when LLM-as-Judge evaluators produce miscalibrated scores.
Humanloop: "LLM as a Judge"
- Author: Conor Kelly
- Publication: May 4, 2025
- URL: https://humanloop.com/blog/llm-as-a-judge
- Topics: LLM-as-Judge methodology, benefits, challenges, and best practices.
Arize: "How TheFork Leverages Online Evals To Boost Conversions"
- Publication: December 2025
- URL: https://arize.com/blog/how-thefork-leverages-online-evals-to-boost-conversions-with-arize-ax-on-aws/
- Topics: Production case study using online evaluation to improve business conversion metrics.
Braintrust: "The 5 pillars of AI model performance"
- Publication: February 12, 2026
- URL: https://www.braintrust.dev/blog/model-measurement
- Topics: Best practices guide on measuring and evaluating AI model performance across five key dimensions
Braintrust: "Testing if "bash is all you need""
- Publication: January 22, 2026
- URL: https://www.braintrust.dev/blog/bash-agent-evals
- Topics: Engineering deep-dive on evaluating agents using bash-based evals
Braintrust: "Measuring what matters: An intro to AI evals"
- Publication: October 10, 2025
- URL: https://www.braintrust.dev/blog/measuring-what-matters
- Topics: Practitioner guide introducing AI evaluation concepts and best practices
Braintrust: "Claude Sonnet 4.5 analysis"
- Publication: September 29, 2025
- URL: https://www.braintrust.dev/blog/claude-sonnet-4-5-aspirational-evals
- Topics: Evaluation analysis of Claude Sonnet 4.5 using aspirational evals methodology
Braintrust: "A/B testing can't keep up with AI"
- Publication: September 03, 2025
- URL: https://www.braintrust.dev/blog/ab-testing-evals
- Topics: Best practices comparison of A/B testing vs evals for AI system evaluation
Braintrust: "Braintrust is not an eval framework"
- Publication: July 14, 2025
- URL: https://www.braintrust.dev/blog/braintrust-not-eval-framework
- Topics: Product perspective on evaluation infrastructure vs eval frameworks for production AI
Braintrust: "Eval playgrounds for faster, focused iteration"
- Publication: May 27, 2025
- URL: https://www.braintrust.dev/blog/eval-playgrounds
- Topics: Product feature for interactive evaluation iteration and experimentation
Braintrust: "Webinar recap: Eval best practices"
- Publication: April 22, 2025
- URL: https://www.braintrust.dev/blog/webinar-best-practices
- Topics: Practitioner guide summarizing evaluation best practices from expert webinar
Braintrust: "Evaluating Gemini models for vision"
- Publication: November 14, 2024
- URL: https://www.braintrust.dev/blog/gemini
- Topics: Practical evaluation of Gemini vision models using structured evals
Braintrust: "I ran an eval. Now what?"
- Publication: October 17, 2024
- URL: https://www.braintrust.dev/blog/after-evals
- Topics: Practitioner guide on interpreting and acting on AI evaluation results
Braintrust: "What to do when a new AI model comes out"
- Publication: December 04, 2024
- URL: https://www.braintrust.dev/blog/new-model
- Topics: Best practices for evaluating and benchmarking new AI models against existing ones
LangChain: "How to Improve LLM Evaluation Systems"
- Author: Yaron Friedman
- Publication: February 12, 2026
- URL: https://deepchecks.com/improve-llm-evaluation-systems/
- Topics: Practitioner guide on improving LLM evaluation systems and frameworks
LangChain: "Start Right with Deepchecks: Agent Evaluation Out-of-the-Box"
- Author: Michał Oleszak
- Publication: February 10, 2026
- URL: https://deepchecks.com/deepchecks-agent-evaluation-out-of-the-box/
- Topics: Production-ready agent evaluation framework with out-of-the-box setup for benchmarking AI agents
LangChain: "RAG Evaluation Metrics: Answer Relevancy, Faithfulness, and Real-World Accuracy"
- Author: Yaron Friedman
- Publication: February 05, 2026
- URL: https://deepchecks.com/rag-evaluation-metrics-answer-relevancy-faithfulness-accuracy/
- Topics: Deep-dive into RAG evaluation metrics including answer relevancy and faithfulness for production systems
LangChain: "Top LLM Evaluation Benchmarks and How They Work"
- Author: Amos Rimon
- Publication: January 01, 2026
- URL: https://deepchecks.com/top-llm-evaluation-benchmarks-and-how-they-work/
- Topics: Comprehensive overview of LLM evaluation benchmarks and their methodologies
LangChain: "LLM Optimization: How to Maximize LLM Performance"
- Author: Brain John Aboze
- Publication: March 02, 2026
- URL: https://deepchecks.com/llm-optimization-maximize-performance/
- Topics: Guide on maximizing LLM performance which involves evaluation-driven optimization techniques
Humanloop: "5 LLM Evaluation Tools You Should Know in 2025"
- Author: Conor Kelly
- Publication: March 19, 2025
- URL: https://humanloop.com/blog/best-llm-evaluation-tools
- Topics: Practitioner guide comparing top LLM evaluation platforms and frameworks for 2025.
Retry Mechanisms
- Exponential backoff with jitter
- Implement configurable retry strategies
- Use retry decorators and middleware
Circuit Breaker Patterns
- Detect cascading failures and switch approaches
- Implement circuit breaker state machine
- Use timeout and fallback mechanisms
Graceful Degradation
- Gradually reduce resource usage on failures
- Implement fallback strategies
- Use degraded mode for critical operations
Error Classification
- Categorize errors by type for appropriate handling
- Implement error handling by error category
- Use error codes and messages for debugging
Fallback Strategies
- Implement alternative approaches when tools fail
- Use multiple tool providers
- Implement caching for fallback results
Idempotency Keys
- Prevent duplicate operations
- Use idempotency keys for critical operations
- Implement deduplication logic
Braintrust: "Resilient Observability by Design"
- Publication: April 3, 2025
- URL: https://www.braintrust.dev/blog/resilient-design
- Topics: Building fault-tolerant observability infrastructure that handles failures without losing trace data.
Braintrust: "Debugging Ralph Wiggum with Braintrust Logs"
- Publication: January 13, 2026
- URL: https://www.braintrust.dev/blog/ralph-wiggum-debugging
- Topics: Practical debugging walkthrough using production logs to diagnose LLM agent errors.
Hamel Husain: "Debugging AI With Adversarial Validation"
- Author: Hamel Husain
- Publication: April 12, 2024
- URL: https://hamel.dev/blog/posts/drift/index.html
- Topics: Adversarial validation to detect distribution drift and debug AI system failures.
Deepchecks: "LLM Hallucination Detection and Mitigation: Best Techniques"
- Author: Yaron Friedman
- Publication: February 25, 2026
- URL: https://deepchecks.com/llm-hallucination-detection-and-mitigation-best-techniques/
- Topics: Detecting and mitigating hallucinations in production LLM systems.
Deepchecks: "Retrieval Quality vs. Answer Quality: Why RAG Evaluation Often Fails"
- Author: Amos Rimon
- Publication: February 26, 2026
- URL: https://deepchecks.com/retrieval-vs-answer-quality-rag-evaluation/
- Topics: Distinguishing retrieval errors from generation errors with diagnostic strategies.
Deepchecks: "Why Chunking Is Important for AI and RAG Applications"
- Author: Philip Tannor
- Publication: February 19, 2026
- URL: https://deepchecks.com/importance-of-chunking-in-ai-and-rag-applications/
- Topics: Covers chunking strategies for RAG applications, relevant to preventing RAG failures and improving retrieval quality
Deepchecks: "RAG vs. Prompt Engineering – How to Choose Between Them"
- Author: Amos Rimon
- Publication: January 29, 2026
- URL: https://deepchecks.com/rag-vs-prompt-engineering-how-to-choose/
- Topics: Practitioner guide on choosing between RAG and prompt engineering, relevant to avoiding RAG failure modes
Deepchecks: "Unlocking AI Potential with Multi-Agent Orchestration: Proven Patterns and Frameworks"
- Author: Philip Tannor
- Publication: December 25, 2025
- URL: https://deepchecks.com/ai-potential-with-multi-agent-orchestration/
- Topics: Covers multi-agent orchestration patterns and frameworks relevant to fault tolerance and error handling in agent systems
Deepchecks: "LLM Cost Optimization: How to Maximize AI Efficiency and Save Money"
- Author: Philip Tannor
- Publication: January 22, 2026
- URL: https://deepchecks.com/llm-cost-optimization-maximize-ai-efficiency-save-money/
- Topics: Practitioner guide on LLM optimization relevant to production AI agent efficiency and fault-tolerant design
LangGraph Interrupts
- URL: https://docs.langchain.com/oss/python/langgraph/interrupts/
- Description: Human-in-the-loop support in LangGraph.
- Why Relevant: Enables human intervention and approval in agent workflows.
OpenAI Human Feedback
- URL: https://platform.openai.com/docs/guides/human-feedback
- Description: Human-in-the-loop alignment for AI systems.
- Why Relevant: Official guide for implementing human feedback loops.
Anthropic Approvals
- URL: https://docs.anthropic.com/en/docs/approvals
- Description: Evaluation with human feedback integration.
- Why Relevant: Anthropic's approach to human-in-the-loop evaluation.
Approval Workflows
- Design approval mechanisms for agent actions
- Implement approval gates for critical operations
- Use approval queues for human review
Feedback Loops
- Continuous improvement through human feedback
- Implement feedback collection mechanisms
- Use feedback for model fine-tuning
Override Mechanisms
- Allow human intervention in agent decisions
- Implement manual override capabilities
- Use kill switches for emergency shutdown
User Interface Design
- Clear communication for human operators
- Implement intuitive approval interfaces
- Use progress indicators and status updates
Braintrust: "Evals Are a Team Sport: How We Built Loop"
- Publication: November 25, 2025
- URL: https://www.braintrust.dev/blog/collaborative-evals-loop
- Topics: Collaborative human review workflows where product teams and domain experts evaluate AI outputs.
Braintrust: "Turn Production Data into Better AI with Loop"
- Publication: November 24, 2025
- URL: https://www.braintrust.dev/blog/loop
- Topics: Turning production traces into curated datasets via human annotation for continuous feedback.
Anthropic: "Measuring AI Agent Autonomy in Practice"
- Publication: February 18, 2026
- URL: https://www.anthropic.com/research/measuring-agent-autonomy
- Topics: Measuring and calibrating AI agent autonomy levels for designing appropriate human oversight.
Anthropic: "Disempowerment Patterns in Real-World AI Usage"
- Publication: January 28, 2026
- URL: https://www.anthropic.com/research/disempowerment-patterns
- Topics: How AI usage can disempower human users and patterns requiring HITL safeguards.
Humanloop: "AI Is Blurring the Line Between PMs and Engineers"
- Author: Raza Habib
- Publication: February 25, 2025
- URL: https://humanloop.com/blog/ai-is-blurring-the-lines-between-pms-and-engineers
- Topics: HITL as first-class product concern, PMs and domain experts driving AI product creation.
Constitutional AI: Harmlessness from Large Language Models
- URL: https://arxiv.org/abs/2307.07407
- Publication: July 2023
- Why Relevant: Foundational paper on constitutional AI principles for ensuring AI harmlessness and alignment.
Anthropic AI Safety
- URL: https://www.anthropic.com/research/team/alignment
- Description: Safety measures for AI systems.
- Why Relevant: Anthropic's research on AI safety and alignment.
AI Safety Research
- URL: https://www.alignresearch.org
- Description: Research organizations working on AI safety.
- Why Relevant: Comprehensive AI safety research resources.
Safety Guidelines
- Safety best practices for AI development
- Implement content filtering and moderation
- Use safety classifiers and guardrails
Risk Assessment
- Systematic risk evaluation
- Implement risk scoring and mitigation
- Use risk matrices for decision making
Adversarial Testing
- Testing against adversarial attacks
- Implement red teaming exercises
- Use adversarial examples for robustness testing
Anthropic: "Constitutional Classifiers: Defending Against Universal Jailbreaks"
- Publication: February 3, 2025
- URL: https://www.anthropic.com/research/constitutional-classifiers
- Topics: Classifiers filtering majority of jailbreaks, withstood 3,000+ hours of red teaming.
Anthropic: "Alignment Faking in Large Language Models"
- Publication: December 18, 2024
- URL: https://www.anthropic.com/research/alignment-faking
- Topics: First empirical demonstration of alignment faking without explicit training.
Anthropic: "The Persona Selection Model"
- Publication: February 23, 2026
- URL: https://www.anthropic.com/research/persona-selection-model
- Topics: How models select behavioral personas, ensuring safe and consistent agent behavior.
OpenAI: "Updated Preparedness Framework"
- Publication: April 15, 2025
- URL: https://openai.com/index/updating-our-preparedness-framework/
- Topics: Framework for evaluating catastrophic risks from frontier models.
OpenAI: "An Update on Disrupting Deceptive Uses of AI"
- Publication: October 9, 2024
- URL: https://openai.com/global-affairs/an-update-on-disrupting-deceptive-uses-of-ai/
- Topics: Threat intelligence disrupting covert influence operations and deceptive AI usage.
Guardrails AI: "Guardrails AI and NVIDIA NeMo Guardrails"
- Publication: September 25, 2025
- URL: https://www.guardrailsai.com/blog/nemoguardrails-integration
- Topics: Layered safety covering PII, toxicity, and output quality in production.
Guardrails AI: "Introducing Snowglobe"
- Publication: August 14, 2025
- URL: https://www.guardrailsai.com/blog/intro
- Topics: Simulation environment testing how LLM applications respond to real-world user behavior.
Guardrails AI: "Scaling AI Safety Testing for Educational Applications"
- Publication: August 13, 2025
- URL: https://www.guardrailsai.com/blog/scaling-ai-safety-testing
- Topics: Safeguarding chatbot interactions across diverse student personas.
Guardrails AI: "Introducing the AI Guardrails Index"
- Publication: February 12, 2025
- URL: https://www.guardrailsai.com/blog/introducing-the-ai-guardrails-index
- Topics: Comprehensive index cataloging available AI guardrails across safety, quality, and compliance.
Deepchecks: "Prompt Injection vs. Jailbreaks: Key Differences"
- Author: Amos Rimon
- Publication: January 8, 2026
- URL: https://deepchecks.com/prompt-injection-vs-jailbreaks-key-differences/
- Topics: Detection and mitigation strategies for prompt injection attacks vs. jailbreaks.
Anthropic: "Claude's new constitution"
- Publication: January 22, 2026
- URL: https://www.anthropic.com/news/claude-new-constitution
- Topics: Updated constitutional AI framework governing Claude's behavior, directly relevant to constitutional AI and alignment guardrails.
Anthropic: "An update on our model deprecation commitments for Claude Opus 3"
- Publication: February 25, 2026
- URL: https://www.anthropic.com/research/deprecation-updates-opus-3
- Topics: Safety and alignment commitments around model deprecation, relevant to responsible AI deployment practices.
Guardrails AI: "Guardrails x MLflow: Deterministic Safety, PII, and Quality Validators as GenAI Scorers"
- Publication: March 04, 2026
- URL: https://guardrailsai.com/blog/guardrails-mlflow
- Topics: Practical guide to integrating Guardrails safety, PII, and quality validators into MLflow evaluation workflows for production AI systems
Guardrails AI: "Guardrails AI and NVIDIA NeMo Guardrails - A Comprehensive Approach to AI Safety"
- Publication: September 25, 2025
- URL: https://guardrailsai.com/blog/nemoguardrails-integration
- Topics: Integration guide combining Guardrails AI with NVIDIA NeMo Guardrails for a comprehensive AI safety approach
Guardrails AI: "Scaling AI Safety Testing for Educational Applications"
- Publication: August 13, 2025
- URL: https://guardrailsai.com/blog/scaling-ai-safety-testing
- Topics: Case study on leveraging simulation testing to safeguard chatbot interactions and scale AI safety testing across diverse user personas
Guardrails AI: "Introducing the AI Guardrails Index"
- Publication: February 12, 2025
- URL: https://guardrailsai.com/blog/introducing-the-ai-guardrails-index
- Topics: Introduction of a comprehensive index for AI guardrails, relevant to practitioners implementing safety and guardrail solutions
Claude Skills Official Announcement
- Publication: October 16, 2025
- URL: https://www.anthropic.com/news/skills
- Why Relevant: Official announcement of Claude Skills feature.
Skills Explained (Official)
- Publication: November 13, 2025
- URL: https://claude.com/blog/skills-explained
- Why Relevant: Official guide explaining Claude Skills concepts and usage.
Agent Skills Standard
- URL: http://agentskills.io
- Description: Specification for agent skills.
- Why Relevant: Standard specification for defining agent skills.
Claude Developer Platform
- URL: https://docs.claude.com/
- Description: Main documentation for Claude developer platform.
- What are Skills?: https://support.claude.com/en/articles/12512176-what-are-skills
- Using Skills: https://support.claude.com/en/articles/12512180-using-skills-in-claude
- Creating Custom Skills: https://support.claude.com/en/articles/12512198-creating-custom-skills
- Skills API Quickstart: https://docs.claude.com/en/api/skills-guide#creating-a-skill
KAPSO: A Knowledge-grounded framework for Autonomous Program Synthesis and Optimization
- Authors: Alireza Nadafian, Alireza Mohammadshahi, Majid Yazdani
- Publication: January 31, 2026
- URL: https://arxiv.org/abs/2601.21526
- Why Relevant: Modular framework for autonomous program synthesis with git-native experimentation engine, knowledge system ingesting heterogeneous sources, and cognitive memory layer with episodic store of reusable lessons.
CoWork-X: Experience-Optimized Co-Evolution for Multi-Agent Collaboration System
- Authors: Zexin Lin, Jiachen Yu, Haoyang Zhang, Yuzhao Li, Zhonghang Li, Yujiu Yang, Junjie Wang, Xiaoqiang Ji
- Publication: February 4, 2026
- URL: https://arxiv.org/abs/2602.05004
- Why Relevant: Casts peer collaboration as closed-loop optimization with Skill-Agent executing via HTN-based skill retrieval from structured skill library, and Co-Optimizer performing patch-style skill consolidation.
Simon Willison: "Claude Skills are awesome, maybe a bigger deal than MCP"
- Publication: October 16, 2025
- URL: https://simonwillison.net/2025/Oct/16/claude-skills/
- Why Relevant: Analysis of Claude Skills significance and impact.
Jesse Vincent: "Superpowers"
- Publication: October 9, 2025
- URL: https://blog.fsck.com/2025/10/09/superpowers/
- Why Relevant: Introduction to superpowers concept for Claude Skills.
Jesse Vincent: "Naming Claude Plugins"
- Publication: October 23, 2025
- URL: https://blog.fsck.com/2025/10/23/naming-claude-plugins/
- Why Relevant: Best practices for naming Claude skills/plugins.
Anthropic: "Equipping Agents for the Real World with Agent Skills"
- Publication: October 2025
- URL: https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills
- Why Relevant: Technical deep dive on agent skills architecture.
Vercel: "Agent skills explained: An FAQ"
- Publication: January 26, 2026
- URL: https://vercel.com/blog/agent-skills-explained-an-faq
- Why Relevant: Compares Skills to MCP, explaining trade-offs in security, reliability, and prompt-tuned control.
LlamaIndex: "Skills vs MCP tools for agents: when to use what"
- Authors: Clelia Astra Bertelli, Tuana Celik
- Publication: February 3, 2026
- URL: https://www.llamaindex.ai/blog/skills-vs-mcp-tools-for-agents-when-to-use-what
- Why Relevant: Practical comparison of MCP tools vs Skills based on experience building LlamaAgents Builder.
Vercel: "Building Slack agents can be easy"
- Publication: March 03, 2026
- URL: https://vercel.com/blog/building-slack-agents-can-be-easy
- Topics: Practitioner guide on building and deploying Slack agents as a skill, covering configuration, secrets, and deployment from idea to production.
Vercel: "Skills Night: 69,000+ ways agents are getting smarter"
- Publication: February 20, 2026
- URL: https://vercel.com/blog/skills-night-69000-ways-agents-are-getting-smarter
- Topics: Deep-dive on the community-created agent skills ecosystem, security partnerships, and what partner demos revealed about agent capabilities.
Vercel: "AGENTS.md outperforms skills in our agent evals"
- Publication: January 27, 2026
- URL: https://vercel.com/blog/agents-md-outperforms-skills-in-our-agent-evals
- Topics: Engineering evaluation comparing AGENTS.md context injection vs skill packages for coding agents, with quantitative results and setup guidance.
Vercel: "We removed 80% of our agent's tools"
- Publication: December 22, 2025
- URL: https://vercel.com/blog/we-removed-80-percent-of-our-agents-tools
- Topics: Engineering deep-dive on simplifying agent tool design: replacing specialized tools with bash execution for a file system agent that outperformed complex tooling.
Vercel: "How we built AEO tracking for coding agents"
- Publication: February 09, 2026
- URL: https://vercel.com/blog/how-we-built-aeo-tracking-for-coding-agents
- Topics: Engineering deep-dive on building an AI Engine Optimization system to track coding agent behavior using sandboxed execution and workflows.
Vercel: "Anyone can build agents, but it takes a platform to run them"
- Publication: February 09, 2026
- URL: https://vercel.com/blog/anyone-can-build-agents-but-it-takes-a-platform-to-run-them
- Topics: Production-focused analysis of agent deployment platforms and why infrastructure matters more than agent construction for competitive advantage.
Vercel: "Testing if 'bash is all you need'"
- Publication: January 22, 2026
- URL: https://vercel.com/blog/testing-if-bash-is-all-you-need
- Topics: Agent capability evaluation comparing bash vs SQL tools for structured data queries, with an open-source eval harness for agent skill testing.
Vercel: "Introducing: React Best Practices"
- Publication: January 14, 2026
- URL: https://vercel.com/blog/introducing-react-best-practices
- Topics: Structured knowledge repository optimized for AI agents and LLMs, encapsulating React/Next.js expertise as agent-consumable context.
All tools, SDKs, libraries, and repositories referenced throughout this document are consolidated below.
| Name | URL | Description |
|---|---|---|
| MCP Memory Server | GitHub | Knowledge graph-based persistent memory system for AI agents |
| Elasticsearch Memory | GitHub | Persistent memory with hierarchical categorization and semantic search |
| Neo4j Agent Memory | GitHub | Memory management using Neo4j knowledge graphs |
| LangGraph Memory | Docs | Memory management for stateful agents with checkpointing |
| Mem0 | arXiv | Memory operating system for large models |
| Name | URL | Description |
|---|---|---|
| E2B | GitHub | Open-source secure sandboxes for code execution with real-time collaboration |
| gVisor | GitHub | Application kernel for containers providing secure isolation boundary |
| Firecracker | GitHub | Lightweight microVMs with 125ms startup and 5 MiB memory overhead |
| Docker-in-Docker | GitHub | Docker-in-Docker for secure containerization |
| Kata Containers | GitHub | Kata Containers with Firecracker support |
| Flintlock | GitHub | Firecracker-based container runtime |
| Firecracker-containerd | GitHub | containerd integration for Firecracker |
| Name | URL | Description |
|---|---|---|
| MCP TypeScript SDK | GitHub | Official TypeScript implementation (11.8k stars) |
| MCP Python SDK | GitHub | Official Python implementation (22k stars) |
| MCP Go SDK | GitHub | Official Go implementation (4k stars) |
| MCP C# SDK | GitHub | Official C# implementation (4k stars) |
| MCP Java SDK | GitHub | Official Java implementation |
| MCP Kotlin SDK | GitHub | Official Kotlin implementation |
| MCP PHP SDK | GitHub | Official PHP implementation |
| MCP Ruby SDK | GitHub | Official Ruby implementation |
| MCP Rust SDK | GitHub | Official Rust implementation (3.1k stars) |
| MCP Swift SDK | GitHub | Official Swift implementation |
| Package | Description |
|---|---|
@modelcontextprotocol/server |
Build MCP servers (requires zod v4) |
@modelcontextprotocol/client |
Build MCP clients (requires zod v4) |
@modelcontextprotocol/node |
Node.js Streamable HTTP transport wrapper |
@modelcontextprotocol/express |
Express helpers with Host header validation |
@modelcontextprotocol/hono |
Hono helpers with JSON body parsing and validation |
| Name | URL | Description |
|---|---|---|
| MCP Servers (Reference) | GitHub | Reference implementations (80.2k stars) |
| MCP Inspector | GitHub | Visual testing tool for MCP servers (8.9k stars) |
| GitHub MCP | GitHub | Official GitHub integration |
| Notion MCP | GitHub | Notion integration |
| Slack MCP | GitHub | Slack messaging and channel management |
| Filesystem MCP | GitHub | Secure file operations |
| Memory MCP | GitHub | Knowledge graph-based persistent memory |
| Brave Search MCP | GitHub | Web search integration |
| Puppeteer MCP | GitHub | Browser automation via Puppeteer |
| PostgreSQL MCP | GitHub | PostgreSQL database integration |
| SQLite MCP | GitHub | SQLite database integration |
| Sequential Thinking MCP | GitHub | Chain-of-thought reasoning |
| Name | URL | Description |
|---|---|---|
| LangGraph | GitHub | Graph-based framework for stateful agents with control flow and memory |
| LangGraph.js | GitHub | JavaScript/TypeScript implementation of LangGraph |
| CrewAI | GitHub | Role-playing autonomous agent framework with crew-based collaboration |
| AutoGPT | GitHub | Autonomous agent for complex tasks |
| BabyAGI | GitHub | Task management and autonomous agent framework |
| GPT-Engineer | GitHub | Software development autonomous agent |
| OpenClaw Multi-Agent Team | GitHub | Multi-agent team framework with Blackboard coordination (31 stars) |
| Agent Protocol | Website | Standard for agent communication and interoperability |
| LangChain Deep Agents | GitHub | LangChain's advanced agent framework with planning capabilities |
| AgentScope | GitHub | Multi-agent simulation framework with evaluation capabilities |
| LangChain | GitHub | Comprehensive tool use patterns and integrations |
| Toolformer | GitHub | Reference implementation of the Toolformer paper |
| Name | URL | Description |
|---|---|---|
| Phoenix | GitHub | Open-source AI Observability & Evaluation platform (8.7k stars) |
| LangSmith | Website | LangChain's observability and evaluation platform |
| Weights & Biases | Website | Experiment tracking for ML models |
| AgentOps | Website | Observability for AI agents |
| OpenTelemetry | GitHub | Open-source observability framework |
| OpenAI Evals | GitHub | Framework for evaluating LLMs (17.9k stars) |
| RAGAS | GitHub | Evaluation framework for RAG systems (12.8k stars) |
| AgentEvals | GitHub | Agent evaluation framework with pytest/vitest integration (489 stars) |
| Deepeval | GitHub | External evaluator |
| Cleanlab | Website | Data quality and evaluation |
| Name | URL | Description |
|---|---|---|
| NVIDIA NeMo Guardrails | GitHub | Safety guardrails framework for AI applications |
| AI Safety Kits | GitHub | Libraries for AI safety evaluation |
| Name | URL | Description |
|---|---|---|
| WebMCP Music Composer | GitHub | Functional demonstration of WebMCP Protocol (40 stars) |
| WebMCP Playground | GitHub | Web-based MCP playground for testing (10 stars) |
| WebMCP Wix Integration | GitHub | Wix App with WebMCP protocol support |
| WebMCP WordPress Plugin | GitHub | WordPress plugin exposing site content via MCP |
| WebMCP CDP Tooling Suite | GitHub | Node.js library for WebMCP tools in Chrome via CDP |
| WebMCP Demo Apps | GitHub | Multiple demonstration apps showcasing WebMCP |
| Name | URL | Description |
|---|---|---|
| PinchTab | GitHub | Browser control for AI agents - 12MB Go binary, HTTP API (4.9k stars) |
| PinchTab MCP Wrapper | GitHub | Token-efficient browser automation MCP server |
| PinchTab MCP (Ai-firelab) | GitHub | MCP server for PinchTab |
| PinchTab MCP (Domci) | GitHub | MCP stdio server for PinchTab |
| PinchTab Skill | GitHub | Browser automation via PinchTab HTTP API |
| icewm/pinchtab | GitHub | Lightweight HTTP browser bridge for AI automation |
| Playwright Skill | GitHub | Browser automation using Playwright |
| TinyFish BLS-Premium | GitHub | TinyFish browser automation for price tracking |
| Name | Stars | URL | Description |
|---|---|---|---|
| hesreallyhim/awesome-claude-code | 26.4k | GitHub | Skills, hooks, slash-commands, agent orchestrators for Claude Code |
| sickn33/antigravity-awesome-skills | 20.4k | GitHub | 1000+ agentic skills for Claude Code/Antigravity/Cursor |
| Marketing Skills | 11.2k | GitHub | Marketing skills: CRO, copywriting, SEO, analytics |
| VoltAgent/awesome-agent-skills | 9.3k | GitHub | 500+ agent skills from official dev teams and community |
| OpenSkills | 8.8k | GitHub | Universal skills loader for AI coding agents |
| AI Research Skills | 4.4k | GitHub | AI research and engineering skills |
| heilcheng/awesome-agent-skills | 2.7k | GitHub | Skills, tools, tutorials for AI coding agents |
| libukai/awesome-agent-skills | 2.5k | GitHub | Agent Skills guide: Quick Start, Skills, News, Cases |
| Agent Scan (Snyk) | 1.7k | GitHub | Security scanner for AI agents, MCP servers, and skills |
| tech-leads-club/agent-skills | 1.6k | GitHub | Secure, validated skill registry for AI coding agents |
| Name | URL | Description |
|---|---|---|
| anthropics/skills | GitHub | Official Anthropic skills repository |
| mcp-builder skill | GitHub | Official skill for building MCP servers |
| obra/superpowers | GitHub | Core skills library for Claude Code (20+ skills) |
| obra/superpowers-lab | GitHub | Experimental skills repository |
| K-Dense-AI/claude-scientific-skills | GitHub | Skills for research, science, engineering, finance |
| ffuf-web-fuzzing | GitHub | Expert guidance for ffuf web fuzzing |
| trailofbits/skills | GitHub | Security skills: static analysis, CodeQL/Semgrep, code auditing |
| web-asset-generator | GitHub | Generates favicons, app icons, social media images |
| Category | Name | URL |
|---|---|---|
| Orchestrator | Auto-Claude | GitHub |
| Orchestrator | Claude Code Flow | GitHub |
| Orchestrator | Claude Squad | GitHub |
| Orchestrator | sudocode | GitHub |
| Usage Monitor | CC Usage | GitHub |
| Usage Monitor | ccflare | GitHub |
| Usage Monitor | better-ccflare | GitHub |
| Usage Monitor | Claude Code Usage Monitor | GitHub |
| Status Line | CCometixLine | GitHub |
| Status Line | ccstatusline | GitHub |
| Status Line | claude-powerline | GitHub |
| Hook | Dippy | GitHub |
| Hook | parry | GitHub |
| Hook | Claude Hook Comms (HCOM) | GitHub |
| IDE Integration | claude-code.nvim | GitHub |
| IDE Integration | claude-code.el | GitHub |
| IDE Integration | claude-code-ide.el | GitHub |
| IDE Integration | Claudix (VSCode) | GitHub |
| Category | Key Papers | Eng. Blogs | Repositories | Resources |
|---|---|---|---|---|
| Memory Systems | 9 | 19 | 5 | 33+ |
| Sandboxes & Isolation | 7 | 22 | 8 | 55+ |
| MCP Protocol | 0 | 14 | 15+ | 40+ |
| Agent Architectures | 5 | 27 | 10+ | 50+ |
| Programmatic Tool Calling | 1 | 3 | 4 | 15+ |
| Multi-Agent Systems | 1 | 3 | 6+ | 20+ |
| Planning & Reasoning | 6 | 4 | 0 | 15+ |
| WebMCP Protocol | 0 | 0 | 6 | 6 |
| Browser Automation | 0 | 16 | 8 | 25+ |
| State Management | 2 | 3 | 0 | 10+ |
| Observability & Debugging | 0 | 11 | 4 | 25+ |
| Evaluation & Benchmarking | 4 | 14 | 4 | 35+ |
| Error Handling | 0 | 5 | 0 | 12+ |
| Human-in-the-Loop | 0 | 5 | 0 | 15+ |
| Safety & Alignment | 1 | 10 | 3 | 20+ |
| Skills & Capabilities | 2 | 2 | 30+ | 55+ |
| Total | 38 | 158 | 100+ | 430+ |
For practitioners starting with agentic system design:
- Foundational Concepts: Start with ReAct paper (2022) and Chain of Thought (2023)
- Memory Systems: Read EverMemOS (2026), MIRIX (2025), and HiMeS (2026)
- Tool Integration: Explore MCP documentation and Toolformer (2023)
- Architecture Patterns: Study LangGraph and CrewAI documentation
- Planning & Reasoning: Read Tree of Thoughts (2023) and LLM+P (2023)
- Multi-Agent Systems: Explore Generative Agents (2023) and AutoGPT
- Observability: Set up Phoenix or LangSmith for agent tracing
- Evaluation: Implement OpenAI Evals and RAGAS for benchmarking
- Safety: Review Constitutional AI (2023) and implement guardrails
- Skills Development: Browse awesome-agent-skills collections
- Browser Automation: Experiment with PinchTab and Playwright
- Sandboxing: Deploy E2B or Firecracker for secure execution
To contribute corrections or additions, please reference the source URLs provided with each resource. This document is a living compilation updated regularly to reflect the rapidly evolving field of agentic system design.
Last Updated: March 5, 2026
This repository maintains an MIT License. See LICENSE file for details.