Feature/fix wikipedia by cmosguy · Pull Request #240 · anthropics/claude-cookbooks

cmosguy · 2025-10-24T14:39:37Z

There were some changes in the api for that needed to be updated in the wikipedia notebook example. We need to start using the transformers library for the tokenizer.

…fix-wikipedia

PedramNavid

Approved, just a small suggestion on one line.

third_party/Wikipedia/wikipedia-search-cookbook.ipynb

PedramNavid · 2025-10-28T16:34:28Z

@cmosguy there's a few conflicts that need to be resolved. i think you can just pull from the main branch for both pyproject and uv.lock but ensure your ipython notebook has run with ruff format and ruff check.

…ature/fix-wikipedia

cmosguy · 2025-10-30T16:08:34Z

@PedramNavid - trying again, thanks for the heads up

PedramNavid

Hi @cmosguy

I went through the full PR. There's a few issues

Rather than import tokenizer, lets rely on the count tokens API already available in the Anthropic library
Please ensure you are using the ruff formatter, as there are a lot of changed lines here that are pure formatting changes that should not be part of the diff
Please do not make changes to the uv.lock file, we should not be making any dependency changes for this PR.

PedramNavid · 2025-10-30T16:41:26Z

third_party/Wikipedia/wikipedia-search-cookbook.ipynb

    "    def process_raw_search_results(\n",
-    "        self,\n",
-    "        results: list[SearchResult],\n",
+    "        self, results: list[SearchResult],\n",


Have you run ruff format on this notebok? This change you made undoes what our formatter is doing.

ok ran ruff format

PedramNavid · 2025-10-30T16:42:01Z

third_party/Wikipedia/wikipedia-search-cookbook.ipynb

    "        result = \"\\n\".join(\n",
    "            [\n",
-    "                f'<item index=\"{i + 1}\">\\n<page_content>\\n{r}\\n</page_content>\\n</item>'\n",
+    "                f'<item index=\"{i+1}\">\\n<page_content>\\n{r}\\n</page_content>\\n</item>'\n",


Same here, looks like your linter/formatter is not using the ruff format we use.

PedramNavid · 2025-10-30T16:48:26Z

third_party/Wikipedia/wikipedia-search-cookbook.ipynb

    "class WikipediaSearchResult(SearchResult):\n",
    "    title: str\n",
    "\n",
+    "from transformers import AutoTokenizer\n",


Rather than adding a new dependency, I think we should use the messages.count_tokens API.

@PedramNavid agreed removed this

PedramNavid · 2025-10-30T16:48:46Z

third_party/Wikipedia/wikipedia-search-cookbook.ipynb

    "                page = wikipedia.page(result)\n",
    "                print(page.url)\n",
-    "            except Exception:\n",
+    "            except:\n",


should not have bare exceptions

PedramNavid · 2025-10-30T16:49:29Z

third_party/Wikipedia/wikipedia-search-cookbook.ipynb

+    "# load the antrophic key from .env\n",
+    "from dotenv import load_dotenv\n",
+    "load_dotenv(verbose=True)\n",
+    "ANTHROPIC_SEARCH_MODEL = os.environ.get('ANTHROPIC_MODEL', 'claude-2')\n",


Let's default to haiku 4-5

PedramNavid · 2025-10-30T16:50:38Z

uv.lock

 [[package]]
 name = "huggingface-hub"
-version = "1.0.0"
+version = "0.36.0"


should not be changing our existing dependencies to a lower version.

cmosguy · 2025-11-03T13:21:36Z

@PedramNavid gentle ping here on the updates. Do they meet your requirements now?

PedramNavid

Hi @cmosguy. I've given it another look, there's quite a few issues still with this PR. I think part of the challenge is that was an old notebook from Claude 2 and so is mixing a lot of old and new concepts. I wonder if maybe a re-write might be better than trying to fix things piece meal. Either way, I've noted a few logic issues that would need to be resolved before we can merge.

PedramNavid · 2025-11-03T17:05:37Z

pyproject.toml

    "voyageai>=0.3.5",
+    "python-dotenv>=1.1.1",
+    "wikipedia>=1.4.0",
+    "huggingface-hub>=1.0.0",


Can delete this I imagine

I think we should leave in wikipedia, right as that is required in this notebook

PedramNavid · 2025-11-03T17:28:21Z

third_party/Wikipedia/wikipedia-search-cookbook.ipynb

    "\n",
-    "    def __init__():\n",
+    "    def __init__(self, anthropic_client: Anthropic):\n",
+    "        self.anthropic_client = anthropic_client\n",


Why did you add the client here?
__init__ now takes anthropic_client: Anthropic as a required parameter, but then has a pass statement that does nothing with it.

Either remove the pass or properly initialize self.anthropic_client = anthropic_client.

PedramNavid · 2025-11-03T17:30:05Z

third_party/Wikipedia/wikipedia-search-cookbook.ipynb

    "# Create a searcher\n",
    "wikipedia_search_tool = WikipediaSearchTool()\n",
-    "ANTHROPIC_SEARCH_MODEL = \"claude-2\"\n",
+    "# load the antrophic key from .env\n",


can delete this comment, the loading happens at cell 3 with `load_dotenv()

PedramNavid · 2025-11-03T17:31:06Z

third_party/Wikipedia/wikipedia-search-cookbook.ipynb

    "            )\n",
    "            print(partial_completion)\n",
-    "            token_budget -= self.count_tokens(partial_completion)\n",
+    "            token_count = self.messages.count_tokens(\n",


I'm not sure this is correct.

You call self.messages.count_tokens() after every partial completion to count the prompt tokens, not the completion tokens.

This is backwards - you should be subtracting partial_completion_.usage.input_tokens + partial_completion_.usage.output_tokens from the budget, which are already returned by the Messages API.

PedramNavid · 2025-11-03T17:33:10Z

third_party/Wikipedia/wikipedia-search-cookbook.ipynb

    "        )\n",
-    "        information = extract_between_tags(\"information\", retrieval_response)[-1]\n",
+    "\n",
+    "        # Try to extract information tags, handle case where none exist\n",


When <information> tags are missing you use the entire retrieval response as information. This could include all the scratchpad content and search quality reflections. Any reason why this is necessary? If no tags are found I would think there's an error in the response

PedramNavid · 2025-11-03T17:44:24Z

third_party/Wikipedia/wikipedia-search-cookbook.ipynb

    "        self.search_tool = search_tool\n",
    "        self.verbose = verbose\n",
    "\n",
+    "        # Pass the anthropic client to the search tool if it supports it\n",


why wouldnt the search tool support it? You've updated the definition.

PedramNavid · 2025-11-03T17:44:46Z

third_party/Wikipedia/wikipedia-search-cookbook.ipynb

    "        if search_query is None:\n",
    "            raise Exception(\n",
-    "                \"Completion with retrieval failed as partial completion returned mismatched <search_query> tags.\"\n",
+    "                f\"Completion with retrieval failed as partial completion returned mismatched <search_query> tags.\"\n",


why is this an f string?

cmosguy added 3 commits October 24, 2025 09:04

trying to fix to use the new packages from anthropic

865fced

adding the wikipedia and transformers library

b78e11a

Merge commit 'b78e11aa58f2498e4de998bb8ac4cf6da477b24f' into feature/…

862233a

…fix-wikipedia

PedramNavid previously approved these changes Oct 24, 2025

View reviewed changes

third_party/Wikipedia/wikipedia-search-cookbook.ipynb Outdated Show resolved Hide resolved

fixed the recommended line of code

13c9e99

cmosguy dismissed PedramNavid’s stale review via 13c9e99 October 24, 2025 21:01

cmosguy added 2 commits October 30, 2025 11:07

Merge branch 'main' of github.com:anthropics/claude-cookbooks into fe…

6217692

…ature/fix-wikipedia

synced to main

e2f344d

PedramNavid requested changes Oct 30, 2025

View reviewed changes

cmosguy added 3 commits October 30, 2025 15:47

fixed based on feedback from PedramNavid

059d4d3

ran ruff format

70d6284

fixed some minor issues

606f19f

cmosguy requested a review from PedramNavid October 31, 2025 18:05

PedramNavid reviewed Nov 3, 2025

View reviewed changes

Conversation

cmosguy commented Oct 24, 2025

Uh oh!

PedramNavid left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

PedramNavid commented Oct 28, 2025

Uh oh!

cmosguy commented Oct 30, 2025

Uh oh!

PedramNavid left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmosguy commented Nov 3, 2025

Uh oh!

PedramNavid left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants