Skip to content

feat: PYOK LLM integration, LinkedIn import, job post tailoring (v0.3.0)#96

Merged
athola merged 11 commits intomainfrom
pyok-0.3.0
Mar 10, 2026
Merged

feat: PYOK LLM integration, LinkedIn import, job post tailoring (v0.3.0)#96
athola merged 11 commits intomainfrom
pyok-0.3.0

Conversation

@athola
Copy link
Owner

@athola athola commented Feb 25, 2026

Summary

Fixes #8, Fixes #9, Fixes #10

Architecture

Follows Functional Core / Imperative Shell pattern:

Layer New Modules Purpose
Core core/llm/protocol.py LLMProvider ABC + LLMConfig dataclass
Core core/llm/prompts.py Prompt templates (pure data)
Core core/ats/tailor.py Gap analysis + tailoring prompt builder
Core core/importers/linkedin.py LinkedIn data → resume schema transform
Shell shell/llm/client.py LiteLLMProvider (litellm wrapper)
Shell shell/llm/config.py API key resolution (CLI, env vars)
Shell shell/llm/gate.py requires_llm decorator, availability check
Shell shell/importers/linkedin_fetcher.py HTML/text LinkedIn parser
Shell shell/cli/_import.py import CLI subcommand
Shell shell/cli/_tailor.py tailor CLI subcommand

New Optional Dependencies

[project.optional-dependencies]
llm = ["litellm>=1.0,<2.0"]
linkedin = ["beautifulsoup4>=4.12,<5.0"]
all = ["simple-resume[llm,linkedin]"]

Test plan

  • 1970 tests passing (115 new), 94.81% coverage
  • All pre-commit hooks pass (ruff, mypy, bandit, ty, architecture checks)
  • Wheel install test passes
  • simple-resume --version reports 0.3.0

Design document covering LLM infrastructure (issue #8),
LinkedIn import (issue #9), and job post tailoring (issue #10).
Specifies FCIS architecture, feature gating via optional deps,
and implementation phases.
Add pass-your-own-key (PYOK) infrastructure for LLM-powered features:

- LLMProvider ABC and LLMConfig dataclass in core/llm/protocol.py
- Prompt templates for tailoring, job extraction, LinkedIn, colors
- LiteLLM wrapper client implementing LLMProvider protocol
- API key resolution from CLI args and environment variables
- requires_llm decorator for feature gating behind [llm] optional dep
- LLMNotAvailableError and LLMError exception types
- 58 unit tests covering all new modules
Phase 2a - LinkedIn Import:
- core/importers/linkedin.py: pure transform from LinkedIn profile
  dict to simple-resume YAML schema
- shell/importers/linkedin_fetcher.py: HTML and text file parsing
  with BeautifulSoup, supporting .html and .txt formats
- 27 unit tests for importer and fetcher

Phase 2b - Job Post Tailoring:
- core/ats/tailor.py: gap analysis comparing resume skills/keywords
  against job requirements, weighted match scoring, and LLM prompt
  construction for resume tailoring
- JobRequirements and GapAnalysis dataclasses
- 14 unit tests for tailor module
- Add LLMProvider, LLMConfig, LLMError, LLMNotAvailableError exports
- Add linkedin_to_simple_resume export
- Add JobRequirements, GapAnalysis, analyze_gaps, build_tailoring_prompt
- Update API surface contract test with new symbols
- Bump version from 0.2.5 to 0.3.0 in pyproject.toml and __init__.py
@athola
Copy link
Owner Author

athola commented Mar 6, 2026

PR Review: feat: PYOK LLM integration, LinkedIn import, job post tailoring (v0.3.0)

Branch: pyok-0.3.0main | Files: 33 changed (+4,256) | Tests: 1970 passing (115 new)

Verdict: APPROVE with non-blocking suggestions


Scope Validation

PR implements 3 issues (#8, #9, #10) matching the design plan in docs/plans/2026-02-24-pyok-linkedin-jobpost-design.md. All features are in scope. Version bump to 0.3.0 is consistent across pyproject.toml, __init__.py, and runtime.

Architecture ✅

Clean FCIS (Functional Core / Imperative Shell) separation:

  • Core (pure): protocol.py, prompts.py, tailor.py, linkedin.py — zero I/O
  • Shell (side effects): client.py, config.py, gate.py, linkedin_fetcher.py, CLI handlers
  • Feature gating via requires_llm decorator is well-designed
  • Optional deps properly configured: [llm], [linkedin], [all]

Blocking Issues

None. The PR is clean and well-structured.

Non-Blocking Improvements

1. Missing LLM response None check (shell/llm/client.py:70)

return response.choices[0].message.content  # Could be None

If the model returns no content, this silently returns None instead of str. Consider:

content = response.choices[0].message.content
if content is None:
    raise LLMError("LLM returned empty response", provider="litellm", model=self.config.model)
return content

2. Tailored output not validated (shell/cli/_tailor.py:159-160)
The design doc specifies "validate tailored YAML" but the LLM output is written raw to disk. If the LLM returns malformed YAML or markdown-wrapped output, the user gets a broken file. Consider parsing with yaml.safe_load() before writing, or at minimum adding a comment that validation is deferred.

3. Dead parameter in _run_llm_tailoring (shell/cli/_tailor.py:102)
The gap parameter (typed as object) is accepted but never used in the function body. Either remove it or pass gap context to the tailoring prompt for better results.

4. complete_structured JSON extraction fragility (shell/llm/client.py:112)
LLMs frequently wrap JSON in markdown code fences (```json ... ```). A simple strip of code fences before json.loads() would improve reliability:

raw = raw.strip()
if raw.startswith("```"):
    raw = "\n".join(raw.split("\n")[1:-1])

5. [all] optional dep doesn't include [bert] (pyproject.toml)
The all extra is [llm,linkedin] but bert is also an optional feature. Either rename to v03 or include bert in all.

Slop Detection ✅

  • Commit messages: Clean, no AI attribution or slop markers
  • Changelog: Professional, factual, no filler language
  • PR description: Well-structured, accurate

Code Quality ✅

  • No code duplication detected across modules
  • Type annotations consistent throughout
  • Error handling follows project's exception hierarchy pattern
  • Lazy imports used correctly for optional dependencies
  • __all__ exports defined in every module

Security ✅

  • API keys stripped of whitespace, not logged
  • No prompt injection risk (user data in clearly delimited sections)
  • HTML parsing uses html.parser (not lxml) — safe default
  • yaml.safe_load() used for untrusted input
  • File paths validated before reading

Version Consistency ✅

Source Version
pyproject.toml 0.3.0
__init__.py 0.3.0
Runtime 0.3.0
Changelog 0.3.0

@athola
Copy link
Owner Author

athola commented Mar 6, 2026

Test Plan

Automated Verification

  • Full test suite: 1970 tests passing (115 new for v0.3.0 features)
  • Pre-commit hooks: ruff, mypy, bandit pass
  • Version consistency: 0.3.0 across all sources

Manual Verification Checklist

PYOK Infrastructure (#8)

  • from simple_resume import LLMProvider, LLMConfig works without [llm] installed
  • LLMNotAvailableError raised with clear install hint when [llm] missing
  • resolve_api_key() resolves from env vars (OPENAI_API_KEY, ANTHROPIC_API_KEY)
  • resolve_api_key(cli_key="sk-...") takes CLI arg over env var

LinkedIn Import (#9)

  • simple-resume import --linkedin profile.html produces valid YAML
  • simple-resume import --linkedin profile.txt handles plain text format
  • simple-resume import --linkedin missing.html shows clear error
  • --output custom.yaml writes to specified path

Job Post Tailoring (#10)

  • simple-resume tailor resume.yaml job.txt --report shows gap analysis without LLM
  • simple-resume tailor resume.yaml job.txt --api-key KEY produces tailored resume
  • Missing API key shows clear instruction message
  • Missing [llm] extra shows install hint

Regression

  • simple-resume generate still works (existing functionality)
  • simple-resume screen still works (ATS scoring)
  • simple-resume --version reports 0.3.0

@athola
Copy link
Owner Author

athola commented Mar 6, 2026

PR Review Addendum - Deep Analysis Findings

Additional findings from parallel code review and test coverage analysis.

Elevated to Non-Blocking (from agent reviews)

6. Hard bs4 import in linkedin_fetcher.py:14
from bs4 import BeautifulSoup, Tag is a top-level import of an optional dependency. Unlike client.py which correctly defers litellm inside method bodies, linkedin_fetcher imports bs4 eagerly. This is mitigated because _import.py lazily imports linkedin_fetcher inside handle_import_command(), so CLI startup is safe. But when simple-resume import --linkedin is invoked without [linkedin], the error is a raw ModuleNotFoundError instead of a clean message. Should use the same lazy-import pattern as client.py or add an is_linkedin_available() guard.

7. is_current stored as string "true" instead of True (linkedin_fetcher.py:113)
_parse_date_range sets entry["is_current"] = "true" (string). The downstream _convert_experience in core/importers/linkedin.py uses _get_str(entry, "is_current") which happens to be truthy for any non-empty string, so it works by accident. Should be entry["is_current"] = True.

8. Duplicated default model string (shell/llm/config.py:78 and core/llm/protocol.py:27)
Both resolve_llm_config() and LLMConfig define "claude-sonnet-4-20250514" as the default model. This creates drift risk. resolve_llm_config could delegate to LLMConfig's field default instead.

9. Missing UnicodeDecodeError handling (linkedin_fetcher.py:297)
file_path.read_text(encoding="utf-8") for non-UTF-8 files raises UnicodeDecodeError, which is not caught by handle_import_command. It falls through to the bare except Exception safety net instead of producing a clean error message.

Test Coverage Gaps (from test analysis agent)

Priority Gap Severity
1 _run_llm_tailoring success path completely untested High
2 _run_llm_tailoring LLMError exception path untested High
3 HTML experience extraction (dates, is_current) not asserted Medium
4 resolve_llm_config with temperature=0.0 edge case Low
5 Short keyword filter (len > 3) excludes "Go", "R" etc. Low

The _run_llm_tailoring function is the primary value proposition of the tailor feature and has zero test coverage for its success path. This is the most significant test gap.

Fragile Test Pattern

test_tailor_no_api_key_without_report manually pops/restores env vars instead of using monkeypatch.delenv. Could leak state on failure.

@athola
Copy link
Owner Author

athola commented Mar 6, 2026

PR Review Addendum 2 - Core Domain Findings (Verified)

Three additional findings from deep core domain analysis, all verified with reproduction evidence.

Elevated Findings

10. SCORING BUG: _compute_score doesn't normalize by active weights (core/ats/tailor.py:134)

When a job posting has only required skills (no preferred, no keywords), a perfect match scores 0.5 instead of 1.0. The formula computes sum(weight * ratio) without normalizing by the sum of weights for active categories.

# Reproduction:
reqs = JobRequirements(title='Eng', required_skills=['Python', 'Docker'])
resume = {'keyskills': ['Python', 'Docker']}
gap = analyze_gaps(resume, reqs)
# gap.match_score = 0.5 ← should be 1.0

Fix: normalize by active weight sum:

active = [c for c in counts if c.total > 0]
if not active:
    return 0.0
weighted = sum(c.weight * (c.matched / c.total) for c in active)
return round(weighted / sum(c.weight for c in active), 3)

11. SECURITY: API key leaked in LLMConfig.__repr__ (core/llm/protocol.py:14)

Default dataclass __repr__ prints all fields including api_key. Any print(config), logging, or exception traceback exposes the raw key:

LLMConfig(api_key='sk-real-secret-key-1234567890', model='claude-sonnet-4-20250514', ...)

Fix: api_key: str = field(repr=False)

12. Substring matching false positives in skill checking (core/ats/tailor.py:118)

skill.lower() in resume_text matches substrings, not word boundaries. "Go" matches in "Django", "R" matches in nearly any resume, "SQL" matches in "PostgreSQL":

reqs = JobRequirements(title='Eng', required_skills=['Go'])
resume = {'keyskills': ['Django']}
gap = analyze_gaps(resume, reqs)
# gap.matched_required_skills = ['Go'] ← false positive

Fix: use word-boundary regex (\b anchors) instead of in.


Recommendation Update

Items 10-11 are worth fixing before merge. Item 12 is a known limitation of simple string matching and can be deferred if documented.

@athola
Copy link
Owner Author

athola commented Mar 6, 2026

Comprehensive PR Review (pr-review-toolkit) — Consolidated Findings

Four specialized agents analyzed PR #96 in parallel: silent-failure-hunter, type-design-analyzer, comment-analyzer, and code-simplifier. Findings are deduplicated and cross-referenced with the earlier sanctum review.


Critical Issues (3)

C1. response.choices[0].message.content can be Noneclient.py:70
[silent-failure-hunter, type-design]
LLM API can return None for content (tool calls, content filtering, empty responses). The # type: ignore[no-any-return] suppresses the type checker's warning. Downstream write_text(None) would raise TypeError.
→ Add explicit null check, raise LLMError with actionable message.

C2. Scoring formula doesn't normalize by active weightstailor.py:134
[sanctum review #10, confirmed]
Required-only perfect match scores 0.5 instead of 1.0. The _compute_score docstring promises "0.0 to 1.0" but this invariant is violated when categories have zero items.
→ Normalize by sum of active weights.

C3. API key leaked in LLMConfig.__repr__protocol.py:14
[type-design, sanctum review #11]
Default dataclass repr prints all fields including api_key. Any print(config), logging, or traceback exposes the raw key. LiteLLMProvider.config is also public, compounding exposure.
api_key: str = field(repr=False) + make provider config private.


Important Issues (8)

I1. _load_resume_yaml doesn't handle YAML parse errors_tailor.py:22-26
[silent-failure-hunter]
yaml.safe_load() can raise yaml.YAMLError for malformed files, and returns None for empty files. Neither case is handled. Falls through to bare except Exception safety net.
→ Catch yaml.YAMLError, validate return is dict.

I2. Raw LLM output written without validation_tailor.py:159-160
[silent-failure-hunter, code-simplifier, sanctum review #2]
LLM could return markdown-fenced YAML, refusals, partial output, or invalid YAML. User sees "Tailored resume saved" for unusable output.
→ Validate with yaml.safe_load() before writing, strip code fences.

I3. extract_profile_from_html silently returns empty dict for unrecognized HTMLlinkedin_fetcher.py:85-101
[silent-failure-hunter]
If HTML has no recognizable LinkedIn elements, returns {}. Error message "No profile data extracted" gives zero diagnostic information about why extraction failed.
→ Add diagnostic context (has <body>? has <section>? likely non-LinkedIn page?).

I4. Substring matching produces false positivestailor.py:118
[sanctum review #12]
skill.lower() in resume_text matches "Go" inside "Django", "R" inside any word. Inflates match scores.
→ Use word-boundary regex.

I5. Frozen dataclasses with mutable list[str] fieldstailor.py:17,42
[type-design]
JobRequirements and GapAnalysis are frozen=True but list fields are mutable. req.required_skills.append("x") succeeds silently, violating the frozen contract.
→ Convert to tuple[str, ...].

I6. Broad except Exception loses error type contextclient.py:78-83
[silent-failure-hunter, code-simplifier]
All litellm errors (auth, rate-limit, timeout, network) become generic LLMError. str(exc) loses the exception type. User can't distinguish "bad API key" from "network timeout".
→ Preserve exception type name: f"{type(exc).__name__}: {exc}".

I7. No logging in CLI error handlers_import.py:64-69, _tailor.py:89-94,164
[silent-failure-hunter]
All except blocks print to stdout but never call logger.error(). LLM failures, import errors, and file errors leave zero diagnostic trail in log files.
→ Add logger.error() before print().

I8. Missing UnicodeDecodeError handlinglinkedin_fetcher.py:297
[silent-failure-hunter, sanctum review #9]
read_text(encoding="utf-8") raises UnicodeDecodeError for non-UTF-8 files. LinkedIn exports from some browsers use latin-1 or windows-1252. Falls through to safety net.
→ Catch and re-raise as ValueError with actionable message.


Suggestions (12)

# Finding Source File:Line
S1 Dead gap parameter (typed object, never used) simplifier, comments _tailor.py:103
S2 Duplicate exception handlers (identical bodies) simplifier _import.py:64-69, _tailor.py:89-94
S3 Near-duplicate experience/education parsers simplifier linkedin_fetcher.py:183-244
S4 Near-duplicate _convert_experience/_convert_education simplifier linkedin.py:60-113
S5 _build_summary takes 8 params (PLR0913 suppressed) simplifier tailor.py:144
S6 Cross-module private import _handle_unexpected_error simplifier _import.py:31, _tailor.py:60
S7 complete_structured should use response_format parameter simplifier client.py:104
S8 Docstring inaccurate: "full text as keywords" comments _tailor.py:39-40
S9 read_linkedin_file docstring missing .htm support comments linkedin_fetcher.py:278
S10 protocol.py module docstring "no API keys" is ambiguous comments protocol.py:1-5
S11 ABC vs typing.Protocol naming mismatch type-design protocol.py:32
S12 Hardcoded default model string duplicated simplifier config.py:78, protocol.py:27

Strengths (Positive Observations)

  • FCIS architecture is sound: Core modules are genuinely pure. Shell layer handles all I/O correctly.
  • Feature gating pattern (requires_llm decorator + is_llm_available) provides actionable install hints.
  • Exception hierarchy is well-designed: LLMError vs LLMNotAvailableError enables different error handling paths.
  • GapAnalysis and JobRequirements field decomposition maps directly to real job posting structure.
  • Prompt templates as pure data in prompts.py is a clean separation.
  • 115 new tests with BDD-style naming, proper mocking of external dependencies.
  • API surface test guards against accidental symbol removal.
  • Zero AI slop in documentation and commit messages.

Recommended Action Plan

Before merge (high confidence):

  1. Fix C3: api_key: str = field(repr=False) (1 line)
  2. Fix C1: Add null check in client.py:70 (5 lines)
  3. Fix C2: Normalize scoring weights (5 lines)

Should fix (medium priority):
4. Fix I1: Catch yaml.YAMLError in _load_resume_yaml
5. Fix I2: Validate LLM YAML output before writing
6. Fix I6: Preserve exception type in LLMError message
7. Fix I7: Add logging to CLI error handlers

Can defer to follow-up:
8. I4 (substring matching), I5 (tuple fields), S1-S12


Review generated by: silent-failure-hunter, type-design-analyzer, comment-analyzer, code-simplifier

athola added 2 commits March 6, 2026 17:32
- Hide API key from LLMConfig repr to prevent accidental exposure
- Add null check for LLM response content, raise LLMError if empty
- Normalize scoring weights by active categories so required-only
  perfect matches score 1.0 instead of 0.5
- Preserve exception type name in LLMError messages for debuggability
- Catch yaml.YAMLError and validate dict type in _load_resume_yaml
- Strip markdown code fences and validate YAML before writing
  LLM-tailored output to disk
- Add logger.error() calls to all CLI error handlers in _tailor.py
  and _import.py for diagnostic traceability
Address PR review feedback on import placement:

- Extract _handle_unexpected_error to _errors.py to break circular
  imports between main.py and CLI subcommand modules
- Move lazy imports to top level in _tailor.py, _screen.py
- Use top-level try/except for optional litellm in client.py, gate.py
- Move lazy imports to top level in resume_extensions.py, loader.py
- Keep linkedin imports lazy in _import.py (optional [linkedin] dep)
- Remove docs/plans/ design doc (content captured in PR description)
- Fix missing import re in tailor.py _check_skills
- Update test mock paths to match new import locations
@athola
Copy link
Owner Author

athola commented Mar 10, 2026

PR Fix Summary — Import Cleanup (ab0fdf9)

All 9 review threads addressed and resolved.

What changed

Circular import fix: Extracted _handle_unexpected_error into shell/cli/_errors.py to break the main.py_import.py/_tailor.py/_screen.py circular dependency.

Top-level imports: Moved all lazy imports to module top level in:

  • _tailor.py_handle_unexpected_error, resolve_api_key, resolve_llm_config, is_llm_available, LiteLLMProvider
  • _screen.py_handle_unexpected_error
  • client.py, gate.pylitellm via top-level try/except ImportError
  • resume_extensions.pyGenerationMetadata, render_resume_latex_from_data, compile_tex_to_pdf, shutil, markdown
  • themes/loader.pyASSETS_ROOT

Intentionally kept lazy: linkedin_to_simple_resume and read_linkedin_file in _import.py — these depend on beautifulsoup4 which is an optional [linkedin] extra.

Docs: Removed docs/plans/2026-02-24-pyok-linkedin-jobpost-design.md (content in PR description).

Also fixed: Missing import re in tailor.py _check_skills function.

Verification

  • 1970 tests passing
  • All pre-commit hooks pass (ruff, mypy, bandit, ty, architecture, wheel install)
  • simple-resume --version works in clean wheel install

@athola athola merged commit 378ab8c into main Mar 10, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment