feat: PYOK LLM integration, LinkedIn import, job post tailoring (v0.3.0) by athola · Pull Request #96 · athola/simple-resume

athola · 2026-02-25T04:04:25Z

Summary

PYOK (Plug Your Own Key) LLM infrastructure (Feature: allow PYOK (pass your own key) so we can link to an LLM to assist with NLP for resume review and color scheme generation based upon provided company URL #8): Provider-agnostic LLM integration with LLMProvider protocol, LiteLLMProvider wrapper, automatic API key resolution from environment variables, and graceful degradation via requires_llm decorator when [llm] extra is not installed.
LinkedIn profile import (Feature: provide a --linkedin argument such that I can parse information to create or update my resume from LinkedIn #9): simple-resume import CLI subcommand that parses HTML and plain-text LinkedIn profile exports and converts them to simple-resume YAML format.
Job post tailoring (Feature: provide a --job-post argument that takes either a url or a file with a job post so we can parse it and use it to review or update our resume using a linked LLM #10): simple-resume tailor CLI subcommand with resume-job gap analysis (analyze_gaps), optional LLM-assisted tailoring via build_tailoring_prompt, and --report flag for gap-only analysis without LLM.

Fixes #8, Fixes #9, Fixes #10

Architecture

Follows Functional Core / Imperative Shell pattern:

Layer	New Modules	Purpose
Core	`core/llm/protocol.py`	`LLMProvider` ABC + `LLMConfig` dataclass
Core	`core/llm/prompts.py`	Prompt templates (pure data)
Core	`core/ats/tailor.py`	Gap analysis + tailoring prompt builder
Core	`core/importers/linkedin.py`	LinkedIn data → resume schema transform
Shell	`shell/llm/client.py`	`LiteLLMProvider` (litellm wrapper)
Shell	`shell/llm/config.py`	API key resolution (CLI, env vars)
Shell	`shell/llm/gate.py`	`requires_llm` decorator, availability check
Shell	`shell/importers/linkedin_fetcher.py`	HTML/text LinkedIn parser
Shell	`shell/cli/_import.py`	`import` CLI subcommand
Shell	`shell/cli/_tailor.py`	`tailor` CLI subcommand

New Optional Dependencies

[project.optional-dependencies]
llm = ["litellm>=1.0,<2.0"]
linkedin = ["beautifulsoup4>=4.12,<5.0"]
all = ["simple-resume[llm,linkedin]"]

Test plan

1970 tests passing (115 new), 94.81% coverage
All pre-commit hooks pass (ruff, mypy, bandit, ty, architecture checks)
Wheel install test passes
simple-resume --version reports 0.3.0

Design document covering LLM infrastructure (issue #8), LinkedIn import (issue #9), and job post tailoring (issue #10). Specifies FCIS architecture, feature gating via optional deps, and implementation phases.

Add pass-your-own-key (PYOK) infrastructure for LLM-powered features: - LLMProvider ABC and LLMConfig dataclass in core/llm/protocol.py - Prompt templates for tailoring, job extraction, LinkedIn, colors - LiteLLM wrapper client implementing LLMProvider protocol - API key resolution from CLI args and environment variables - requires_llm decorator for feature gating behind [llm] optional dep - LLMNotAvailableError and LLMError exception types - 58 unit tests covering all new modules

Phase 2a - LinkedIn Import: - core/importers/linkedin.py: pure transform from LinkedIn profile dict to simple-resume YAML schema - shell/importers/linkedin_fetcher.py: HTML and text file parsing with BeautifulSoup, supporting .html and .txt formats - 27 unit tests for importer and fetcher Phase 2b - Job Post Tailoring: - core/ats/tailor.py: gap analysis comparing resume skills/keywords against job requirements, weighted match scoring, and LLM prompt construction for resume tailoring - JobRequirements and GapAnalysis dataclasses - 14 unit tests for tailor module

- Add LLMProvider, LLMConfig, LLMError, LLMNotAvailableError exports - Add linkedin_to_simple_resume export - Add JobRequirements, GapAnalysis, analyze_gaps, build_tailoring_prompt - Update API surface contract test with new symbols - Bump version from 0.2.5 to 0.3.0 in pyproject.toml and __init__.py

athola · 2026-03-06T20:39:38Z

PR Review: feat: PYOK LLM integration, LinkedIn import, job post tailoring (v0.3.0)

Branch: pyok-0.3.0 → main | Files: 33 changed (+4,256) | Tests: 1970 passing (115 new)

Verdict: APPROVE with non-blocking suggestions

Scope Validation

PR implements 3 issues (#8, #9, #10) matching the design plan in docs/plans/2026-02-24-pyok-linkedin-jobpost-design.md. All features are in scope. Version bump to 0.3.0 is consistent across pyproject.toml, __init__.py, and runtime.

Architecture ✅

Clean FCIS (Functional Core / Imperative Shell) separation:

Core (pure): protocol.py, prompts.py, tailor.py, linkedin.py — zero I/O
Shell (side effects): client.py, config.py, gate.py, linkedin_fetcher.py, CLI handlers
Feature gating via requires_llm decorator is well-designed
Optional deps properly configured: [llm], [linkedin], [all]

Blocking Issues

None. The PR is clean and well-structured.

Non-Blocking Improvements

1. Missing LLM response None check (shell/llm/client.py:70)

return response.choices[0].message.content  # Could be None

If the model returns no content, this silently returns None instead of str. Consider:

content = response.choices[0].message.content
if content is None:
    raise LLMError("LLM returned empty response", provider="litellm", model=self.config.model)
return content

2. Tailored output not validated (shell/cli/_tailor.py:159-160)
The design doc specifies "validate tailored YAML" but the LLM output is written raw to disk. If the LLM returns malformed YAML or markdown-wrapped output, the user gets a broken file. Consider parsing with yaml.safe_load() before writing, or at minimum adding a comment that validation is deferred.

3. Dead parameter in _run_llm_tailoring (shell/cli/_tailor.py:102)
The gap parameter (typed as object) is accepted but never used in the function body. Either remove it or pass gap context to the tailoring prompt for better results.

4. complete_structured JSON extraction fragility (shell/llm/client.py:112)
LLMs frequently wrap JSON in markdown code fences (```json ... ```). A simple strip of code fences before json.loads() would improve reliability:

raw = raw.strip()
if raw.startswith("```"):
    raw = "\n".join(raw.split("\n")[1:-1])

5. [all] optional dep doesn't include [bert] (pyproject.toml)
The all extra is [llm,linkedin] but bert is also an optional feature. Either rename to v03 or include bert in all.

Slop Detection ✅

Commit messages: Clean, no AI attribution or slop markers
Changelog: Professional, factual, no filler language
PR description: Well-structured, accurate

Code Quality ✅

No code duplication detected across modules
Type annotations consistent throughout
Error handling follows project's exception hierarchy pattern
Lazy imports used correctly for optional dependencies
__all__ exports defined in every module

Security ✅

API keys stripped of whitespace, not logged
No prompt injection risk (user data in clearly delimited sections)
HTML parsing uses html.parser (not lxml) — safe default
yaml.safe_load() used for untrusted input
File paths validated before reading

Version Consistency ✅

Source	Version
`pyproject.toml`	0.3.0
`__init__.py`	0.3.0
Runtime	0.3.0
Changelog	0.3.0

athola · 2026-03-06T20:39:50Z

athola · 2026-03-06T20:41:32Z

PR Review Addendum - Deep Analysis Findings

Additional findings from parallel code review and test coverage analysis.

Elevated to Non-Blocking (from agent reviews)

6. Hard bs4 import in linkedin_fetcher.py:14
from bs4 import BeautifulSoup, Tag is a top-level import of an optional dependency. Unlike client.py which correctly defers litellm inside method bodies, linkedin_fetcher imports bs4 eagerly. This is mitigated because _import.py lazily imports linkedin_fetcher inside handle_import_command(), so CLI startup is safe. But when simple-resume import --linkedin is invoked without [linkedin], the error is a raw ModuleNotFoundError instead of a clean message. Should use the same lazy-import pattern as client.py or add an is_linkedin_available() guard.

7. is_current stored as string "true" instead of True (linkedin_fetcher.py:113)
_parse_date_range sets entry["is_current"] = "true" (string). The downstream _convert_experience in core/importers/linkedin.py uses _get_str(entry, "is_current") which happens to be truthy for any non-empty string, so it works by accident. Should be entry["is_current"] = True.

8. Duplicated default model string (shell/llm/config.py:78 and core/llm/protocol.py:27)
Both resolve_llm_config() and LLMConfig define "claude-sonnet-4-20250514" as the default model. This creates drift risk. resolve_llm_config could delegate to LLMConfig's field default instead.

9. Missing UnicodeDecodeError handling (linkedin_fetcher.py:297)
file_path.read_text(encoding="utf-8") for non-UTF-8 files raises UnicodeDecodeError, which is not caught by handle_import_command. It falls through to the bare except Exception safety net instead of producing a clean error message.

Test Coverage Gaps (from test analysis agent)

Priority	Gap	Severity
1	`_run_llm_tailoring` success path completely untested	High
2	`_run_llm_tailoring` `LLMError` exception path untested	High
3	HTML experience extraction (dates, `is_current`) not asserted	Medium
4	`resolve_llm_config` with `temperature=0.0` edge case	Low
5	Short keyword filter (`len > 3`) excludes "Go", "R" etc.	Low

The _run_llm_tailoring function is the primary value proposition of the tailor feature and has zero test coverage for its success path. This is the most significant test gap.

Fragile Test Pattern

test_tailor_no_api_key_without_report manually pops/restores env vars instead of using monkeypatch.delenv. Could leak state on failure.

athola · 2026-03-06T20:42:42Z

PR Review Addendum 2 - Core Domain Findings (Verified)

Three additional findings from deep core domain analysis, all verified with reproduction evidence.

Elevated Findings

10. SCORING BUG: _compute_score doesn't normalize by active weights (core/ats/tailor.py:134)

When a job posting has only required skills (no preferred, no keywords), a perfect match scores 0.5 instead of 1.0. The formula computes sum(weight * ratio) without normalizing by the sum of weights for active categories.

# Reproduction:
reqs = JobRequirements(title='Eng', required_skills=['Python', 'Docker'])
resume = {'keyskills': ['Python', 'Docker']}
gap = analyze_gaps(resume, reqs)
# gap.match_score = 0.5 ← should be 1.0

Fix: normalize by active weight sum:

active = [c for c in counts if c.total > 0]
if not active:
    return 0.0
weighted = sum(c.weight * (c.matched / c.total) for c in active)
return round(weighted / sum(c.weight for c in active), 3)

11. SECURITY: API key leaked in LLMConfig.__repr__ (core/llm/protocol.py:14)

Default dataclass __repr__ prints all fields including api_key. Any print(config), logging, or exception traceback exposes the raw key:

LLMConfig(api_key='sk-real-secret-key-1234567890', model='claude-sonnet-4-20250514', ...)

Fix: api_key: str = field(repr=False)

12. Substring matching false positives in skill checking (core/ats/tailor.py:118)

skill.lower() in resume_text matches substrings, not word boundaries. "Go" matches in "Django", "R" matches in nearly any resume, "SQL" matches in "PostgreSQL":

reqs = JobRequirements(title='Eng', required_skills=['Go'])
resume = {'keyskills': ['Django']}
gap = analyze_gaps(resume, reqs)
# gap.matched_required_skills = ['Go'] ← false positive

Fix: use word-boundary regex (\b anchors) instead of in.

Recommendation Update

Items 10-11 are worth fixing before merge. Item 12 is a known limitation of simple string matching and can be deferred if documented.

athola · 2026-03-06T22:43:25Z

Comprehensive PR Review (pr-review-toolkit) — Consolidated Findings

Four specialized agents analyzed PR #96 in parallel: silent-failure-hunter, type-design-analyzer, comment-analyzer, and code-simplifier. Findings are deduplicated and cross-referenced with the earlier sanctum review.

Critical Issues (3)

C1. response.choices[0].message.content can be None — client.py:70
[silent-failure-hunter, type-design]
LLM API can return None for content (tool calls, content filtering, empty responses). The # type: ignore[no-any-return] suppresses the type checker's warning. Downstream write_text(None) would raise TypeError.
→ Add explicit null check, raise LLMError with actionable message.

C2. Scoring formula doesn't normalize by active weights — tailor.py:134
[sanctum review #10, confirmed]
Required-only perfect match scores 0.5 instead of 1.0. The _compute_score docstring promises "0.0 to 1.0" but this invariant is violated when categories have zero items.
→ Normalize by sum of active weights.

C3. API key leaked in LLMConfig.__repr__ — protocol.py:14
[type-design, sanctum review #11]
Default dataclass repr prints all fields including api_key. Any print(config), logging, or traceback exposes the raw key. LiteLLMProvider.config is also public, compounding exposure.
→ api_key: str = field(repr=False) + make provider config private.

Important Issues (8)

I1. _load_resume_yaml doesn't handle YAML parse errors — _tailor.py:22-26
[silent-failure-hunter]
yaml.safe_load() can raise yaml.YAMLError for malformed files, and returns None for empty files. Neither case is handled. Falls through to bare except Exception safety net.
→ Catch yaml.YAMLError, validate return is dict.

I2. Raw LLM output written without validation — _tailor.py:159-160
[silent-failure-hunter, code-simplifier, sanctum review #2]
LLM could return markdown-fenced YAML, refusals, partial output, or invalid YAML. User sees "Tailored resume saved" for unusable output.
→ Validate with yaml.safe_load() before writing, strip code fences.

I3. extract_profile_from_html silently returns empty dict for unrecognized HTML — linkedin_fetcher.py:85-101
[silent-failure-hunter]
If HTML has no recognizable LinkedIn elements, returns {}. Error message "No profile data extracted" gives zero diagnostic information about why extraction failed.
→ Add diagnostic context (has <body>? has <section>? likely non-LinkedIn page?).

I4. Substring matching produces false positives — tailor.py:118
[sanctum review #12]
skill.lower() in resume_text matches "Go" inside "Django", "R" inside any word. Inflates match scores.
→ Use word-boundary regex.

I5. Frozen dataclasses with mutable list[str] fields — tailor.py:17,42
[type-design]
JobRequirements and GapAnalysis are frozen=True but list fields are mutable. req.required_skills.append("x") succeeds silently, violating the frozen contract.
→ Convert to tuple[str, ...].

I6. Broad except Exception loses error type context — client.py:78-83
[silent-failure-hunter, code-simplifier]
All litellm errors (auth, rate-limit, timeout, network) become generic LLMError. str(exc) loses the exception type. User can't distinguish "bad API key" from "network timeout".
→ Preserve exception type name: f"{type(exc).__name__}: {exc}".

I7. No logging in CLI error handlers — _import.py:64-69, _tailor.py:89-94,164
[silent-failure-hunter]
All except blocks print to stdout but never call logger.error(). LLM failures, import errors, and file errors leave zero diagnostic trail in log files.
→ Add logger.error() before print().

I8. Missing UnicodeDecodeError handling — linkedin_fetcher.py:297
[silent-failure-hunter, sanctum review #9]
read_text(encoding="utf-8") raises UnicodeDecodeError for non-UTF-8 files. LinkedIn exports from some browsers use latin-1 or windows-1252. Falls through to safety net.
→ Catch and re-raise as ValueError with actionable message.

Suggestions (12)

#	Finding	Source	File:Line
S1	Dead `gap` parameter (typed `object`, never used)	simplifier, comments	`_tailor.py:103`
S2	Duplicate exception handlers (identical bodies)	simplifier	`_import.py:64-69`, `_tailor.py:89-94`
S3	Near-duplicate experience/education parsers	simplifier	`linkedin_fetcher.py:183-244`
S4	Near-duplicate `_convert_experience`/`_convert_education`	simplifier	`linkedin.py:60-113`
S5	`_build_summary` takes 8 params (PLR0913 suppressed)	simplifier	`tailor.py:144`
S6	Cross-module private import `_handle_unexpected_error`	simplifier	`_import.py:31`, `_tailor.py:60`
S7	`complete_structured` should use `response_format` parameter	simplifier	`client.py:104`
S8	Docstring inaccurate: "full text as keywords"	comments	`_tailor.py:39-40`
S9	`read_linkedin_file` docstring missing `.htm` support	comments	`linkedin_fetcher.py:278`
S10	`protocol.py` module docstring "no API keys" is ambiguous	comments	`protocol.py:1-5`
S11	ABC vs `typing.Protocol` naming mismatch	type-design	`protocol.py:32`
S12	Hardcoded default model string duplicated	simplifier	`config.py:78`, `protocol.py:27`

Strengths (Positive Observations)

FCIS architecture is sound: Core modules are genuinely pure. Shell layer handles all I/O correctly.
Feature gating pattern (requires_llm decorator + is_llm_available) provides actionable install hints.
Exception hierarchy is well-designed: LLMError vs LLMNotAvailableError enables different error handling paths.
GapAnalysis and JobRequirements field decomposition maps directly to real job posting structure.
Prompt templates as pure data in prompts.py is a clean separation.
115 new tests with BDD-style naming, proper mocking of external dependencies.
API surface test guards against accidental symbol removal.
Zero AI slop in documentation and commit messages.

Recommended Action Plan

Before merge (high confidence):

Fix C3: api_key: str = field(repr=False) (1 line)
Fix C1: Add null check in client.py:70 (5 lines)
Fix C2: Normalize scoring weights (5 lines)

Should fix (medium priority):
4. Fix I1: Catch yaml.YAMLError in _load_resume_yaml
5. Fix I2: Validate LLM YAML output before writing
6. Fix I6: Preserve exception type in LLMError message
7. Fix I7: Add logging to CLI error handlers

Can defer to follow-up:
8. I4 (substring matching), I5 (tuple fields), S1-S12

Review generated by: silent-failure-hunter, type-design-analyzer, comment-analyzer, code-simplifier

- Hide API key from LLMConfig repr to prevent accidental exposure - Add null check for LLM response content, raise LLMError if empty - Normalize scoring weights by active categories so required-only perfect matches score 1.0 instead of 0.5 - Preserve exception type name in LLMError messages for debuggability

- Catch yaml.YAMLError and validate dict type in _load_resume_yaml - Strip markdown code fences and validate YAML before writing LLM-tailored output to disk - Add logger.error() calls to all CLI error handlers in _tailor.py and _import.py for diagnostic traceability

docs/plans/2026-02-24-pyok-linkedin-jobpost-design.md

src/simple_resume/shell/cli/_import.py

src/simple_resume/shell/cli/_tailor.py

src/simple_resume/shell/llm/client.py

src/simple_resume/shell/llm/gate.py

Address PR review feedback on import placement: - Extract _handle_unexpected_error to _errors.py to break circular imports between main.py and CLI subcommand modules - Move lazy imports to top level in _tailor.py, _screen.py - Use top-level try/except for optional litellm in client.py, gate.py - Move lazy imports to top level in resume_extensions.py, loader.py - Keep linkedin imports lazy in _import.py (optional [linkedin] dep) - Remove docs/plans/ design doc (content captured in PR description) - Fix missing import re in tailor.py _check_skills - Update test mock paths to match new import locations

athola · 2026-03-10T15:51:31Z

PR Fix Summary — Import Cleanup (`ab0fdf9`)

All 9 review threads addressed and resolved.

What changed

Circular import fix: Extracted _handle_unexpected_error into shell/cli/_errors.py to break the main.py ↔ _import.py/_tailor.py/_screen.py circular dependency.

Top-level imports: Moved all lazy imports to module top level in:

_tailor.py — _handle_unexpected_error, resolve_api_key, resolve_llm_config, is_llm_available, LiteLLMProvider
_screen.py — _handle_unexpected_error
client.py, gate.py — litellm via top-level try/except ImportError
resume_extensions.py — GenerationMetadata, render_resume_latex_from_data, compile_tex_to_pdf, shutil, markdown
themes/loader.py — ASSETS_ROOT

Intentionally kept lazy: linkedin_to_simple_resume and read_linkedin_file in _import.py — these depend on beautifulsoup4 which is an optional [linkedin] extra.

Docs: Removed docs/plans/2026-02-24-pyok-linkedin-jobpost-design.md (content in PR description).

Also fixed: Missing import re in tailor.py _check_skills function.

Verification

1970 tests passing
All pre-commit hooks pass (ruff, mypy, bandit, ty, architecture, wheel install)
simple-resume --version works in clean wheel install

athola added 8 commits February 24, 2026 21:23

docs: add PYOK/LinkedIn/job-post design plan for v0.3.0

273c9c2

Design document covering LLM infrastructure (issue #8), LinkedIn import (issue #9), and job post tailoring (issue #10). Specifies FCIS architecture, feature gating via optional deps, and implementation phases.

feat: add import and tailor CLI subcommands (#9, #10)

ddeec0b

docs: add v0.3.0 changelog for PYOK, import, and tailor features

3656eaf

fix: resolve Pyright type checking warnings in v0.3.0 modules

3d8068d

chore: add pyrightconfig.json for IDE type checking

ce55b90

athola added 2 commits March 6, 2026 17:32

athola commented Mar 10, 2026

View reviewed changes

athola merged commit 378ab8c into main Mar 10, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: PYOK LLM integration, LinkedIn import, job post tailoring (v0.3.0)#96

feat: PYOK LLM integration, LinkedIn import, job post tailoring (v0.3.0)#96
athola merged 11 commits intomainfrom
pyok-0.3.0

athola commented Feb 25, 2026 •

edited

Loading

Uh oh!

athola commented Mar 6, 2026

Uh oh!

athola commented Mar 6, 2026

Uh oh!

athola commented Mar 6, 2026

Uh oh!

athola commented Mar 6, 2026

Uh oh!

athola commented Mar 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

athola commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

athola commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

New Optional Dependencies

Test plan

Uh oh!

athola commented Mar 6, 2026

PR Review: feat: PYOK LLM integration, LinkedIn import, job post tailoring (v0.3.0)

Verdict: APPROVE with non-blocking suggestions

Scope Validation

Architecture ✅

Blocking Issues

Non-Blocking Improvements

Slop Detection ✅

Code Quality ✅

Security ✅

Version Consistency ✅

Uh oh!

athola commented Mar 6, 2026

Test Plan

Automated Verification

Manual Verification Checklist

Uh oh!

athola commented Mar 6, 2026

PR Review Addendum - Deep Analysis Findings

Elevated to Non-Blocking (from agent reviews)

Test Coverage Gaps (from test analysis agent)

Fragile Test Pattern

Uh oh!

athola commented Mar 6, 2026

PR Review Addendum 2 - Core Domain Findings (Verified)

Elevated Findings

Recommendation Update

Uh oh!

athola commented Mar 6, 2026

Comprehensive PR Review (pr-review-toolkit) — Consolidated Findings

Critical Issues (3)

Important Issues (8)

Suggestions (12)

Strengths (Positive Observations)

Recommended Action Plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

athola commented Mar 10, 2026

PR Fix Summary — Import Cleanup (ab0fdf9)

What changed

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

athola commented Feb 25, 2026 •

edited

Loading

PR Fix Summary — Import Cleanup (`ab0fdf9`)