InsightGraph

InsightGraph is a local-first AI career and market intelligence workspace for technical job seekers, researchers, and builders. It ingests web pages, job posts, company pages, PDFs, GitHub-style sources, and pasted documents, then turns them into cited RAG answers, a knowledge graph, source trust controls, company dossiers, skill velocity charts, watchlists, and an application shortlist.

The current product wedge is practical and narrow: help a user answer which companies to track, which skills are rising, which roles are showing up, what changed recently, and whether the sources behind an answer are trustworthy.

What It Can Do Today

Crawl and ingest real sources from topic search, URLs, and pasted text.
Extract clean chunks, entities, relationships, skills, roles, companies, tools, locations, and evidence snippets.
Store chunks in Qdrant for semantic retrieval.
Store structured data in Postgres and graph relationships for exploration.
Provide cited chat answers with retrieved evidence and confidence metadata.
Show an interactive graph with edge evidence.
Score source reliability and allow trust, neutral, or untrusted source overrides.
Build AI career market maps from indexed evidence.
Show company dossiers with hiring signals, stack, skills, sources, and evidence.
Compare pasted resume text against market demand.
Track skill velocity from recent ingestions.
Save watchlists and generate change briefs.
Recommend and shortlist jobs from indexed evidence.
Support local OSS LLM defaults and optional Anthropic Claude generation with a user-provided key.

Sample Graph Outputs

These examples were captured from a local run after ingesting AI/RAG hiring and market sources. They are sample outputs, not static mock data.

Graph Explorer UI

Extracted Entity Relationship Snapshot

Honest Limits

InsightGraph is not a magic job search engine and does not guarantee employment outcomes. Its quality depends on source quality. Targeted company career pages, real job posts, GitHub repos, and trusted articles produce much better results than broad noisy job-board pages.

Scheduled watchlists are not fully automated yet; manual watchlist refresh works now. Claude is optional and only used for chat/generation, not embeddings.

Tech Stack

Layer	Technology
Web UI	Next.js App Router, TypeScript, Tailwind CSS, Cytoscape.js
API	FastAPI, Pydantic, SQLAlchemy, Alembic
Worker	Celery, Redis
Crawling	Crawl4AI, Playwright-compatible crawler stack
Search discovery	SearxNG by default, optional Brave, Tavily, SerpAPI
Database	Postgres
Vector search	Qdrant
Graph store	Memgraph
Object storage	MinIO
Observability	Phoenix / OpenTelemetry
Local AI	Ollama default, vLLM optional
Paid generation	Anthropic Claude Messages API, optional

Repository Layout

.
+-- apps/web                  # Single local Next.js TypeScript web app
+-- services/api              # FastAPI backend, schemas, providers, RAG, migrations
+-- services/worker           # Celery ingestion worker
+-- infra                     # Docker Compose, Dockerfiles, service config
+-- pipelines                 # Future Airflow pipeline scaffold
+-- docs/assets               # README and project visual assets
+-- package.json              # Root scripts
+-- .env.example              # Local environment template

Prerequisites

macOS, Linux, or Windows with WSL2.
Node.js 20 or newer.
npm.
Docker Desktop or Docker Engine.
Optional: Ollama if you want local model generation outside Docker.
Optional: Anthropic API key if you want Claude responses.
Optional: Brave, Tavily, or SerpAPI key if you want hosted search in addition to SearxNG.

Check basics:

node --version
npm --version
docker --version
docker compose version

Quick Start

From the project root:

cd "/Users/karanchandradey/Downloads/AI Portfolio InsightGraph"
npm install
cp .env.example .env

Start the backend stack:

npm run dev:stack

This starts Postgres, Redis, Qdrant, Memgraph, MinIO, Phoenix, SearxNG, FastAPI, and the Celery worker.

In a second terminal, start the single local Next.js web app:

npm run dev

Open:

http://localhost:3001

Default Local URLs

Service	URL
Web UI	`http://localhost:3001`
API	`http://localhost:8000`
API docs	`http://localhost:8000/docs`
Qdrant	`http://localhost:6333`
Memgraph Bolt	`localhost:7687`
MinIO console	`http://localhost:9001`
Phoenix	`http://localhost:6006`
SearxNG	`http://localhost:8080`

MinIO default local credentials:

user: insightgraph
password: insightgraph-secret

Environment Setup

Copy the template:

cp .env.example .env

Important defaults:

WEB_ORIGIN=http://localhost:3001
SEARCH_PROVIDER=searxng,brave,tavily,serpapi
SEARXNG_URL=http://searxng:8080
DEFAULT_LLM_PROVIDER=ollama
OLLAMA_MODEL=qwen2.5:7b-instruct
OLLAMA_EMBEDDING_MODEL=embeddinggemma

For local development, the API and worker run inside Docker, so service URLs point to Docker service names such as postgres, qdrant, and searxng.

Running The App

Recommended Development Mode

Terminal 1:

npm run dev:stack

Terminal 2:

npm run dev

The web app uses the same-origin /v1/* gateway and proxies to FastAPI when the backend is running.

Containerized Web App

The Docker web image exists, but it is not used by default because the project should have one local Next.js app during development.

Run the containerized web profile only when intentionally testing the web image:

npm run dev:stack:web

Stop The Stack

docker compose -f infra/docker-compose.yml down

To stop and remove local Docker volumes, which deletes indexed data:

docker compose -f infra/docker-compose.yml down -v

Using InsightGraph

1. Ingest Sources

Use the left sidebar.

You can ingest:

A topic, such as AI infra startups hiring RAG engineers.
Specific URLs.
Pasted source text.

Click Queue ingestion, then open the Jobs tab to watch progress.

What happens:

Search discovery finds sources when a topic is provided.
The worker crawls and extracts text.
Content is converted to markdown-like clean text.
Documents are chunked.
Chunks are embedded and stored in Qdrant.
Entities and relationships are extracted.
Graph evidence and trust metadata are stored.

2. Use Career Intelligence

Open the Career tab.

You can:

Save a target profile.
Set target role, keywords, preferred stack, locations, and resume text.
Inspect skill velocity.
Open company dossiers.
See hiring roles, stack, skills, locations, sources, and evidence.
Run resume-to-market gap scoring.
Get evidence-backed project recommendations.

Good target profile examples:

Target role: AI/RAG engineer
Keywords: RAG, agent, retrieval, evaluation, LLM infra
Preferred stack: Qdrant, LangGraph, FastAPI, Docker, OpenTelemetry
Locations: Remote, Bangalore, San Francisco

3. Use Applications Workspace

Open the Applications tab.

You can:

Review job recommendations scored from indexed evidence.
See matched and missing skills.
Add jobs to a persistent shortlist.
Track status: tracking, applied, interviewing, rejected, archived.

This is intentionally not an auto-apply tool. It is a decision workspace.

4. Use Chat With Citations

Open the Chat tab and ask questions like:

Show me companies hiring for agentic RAG roles, compare their stack, and map skills I need.

The answer includes:

Provider and model used.
Retrieved evidence.
Citation links.
Source reliability.
Trust status.
Retrieval strategy and confidence metadata.

5. Inspect The Graph

Open the Graph tab.

Use it to inspect:

Companies.
Roles.
Tools.
Skills.
Locations.
Relationships.
Evidence snippets behind graph edges.

6. Manage Source Trust

Open the Sources tab.

You can mark each source as:

trusted
neutral
untrusted

Trust status affects source reliability scoring and retrieval ranking.

7. Use Watchlists

Open the Watchlists tab.

You can:

Save a market topic.
Run a manual refresh.
Generate a change brief.
See new companies, roles, skills, documents, and weekly skill velocity.

Example watchlist:

Name: Agentic RAG hiring
Topic: AI infra startups hiring RAG agent engineers
Pages: 20

AI Provider Setup

Local / Free Defaults

The app is designed to run without paid APIs. If no generation provider works, chat falls back to extractive answers from indexed evidence.

Ollama settings:

DEFAULT_LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://ollama:11434
OLLAMA_MODEL=qwen2.5:7b-instruct
OLLAMA_EMBEDDING_MODEL=embeddinggemma

Start the optional Ollama container profile:

docker compose -f infra/docker-compose.yml --profile local-models up ollama

Then pull models inside the Ollama environment if needed:

ollama pull qwen2.5:7b-instruct
ollama pull embeddinggemma

Anthropic Claude

Claude support is optional and paid. It is used for generation/chat, not embeddings.

Option 1: environment variables in .env:

ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
ANTHROPIC_BASE_URL=https://api.anthropic.com
ANTHROPIC_VERSION=2023-06-01

Option 2: store a workspace key from the UI:

Open the left sidebar.
Paste the Claude API key.
Confirm the model.
Click Store key.

Keys are encrypted before being stored in Postgres and are never exposed to the browser bundle.

vLLM

Run the optional vLLM profile:

VLLM_MODEL=Qwen/Qwen2.5-7B-Instruct docker compose -f infra/docker-compose.yml --profile vllm up vllm

Search Provider Setup

Default search discovery uses local SearxNG:

SEARCH_PROVIDER=searxng,brave,tavily,serpapi
SEARXNG_URL=http://searxng:8080

Optional providers:

BRAVE_SEARCH_API_KEY=...
TAVILY_API_KEY=...
SERPAPI_API_KEY=...

Provider order is controlled by SEARCH_PROVIDER:

SEARCH_PROVIDER=searxng
SEARCH_PROVIDER=brave,searxng
SEARCH_PROVIDER=tavily,serpapi,searxng

API Overview

Core:

GET  /v1/health
GET  /v1/models
POST /v1/llm/test-key
POST /v1/provider-credentials

Ingestion and research:

POST /v1/ingestions
GET  /v1/jobs
GET  /v1/documents
POST /v1/search
POST /v1/chat
GET  /v1/graph
GET  /v1/trends
GET  /v1/entities
GET  /v1/evidence

Trust and cleanup:

PATCH /v1/documents/{id}/trust
PATCH /v1/entities/{id}
POST  /v1/entities/{id}/merge
POST  /v1/pins
GET   /v1/pins

Career intelligence:

GET  /v1/career/profile
POST /v1/career/profile
GET  /v1/career/market-map
GET  /v1/career/company-dossiers
GET  /v1/career/company-dossiers/{company_id}
GET  /v1/career/skill-velocity
POST /v1/career/skill-gap
GET  /v1/career/job-recommendations
GET  /v1/career/shortlist
POST /v1/career/shortlist
PATCH /v1/career/shortlist/{item_id}

Watchlists:

POST /v1/watchlists
GET  /v1/watchlists
POST /v1/watchlists/{id}/run
GET  /v1/watchlists/{id}/brief
GET  /v1/watchlists/{id}/weekly-brief

API Examples

Health

curl http://localhost:8000/v1/health

Queue An Ingestion

curl -X POST http://localhost:8000/v1/ingestions \
  -H "content-type: application/json" \
  -d '{
    "workspace_id": "default",
    "topic": "AI infra startups hiring RAG agent engineers",
    "max_pages": 20,
    "crawl_depth": 1,
    "urls": [],
    "pasted_sources": []
  }'

Ask A Cited Question

curl -X POST http://localhost:8000/v1/chat \
  -H "content-type: application/json" \
  -d '{
    "workspace_id": "default",
    "question": "Which companies are hiring for agentic RAG roles and what stack do they use?",
    "provider": "extractive",
    "model": "extractive-local",
    "retrieval_limit": 12
  }'

Save A Career Profile

curl -X POST http://localhost:8000/v1/career/profile \
  -H "content-type: application/json" \
  -d '{
    "workspace_id": "default",
    "name": "Primary target",
    "target_role": "AI/RAG engineer",
    "target_keywords": ["RAG", "agent", "retrieval", "evaluation"],
    "preferred_stack": ["Qdrant", "LangGraph", "FastAPI", "Docker"],
    "preferred_locations": ["Remote"],
    "seniority": "mid",
    "resume_text": "Python, FastAPI, LangChain, Docker..."
  }'

Get Company Dossiers

curl "http://localhost:8000/v1/career/company-dossiers?workspace_id=default&limit=10"

Get Job Recommendations

curl "http://localhost:8000/v1/career/job-recommendations?workspace_id=default&limit=20"

Testing And Verification

Run backend unit tests plus web typecheck:

npm test

Run only web typecheck:

npm run typecheck

Run production web build:

npm run build

If Turbopack fails with an internal local port permission error in a sandboxed environment, rerun the build in a normal terminal.

Troubleshooting

Docker API Not Running

Error:

failed to connect to the docker API

Fix:

Start Docker Desktop.
Wait until Docker says it is running.
Run:

npm run dev:stack

Port 3001 Already In Use

Error:

EADDRINUSE: address already in use :::3001

Find the process:

lsof -i :3001

Stop it, or run the web app on another port manually:

npm --workspace apps/web run dev -- -p 3002

API Is Unreachable From The UI

Check:

curl http://localhost:8000/v1/health

If it fails, start the backend:

npm run dev:stack

Queue Ingestion Does Nothing

Check the Jobs tab first. Then inspect worker logs:

docker compose -f infra/docker-compose.yml logs --tail=160 worker
docker compose -f infra/docker-compose.yml logs --tail=160 api

Common causes:

Docker stack is not running.
Worker is not connected to Redis.
Search provider has no results.
Source pages block crawling.
Ingestion is still queued or running.

Reset Local Data

This deletes Postgres, Qdrant, Memgraph, MinIO, Redis, and Phoenix volumes:

docker compose -f infra/docker-compose.yml down -v

Then restart:

npm run dev:stack

Development Notes

The single local web app is apps/web.
The backend applies Alembic migrations on container startup.
The API creates the default workspace automatically.
The Next.js app calls /v1/* through the same-origin gateway.
The Docker web profile is optional and should not be run alongside the local dev app unless you intentionally want a containerized web check.

Product Positioning

InsightGraph is most useful as a personal research terminal for AI career strategy:

Track a niche market.
Identify companies and roles.
Understand repeated skill demand.
Compare your resume against real evidence.
Build a focused application shortlist.
Inspect source trust before acting.

It is not a replacement for LinkedIn, an auto-apply bot, or a polished commercial job board. It is an evidence-backed decision layer you control locally.

Developer Credit

Built by Karan Chandra Dey [K28], Founder and CEO @ K28.

Website: k28art.space

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.playwright-cli		.playwright-cli
apps/web		apps/web
data		data
docs/assets		docs/assets
infra		infra
logs		logs
pipelines		pipelines
services		services
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

InsightGraph

What It Can Do Today

Sample Graph Outputs

Graph Explorer UI

Extracted Entity Relationship Snapshot

Honest Limits

Tech Stack

Repository Layout

Prerequisites

Quick Start

Default Local URLs

Environment Setup

Running The App

Recommended Development Mode

Containerized Web App

Stop The Stack

Using InsightGraph

1. Ingest Sources

2. Use Career Intelligence

3. Use Applications Workspace

4. Use Chat With Citations

5. Inspect The Graph

6. Manage Source Trust

7. Use Watchlists

AI Provider Setup

Local / Free Defaults

Anthropic Claude

vLLM

Search Provider Setup

API Overview

API Examples

Health

Queue An Ingestion

Ask A Cited Question

Save A Career Profile

Get Company Dossiers

Get Job Recommendations

Testing And Verification

Troubleshooting

Docker API Not Running

Port 3001 Already In Use

API Is Unreachable From The UI

Queue Ingestion Does Nothing

Reset Local Data

Development Notes

Product Positioning

Developer Credit

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages