IntentForge Roadmap
Privacy-first, intent-driven search engine with zero-trust architecture.
Vision
IntentForge will be the only search engine that truly respects users — no tracking, no manipulation, no corporate control. Results are determined solely by relevance and intent, not by who pays the most. Users will be able to talk directly to the ranking algorithm to personalize results, and the system will be transparent about how rankings work.
Current State
✅ Operational
| Feature | Status | Notes |
|---|---|---|
| Core hybrid search | ✅ Live | BM25 + ONNX semantic search |
| Intent classification | ✅ Live | 4 query types with adaptive semantic ratio |
| Multi-tier caching | ✅ Live | L1 (LRU) + Redis |
| Meta-search | ✅ Live | 9 providers simultaneously |
| Image search | ✅ Live | Zero-bandwidth HTML metadata indexing |
| Video search | ✅ Live | 6 sources (YouTube, Piped, Invidious, etc.) |
| Anti-detection | ✅ Live | TLS fingerprinting, Cloudflare bypass |
| Content extraction | ✅ Live | Trafilatura integration |
| Cross-encoder reranking | ✅ Live | ms-marco-MiniLM-L6-v2 |
| Self-improvement | ✅ Live | Background gap analysis + crawling |
🔄 In Progress
| Feature | Status | Notes |
|---|---|---|
| Courses tab | 🔄 Planning | Docs + video roadmap system |
| Personalization layer | 🔄 Designing | Direct user → algorithm communication |
| Zero-trust request validation | 🔄 Designing | Cryptographic request signing |
Roadmap
Phase 1: Intent Enhancement
1.1 Smarter Intent Detection for Vague Queries
Problem: Users type vague things like "rust" or "python" and get arbitrary results.
Solution: When a query is too vague to determine intent, present a choice menu:
I see "rust" could mean several things. What do you want?
🤖 [Rust Programming Language] — The systems programming language
🦀 [Rust (car brand)] — Ford's pickup truck
🧪 [Rust (chemistry)] — Iron oxide, corrosion
📊 [Rust (framework)] — Rust-based web frameworks (Actix, Axum)
Implementation:
- Add
is_vague_query()detection inintent_classifier.rs - Create
/api/intent/disambiguateendpoint - Return disambiguation choices with icons, descriptions, intent hints
- Frontend renders as interactive cards
Files affected: src/intent_classifier.rs, src/api/mod.rs
1.2 Context-Aware Intent Memory
Problem: Each search is treated independently. No learning from user patterns.
Solution: Lightweight session memory that remembers intent patterns:
User: "how to fix"
→ Intent: Informational / "how-to-fix" subtype
→ Stores: {query: "how to fix", intent: "tutorial", skill_level: "intermediate"}
User: "rust"
→ Remembers previous "how to fix" pattern
→ Suggests: "How to fix Rust memory leaks" (not the car)
Implementation:
- SQLite session store (privacy-local, not server-side)
- Intent pattern extraction over session lifetime
- Suggest refinements based on history
Phase 2: Personalization
2.1 Direct Algorithm Communication
Problem: Users can't tell the search engine what they actually want.
Solution: A personalization DSL — natural language instructions to the ranker:
> "I prefer recent papers over classic references"
> "Show me tutorials, not product pages"
> "Ignore anything from example.com"
> "I want beginner-friendly content"
> "More video results, fewer articles"
Implementation:
- New
/api/searchparameter:preferences: string - Preference parser in ranking layer:
- "recent" → recency boost
- "tutorials" → content_type=tutorial filter
- "ignore domain" → domain exclusion filter
- "beginner" → skill_level=beginner boost
- "video" → media_type=video boost
- Preferences stored in local session cookie (no server-side profile)
- Preference syntax validated and sanitized
Example API call:
GET /search?q=rust+error+handling&preferences=no+blogs,+prefer+docs+and+videos
2.2 Zero-Trust Request Layer
Problem: Attackers could theoretically manipulate search results via request injection.
Solution: Cryptographic request signing for internal service communication:
User Request → [Sign with shared secret] → API Gateway
↓
[Verify signature]
↓
Route to services
Implementation:
- HMAC-SHA256 signing of request parameters
- Shared secret per service pair
- Replay attack prevention with nonces + timestamps
- All internal service calls signed
- Public-facing API remains unauthenticated (but rate-limited)
Files affected: services/search-api/src/proxy.rs, new src/zero_trust/ module
Phase 3: Latency Improvements
3.1 Streaming Response + Progressive Results
Problem: User waits 800ms+ before seeing any results.
Solution: Stream results as they're discovered:
0ms → Query analysis (intent, disambiguation check)
50ms → Stream: "Here are results for 'rust'..."
100ms → Stream: [Result 1, 2, 3...] (from cache/index)
200ms → Stream: [Result 4, 5, 6...] (from meta-search)
500ms → Stream: "Meta-search found 3 more..."
800ms → Final: [All results sorted + re-ranked]
Implementation:
- SSE (Server-Sent Events) endpoint:
/search/stream - Parallel discovery: index + meta-search fire simultaneously
- Progressive result emission (yield as found, don't wait for all)
- Frontend renders incrementally
Files affected: src/api/mod.rs, new streaming endpoints
3.2 Persistent Vector Cache
Problem: Embedding computation is expensive (~50ms per query).
Solution: Pre-compute embeddings for common terms + cache all query embeddings:
Query: "rust error handling"
→ Check embedding cache (Redis)
→ HIT: Use cached vector (0.1ms)
→ MISS: Compute ONNX embedding (50ms), store in cache (TTL: 24h)
Implementation:
- Redis vector store (128-dim float32 = 512 bytes per embedding)
- Cache key: SHA256(normalized_query)
- Background job: pre-compute embeddings for top-1000 common terms
- Embedding cache in
services/query_layer/app/core/embedding_service.py
3.3 Edge Cache for Popular Queries
Problem: Trending topics cause thundering herd on internal services.
Solution: CDN-style edge caching for queries matching trending patterns:
Query: "Apple Event 2025"
→ Edge cache hit (Cloudflare Workers)
→ Response: <5ms
→ Background refresh cache
Implementation:
- Cloudflare Worker as edge cache (free tier: 100k requests/day)
- Cache popular queries (top 1000 by frequency)
- Stale-while-revalidate: serve cached, refresh in background
- Vary: by intent type + region
Phase 4: E-Commerce
4.1 Direct Product Links
Problem: Search for "laptop" returns blog posts about "best laptops 2024", not products.
Solution: Intent-aware product extraction:
Query: "macbook pro 14 inch price"
→ Intent: Transactional (user wants to buy)
→ Source: Direct product links (Amazon, Best Buy, Apple Store)
→ No blog posts unless explicitly requested
Query: "macbook pro review"
→ Intent: Informational / ProductReview
→ Source: Blog reviews, YouTube reviews
→ No direct purchase links
Implementation:
- New
/api/productsendpoint - Product-specific providers: Amazon PA-API, Google Shopping
- Intent detection for transactional queries → route to product index
- Price comparison: extract + display price from product pages
- Affiliate link sanitization (remove tracking params)
Files affected: src/api/mod.rs, new services/search-api/src/providers/shopping.rs
4.2 Price Comparison Engine
Problem: Users can't compare prices across stores without manual checking.
Solution: Unified price index:
{
"product": "MacBook Pro 14-inch M3",
"stores": [
{"store": "Apple", "url": "https://store.apple.com/...", "price": 1999},
{"store": "Amazon", "url": "https://amazon.com/...", "price": 1899},
{"store": "Best Buy", "url": "https://bestbuy.com/...", "price": 1949}
],
"lowest_price": 1899,
"highest_price": 1999,
"currency": "USD"
}
Implementation:
- Product pages crawled → price extracted (Trafilatura + custom extraction)
- Price normalization: handle "From $X", "$X.99", "€X" formats
- Currency conversion via API ( exchangerate-api.com free tier)
- New schema field:
price_infoin search results
Phase 5: Courses Tab
5.1 Roadmap-Based Learning Paths
Problem: Searching "react" returns random tutorials. No structured learning path.
Solution: Courses tab with goal-based roadmaps:
User selects: "I want to learn React development"
Roadmap: React Developer Path (Beginner → Expert)
Week 1-2: Foundations
📹 [JavaScript Fundamentals] — 4hr video course
📖 [MDN JavaScript Guide] — Documentation
🧪 [JavaScript Exercises] — Interactive practice
Week 3-4: React Basics
📹 [React Official Tutorial] — 3hr interactive
📹 [React Fundamentals by Kent C. Dodds] — 6hr course
📖 [React Beta Docs] — Official documentation
Week 5-8: Advanced Patterns
📹 [Advanced React Patterns] — 4hr course
📹 [React Performance] — 2hr deep dive
🧪 [React Coding Challenges] — Practice problems
Week 9+: Real-world Projects
📹 [Build a SaaS with React] — 8hr project course
📹 [React + TypeScript] — 3hr course
Implementation:
- New
/coursestab in frontend - Goal selector: "Learn [topic]", "Master [skill]", "Prepare for [interview]"
- Curated course database: manually maintained list of quality courses
- Video + docs + interactive practice mix
- Skill level progression: Beginner → Intermediate → Advanced → Expert
- Provider sources: YouTube, Coursera, Udemy, freeCodeCamp, official docs
Data model:
{
"course_id": "react-fundamentals",
"title": "React Fundamentals",
"provider": "YouTube",
"url": "https://youtube.com/...",
"duration_hours": 4,
"skill_level": "beginner",
"topics": ["react", "javascript", "frontend"],
"type": "video",
"is_free": true,
"rating": 4.8,
"review_count": 12500
}
5.2 Expert-Level Guidance
Problem: Advanced users searching "react internals" get beginner tutorials.
Solution: Expert mode with deep-dive content filtering:
User selects: "Expert" skill level
Results filtered to:
📖 [React Fiber Architecture] — Deep dive into reconciliation
📖 [React Compiler Architecture] — How the new compiler works
📖 [React Source Code Walkthrough] — Reading React source
📹 [Advanced React Patterns] — Compound components, render props
📹 [React Performance Masterclass] — Profiling, optimization
Implementation:
- Skill level attribute on all courses/content
- Filter by
skill_levelin search - Inference: "internals", "source code", "architecture" → expert level
- Include: conference talks, academic papers, source code analysis
Phase 6: Image Search Improvements
6.1 Visual Similarity Search
Problem: Current image search is text-based only (alt text, surrounding context).
Solution: CLIP-based visual similarity:
User searches: "cyberpunk city"
→ Text match: "cyberpunk city" in alt text
→ Visual match: CLIP embedding similarity to query image
→ Results: Both keyword-matched AND visually similar images
Implementation:
- Add
transformers+open_clipto Rust via PyO3 bindings - Pre-compute CLIP embeddings for indexed images
- Store 512-dim float vectors in Meilisearch
- Query: encode text → search vectors by cosine similarity
Note: More compute-intensive, but feasible with ONNX runtime already in use.
6.2 Reverse Image Search
Problem: User has an image, wants to find similar ones or where it came from.
Solution: Upload-based reverse search:
User uploads: screenshot.jpg
→ Compute perceptual hash (dHash)
→ Search index for similar hashes (hamming distance < 10)
→ Return: matching + visually similar images
Implementation:
- New
/api/images/reverseendpoint (multipart upload) - Compute dHash + ThumbHash of uploaded image
- Query Meilisearch for
dhashfield with hamming distance search - Source attribution (find original of meme/product image)
Phase 7: Privacy Hardening
7.1 Complete No-Log Policy
Problem: Current system logs query metadata for debugging.
Solution: Formal no-logging with cryptographic proof:
No logging at any layer:
✅ No query logs
✅ No IP address logging
✅ No User-Agent logging
✅ No referer logging
✅ No timing data retention
Verification:
- Automated audit: grep all services for log statements with PII
- Third-party privacy audit (annual)
Implementation:
- Remove all
tracing::info!with query content - Add
--no-logcompile flag that strips all debug logging - Regular automated grep for potential PII in logs
7.2 Tor-Only Mode
Problem: Even with anti-detection, ISP can see you're using IntentForge.
Solution: Optional Tor-only mode:
User enables: "Tor Required" mode
All requests route through Tor:
→ ISP sees: Tor traffic to guard node
→ Guard node sees: Encrypted traffic to bridge
→ IntentForge sees: Tor exit node IP (anonymous)
Trade-off: 3-5x latency increase
Implementation:
- Arti (Rust Tor client) integration (already in
Cargo.tomlas optional) - UI toggle: "Privacy Mode: Tor"
- Auto-fallback: if Tor is blocked, show warning
- New config:
tor_required = true
Priority Order
P0 (Must have):
1.1 Vague query disambiguation
2.1 Direct algorithm communication (preferences)
3.1 Streaming + progressive results
3.2 Persistent vector cache
P1 (Should have):
4.1 Direct product links
5.1 Courses tab + roadmaps
3.3 Edge cache for popular queries
P2 (Nice to have):
5.2 Expert-level guidance
6.1 Visual similarity search
2.2 Zero-trust request layer
6.2 Reverse image search
P3 (Future):
4.2 Price comparison engine
7.1 Complete no-log policy
7.2 Tor-only mode
2.3 Context-aware intent memory
Contributing
IntentForge is open source under MIT License. See AGENTS.md for development guidelines.
Key areas for contribution:
- Frontend (React/TypeScript) — Disambiguation UI, Courses tab, Preference editor
- Rust Backend — Intent classifier improvements, Zero-trust layer
- Python Query Layer — Embedding cache, cross-encoder optimization
- Testing — Latency benchmarks, quality evaluation
Glossary
| Term | Definition |
|---|---|
| Intent | The user's goal: find info, navigate to site, buy product, explore topic |
| BM25 | Classic keyword search algorithm (relevance scoring) |
| Semantic Search | Vector-based search (finds conceptually similar content) |
| Cross-Encoder | Two-pass ranker: coarse search → precise re-ranking |
| Perceptual Hash | Image fingerprint for similarity comparison |
| ThumbHash | Compact visual representation (~20 bytes) |
| Disambiguation | Presenting choices when query has multiple meanings |
| DSL | Domain-specific language (preferences = mini DSL) |