IntentForge v2 Architecture
IntentForge v2 is a high-performance discovery engine built to prioritize user intent over simple keyword matching. This document outlines the technical architecture, component interactions, and the hybrid search strategy.
High-Level Overview
IntentForge operates as a distributed system with a Rust core orchestrating specialized microservices.
Loading Diagram...
Core Components
1. Axum API (src/api/)
The entry point for all requests. It handles:
- Search Logic: Orchestrates local Meilisearch queries and remote meta-search fan-out.
- Caching: Multi-tiered caching (In-memory L1 + Redis L2).
- Streaming: Supports SSE (Server-Sent Events) for real-time result updates.
2. Intent Classifier (src/intent_classifier/)
Analyzes queries to determine the user's goal (Informational, Navigational, Transactional, etc.).
- Vectorized Mapping: Uses ONNX embeddings to map queries to intent groups.
- Adaptive Search: Adjusts the semantic/keyword ratio based on query confidence.
3. Meta-Search Aggregator (src/meta_search/)
Discovers fresh content in real-time by querying external providers.
- Fan-out Strategy: Parallelizes requests to multiple providers (DuckDuckGo, Bing, Brave, etc.).
- Anonymization: All outgoing meta-search traffic is routed through Tor (Snowflake) or Cloudflare Worker Proxies to prevent tracking and IP blocking by upstream providers. (User-to-server traffic remains direct for performance).
4. Hybrid Search Engine
Combines the precision of keyword search with the context-awareness of semantic search.
- Semantic Ratio: Defaults to 0.7 (70% semantic, 30% keyword).
- Binary Quantization: Enables 8x vector compression for sub-50ms query latency.
5. Self-Improvement Service (src/self_improvement/)
An autonomous loop that triggers background indexing when search quality is low.
- Gap Analysis: Identifies queries with zero or low-relevance results.
- Proactive Discovery: Automatically searches, crawls, and indexes new content to fill gaps.
Data Flow: The Search Lifecycle
- Request: User submits a query.
- Classification: Intent classifier detects the query type and extracts salient attributes.
- Local Search: Hybrid search queries the Meilisearch index.
- Meta Fallback: If local results are insufficient, the aggregator fans out to meta-search providers.
- Ranking: Results from all sources are merged, deduplicated (using SimHash), and re-ranked using a cross-encoder model.
- Enrichment: Discovered URLs are queued for the background crawler to ingest full content for future queries.
Microservices
- Query Layer: Python/FastAPI service for semantic query expansion and ranking.
- Trafilatura: Specialized text extraction service for clean content retrieval.
- YouTube-Unified: Optimized Node.js service for video discovery.
Infrastructure
- Tor Transport: Native Tor daemon + Snowflake 2.12.1 with SQS rendezvous for censorship circumvention.
- Meilisearch: Optimized for hybrid search with vector storage.
- Redis: Handles deduplication (Bloom filters), caching, and discovery queues.