DocsIntentforgearchitecture

IntentForge v2 Architecture

IntentForge v2 is a high-performance discovery engine built to prioritize user intent over simple keyword matching. This document outlines the technical architecture, component interactions, and the hybrid search strategy.

High-Level Overview

IntentForge operates as a distributed system with a Rust core orchestrating specialized microservices.

Loading Diagram...

Core Components

1. Axum API (src/api/)

The entry point for all requests. It handles:

  • Search Logic: Orchestrates local Meilisearch queries and remote meta-search fan-out.
  • Caching: Multi-tiered caching (In-memory L1 + Redis L2).
  • Streaming: Supports SSE (Server-Sent Events) for real-time result updates.

2. Intent Classifier (src/intent_classifier/)

Analyzes queries to determine the user's goal (Informational, Navigational, Transactional, etc.).

  • Vectorized Mapping: Uses ONNX embeddings to map queries to intent groups.
  • Adaptive Search: Adjusts the semantic/keyword ratio based on query confidence.

3. Meta-Search Aggregator (src/meta_search/)

Discovers fresh content in real-time by querying external providers.

  • Fan-out Strategy: Parallelizes requests to multiple providers (DuckDuckGo, Bing, Brave, etc.).
  • Anonymization: All outgoing meta-search traffic is routed through Tor (Snowflake) or Cloudflare Worker Proxies to prevent tracking and IP blocking by upstream providers. (User-to-server traffic remains direct for performance).

4. Hybrid Search Engine

Combines the precision of keyword search with the context-awareness of semantic search.

  • Semantic Ratio: Defaults to 0.7 (70% semantic, 30% keyword).
  • Binary Quantization: Enables 8x vector compression for sub-50ms query latency.

5. Self-Improvement Service (src/self_improvement/)

An autonomous loop that triggers background indexing when search quality is low.

  • Gap Analysis: Identifies queries with zero or low-relevance results.
  • Proactive Discovery: Automatically searches, crawls, and indexes new content to fill gaps.

Data Flow: The Search Lifecycle

  1. Request: User submits a query.
  2. Classification: Intent classifier detects the query type and extracts salient attributes.
  3. Local Search: Hybrid search queries the Meilisearch index.
  4. Meta Fallback: If local results are insufficient, the aggregator fans out to meta-search providers.
  5. Ranking: Results from all sources are merged, deduplicated (using SimHash), and re-ranked using a cross-encoder model.
  6. Enrichment: Discovered URLs are queued for the background crawler to ingest full content for future queries.

Microservices

  • Query Layer: Python/FastAPI service for semantic query expansion and ranking.
  • Trafilatura: Specialized text extraction service for clean content retrieval.
  • YouTube-Unified: Optimized Node.js service for video discovery.

Infrastructure

  • Tor Transport: Native Tor daemon + Snowflake 2.12.1 with SQS rendezvous for censorship circumvention.
  • Meilisearch: Optimized for hybrid search with vector storage.
  • Redis: Handles deduplication (Bloom filters), caching, and discovery queues.