IntentForge Architecture — How We Built a Privacy-First Search Engine with Tor - Oxiverse

IntentForge: How We Built a Privacy-First Search Engine on Tor

Google knows what you searched for. Your ISP knows every site you visited. Your government can demand logs. The current web is a surveillance infrastructure with a search interface.

IntentForge was built to change this.

The Core Problem: Metadata

Even when you use HTTPS, your DNS queries and connection metadata leak your intent. Your ISP sees what domains you resolve. Cloudflare sees what IPs you connect to. Running a search engine without Tor is like sending mail without an envelope — the content is sealed, but the destination is visible.

Tor Integration: First-Class from Day One

IntentForge routes all queries through the Tor network by default. Not as an option. Not as a "privacy mode." As the primary interface.

We use Snowflake bridges to connect to Tor, making it harder for network observers to detect Tor usage patterns. Every search request is routed through a different exit node, making query correlation across requests practically impossible.

Intent-First Architecture

Traditional search engines match keywords. IntentForge matches intent.

When you search "best laptop for coding," a keyword engine returns pages with those exact words. IntentForge understands that you're evaluating purchasing decisions, so it surfaces reviews, comparisons, and developer forum discussions — even if none contain the phrase "best laptop for coding."

How it works:

Query parsing — The intent extraction layer breaks queries into structured intent objects: { action, target, constraints, context }
Tor-routed meta-search — The structured intent is sent through Tor to multiple search backends simultaneously
Vector scoring — Results are embedded and scored using binary quantized vectors (384→48 bytes per embedding)
Self-improving index — Implicit feedback from clicks, dwell time, and reformulations updates the index in real-time

Binary Quantized Vectors: 8× Compression

Storing full 384-dimensional float vectors for every indexed document is expensive. IntentForge uses binary quantization — mapping each float vector to a 48-byte binary code while retaining ~92% of retrieval accuracy. This lets us run the full index on modest hardware while maintaining sub-50ms P95 latency.

Self-Improving Index: Learning from Searches

Most search engines update their index on a fixed schedule — hourly, daily, weekly. IntentForge updates its index based on query intent signals. When users consistently reformulate a query in a certain way, the intent extractor learns. When users click results lower in the ranking, the vector scorer adjusts. When new content matches a query pattern, the crawler prioritizes it. This creates a feedback loop where the search engine gets better at matching intent without manual curation.

Anti-Signals Filtering: No Manipulation

We actively filter anti-signals — SEO manipulation, paid placements, clickbait patterns, and known misinformation sources. This is expensive to compute but essential for maintaining result quality.

What's Under the Hood

Backend: FastAPI + Redis caching
Crawler: Go-based with BadgerDB for persistence
Search: Meilisearch with custom intent scoring
Embedding: ONNX Runtime (local inference, no data leaves the stack)
Privacy layer: Tor + Snowflake bridges
Vector storage: Binary quantized embeddings

Open Source

IntentForge is fully open source under the IECL license. The code, architecture docs, and research notes are available at github.com/oxiverse-labs/intentforge.

The Bigger Picture

IntentForge is one piece of Oxiverse — a complete privacy-first ecosystem that includes browser, productivity tools, and more. We believe privacy isn't a feature. It's the default.

IntentForge Architecture — How We Built a Privacy-First Search Engine with Tor