Oxiverse is a privacy-first ecosystem of products including a search engine (IntentForge), browser, and productivity tools, all built on source-available principles and Privacy-by-Design.

Is Oxiverse open source?

Yes, Oxiverse is source-available under the Oxiverse Community License (OCL) v1.0, ensuring transparency and privacy for all users.

Where is the official Oxiverse source code?

The primary development repository is hosted on Codeberg (codeberg.org/oxiverse). A mirror exists on GitHub (github.com/oxiverse-ecosystem), but all contributions should be directed to Codeberg to ensure development remains within privacy-focused infrastructure.

IntentForge is an autonomous discovery engine utilizing intent extraction and self-healing search technology to provide private, relevant results without user profiling or data retention.

IntentForge Roadmap

Privacy-first, intent-driven search engine with zero-trust architecture.

Vision

IntentForge will be the only search engine that truly respects users — no tracking, no manipulation, no corporate control. Results are determined solely by relevance and intent, not by who pays the most. Users will be able to talk directly to the ranking algorithm to personalize results, and the system will be transparent about how rankings work.

Current State

✅ Operational

Feature	Status	Notes
Core hybrid search	✅ Live	BM25 + ONNX semantic search
Intent classification	✅ Live	4 query types with adaptive semantic ratio
Multi-tier caching	✅ Live	L1 (LRU) + Redis
Meta-search	✅ Live	9 providers simultaneously
Image search	✅ Live	Zero-bandwidth HTML metadata indexing
Video search	✅ Live	6 sources (YouTube, Piped, Invidious, etc.)
Anti-detection	✅ Live	TLS fingerprinting, Cloudflare bypass
Content extraction	✅ Live	Trafilatura integration
Cross-encoder reranking	✅ Live	ms-marco-MiniLM-L6-v2
Self-improvement	✅ Live	Background gap analysis + crawling

🔄 In Progress

Feature	Status	Notes
Courses tab	🔄 Planning	Docs + video roadmap system
Personalization layer	🔄 Designing	Direct user → algorithm communication
Zero-trust request validation	🔄 Designing	Cryptographic request signing

Roadmap

Phase 1: Intent Enhancement

1.1 Smarter Intent Detection for Vague Queries

Problem: Users type vague things like "rust" or "python" and get arbitrary results.

Solution: When a query is too vague to determine intent, present a choice menu:

I see "rust" could mean several things. What do you want?

🤖  [Rust Programming Language] — The systems programming language
🦀  [Rust (car brand)] — Ford's pickup truck
🧪  [Rust (chemistry)] — Iron oxide, corrosion
📊  [Rust (framework)] — Rust-based web frameworks (Actix, Axum)

Implementation:

Add is_vague_query() detection in intent_classifier.rs
Create /api/intent/disambiguate endpoint
Return disambiguation choices with icons, descriptions, intent hints
Frontend renders as interactive cards

Files affected: src/intent_classifier.rs, src/api/mod.rs

1.2 Context-Aware Intent Memory

Problem: Each search is treated independently. No learning from user patterns.

Solution: Lightweight session memory that remembers intent patterns:

User: "how to fix"
  → Intent: Informational / "how-to-fix" subtype
  → Stores: {query: "how to fix", intent: "tutorial", skill_level: "intermediate"}

User: "rust"
  → Remembers previous "how to fix" pattern
  → Suggests: "How to fix Rust memory leaks" (not the car)

Implementation:

SQLite session store (privacy-local, not server-side)
Intent pattern extraction over session lifetime
Suggest refinements based on history

Phase 2: Personalization

2.1 Direct Algorithm Communication

Problem: Users can't tell the search engine what they actually want.

Solution: A personalization DSL — natural language instructions to the ranker:

> "I prefer recent papers over classic references"
> "Show me tutorials, not product pages"
> "Ignore anything from example.com"
> "I want beginner-friendly content"
> "More video results, fewer articles"

Implementation:

New /api/search parameter: preferences: string
Preference parser in ranking layer:
- "recent" → recency boost
- "tutorials" → content_type=tutorial filter
- "ignore domain" → domain exclusion filter
- "beginner" → skill_level=beginner boost
- "video" → media_type=video boost
Preferences stored in local session cookie (no server-side profile)
Preference syntax validated and sanitized

Example API call:

GET /search?q=rust+error+handling&preferences=no+blogs,+prefer+docs+and+videos

2.2 Zero-Trust Request Layer

Problem: Attackers could theoretically manipulate search results via request injection.

Solution: Cryptographic request signing for internal service communication:

User Request → [Sign with shared secret] → API Gateway
                                           ↓
                                    [Verify signature]
                                           ↓
                                    Route to services

Implementation:

HMAC-SHA256 signing of request parameters
Shared secret per service pair
Replay attack prevention with nonces + timestamps
All internal service calls signed
Public-facing API remains unauthenticated (but rate-limited)

Files affected: services/search-api/src/proxy.rs, new src/zero_trust/ module

Phase 3: Latency Improvements

3.1 Streaming Response + Progressive Results

Problem: User waits 800ms+ before seeing any results.

Solution: Stream results as they're discovered:

0ms   → Query analysis (intent, disambiguation check)
50ms  → Stream: "Here are results for 'rust'..."
100ms → Stream: [Result 1, 2, 3...] (from cache/index)
200ms → Stream: [Result 4, 5, 6...] (from meta-search)
500ms → Stream: "Meta-search found 3 more..."
800ms → Final: [All results sorted + re-ranked]

Implementation:

SSE (Server-Sent Events) endpoint: /search/stream
Parallel discovery: index + meta-search fire simultaneously
Progressive result emission (yield as found, don't wait for all)
Frontend renders incrementally

Files affected: src/api/mod.rs, new streaming endpoints

3.2 Persistent Vector Cache

Problem: Embedding computation is expensive (~50ms per query).

Solution: Pre-compute embeddings for common terms + cache all query embeddings:

Query: "rust error handling"
  → Check embedding cache (Redis)
  → HIT: Use cached vector (0.1ms)
  → MISS: Compute ONNX embedding (50ms), store in cache (TTL: 24h)

Implementation:

Redis vector store (128-dim float32 = 512 bytes per embedding)
Cache key: SHA256(normalized_query)
Background job: pre-compute embeddings for top-1000 common terms
Embedding cache in services/query_layer/app/core/embedding_service.py

3.3 Edge Cache for Popular Queries

Problem: Trending topics cause thundering herd on internal services.

Solution: CDN-style edge caching for queries matching trending patterns:

Query: "Apple Event 2025"
  → Edge cache hit (Cloudflare Workers)
  → Response: <5ms
  → Background refresh cache

Implementation:

Cloudflare Worker as edge cache (free tier: 100k requests/day)
Cache popular queries (top 1000 by frequency)
Stale-while-revalidate: serve cached, refresh in background
Vary: by intent type + region

Phase 4: E-Commerce

4.1 Direct Product Links

Problem: Search for "laptop" returns blog posts about "best laptops 2024", not products.

Solution: Intent-aware product extraction:

Query: "macbook pro 14 inch price"
  → Intent: Transactional (user wants to buy)
  → Source: Direct product links (Amazon, Best Buy, Apple Store)
  → No blog posts unless explicitly requested

Query: "macbook pro review"
  → Intent: Informational / ProductReview
  → Source: Blog reviews, YouTube reviews
  → No direct purchase links

Implementation:

New /api/products endpoint
Product-specific providers: Amazon PA-API, Google Shopping
Intent detection for transactional queries → route to product index
Price comparison: extract + display price from product pages
Affiliate link sanitization (remove tracking params)

Files affected: src/api/mod.rs, new services/search-api/src/providers/shopping.rs

4.2 Price Comparison Engine

Problem: Users can't compare prices across stores without manual checking.

Solution: Unified price index:

{
  "product": "MacBook Pro 14-inch M3",
  "stores": [
    {"store": "Apple", "url": "https://store.apple.com/...", "price": 1999},
    {"store": "Amazon", "url": "https://amazon.com/...", "price": 1899},
    {"store": "Best Buy", "url": "https://bestbuy.com/...", "price": 1949}
  ],
  "lowest_price": 1899,
  "highest_price": 1999,
  "currency": "USD"
}

Implementation:

Product pages crawled → price extracted (Trafilatura + custom extraction)
Price normalization: handle "From $X", "$X.99", "€X" formats
Currency conversion via API ( exchangerate-api.com free tier)
New schema field: price_info in search results

Phase 5: Courses Tab

5.1 Roadmap-Based Learning Paths

Problem: Searching "react" returns random tutorials. No structured learning path.

Solution: Courses tab with goal-based roadmaps:

User selects: "I want to learn React development"

Roadmap: React Developer Path (Beginner → Expert)

Week 1-2: Foundations
  📹 [JavaScript Fundamentals] — 4hr video course
  📖 [MDN JavaScript Guide] — Documentation
  🧪 [JavaScript Exercises] — Interactive practice

Week 3-4: React Basics
  📹 [React Official Tutorial] — 3hr interactive
  📹 [React Fundamentals by Kent C. Dodds] — 6hr course
  📖 [React Beta Docs] — Official documentation

Week 5-8: Advanced Patterns
  📹 [Advanced React Patterns] — 4hr course
  📹 [React Performance] — 2hr deep dive
  🧪 [React Coding Challenges] — Practice problems

Week 9+: Real-world Projects
  📹 [Build a SaaS with React] — 8hr project course
  📹 [React + TypeScript] — 3hr course

Implementation:

New /courses tab in frontend
Goal selector: "Learn [topic]", "Master [skill]", "Prepare for [interview]"
Curated course database: manually maintained list of quality courses
Video + docs + interactive practice mix
Skill level progression: Beginner → Intermediate → Advanced → Expert
Provider sources: YouTube, Coursera, Udemy, freeCodeCamp, official docs

Data model:

{
  "course_id": "react-fundamentals",
  "title": "React Fundamentals",
  "provider": "YouTube",
  "url": "https://youtube.com/...",
  "duration_hours": 4,
  "skill_level": "beginner",
  "topics": ["react", "javascript", "frontend"],
  "type": "video",
  "is_free": true,
  "rating": 4.8,
  "review_count": 12500
}

5.2 Expert-Level Guidance

Problem: Advanced users searching "react internals" get beginner tutorials.

Solution: Expert mode with deep-dive content filtering:

User selects: "Expert" skill level

Results filtered to:
  📖 [React Fiber Architecture] — Deep dive into reconciliation
  📖 [React Compiler Architecture] — How the new compiler works
  📖 [React Source Code Walkthrough] — Reading React source
  📹 [Advanced React Patterns] — Compound components, render props
  📹 [React Performance Masterclass] — Profiling, optimization

Implementation:

Skill level attribute on all courses/content
Filter by skill_level in search
Inference: "internals", "source code", "architecture" → expert level
Include: conference talks, academic papers, source code analysis

Phase 6: Image Search Improvements

6.1 Visual Similarity Search

Problem: Current image search is text-based only (alt text, surrounding context).

Solution: CLIP-based visual similarity:

User searches: "cyberpunk city"
  → Text match: "cyberpunk city" in alt text
  → Visual match: CLIP embedding similarity to query image
  → Results: Both keyword-matched AND visually similar images

Implementation:

Add transformers + open_clip to Rust via PyO3 bindings
Pre-compute CLIP embeddings for indexed images
Store 512-dim float vectors in Meilisearch
Query: encode text → search vectors by cosine similarity

Note: More compute-intensive, but feasible with ONNX runtime already in use.

6.2 Reverse Image Search

Problem: User has an image, wants to find similar ones or where it came from.

Solution: Upload-based reverse search:

User uploads: screenshot.jpg
  → Compute perceptual hash (dHash)
  → Search index for similar hashes (hamming distance < 10)
  → Return: matching + visually similar images

Implementation:

New /api/images/reverse endpoint (multipart upload)
Compute dHash + ThumbHash of uploaded image
Query Meilisearch for dhash field with hamming distance search
Source attribution (find original of meme/product image)

Phase 7: Privacy Hardening

7.1 Complete No-Log Policy

Problem: Current system logs query metadata for debugging.

Solution: Formal no-logging with cryptographic proof:

No logging at any layer:
  ✅ No query logs
  ✅ No IP address logging
  ✅ No User-Agent logging
  ✅ No referer logging
  ✅ No timing data retention

Verification:
  - Automated audit: grep all services for log statements with PII
  - Third-party privacy audit (annual)

Implementation:

Remove all tracing::info! with query content
Add --no-log compile flag that strips all debug logging
Regular automated grep for potential PII in logs

7.2 Tor-Only Mode

Problem: Even with anti-detection, ISP can see you're using IntentForge.

Solution: Optional Tor-only mode:

User enables: "Tor Required" mode

All requests route through Tor:
  → ISP sees: Tor traffic to guard node
  → Guard node sees: Encrypted traffic to bridge
  → IntentForge sees: Tor exit node IP (anonymous)

Trade-off: 3-5x latency increase

Implementation:

Arti (Rust Tor client) integration (already in Cargo.toml as optional)
UI toggle: "Privacy Mode: Tor"
Auto-fallback: if Tor is blocked, show warning
New config: tor_required = true

Priority Order

P0 (Must have):
  1.1 Vague query disambiguation
  2.1 Direct algorithm communication (preferences)
  3.1 Streaming + progressive results
  3.2 Persistent vector cache

P1 (Should have):
  4.1 Direct product links
  5.1 Courses tab + roadmaps
  3.3 Edge cache for popular queries

P2 (Nice to have):
  5.2 Expert-level guidance
  6.1 Visual similarity search
  2.2 Zero-trust request layer
  6.2 Reverse image search

P3 (Future):
  4.2 Price comparison engine
  7.1 Complete no-log policy
  7.2 Tor-only mode
  2.3 Context-aware intent memory

Contributing

IntentForge is open source under MIT License. See AGENTS.md for development guidelines.

Key areas for contribution:

Frontend (React/TypeScript) — Disambiguation UI, Courses tab, Preference editor
Rust Backend — Intent classifier improvements, Zero-trust layer
Python Query Layer — Embedding cache, cross-encoder optimization
Testing — Latency benchmarks, quality evaluation

Glossary

Term	Definition
Intent	The user's goal: find info, navigate to site, buy product, explore topic
BM25	Classic keyword search algorithm (relevance scoring)
Semantic Search	Vector-based search (finds conceptually similar content)
Cross-Encoder	Two-pass ranker: coarse search → precise re-ranking
Perceptual Hash	Image fingerprint for similarity comparison
ThumbHash	Compact visual representation (~20 bytes)
Disambiguation	Presenting choices when query has multiple meanings
DSL	Domain-specific language (preferences = mini DSL)