Investigation Report & Long-Term Solution Plan - May 14, 2026
1. Executive Summary
Following the successful restoration of core search quality by fixing the ONNX model stubs, several functional gaps remain in the IntentForge v2 stack. These issues are primarily related to external provider blocking, architectural limitations, and incomplete API specifications. This document outlines the root causes and a comprehensive plan for long-term resolution.
2. Issues & Root Causes
2.1 /news → 0 Results
- Symptoms: Queries to
/newsreturn empty lists. - Investigation:
GoogleNewsProvideruses direct connections, likely blocked by Google for data-center IP ranges.DuckDuckGoNewsProviderandBingNewsProvideruse Tor, which is aggressively blocked by their respective news endpoints.
- Root Cause: Inconsistent and easily detectable proxying strategies for real-time news aggregation.
2.2 /videos → 0 Results / CORS Issue
- Symptoms: Video searches return 0 results; user reports CORS errors.
- Investigation:
youtube-unified(Node.js) lacks CORS middleware.- The main Rust API also lacks CORS configuration for Axum.
yt-dlpupdates are failing because theauto-update.shscript is "not found" (likely a line-ending issue in the Docker container).
- Root Cause: Missing cross-origin resource sharing configuration and broken background update mechanisms.
2.3 Non-English Support → 0 Results
- Symptoms: Queries in languages other than English return no results.
- Root Cause: The system uses
all-MiniLM-L6-v2, an English-only embedding model. The discovery pipeline likely filters or fails to index non-English content effectively.
2.4 Missing Pagination Metadata
- Symptoms:
SearchResponsedoes not includepageortotal_pages. - Root Cause: The API response structure was never updated to support full pagination metadata.
2.5 Spam Query Handling
- Symptoms: Spam-like queries return 0 results.
- Root Cause: Hardcoded threshold (
spam_score < 0.5) in the query rewriter.
3. Long-Term Solution Plan
Phase 1: Proxy & Infrastructure Standardization
- Middle-Route Implementation: Update all news providers to use
build_middle_route_client(VPN/CF Worker/Public Proxy) instead of Tor or Direct. This maintains privacy while bypassing Tor exit node blocks. - Docker Fixes: Convert
services/youtube-unified/auto-update.shto LF line endings and ensure it runs correctly in the Alpine container. - CORS Support:
- Add
corspackage toyoutube-unifiedand enable it. - Add
tower-httpCORS layer tosrc/api/mod.rsto allow frontend integration.
- Add
Phase 2: Multilingual Capabilities
- Model Upgrade: Switch the default embedding model to
paraphrase-multilingual-MiniLM-L12-v2orBGE-M3(384/1024 dimensions). - Indexing Updates: Modify
scripts/ensure_models.pyto handle multilingual models and ensure they are correctly loaded by the ONNX inference engine.
Phase 3: API & Quality Improvements
- Pagination:
- Update
SearchResponsestruct to includepage,limit, andtotal_pages. - Implement pagination logic in the Meilisearch aggregator and meta-search providers.
- Update
- Configurable Spam Filtering: Move the
spam_scorethreshold toconfig.yamlto allow site administrators to tune the aggressiveness of the filter. - RSS Robustness: Enhance
GoogleNewsProviderwith multiple fallback RSS feeds and a more resilient parsing strategy.
4. Immediate Next Steps (Proposed)
- Fix line endings in
auto-update.shand rebuildyoutube-unified. - Add CORS headers to both the Rust and Node.js APIs.
- Update
scripts/ensure_models.pyto check file sizes and prevent stub-blocking. - Refactor
GoogleNewsProviderto use the middle route.