Epstein Files Hub - Bot Usage Guide
This guide explains how the AI agents work and how to interact with them effectively.
Overview
The Epstein Files Hub uses 26 specialized AI agents that work 24/7 to maintain and organize over 30,000 documents and 20,000 images.
Agent Categories
1. Document Management Agents (7 agents)
Document Indexing Agent
- Purpose: Indexes all incoming documents
- Capacity: 6,000 documents/day
- Features:
- Full-text extraction
- Metadata generation
- Tag assignment
- Cross-referencing
OCR Processing Agent
- Purpose: Converts scanned documents to searchable text
- Capacity: 5,000 pages/day
- Technology: Azure Document Intelligence
- Output: Searchable PDFs with extracted text
Document Analysis Agent
- Purpose: Deep content analysis
- Capacity: 3,000 documents/day
- Analysis:
- Entity extraction (people, places, organizations)
- Date/timeline extraction
- Relationship mapping
- Redaction detection
Document Verification Agent
- Purpose: Verifies document authenticity
- Capacity: 2,000 documents/day
- Checks:
- Source verification
- Duplicate detection
- Format validation
- Metadata consistency
Document Summarization Agent
- Purpose: Creates document summaries
- Capacity: 10,000 documents/day
- Output:
- Brief summary (50-100 words)
- Key points extraction
- Important dates and names
- Relevance scoring
Cross-Reference Agent
- Purpose: Links related documents
- Capacity: 15,000 documents/day
- Connections:
- Same people mentioned
- Same locations
- Same time periods
- Related cases
Document Classification Agent
- Purpose: Categorizes documents
- Capacity: 10,000 documents/day
- Categories:
- Flight logs
- Court filings
- Depositions
- Financial records
- Property documents
- Communications
Total Document Processing Capacity: 42,000+ operations/day
2. Image Management Agents (5 agents)
Image Indexing Agent
- Purpose: Catalogs all images
- Capacity: 10,000 images/day
- Features:
- Metadata extraction
- Tag assignment
- Location data
- Timestamp extraction
Image Analysis Agent
- Purpose: Visual content analysis
- Capacity: 5,000 images/day
- Technology: Azure Computer Vision
- Detection:
- Objects and scenes
- Structures and buildings
- Documents in photos
- Text in images (OCR)
Image Verification Agent
- Purpose: Verifies image authenticity
- Capacity: 3,000 images/day
- Checks:
- Duplicate detection
- Manipulation detection
- Source verification
- Quality assessment
Image Organization Agent
- Purpose: Organizes image library
- Capacity: 15,000 images/day
- Actions:
- Folder structure maintenance
- Collection creation
- Related image grouping
- Timeline arrangement
Image Maintenance Agent
- Purpose: Library upkeep
- Capacity: 5,000 images/day
- Tasks:
- Broken link checking
- Format optimization
- Backup verification
- Storage management
Total Image Processing Capacity: 26,000+ operations/day
3. Search & Retrieval Agents (3 agents - Backend Only)
Web Search Agent
- Purpose: Searches external sources for verification
- Engines: Google, Bing, DuckDuckGo, Azure Search
- Usage: Backend only (AI agents only)
- Functions:
- Fact verification
- Source discovery
- News monitoring
- Related content finding
Image Search Agent
- Purpose: Finds related images
- Engines: Google Images, Bing Images, TinEye
- Usage: Backend only (AI agents only)
- Functions:
- Reverse image search
- Similar image finding
- Source tracing
- Duplicate detection
Internal Search Agent
- Purpose: Searches internal database
- Technology: Azure Cognitive Search
- Features:
- Full-text search
- Semantic search
- Faceted search
- Relevance ranking
Note: External search engines are NOT exposed to users. Users only access the Internal Search Agent through the web UI.
4. Quality Control Agents (3 agents)
Fact-Checking Agent
- Purpose: Verifies facts and claims
- Process:
- Extract factual claims
- Search authoritative sources
- Compare and verify
- Flag inconsistencies
- Sources:
- Court records
- Government databases
- Verified news sources
- Academic publications
Source Verification Agent
- Purpose: Validates document sources
- Checks:
- Source authenticity
- Publication verification
- Chain of custody
- Legal compliance
- Actions:
- Assign verification level (1-5)
- Flag suspicious sources
- Recommend additional verification
Content Moderation Agent
- Purpose: Ensures content quality
- Monitors:
- Inappropriate content
- Unverified claims
- Duplicate submissions
- Privacy violations
- Actions:
- Flag for review
- Auto-reject if clearly inappropriate
- Notify moderators
- Log incidents
5. Organization Agents (4 agents)
Collection Management Agent
- Purpose: Maintains document collections
- Functions:
- Create thematic collections
- Update collection metadata
- Remove outdated items
- Generate collection summaries
Timeline Generation Agent
- Purpose: Creates chronological timelines
- Features:
- Extract dates from documents
- Build event sequences
- Link related events
- Generate visual timelines
Relationship Mapping Agent
- Purpose: Maps connections between entities
- Outputs:
- Person-to-person connections
- Person-to-location links
- Organization relationships
- Timeline of interactions
Auto-Tagging Agent
- Purpose: Automatically tags content
- Tags:
- People mentioned
- Locations
- Organizations
- Topics
- Date ranges
- Document types
6. Monitoring Agents (2 agents)
System Health Agent
- Purpose: Monitors system health
- Checks:
- Bot performance
- Error rates
- Processing queues
- Storage capacity
- API rate limits
- Alerts:
- Critical errors
- Performance degradation
- Capacity warnings
- Service outages
- Purpose: Optimizes system performance
- Actions:
- Identify bottlenecks
- Adjust processing priorities
- Optimize database queries
- Manage caching
- Balance load
7. User Support Agents (2 agents)
Search Assistant Agent
- Purpose: Helps users find information
- Features:
- Query interpretation
- Search suggestions
- Related content recommendations
- Result explanation
Help & Documentation Agent
- Purpose: Provides user assistance
- Functions:
- Answer common questions
- Generate help content
- Provide usage tips
- Troubleshoot issues
How to Use the Bots
For End Users
Using the Search Feature
- Go to Search Page
- Enter your search criteria
- Internal Search Agent processes your query
- Results display with relevance scores
- Click to view full documents
Search Tips:
- Use specific keywords for better results
- Filter by date range for focused searches
- Use location filters to narrow results
- Check redaction status filters
- Try advanced filters for precision
Uploading Documents
- Visit Upload Page
- Select PDF file(s)
- PDF Analysis Agent automatically analyzes
- Document Routing Agent decides:
- ≥70% relevance → Accepted & indexed
- <70% relevance → Rejected & trashed
- Receive notification of results
Upload Guidelines:
- PDFs only
- Maximum 50MB per file
- Must be Epstein-related
- No copyrighted material
- Verifiable sources preferred
For Contributors
Submitting via GitHub
# 1. Fork repository
git clone https://github.com/YOUR_USERNAME/Hub_of_Epstein_Files_Directory.git
# 2. Add document
cp document.pdf data/uploads/
# 3. Create metadata
cat > data/uploads/document.json << EOF
{
"title": "Document Title",
"date": "2024-01-01",
"source": "Court Records",
"relevance": "High"
}
EOF
# 4. Commit and push
git add data/uploads/
git commit -m "Add new court filing"
git push origin main
# 5. Create pull request
Automated Processing:
- GitHub Actions workflow triggers
- PDF Analysis Bot processes document
- If relevant (≥70%), automatically merged
- If uncertain (40-69%), flagged for manual review
- If irrelevant (<40%), rejected with explanation
For Developers
Adding New Bots
# bots/your-new-bot/bot.py
from typing import Dict, Any
from azure.ai import DocumentAnalysisClient
class YourNewBot:
"""
Description of what this bot does.
Capacity: X operations/day
Dependencies: Azure service, etc.
"""
def __init__(self, config: Dict[str, Any]):
self.config = config
# Initialize services
def process(self, input_data: Any) -> Dict[str, Any]:
"""
Main processing function.
Args:
input_data: Input to process
Returns:
Processing results
"""
# Implementation
pass
Steps:
- Create bot directory in
bots/
- Implement bot class
- Create README documenting bot
- Add to
bots/AGENT_INFRASTRUCTURE.md
- Create GitHub Actions workflow
- Write tests
- Submit PR
Check Application Insights:
# Azure CLI
az monitor metrics list \
--resource /subscriptions/SUB_ID/resourceGroups/RG/providers/Microsoft.Insights/components/APP_INSIGHTS \
--metric-names requests/count \
--aggregation Average
Dashboard available at: Monitor Dashboard
Bot Coordination
Agents work together in workflows:
Document Upload Workflow
- Upload → PDF Analysis Bot
- Analysis → Document Routing Bot
- If accepted:
- OCR → OCR Processing Agent
- Index → Document Indexing Agent
- Analyze → Document Analysis Agent
- Verify → Document Verification Agent
- Summarize → Summarization Agent
- Link → Cross-Reference Agent
- Classify → Classification Agent
- Complete → Notification
Image Upload Workflow
- Upload → Image Analysis Bot
- Analysis → Image Routing Bot
- If accepted:
- Index → Image Indexing Agent
- Analyze → Image Analysis Agent
- Verify → Image Verification Agent
- Organize → Organization Agent
- Complete → Notification
Search Query Workflow
- Query → Search Assistant Agent
- Process → Internal Search Agent
- Rank → Relevance Scoring
- Results → User Interface
Troubleshooting
Bot Not Processing
Check:
- GitHub Actions status
- Azure service health
- Processing queue length
- Error logs in App Insights
Solution:
- Restart workflow
- Check API limits
- Review error messages
- Contact maintainers
Slow Processing
Causes:
- High queue length
- API rate limits
- Service degradation
- Large file size
Solution:
- Wait for queue to clear
- Upload smaller batches
- Try during off-peak hours
Document Rejected
Reasons:
- Relevance score < 70%
- Duplicate detected
- Invalid format
- Copyright issue
Solution:
- Check relevance criteria
- Verify it’s unique content
- Ensure proper PDF format
- Confirm source rights
API Documentation
For developers integrating with the system:
Search API
GET /api/search?q=query&type=document&from=0&size=20
Upload API
POST /api/upload
Content-Type: multipart/form-data
file: <binary data>
metadata: <json>
Status API
GET /api/status/{upload_id}
Full API docs: API Documentation
Best Practices
- Upload Guidelines:
- Verify documents before uploading
- Include source information
- Use descriptive filenames
- Check for duplicates first
- Search Tips:
- Start broad, then refine
- Use filters effectively
- Check related documents
- Review source verification levels
- Contributing:
- Follow contribution guidelines
- Test before submitting
- Document your changes
- Be patient with review process
Support
- Documentation: Check this guide and other docs
- Issues: Open GitHub issue
- Questions: Use GitHub Discussions
- Urgent: Email maintainers (see README)
Last Updated: December 2024