This guide covers the implementation of safe, legal, and ethical source expansion for the Epstein Files Hub, including Wikipedia integration for comprehensive data on dates, times, locations, and characters.
Script: scripts/fetch-wikipedia-data.py
What it fetches:
Wikipedia articles monitored:
Outputs generated:
data/wikipedia/character_profiles.json - Comprehensive profilesdata/wikipedia/timeline.json - Chronological eventsdata/wikipedia/locations_guide.json - Location informationSchedule: Weekly (Sundays at 3 AM UTC) Cost: $0 (uses free Wikipedia API)
Script: scripts/safe-source-expander.py
Sources monitored:
How it works:
Schedule: Daily at 2 AM UTC Cost: $0 (all free public APIs)
# Install dependencies
pip install -r requirements.txt
# Run Wikipedia integration
python scripts/fetch-wikipedia-data.py
Output:
π₯ Fetching: Jeffrey_Epstein
β
Saved: Jeffrey_Epstein.json
π 15234 words, 87 dates, 12 locations, 45 persons
π Generating aggregated data...
β
Generated 15 character profiles
β
Generated timeline with 234 events
β
Generated location guide with 18 locations
β
Wikipedia integration complete!
# Run discovery across all sources
python scripts/safe-source-expander.py
Output:
π Checking Internet Archive...
β
Found 12 items
π Checking DocumentCloud...
β
Found 8 documents
π Checking Wikimedia Commons...
β
Found 5 media files
π Discovery complete!
β
Found 35 new items across all sources
πΎ Saved discoveries to: data/discovered_sources/discoveries_20240120_140530.json
π Generated report: data/discovered_sources/discovery_report_20240120_140530.md
Both scripts run automatically via GitHub Actions:
.github/workflows/wikipedia-integration.yml.github/workflows/source-discovery.ymlWhen new sources are discovered:
source-discovery, needs-reviewapprove: [item title or URL]reject: [item title] - [reason]Dates extracted:
Locations extracted:
Persons extracted:
Each profile includes:
{
"name": "Person Name",
"source": "Wikipedia",
"url": "https://en.wikipedia.org/wiki/...",
"summary": "Brief description...",
"associated_dates": ["1990", "2005", "2019"],
"associated_locations": ["Palm Beach", "Manhattan"],
"associated_persons": ["Related Person 1", "Related Person 2"],
"last_updated": "2024-01-20T14:05:30"
}
{
"date": "2019-07-06",
"source": "Jeffrey_Epstein",
"url": "https://en.wikipedia.org/wiki/Jeffrey_Epstein",
"context": "Arrest at Teterboro Airport"
}
{
"name": "Little Saint James",
"mentions": 45,
"sources": [
{
"title": "Little_Saint_James,_U.S._Virgin_Islands",
"url": "https://en.wikipedia.org/wiki/..."
}
],
"associated_persons": ["Jeffrey Epstein", "Ghislaine Maxwell"],
"dates": ["1998", "2001", "2019"]
}
All Wikipedia and discovered data automatically integrates with the search index:
# After fetching data, update search
python scripts/generate-search-index.py
Search will now include:
Wikipedia data:
Discovered sources:
Git LFS:
# Check network connection
ping en.wikipedia.org
# Verify API access
curl "https://en.wikipedia.org/w/api.php?action=query&format=json&titles=Jeffrey_Epstein"
# Check rate limiting
# Wait 60 seconds and try again
# Check specific source
python scripts/safe-source-expander.py
# Review error messages
# Most common: API rate limits or network issues
# Manually regenerate
python scripts/generate-search-index.py
# Check data directory
ls -la data/wikipedia/
ls -la data/discovered_sources/
Wikipedia integration:
Source discovery:
Edit scripts/fetch-wikipedia-data.py:
WIKIPEDIA_ARTICLES = {
'main': [
'Jeffrey_Epstein',
'Your_New_Article', # Add here
],
# ...
}
Edit scripts/safe-source-expander.py:
SAFE_SOURCES = {
'your_source': {
'name': 'Source Name',
'api_url': 'https://api.example.com',
'params': {...},
'enabled': True
}
}
'archive_org': {
# ...
'enabled': False # Disable source
}
| Component | Monthly Cost | Notes |
|---|---|---|
| Wikipedia API | $0 | Free, unlimited |
| Archive.org API | $0 | Free tier |
| DocumentCloud | $0 | Public API |
| Wikimedia Commons | $0 | Free |
| RSS feeds | $0 | Public feeds |
| GitHub Actions | $0 | 2,000 min/month free |
| Storage | $0 | < 1GB |
| TOTAL | $0 | Fully free |
β Wikipedia integration - Comprehensive data on dates, times, locations, characters β Safe source expansion - 5 official sources monitored daily β Fully automated - GitHub Actions workflows β Human oversight - Approval required for downloads β 100% free - $0/month cost β Legal & ethical - Respects all ToS and privacy β Production ready - Tested and documented
Total setup time: 10-15 minutes Monthly cost: $0 Data quality: High (official sources only)