Web Monitoring
ActiveProduction web archiver with automated crawling, SHA-256 change detection, full-text search, and cryptographic verification. 26 Python modules with Docker orchestration.
Pipeline
How It Works
Sitemap parsing + internal link extraction with robots.txt respect
ArchiveBox integration with retry logic and conditional GET
SHA-256 content hashing for version comparison and diff tracking
SQLite FTS5 full-text search with faceting by site and date
Capabilities
Infrastructure
Multi-service: crawler, Prometheus, Grafana, IPFS
Metrics collection with search request counters
Timer units for production auto-restart
HMAC-SHA256 signing + Merkle tree scaffolding