📦 Installation Guide
This guide will help you install and set up the Advanced Python Web Scraper on your system.
🔧 System Requirements
- Python 3.8 or higher
- pip package manager
- Git (optional, for cloning)
📥 Installation Methods
Method 1: Clone from GitHub
git clone https://github.com/I-invincib1e/advanced-python-scraper.git
cd advanced-python-scraper
Method 2: Direct Download
Download the ZIP file from GitHub and extract it.
🐍 Python Dependencies
Core Dependencies (Required)
pip install aiohttp beautifulsoup4 tqdm pyyaml
Backend-Specific Dependencies (Optional)
For JavaScript Rendering (Recommended):
pip install playwright
playwright install # Install browser binaries
💡 Recommendation: Use Playwright over requests-html for JavaScript sites. Playwright is actively maintained, faster, and more reliable.
For JavaScript Rendering (Legacy):
pip install requests-html # Not recommended - unmaintained
Install All Backends:
pip install aiohttp beautifulsoup4 tqdm pyyaml playwright
playwright install
✅ Verification
Test your installation:
python -c "import aiohttp, bs4, tqdm, yaml; print('✅ Core dependencies installed')"
python advanced_scraper.py --help # Should show help information
⚙️ Configuration
Create a configuration file:
{
"rate_limiting": {
"delay": 0.2,
"enabled": true
},
"retry": {
"max_attempts": 5,
"backoff_factor": 1.0
},
"user_agent_rotation": {
"enabled": true
},
"concurrency": {
"max_concurrent": 8
}
}
🚀 Quick Test
Test the scraper with a simple example:
import asyncio
from advanced_scraper import AdvancedBookScraper
async def test():
async with AdvancedBookScraper("https://httpbin.org") as scraper:
content = await scraper.scrape_single_page("https://httpbin.org/html")
print(f"✅ Scraped {len(content)} characters")
asyncio.run(test())
🎉 Installation Complete!
Your Advanced Python Web Scraper is now ready to use.
📚 Next Steps
View Examples
API Reference
GitHub Repository