📦 Installation Guide

This guide will help you install and set up the Advanced Python Web Scraper on your system.

🔧 System Requirements

Python 3.8 or higher
pip package manager
Git (optional, for cloning)

📥 Installation Methods

Method 1: Clone from GitHub

git clone https://github.com/I-invincib1e/advanced-python-scraper.git
cd advanced-python-scraper
        

Method 2: Direct Download

Download the ZIP file from GitHub and extract it.

🐍 Python Dependencies

Core Dependencies (Required)

pip install aiohttp beautifulsoup4 tqdm pyyaml

Backend-Specific Dependencies (Optional)

For JavaScript Rendering (Recommended):

pip install playwright
playwright install  # Install browser binaries

💡 Recommendation: Use Playwright over requests-html for JavaScript sites. Playwright is actively maintained, faster, and more reliable.

For JavaScript Rendering (Legacy):

pip install requests-html # Not recommended - unmaintained

Install All Backends:

pip install aiohttp beautifulsoup4 tqdm pyyaml playwright
playwright install

✅ Verification

Test your installation:

python -c "import aiohttp, bs4, tqdm, yaml; print('✅ Core dependencies installed')"
        

python advanced_scraper.py --help # Should show help information

⚙️ Configuration

Create a configuration file:

{
  "rate_limiting": {
    "delay": 0.2,
    "enabled": true
  },
  "retry": {
    "max_attempts": 5,
    "backoff_factor": 1.0
  },
  "user_agent_rotation": {
    "enabled": true
  },
  "concurrency": {
    "max_concurrent": 8
  }
}
        

🚀 Quick Test

Test the scraper with a simple example:

import asyncio
from advanced_scraper import AdvancedBookScraper

async def test():
    async with AdvancedBookScraper("https://httpbin.org") as scraper:
        content = await scraper.scrape_single_page("https://httpbin.org/html")
        print(f"✅ Scraped {len(content)} characters")

asyncio.run(test())
        

🎉 Installation Complete!
Your Advanced Python Web Scraper is now ready to use.

📚 Next Steps

View Examples API Reference GitHub Repository