eurotunnel-scrape/README.md

86 lines
3 KiB
Markdown
Raw Normal View History

2023-10-06 18:20:54 +01:00
# Eurotunnel price checker
## Overview
This is a personal tool designed to scrape and display Eurotunnel ticket prices. The tool consists of two primary Python scripts:
1. `check.py`: A script that runs headless browsing to scrape ticket data and save it as HTML files.
2. `web_view.py`: A Flask web application that parses the scraped HTML data and displays the available tickets and prices.
## Requirements
- Python 3.x
- Flask
- lxml
- Playwright Python package
- pytz
Install them via pip:
```bash
pip install flask lxml playwright pytz
```
## How to use
### Running the scraper (`check.py`)
1. Update the `outbound_date` and `return_date` variables in the script to match your desired travel dates.
2. Run the script manually or add it to your crontab for scheduled checks. The HTML files will be saved in a specified directory.
For example, to run the script every day at 6:34 AM:
```bash
34 6 * * * ~/src/2023/eurotunnel-scrape/check.py
```
### Running the web viewer (`web_view.py`)
1. Run `web_view.py` to start the Flask web server.
2. Access the web interface to view available tickets and prices.
To start the web server:
```bash
python web_view.py
```
You can then navigate to `http://localhost:5000/` to see the ticket options.
## Code structure
- `check.py` uses the playwright package to scrape the Eurotunnel website for ticket prices and saves the resulting HTML files.
- `web_view.py` reads these HTML files, extracts the relevant data using lxml, and displays it using a Flask web interface.
### Data Classes and Functions
- `Train`: Data class representing a Eurotunnel train with departure time, arrival time, and price.
- `get_filename(direction: str) -> tuple[datetime, str]`: Function to find the most recent file corresponding to a given direction ('outbound' or 'return').
- `get_tickets(filename: str) -> tuple[date, list[Train]]`: Function to parse the HTML and get a list of available trains and prices.
## Notes
- All prices are displayed for 'standard' tickets only.
- The time and price information are displayed only for trains that are within a specific time range (as defined in `web_view.py`).
## Data Storage
Scraped data is saved as HTML files in the `data` directory. The filenames include timestamps to indicate when the data was scraped. For example:
- `2023-09-29_123456_outbound.html`: Outbound data scraped on September 29, 2023, at 12:34:56.
- `2023-10-06_234567_return.html`: Return data scraped on October 6, 2023, at 23:45:67.
## Author
This tool was created by Edward Betts.
## Support and Contributions
This tool is provided as-is and may require maintenance or updates as Eurotunnel's website changes. If you encounter issues or have suggestions for improvements, feel free to open an issue or submit a pull request on the GitHub repository.
## License
This tool is released under the [MIT License](LICENSE).
## Disclaimer
This tool is not affiliated with Eurotunnel and is meant for personal use only. Always refer to the official Eurotunnel website for the most accurate and up-to-date information.