86 lines
3 KiB
Markdown
86 lines
3 KiB
Markdown
|
# Eurotunnel price checker
|
||
|
|
||
|
## Overview
|
||
|
|
||
|
This is a personal tool designed to scrape and display Eurotunnel ticket prices. The tool consists of two primary Python scripts:
|
||
|
|
||
|
1. `check.py`: A script that runs headless browsing to scrape ticket data and save it as HTML files.
|
||
|
2. `web_view.py`: A Flask web application that parses the scraped HTML data and displays the available tickets and prices.
|
||
|
|
||
|
## Requirements
|
||
|
|
||
|
- Python 3.x
|
||
|
- Flask
|
||
|
- lxml
|
||
|
- Playwright Python package
|
||
|
- pytz
|
||
|
|
||
|
Install them via pip:
|
||
|
|
||
|
```bash
|
||
|
pip install flask lxml playwright pytz
|
||
|
```
|
||
|
|
||
|
## How to use
|
||
|
|
||
|
### Running the scraper (`check.py`)
|
||
|
|
||
|
1. Update the `outbound_date` and `return_date` variables in the script to match your desired travel dates.
|
||
|
2. Run the script manually or add it to your crontab for scheduled checks. The HTML files will be saved in a specified directory.
|
||
|
|
||
|
For example, to run the script every day at 6:34 AM:
|
||
|
|
||
|
```bash
|
||
|
34 6 * * * ~/src/2023/eurotunnel-scrape/check.py
|
||
|
```
|
||
|
|
||
|
### Running the web viewer (`web_view.py`)
|
||
|
|
||
|
1. Run `web_view.py` to start the Flask web server.
|
||
|
2. Access the web interface to view available tickets and prices.
|
||
|
|
||
|
To start the web server:
|
||
|
|
||
|
```bash
|
||
|
python web_view.py
|
||
|
```
|
||
|
|
||
|
You can then navigate to `http://localhost:5000/` to see the ticket options.
|
||
|
|
||
|
## Code structure
|
||
|
|
||
|
- `check.py` uses the playwright package to scrape the Eurotunnel website for ticket prices and saves the resulting HTML files.
|
||
|
- `web_view.py` reads these HTML files, extracts the relevant data using lxml, and displays it using a Flask web interface.
|
||
|
|
||
|
### Data Classes and Functions
|
||
|
|
||
|
- `Train`: Data class representing a Eurotunnel train with departure time, arrival time, and price.
|
||
|
- `get_filename(direction: str) -> tuple[datetime, str]`: Function to find the most recent file corresponding to a given direction ('outbound' or 'return').
|
||
|
- `get_tickets(filename: str) -> tuple[date, list[Train]]`: Function to parse the HTML and get a list of available trains and prices.
|
||
|
|
||
|
## Notes
|
||
|
|
||
|
- All prices are displayed for 'standard' tickets only.
|
||
|
- The time and price information are displayed only for trains that are within a specific time range (as defined in `web_view.py`).
|
||
|
|
||
|
## Data Storage
|
||
|
Scraped data is saved as HTML files in the `data` directory. The filenames include timestamps to indicate when the data was scraped. For example:
|
||
|
|
||
|
- `2023-09-29_123456_outbound.html`: Outbound data scraped on September 29, 2023, at 12:34:56.
|
||
|
- `2023-10-06_234567_return.html`: Return data scraped on October 6, 2023, at 23:45:67.
|
||
|
|
||
|
## Author
|
||
|
|
||
|
This tool was created by Edward Betts.
|
||
|
|
||
|
## Support and Contributions
|
||
|
This tool is provided as-is and may require maintenance or updates as Eurotunnel's website changes. If you encounter issues or have suggestions for improvements, feel free to open an issue or submit a pull request on the GitHub repository.
|
||
|
|
||
|
## License
|
||
|
|
||
|
This tool is released under the [MIT License](LICENSE).
|
||
|
|
||
|
## Disclaimer
|
||
|
|
||
|
This tool is not affiliated with Eurotunnel and is meant for personal use only. Always refer to the official Eurotunnel website for the most accurate and up-to-date information.
|