From f5be17e979a8d897879ca8fb4f44246c82fd4fc8 Mon Sep 17 00:00:00 2001 From: Edward Betts Date: Fri, 6 Oct 2023 18:20:54 +0100 Subject: [PATCH] Add README.md and LICENSE --- LICENSE | 21 ++++++++++++++ README.md | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 106 insertions(+) create mode 100644 LICENSE create mode 100644 README.md diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..379270c --- /dev/null +++ b/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2023 Edward Betts + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/README.md b/README.md new file mode 100644 index 0000000..cdb7835 --- /dev/null +++ b/README.md @@ -0,0 +1,85 @@ +# Eurotunnel price checker + +## Overview + +This is a personal tool designed to scrape and display Eurotunnel ticket prices. The tool consists of two primary Python scripts: + +1. `check.py`: A script that runs headless browsing to scrape ticket data and save it as HTML files. +2. `web_view.py`: A Flask web application that parses the scraped HTML data and displays the available tickets and prices. + +## Requirements + +- Python 3.x +- Flask +- lxml +- Playwright Python package +- pytz + +Install them via pip: + +```bash +pip install flask lxml playwright pytz +``` + +## How to use + +### Running the scraper (`check.py`) + +1. Update the `outbound_date` and `return_date` variables in the script to match your desired travel dates. +2. Run the script manually or add it to your crontab for scheduled checks. The HTML files will be saved in a specified directory. + +For example, to run the script every day at 6:34 AM: + +```bash +34 6 * * * ~/src/2023/eurotunnel-scrape/check.py +``` + +### Running the web viewer (`web_view.py`) + +1. Run `web_view.py` to start the Flask web server. +2. Access the web interface to view available tickets and prices. + +To start the web server: + +```bash +python web_view.py +``` + +You can then navigate to `http://localhost:5000/` to see the ticket options. + +## Code structure + +- `check.py` uses the playwright package to scrape the Eurotunnel website for ticket prices and saves the resulting HTML files. +- `web_view.py` reads these HTML files, extracts the relevant data using lxml, and displays it using a Flask web interface. + +### Data Classes and Functions + +- `Train`: Data class representing a Eurotunnel train with departure time, arrival time, and price. +- `get_filename(direction: str) -> tuple[datetime, str]`: Function to find the most recent file corresponding to a given direction ('outbound' or 'return'). +- `get_tickets(filename: str) -> tuple[date, list[Train]]`: Function to parse the HTML and get a list of available trains and prices. + +## Notes + +- All prices are displayed for 'standard' tickets only. +- The time and price information are displayed only for trains that are within a specific time range (as defined in `web_view.py`). + +## Data Storage +Scraped data is saved as HTML files in the `data` directory. The filenames include timestamps to indicate when the data was scraped. For example: + +- `2023-09-29_123456_outbound.html`: Outbound data scraped on September 29, 2023, at 12:34:56. +- `2023-10-06_234567_return.html`: Return data scraped on October 6, 2023, at 23:45:67. + +## Author + +This tool was created by Edward Betts. + +## Support and Contributions +This tool is provided as-is and may require maintenance or updates as Eurotunnel's website changes. If you encounter issues or have suggestions for improvements, feel free to open an issue or submit a pull request on the GitHub repository. + +## License + +This tool is released under the [MIT License](LICENSE). + +## Disclaimer + +This tool is not affiliated with Eurotunnel and is meant for personal use only. Always refer to the official Eurotunnel website for the most accurate and up-to-date information.