Switch to Playwright to bypass Newegg bot detection, closes #2

Newegg now blocks requests-based scraping; replace with Playwright
using headless Chromium with mouse simulation to pass bot detection.
Also fix hardcoded build output path, use os.makedirs for nested dirs,
update category labels (HDD/SATA SSD/NVMe SSD), drop near-empty 2.5"
internal and laptop HDD categories, and fix invalid HTML in index
template (h2 inside table cells).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Edward Betts 2026-04-03 15:06:49 +01:00
parent 55bb3697b6
commit 2dc799ecaa
3 changed files with 62 additions and 42 deletions

View file

@ -13,9 +13,9 @@ Comments welcome: edward@4angle.com
<p>Last updated: {{ today.strftime('%d %B %Y') }}.<p>
<table>
{% for cat in best %}
<tr><td colspan="4"><h2>{{ cat.label }}</h2></td></tr>
<h2>{{ cat.label }}</h2>
<table>
<tr>
<th align="right">Price<br>per TB</th>
<th align="right">Price</th>
@ -30,7 +30,7 @@ Comments welcome: edward@4angle.com
<td><a href="https://www.newegg.com/Product/Product.aspx?Item={{ hdd.number }}">{{ hdd.title }}</a></td>
</tr>
{% endfor %}
<tr><td colspan="4"><a href="{{ cat.name }}/index.html">more</a></td></tr>
{% endfor %}
</table>
<p><a href="{{ cat.name }}/index.html">more</a></p>
{% endfor %}
{% endblock %}