Switch to Playwright to bypass Newegg bot detection, closes #2
Newegg now blocks requests-based scraping; replace with Playwright using headless Chromium with mouse simulation to pass bot detection. Also fix hardcoded build output path, use os.makedirs for nested dirs, update category labels (HDD/SATA SSD/NVMe SSD), drop near-empty 2.5" internal and laptop HDD categories, and fix invalid HTML in index template (h2 inside table cells). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
55bb3697b6
commit
2dc799ecaa
3 changed files with 62 additions and 42 deletions
|
|
@ -13,9 +13,9 @@ Comments welcome: edward@4angle.com
|
|||
|
||||
<p>Last updated: {{ today.strftime('%d %B %Y') }}.<p>
|
||||
|
||||
<table>
|
||||
{% for cat in best %}
|
||||
<tr><td colspan="4"><h2>{{ cat.label }}</h2></td></tr>
|
||||
<h2>{{ cat.label }}</h2>
|
||||
<table>
|
||||
<tr>
|
||||
<th align="right">Price<br>per TB</th>
|
||||
<th align="right">Price</th>
|
||||
|
|
@ -30,7 +30,7 @@ Comments welcome: edward@4angle.com
|
|||
<td><a href="https://www.newegg.com/Product/Product.aspx?Item={{ hdd.number }}">{{ hdd.title }}</a></td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
<tr><td colspan="4"><a href="{{ cat.name }}/index.html">more</a></td></tr>
|
||||
{% endfor %}
|
||||
</table>
|
||||
<p><a href="{{ cat.name }}/index.html">more</a></p>
|
||||
{% endfor %}
|
||||
{% endblock %}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue