Move Flickr sent-mail cookies into local config file

This commit is contained in:
Edward Betts 2026-02-07 14:41:41 +00:00
parent 2819652afd
commit 252a854e76
5 changed files with 127 additions and 79 deletions

View file

@ -85,16 +85,18 @@ for the Flickr mail URL. Scrapes the user's profile page for embedded params.
Shows recent Wikimedia Commons uploads on the home page, filtered to only
those obtained via Flickr mail requests.
**Data files** (in `commons_contributions/`):
- `flickr_uploads.json`: List of Commons uploads from Flickr with metadata
- `thumbnail_cache.json`: Cached Commons API thumbnail URLs (7-day TTL)
- `sent_mail_index.json`: Index of sent mail messages (flickr_url → wikipedia_url)
**Database tables used by the app**:
- `sent_messages`: downloaded from Flickr sent mail, includes extracted Flickr
URL and Wikipedia URL from message body
- `contributions`: downloaded from Commons `usercontribs`
- `flickr_uploads`: derived table built by `update_flickr_uploads.py` by
matching Commons uploads to Flickr URLs
- `thumbnail_cache`: cached Commons API thumbnail URLs (7-day TTL)
**Key functions**:
- `build_sent_mail_index()`: Parses sent mail JSON files, extracts Flickr and
Wikipedia URLs from message bodies, caches the index
- `get_recent_commons_uploads()`: Loads uploads, filters by sent mail match,
fetches thumbnails from Commons API
joins `flickr_uploads` with `sent_messages`, and fetches thumbnails from
Commons API
- `normalize_flickr_url()`: Normalizes URLs for matching (removes protocol, www, trailing slash)
**CommonsUpload dataclass**:
@ -104,9 +106,14 @@ those obtained via Flickr mail requests.
- `wiki_link_url`, `wiki_link_label`: Handles Wikidata vs Wikipedia links
**Maintenance script** (`update_flickr_uploads.py`):
Run to find Flickr uploads from UploadWizard contributions that don't have
the Flickr URL in the edit comment. Queries Commons API for image metadata
and checks the Credit field for Flickr URLs.
Builds/updates `flickr_uploads` from `contributions` and links to
`sent_messages`.
- Scans file contributions containing `UploadWizard` in the comment
- Supports both comment styles:
- `User created page with UploadWizard` (older)
- `Uploaded a work by ... with UploadWizard` (newer; often includes URL)
- Extracts Flickr URL from contribution comment when present
- Falls back to Commons `extmetadata.Credit` lookup when comment has no URL
### Category Search (`/category` route)
@ -125,7 +132,7 @@ to allow back-navigation to the category.
### Previous Message Detection (`get_previous_messages`)
Checks `sent_mail/messages_index.json` for previous messages to a Flickr user.
Checks the `sent_messages` database table for previous messages to a Flickr user.
Matches by both display name and username (case-insensitive). Results shown as
an info alert on the message page.
@ -159,6 +166,24 @@ print(f"{len(result.photos)} photos, {result.total_pages} pages")
print(result.photos[0].title, result.photos[0].license_name)
```
## Data Sync Workflow
To refresh "recent Commons uploads obtained via Flickr mail", run scripts in
this order:
1. `./download_sent_mail.py`
2. `./download_commons_contributions.py`
3. `./update_flickr_uploads.py`
Notes:
- `download_sent_mail.py` reads Flickr auth cookies from
`download_sent_mail.local.json` (`cookies_str` key). Copy
`download_sent_mail.example.json` to create local config.
- `main.py` does not populate `flickr_uploads`; it only reads from it.
- `download_commons_contributions.py` intentionally stops after several
consecutive fully-known API batches (overlap window) to avoid full-history
scans while still catching shallow gaps.
## Potential Improvements
- Cache search results to reduce Flickr requests