Move Flickr sent-mail cookies into local config file
This commit is contained in:
parent
2819652afd
commit
252a854e76
5 changed files with 127 additions and 79 deletions
47
AGENTS.md
47
AGENTS.md
|
|
@ -85,16 +85,18 @@ for the Flickr mail URL. Scrapes the user's profile page for embedded params.
|
|||
Shows recent Wikimedia Commons uploads on the home page, filtered to only
|
||||
those obtained via Flickr mail requests.
|
||||
|
||||
**Data files** (in `commons_contributions/`):
|
||||
- `flickr_uploads.json`: List of Commons uploads from Flickr with metadata
|
||||
- `thumbnail_cache.json`: Cached Commons API thumbnail URLs (7-day TTL)
|
||||
- `sent_mail_index.json`: Index of sent mail messages (flickr_url → wikipedia_url)
|
||||
**Database tables used by the app**:
|
||||
- `sent_messages`: downloaded from Flickr sent mail, includes extracted Flickr
|
||||
URL and Wikipedia URL from message body
|
||||
- `contributions`: downloaded from Commons `usercontribs`
|
||||
- `flickr_uploads`: derived table built by `update_flickr_uploads.py` by
|
||||
matching Commons uploads to Flickr URLs
|
||||
- `thumbnail_cache`: cached Commons API thumbnail URLs (7-day TTL)
|
||||
|
||||
**Key functions**:
|
||||
- `build_sent_mail_index()`: Parses sent mail JSON files, extracts Flickr and
|
||||
Wikipedia URLs from message bodies, caches the index
|
||||
- `get_recent_commons_uploads()`: Loads uploads, filters by sent mail match,
|
||||
fetches thumbnails from Commons API
|
||||
joins `flickr_uploads` with `sent_messages`, and fetches thumbnails from
|
||||
Commons API
|
||||
- `normalize_flickr_url()`: Normalizes URLs for matching (removes protocol, www, trailing slash)
|
||||
|
||||
**CommonsUpload dataclass**:
|
||||
|
|
@ -104,9 +106,14 @@ those obtained via Flickr mail requests.
|
|||
- `wiki_link_url`, `wiki_link_label`: Handles Wikidata vs Wikipedia links
|
||||
|
||||
**Maintenance script** (`update_flickr_uploads.py`):
|
||||
Run to find Flickr uploads from UploadWizard contributions that don't have
|
||||
the Flickr URL in the edit comment. Queries Commons API for image metadata
|
||||
and checks the Credit field for Flickr URLs.
|
||||
Builds/updates `flickr_uploads` from `contributions` and links to
|
||||
`sent_messages`.
|
||||
- Scans file contributions containing `UploadWizard` in the comment
|
||||
- Supports both comment styles:
|
||||
- `User created page with UploadWizard` (older)
|
||||
- `Uploaded a work by ... with UploadWizard` (newer; often includes URL)
|
||||
- Extracts Flickr URL from contribution comment when present
|
||||
- Falls back to Commons `extmetadata.Credit` lookup when comment has no URL
|
||||
|
||||
### Category Search (`/category` route)
|
||||
|
||||
|
|
@ -125,7 +132,7 @@ to allow back-navigation to the category.
|
|||
|
||||
### Previous Message Detection (`get_previous_messages`)
|
||||
|
||||
Checks `sent_mail/messages_index.json` for previous messages to a Flickr user.
|
||||
Checks the `sent_messages` database table for previous messages to a Flickr user.
|
||||
Matches by both display name and username (case-insensitive). Results shown as
|
||||
an info alert on the message page.
|
||||
|
||||
|
|
@ -159,6 +166,24 @@ print(f"{len(result.photos)} photos, {result.total_pages} pages")
|
|||
print(result.photos[0].title, result.photos[0].license_name)
|
||||
```
|
||||
|
||||
## Data Sync Workflow
|
||||
|
||||
To refresh "recent Commons uploads obtained via Flickr mail", run scripts in
|
||||
this order:
|
||||
|
||||
1. `./download_sent_mail.py`
|
||||
2. `./download_commons_contributions.py`
|
||||
3. `./update_flickr_uploads.py`
|
||||
|
||||
Notes:
|
||||
- `download_sent_mail.py` reads Flickr auth cookies from
|
||||
`download_sent_mail.local.json` (`cookies_str` key). Copy
|
||||
`download_sent_mail.example.json` to create local config.
|
||||
- `main.py` does not populate `flickr_uploads`; it only reads from it.
|
||||
- `download_commons_contributions.py` intentionally stops after several
|
||||
consecutive fully-known API batches (overlap window) to avoid full-history
|
||||
scans while still catching shallow gaps.
|
||||
|
||||
## Potential Improvements
|
||||
|
||||
- Cache search results to reduce Flickr requests
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue