Update README and AGENTS with category search and license features

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Edward Betts 2026-02-07 10:26:56 +00:00
parent c5efd429ce
commit ac1b01ea68
2 changed files with 72 additions and 21 deletions

View file

@ -14,7 +14,9 @@ licensing.
- **templates/**: Jinja2 templates using Bootstrap 5 for styling
- `base.html`: Base template with Bootstrap CSS/JS
- `combined.html`: Main UI template for search, results, and message composition
- `message.jinja`: Template for the permission request message body
- `message.jinja`: Template for the permission request message body (with
alternate text for non-free CC licenses)
- `category.html`: Category search page with visited link styling
- `show_error.html`: Error display template
## Key Components
@ -48,10 +50,22 @@ Represents a photo with:
### License Codes
Wikipedia-compatible licenses (can be used): 4 (CC BY), 5 (CC BY-SA), 7 (No
known copyright), 8 (US Government), 9 (CC0), 10 (Public Domain).
Flickr uses numeric codes for licenses. Codes 1-6 are CC 2.0, codes 11-16 are
CC 4.0 equivalents.
Not compatible: 0 (All Rights Reserved), 1-3 (NC variants), 6 (ND).
Wikipedia-compatible (`FREE_LICENSES`): 4 (CC BY 2.0), 5 (CC BY-SA 2.0),
7 (No known copyright), 8 (US Government), 9 (CC0), 10 (Public Domain),
14 (CC BY 4.0), 15 (CC BY-SA 4.0).
Non-free CC (`NONFREE_CC_LICENSES`): 1 (CC BY-NC-SA 2.0), 2 (CC BY-NC 2.0),
3 (CC BY-NC-ND 2.0), 6 (CC BY-ND 2.0), 11-13 (4.0 NC variants),
16 (CC BY-ND 4.0).
Not compatible: 0 (All Rights Reserved).
For free licenses, the message page shows an UploadWizard link instead of a
message. For non-free CC licenses, a tailored message explains which
restrictions (NC/ND) prevent Wikipedia use.
### URL Validation (`is_valid_flickr_image_url`)
@ -94,15 +108,40 @@ Run to find Flickr uploads from UploadWizard contributions that don't have
the Flickr URL in the edit comment. Queries Commons API for image metadata
and checks the Credit field for Flickr URLs.
### Category Search (`/category` route)
Finds Wikipedia articles in a category that don't have images.
**Key functions**:
- `parse_category_input()`: Accepts category name, `Category:` prefix, or full
Wikipedia URL
- `get_articles_without_images()`: Uses MediaWiki API with
`generator=categorymembers` and `prop=images` for efficient batch queries
- `has_content_image()`: Filters out non-content images (UI icons, logos) using
`NON_CONTENT_IMAGE_PATTERNS`
The `cat` URL parameter is preserved through search results and message pages
to allow back-navigation to the category.
### Previous Message Detection (`get_previous_messages`)
Checks `sent_mail/messages_index.json` for previous messages to a Flickr user.
Matches by both display name and username (case-insensitive). Results shown as
an info alert on the message page.
## Request Flow
1. User enters Wikipedia article title/URL → `start()` extracts article name
2. `search_flickr()` fetches and parses Flickr search results
3. Results displayed as clickable photo grid with license badges
4. User clicks photo → page reloads with `flickr` and `img` params
5. `flickr_usrename_to_nsid()` looks up the photographer's NSID
6. Message template rendered with photo details
7. User copies message and clicks link to Flickr's mail compose page
1. User enters Wikipedia article title/URL → `start()` extracts article name.
Alternatively, user searches by category via `/category` route.
2. `search_flickr()` fetches and parses Flickr search results.
Disambiguation suffixes like "(academic)" are removed for the search.
3. Results displayed as clickable photo grid with license badges.
4. User clicks photo → page reloads with `flickr`, `img`, `license`, and
`flickr_user` params.
5. If license is Wikipedia-compatible: show UploadWizard link.
6. Otherwise: `flickr_usrename_to_nsid()` looks up the user's NSID, previous
messages are checked, and the appropriate message template is rendered.
7. User copies message and clicks link to Flickr's mail compose page.
## Testing Changes
@ -123,6 +162,8 @@ print(result.photos[0].title, result.photos[0].license_name)
## Potential Improvements
- Cache search results to reduce Flickr requests
- Add filtering by license type
- Add filtering by license type in search results
- Handle Flickr rate limiting/blocks more gracefully
- Add tests for the parsing logic
- Add pagination for category search (continue token is already returned)
- Confirm CC 4.0 license codes 11-15 (only 16 confirmed so far)