flickr-mail/AGENTS.md
Edward Betts a2d29d7937 Show recent Commons uploads obtained via Flickr mail
Display recent Wikimedia Commons uploads on the home page, filtered to
only show images that were obtained by contacting creators via Flickr
mail. Each upload shows:
- Thumbnail linking to Commons
- Creator name linking to their Flickr profile
- Link to the illustrated Wikipedia article (or Wikidata item)

Features:
- Parse sent mail messages to extract Flickr and Wikipedia URLs
- Match Commons uploads with sent mail by normalized Flickr URL
- Cache Commons API thumbnail responses and sent mail index
- Handle Wikidata item URLs (Q-numbers) with correct links
- Add update_flickr_uploads.py script to find uploads from UploadWizard
  contributions by checking Commons API metadata

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 10:43:45 +00:00

128 lines
4.8 KiB
Markdown

# Agent Guidelines for Flickr Mail
This document provides context for AI agents working on this codebase.
## Project Overview
Flickr Mail is a Flask web application that helps users find photos on Flickr
for Wikipedia articles and contact photographers to request Creative Commons
licensing.
## Architecture
- **main.py**: Single-file Flask application containing all routes and logic
- **templates/**: Jinja2 templates using Bootstrap 5 for styling
- `base.html`: Base template with Bootstrap CSS/JS
- `combined.html`: Main UI template for search, results, and message composition
- `message.jinja`: Template for the permission request message body
- `show_error.html`: Error display template
## Key Components
### Flickr Search (`search_flickr`, `parse_flickr_search_results`)
Searches Flickr by scraping the search results page. The page embeds JSON data
in a `modelExport` JavaScript variable which contains photo metadata.
- Uses browser-like headers (`BROWSER_HEADERS`) to avoid blocks
- Parses embedded JSON by counting braces (not regex) to handle nested structures
- Accepts optional `page` parameter for pagination (25 photos per page)
- Returns `SearchResult` dataclass containing photos and pagination metadata
### SearchResult Dataclass
Contains search results with pagination info:
- `photos`: List of `FlickrPhoto` instances
- `total_photos`: Total number of matching photos
- `current_page`: Current page number (1-indexed)
- `total_pages`: Total number of pages (capped at 160 due to Flickr's 4000 result limit)
### FlickrPhoto Dataclass
Represents a photo with:
- `id`, `title`, `path_alias`, `owner_nsid`, `username`, `realname`
- `license` (int): Flickr license code (0=ARR, 4=CC BY, 5=CC BY-SA, etc.)
- `thumb_url`, `medium_url`: Static image URLs
- `flickr_url` property: URL to photo page
- `license_name` property: Human-readable license name
### License Codes
Wikipedia-compatible licenses (can be used): 4 (CC BY), 5 (CC BY-SA), 7 (No
known copyright), 8 (US Government), 9 (CC0), 10 (Public Domain).
Not compatible: 0 (All Rights Reserved), 1-3 (NC variants), 6 (ND).
### URL Validation (`is_valid_flickr_image_url`)
Validates that image URLs passed via query params are from legitimate Flickr
static image servers:
- `live.staticflickr.com`
- `farm*.staticflickr.com`
- `c1.staticflickr.com`, `c2.staticflickr.com`
### NSID Lookup (`flickr_usrename_to_nsid`)
Converts a Flickr username/path alias to the NSID (internal user ID) needed
for the Flickr mail URL. Scrapes the user's profile page for embedded params.
### Commons Uploads Display
Shows recent Wikimedia Commons uploads on the home page, filtered to only
those obtained via Flickr mail requests.
**Data files** (in `commons_contributions/`):
- `flickr_uploads.json`: List of Commons uploads from Flickr with metadata
- `thumbnail_cache.json`: Cached Commons API thumbnail URLs (7-day TTL)
- `sent_mail_index.json`: Index of sent mail messages (flickr_url → wikipedia_url)
**Key functions**:
- `build_sent_mail_index()`: Parses sent mail JSON files, extracts Flickr and
Wikipedia URLs from message bodies, caches the index
- `get_recent_commons_uploads()`: Loads uploads, filters by sent mail match,
fetches thumbnails from Commons API
- `normalize_flickr_url()`: Normalizes URLs for matching (removes protocol, www, trailing slash)
**CommonsUpload dataclass**:
- `title`, `thumb_url`, `commons_url`, `flickr_url`, `creator`, `timestamp`
- `wikipedia_url`, `creator_profile_url`: Extracted from sent mail
- `is_wikidata_item` property: Detects Q-number URLs
- `wiki_link_url`, `wiki_link_label`: Handles Wikidata vs Wikipedia links
**Maintenance script** (`update_flickr_uploads.py`):
Run to find Flickr uploads from UploadWizard contributions that don't have
the Flickr URL in the edit comment. Queries Commons API for image metadata
and checks the Credit field for Flickr URLs.
## Request Flow
1. User enters Wikipedia article title/URL → `start()` extracts article name
2. `search_flickr()` fetches and parses Flickr search results
3. Results displayed as clickable photo grid with license badges
4. User clicks photo → page reloads with `flickr` and `img` params
5. `flickr_usrename_to_nsid()` looks up the photographer's NSID
6. Message template rendered with photo details
7. User copies message and clicks link to Flickr's mail compose page
## Testing Changes
Run the Flask app locally:
```bash
python3 main.py
```
Then visit http://localhost:5000/
Test search functionality:
```python
from main import search_flickr
result = search_flickr("Big Ben", page=1)
print(f"{len(result.photos)} photos, {result.total_pages} pages")
print(result.photos[0].title, result.photos[0].license_name)
```
## Potential Improvements
- Cache search results to reduce Flickr requests
- Add filtering by license type
- Handle Flickr rate limiting/blocks more gracefully
- Add tests for the parsing logic