Display recent Wikimedia Commons uploads on the home page, filtered to only show images that were obtained by contacting creators via Flickr mail. Each upload shows: - Thumbnail linking to Commons - Creator name linking to their Flickr profile - Link to the illustrated Wikipedia article (or Wikidata item) Features: - Parse sent mail messages to extract Flickr and Wikipedia URLs - Match Commons uploads with sent mail by normalized Flickr URL - Cache Commons API thumbnail responses and sent mail index - Handle Wikidata item URLs (Q-numbers) with correct links - Add update_flickr_uploads.py script to find uploads from UploadWizard contributions by checking Commons API metadata Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
128 lines
4.8 KiB
Markdown
128 lines
4.8 KiB
Markdown
# Agent Guidelines for Flickr Mail
|
|
|
|
This document provides context for AI agents working on this codebase.
|
|
|
|
## Project Overview
|
|
|
|
Flickr Mail is a Flask web application that helps users find photos on Flickr
|
|
for Wikipedia articles and contact photographers to request Creative Commons
|
|
licensing.
|
|
|
|
## Architecture
|
|
|
|
- **main.py**: Single-file Flask application containing all routes and logic
|
|
- **templates/**: Jinja2 templates using Bootstrap 5 for styling
|
|
- `base.html`: Base template with Bootstrap CSS/JS
|
|
- `combined.html`: Main UI template for search, results, and message composition
|
|
- `message.jinja`: Template for the permission request message body
|
|
- `show_error.html`: Error display template
|
|
|
|
## Key Components
|
|
|
|
### Flickr Search (`search_flickr`, `parse_flickr_search_results`)
|
|
|
|
Searches Flickr by scraping the search results page. The page embeds JSON data
|
|
in a `modelExport` JavaScript variable which contains photo metadata.
|
|
|
|
- Uses browser-like headers (`BROWSER_HEADERS`) to avoid blocks
|
|
- Parses embedded JSON by counting braces (not regex) to handle nested structures
|
|
- Accepts optional `page` parameter for pagination (25 photos per page)
|
|
- Returns `SearchResult` dataclass containing photos and pagination metadata
|
|
|
|
### SearchResult Dataclass
|
|
|
|
Contains search results with pagination info:
|
|
- `photos`: List of `FlickrPhoto` instances
|
|
- `total_photos`: Total number of matching photos
|
|
- `current_page`: Current page number (1-indexed)
|
|
- `total_pages`: Total number of pages (capped at 160 due to Flickr's 4000 result limit)
|
|
|
|
### FlickrPhoto Dataclass
|
|
|
|
Represents a photo with:
|
|
- `id`, `title`, `path_alias`, `owner_nsid`, `username`, `realname`
|
|
- `license` (int): Flickr license code (0=ARR, 4=CC BY, 5=CC BY-SA, etc.)
|
|
- `thumb_url`, `medium_url`: Static image URLs
|
|
- `flickr_url` property: URL to photo page
|
|
- `license_name` property: Human-readable license name
|
|
|
|
### License Codes
|
|
|
|
Wikipedia-compatible licenses (can be used): 4 (CC BY), 5 (CC BY-SA), 7 (No
|
|
known copyright), 8 (US Government), 9 (CC0), 10 (Public Domain).
|
|
|
|
Not compatible: 0 (All Rights Reserved), 1-3 (NC variants), 6 (ND).
|
|
|
|
### URL Validation (`is_valid_flickr_image_url`)
|
|
|
|
Validates that image URLs passed via query params are from legitimate Flickr
|
|
static image servers:
|
|
- `live.staticflickr.com`
|
|
- `farm*.staticflickr.com`
|
|
- `c1.staticflickr.com`, `c2.staticflickr.com`
|
|
|
|
### NSID Lookup (`flickr_usrename_to_nsid`)
|
|
|
|
Converts a Flickr username/path alias to the NSID (internal user ID) needed
|
|
for the Flickr mail URL. Scrapes the user's profile page for embedded params.
|
|
|
|
### Commons Uploads Display
|
|
|
|
Shows recent Wikimedia Commons uploads on the home page, filtered to only
|
|
those obtained via Flickr mail requests.
|
|
|
|
**Data files** (in `commons_contributions/`):
|
|
- `flickr_uploads.json`: List of Commons uploads from Flickr with metadata
|
|
- `thumbnail_cache.json`: Cached Commons API thumbnail URLs (7-day TTL)
|
|
- `sent_mail_index.json`: Index of sent mail messages (flickr_url → wikipedia_url)
|
|
|
|
**Key functions**:
|
|
- `build_sent_mail_index()`: Parses sent mail JSON files, extracts Flickr and
|
|
Wikipedia URLs from message bodies, caches the index
|
|
- `get_recent_commons_uploads()`: Loads uploads, filters by sent mail match,
|
|
fetches thumbnails from Commons API
|
|
- `normalize_flickr_url()`: Normalizes URLs for matching (removes protocol, www, trailing slash)
|
|
|
|
**CommonsUpload dataclass**:
|
|
- `title`, `thumb_url`, `commons_url`, `flickr_url`, `creator`, `timestamp`
|
|
- `wikipedia_url`, `creator_profile_url`: Extracted from sent mail
|
|
- `is_wikidata_item` property: Detects Q-number URLs
|
|
- `wiki_link_url`, `wiki_link_label`: Handles Wikidata vs Wikipedia links
|
|
|
|
**Maintenance script** (`update_flickr_uploads.py`):
|
|
Run to find Flickr uploads from UploadWizard contributions that don't have
|
|
the Flickr URL in the edit comment. Queries Commons API for image metadata
|
|
and checks the Credit field for Flickr URLs.
|
|
|
|
## Request Flow
|
|
|
|
1. User enters Wikipedia article title/URL → `start()` extracts article name
|
|
2. `search_flickr()` fetches and parses Flickr search results
|
|
3. Results displayed as clickable photo grid with license badges
|
|
4. User clicks photo → page reloads with `flickr` and `img` params
|
|
5. `flickr_usrename_to_nsid()` looks up the photographer's NSID
|
|
6. Message template rendered with photo details
|
|
7. User copies message and clicks link to Flickr's mail compose page
|
|
|
|
## Testing Changes
|
|
|
|
Run the Flask app locally:
|
|
```bash
|
|
python3 main.py
|
|
```
|
|
Then visit http://localhost:5000/
|
|
|
|
Test search functionality:
|
|
```python
|
|
from main import search_flickr
|
|
result = search_flickr("Big Ben", page=1)
|
|
print(f"{len(result.photos)} photos, {result.total_pages} pages")
|
|
print(result.photos[0].title, result.photos[0].license_name)
|
|
```
|
|
|
|
## Potential Improvements
|
|
|
|
- Cache search results to reduce Flickr requests
|
|
- Add filtering by license type
|
|
- Handle Flickr rate limiting/blocks more gracefully
|
|
- Add tests for the parsing logic
|