Move Flickr sent-mail cookies into local config file
This commit is contained in:
parent
2819652afd
commit
252a854e76
5 changed files with 127 additions and 79 deletions
1
.gitignore
vendored
1
.gitignore
vendored
|
|
@ -3,3 +3,4 @@ __pycache__
|
||||||
commons_contributions/thumbnail_cache.json
|
commons_contributions/thumbnail_cache.json
|
||||||
commons_contributions/sent_mail_index.json
|
commons_contributions/sent_mail_index.json
|
||||||
flickr_mail.db
|
flickr_mail.db
|
||||||
|
download_sent_mail.local.json
|
||||||
|
|
|
||||||
47
AGENTS.md
47
AGENTS.md
|
|
@ -85,16 +85,18 @@ for the Flickr mail URL. Scrapes the user's profile page for embedded params.
|
||||||
Shows recent Wikimedia Commons uploads on the home page, filtered to only
|
Shows recent Wikimedia Commons uploads on the home page, filtered to only
|
||||||
those obtained via Flickr mail requests.
|
those obtained via Flickr mail requests.
|
||||||
|
|
||||||
**Data files** (in `commons_contributions/`):
|
**Database tables used by the app**:
|
||||||
- `flickr_uploads.json`: List of Commons uploads from Flickr with metadata
|
- `sent_messages`: downloaded from Flickr sent mail, includes extracted Flickr
|
||||||
- `thumbnail_cache.json`: Cached Commons API thumbnail URLs (7-day TTL)
|
URL and Wikipedia URL from message body
|
||||||
- `sent_mail_index.json`: Index of sent mail messages (flickr_url → wikipedia_url)
|
- `contributions`: downloaded from Commons `usercontribs`
|
||||||
|
- `flickr_uploads`: derived table built by `update_flickr_uploads.py` by
|
||||||
|
matching Commons uploads to Flickr URLs
|
||||||
|
- `thumbnail_cache`: cached Commons API thumbnail URLs (7-day TTL)
|
||||||
|
|
||||||
**Key functions**:
|
**Key functions**:
|
||||||
- `build_sent_mail_index()`: Parses sent mail JSON files, extracts Flickr and
|
|
||||||
Wikipedia URLs from message bodies, caches the index
|
|
||||||
- `get_recent_commons_uploads()`: Loads uploads, filters by sent mail match,
|
- `get_recent_commons_uploads()`: Loads uploads, filters by sent mail match,
|
||||||
fetches thumbnails from Commons API
|
joins `flickr_uploads` with `sent_messages`, and fetches thumbnails from
|
||||||
|
Commons API
|
||||||
- `normalize_flickr_url()`: Normalizes URLs for matching (removes protocol, www, trailing slash)
|
- `normalize_flickr_url()`: Normalizes URLs for matching (removes protocol, www, trailing slash)
|
||||||
|
|
||||||
**CommonsUpload dataclass**:
|
**CommonsUpload dataclass**:
|
||||||
|
|
@ -104,9 +106,14 @@ those obtained via Flickr mail requests.
|
||||||
- `wiki_link_url`, `wiki_link_label`: Handles Wikidata vs Wikipedia links
|
- `wiki_link_url`, `wiki_link_label`: Handles Wikidata vs Wikipedia links
|
||||||
|
|
||||||
**Maintenance script** (`update_flickr_uploads.py`):
|
**Maintenance script** (`update_flickr_uploads.py`):
|
||||||
Run to find Flickr uploads from UploadWizard contributions that don't have
|
Builds/updates `flickr_uploads` from `contributions` and links to
|
||||||
the Flickr URL in the edit comment. Queries Commons API for image metadata
|
`sent_messages`.
|
||||||
and checks the Credit field for Flickr URLs.
|
- Scans file contributions containing `UploadWizard` in the comment
|
||||||
|
- Supports both comment styles:
|
||||||
|
- `User created page with UploadWizard` (older)
|
||||||
|
- `Uploaded a work by ... with UploadWizard` (newer; often includes URL)
|
||||||
|
- Extracts Flickr URL from contribution comment when present
|
||||||
|
- Falls back to Commons `extmetadata.Credit` lookup when comment has no URL
|
||||||
|
|
||||||
### Category Search (`/category` route)
|
### Category Search (`/category` route)
|
||||||
|
|
||||||
|
|
@ -125,7 +132,7 @@ to allow back-navigation to the category.
|
||||||
|
|
||||||
### Previous Message Detection (`get_previous_messages`)
|
### Previous Message Detection (`get_previous_messages`)
|
||||||
|
|
||||||
Checks `sent_mail/messages_index.json` for previous messages to a Flickr user.
|
Checks the `sent_messages` database table for previous messages to a Flickr user.
|
||||||
Matches by both display name and username (case-insensitive). Results shown as
|
Matches by both display name and username (case-insensitive). Results shown as
|
||||||
an info alert on the message page.
|
an info alert on the message page.
|
||||||
|
|
||||||
|
|
@ -159,6 +166,24 @@ print(f"{len(result.photos)} photos, {result.total_pages} pages")
|
||||||
print(result.photos[0].title, result.photos[0].license_name)
|
print(result.photos[0].title, result.photos[0].license_name)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Data Sync Workflow
|
||||||
|
|
||||||
|
To refresh "recent Commons uploads obtained via Flickr mail", run scripts in
|
||||||
|
this order:
|
||||||
|
|
||||||
|
1. `./download_sent_mail.py`
|
||||||
|
2. `./download_commons_contributions.py`
|
||||||
|
3. `./update_flickr_uploads.py`
|
||||||
|
|
||||||
|
Notes:
|
||||||
|
- `download_sent_mail.py` reads Flickr auth cookies from
|
||||||
|
`download_sent_mail.local.json` (`cookies_str` key). Copy
|
||||||
|
`download_sent_mail.example.json` to create local config.
|
||||||
|
- `main.py` does not populate `flickr_uploads`; it only reads from it.
|
||||||
|
- `download_commons_contributions.py` intentionally stops after several
|
||||||
|
consecutive fully-known API batches (overlap window) to avoid full-history
|
||||||
|
scans while still catching shallow gaps.
|
||||||
|
|
||||||
## Potential Improvements
|
## Potential Improvements
|
||||||
|
|
||||||
- Cache search results to reduce Flickr requests
|
- Cache search results to reduce Flickr requests
|
||||||
|
|
|
||||||
131
README.md
131
README.md
|
|
@ -1,89 +1,88 @@
|
||||||
# Flickr Photo Finder for Wikipedia Articles
|
# Flickr Mail
|
||||||
|
|
||||||
Tool lives here: <https://edwardbetts.com/flickr_mail/>
|
Tool lives here: <https://edwardbetts.com/flickr_mail/>
|
||||||
|
|
||||||
This tool is designed to help you find photos on Flickr for Wikipedia articles
|
Flickr Mail is a Flask app that helps find Flickr photos for Wikipedia articles
|
||||||
and contact the photographer. It's a Python application that leverages the Flask
|
and contact photographers to request Wikipedia-compatible licensing.
|
||||||
framework for web development.
|
|
||||||
|
|
||||||
## Table of Contents
|
## What It Does
|
||||||
- [Introduction](#introduction)
|
|
||||||
- [Usage](#usage)
|
|
||||||
- [Error Handling](#error-handling)
|
|
||||||
- [Running the Application](#running-the-application)
|
|
||||||
|
|
||||||
## Introduction
|
- Searches Flickr from a Wikipedia article title/URL
|
||||||
|
- Shows license status for each result (free vs non-free CC variants)
|
||||||
|
- Builds a ready-to-send Flickr message for non-free licenses
|
||||||
|
- Finds image-less articles in a Wikipedia category
|
||||||
|
- Shows recent Commons uploads that came from Flickr mail outreach
|
||||||
|
|
||||||
This tool is developed and maintained by Edward Betts (edward@4angle.com). Its
|
## Project Layout
|
||||||
primary purpose is to simplify the process of discovering and contacting
|
|
||||||
photographers on Flickr whose photos can be used to enhance Wikipedia articles.
|
|
||||||
|
|
||||||
### Key Features
|
- `main.py`: Flask app routes and core logic
|
||||||
- **Integrated Flickr search**: Enter a Wikipedia article title and see Flickr
|
- `templates/`: UI templates
|
||||||
photos directly in the interface - no need to visit Flickr's search page.
|
- `download_sent_mail.py`: sync Flickr sent messages into DB
|
||||||
- **Photo grid with metadata**: Search results display as a grid of thumbnails
|
- `download_commons_contributions.py`: sync Commons contributions into DB
|
||||||
showing the user's name and license for each photo.
|
- `update_flickr_uploads.py`: derive `flickr_uploads` from contributions/sent mail
|
||||||
- **License handling**: Photos with Wikipedia-compatible licenses (CC BY,
|
- `flickr_mail.db`: SQLite database
|
||||||
CC BY-SA, CC0, Public Domain) are highlighted with a green badge and link
|
|
||||||
directly to the Commons UploadWizard. Non-free CC licenses (NC/ND) show a
|
|
||||||
tailored message explaining Wikipedia's requirements. Supports both CC 2.0
|
|
||||||
and CC 4.0 license codes.
|
|
||||||
- **One-click message composition**: Click any photo to compose a permission
|
|
||||||
request message with the photo displayed alongside, showing the user's Flickr
|
|
||||||
profile and current license.
|
|
||||||
- **Previous message detection**: The message page checks sent mail history and
|
|
||||||
warns if you have previously contacted the user.
|
|
||||||
- **Category search**: Find Wikipedia articles without images in a given
|
|
||||||
category, with links to search Flickr for each article.
|
|
||||||
- **Pagination**: Browse through thousands of search results with page navigation.
|
|
||||||
- **Recent uploads showcase**: The home page displays recent Wikimedia Commons
|
|
||||||
uploads that were obtained via Flickr mail requests, with links to the
|
|
||||||
Wikipedia article and user's Flickr profile.
|
|
||||||
- Handle exceptions gracefully and provide detailed error information.
|
|
||||||
|
|
||||||
## Usage
|
## Database Pipeline
|
||||||
|
|
||||||
To use the tool, follow these steps:
|
The recent uploads section depends on a 3-step pipeline:
|
||||||
|
|
||||||
1. Start the tool by running the script.
|
1. `./download_sent_mail.py` updates `sent_messages`
|
||||||
2. Access the tool through a web browser.
|
2. `./download_commons_contributions.py` updates `contributions`
|
||||||
3. Enter a Wikipedia article title or URL, or use "Find articles by category"
|
3. `./update_flickr_uploads.py` builds/updates `flickr_uploads`
|
||||||
to discover articles that need images.
|
|
||||||
4. Browse the Flickr search results displayed in the interface.
|
|
||||||
5. Click on a photo to select it. If the license is Wikipedia-compatible, you'll
|
|
||||||
be linked to the Commons UploadWizard. Otherwise, a message is composed to
|
|
||||||
request a license change.
|
|
||||||
6. Copy the subject and message, then click "Send message on Flickr" to contact
|
|
||||||
the user.
|
|
||||||
|
|
||||||
## Error Handling
|
`main.py` only reads `flickr_uploads`; it does not populate it.
|
||||||
|
|
||||||
The application includes error handling to ensure a smooth user experience. If
|
## UploadWizard Detection
|
||||||
an error occurs, it will display a detailed error message with traceback
|
|
||||||
information. The error handling is designed to provide valuable insights into
|
|
||||||
any issues that may arise during use.
|
|
||||||
|
|
||||||
## Running the Application
|
`update_flickr_uploads.py` supports both Commons UploadWizard comment styles:
|
||||||
|
|
||||||
To run the application, ensure you have Python 3 installed on your system. You
|
- `User created page with UploadWizard` (older)
|
||||||
will also need to install the required Python modules mentioned in the script,
|
- `Uploaded a work by ... with UploadWizard` (newer)
|
||||||
including Flask, requests, and others.
|
|
||||||
|
|
||||||
1. Clone this repository to your local machine.
|
It first tries to extract a Flickr URL directly from the contribution comment.
|
||||||
2. Navigate to the project directory.
|
If absent, it falls back to Commons `extmetadata.Credit`.
|
||||||
3. Run the following command to start the application:
|
|
||||||
|
## Local Run
|
||||||
|
|
||||||
|
Install dependencies (example):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install flask requests beautifulsoup4 sqlalchemy
|
||||||
|
```
|
||||||
|
|
||||||
|
Start the app:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python3 main.py
|
python3 main.py
|
||||||
```
|
```
|
||||||
|
|
||||||
4. Access the application by opening a web browser and visiting the provided URL
|
Then open:
|
||||||
(usually `http://localhost:5000/`).
|
|
||||||
|
|
||||||
That's it! You can now use the Flickr Photo Finder tool to streamline the
|
- `http://localhost:5000/`
|
||||||
process of finding and contacting photographers for Wikipedia articles.
|
|
||||||
|
|
||||||
If you encounter any issues or have questions, feel free to contact Edward Betts
|
## Refresh Data
|
||||||
(edward@4angle.com).
|
|
||||||
|
|
||||||
Happy photo hunting!
|
Run in this order:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./download_sent_mail.py
|
||||||
|
./download_commons_contributions.py
|
||||||
|
./update_flickr_uploads.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Before running `./download_sent_mail.py`, create local auth config:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cp download_sent_mail.example.json download_sent_mail.local.json
|
||||||
|
```
|
||||||
|
|
||||||
|
Then edit `download_sent_mail.local.json` and set `cookies_str` to your full
|
||||||
|
Flickr `Cookie` header value.
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- `download_commons_contributions.py` uses an overlap window of known-only
|
||||||
|
batches before stopping to avoid full-history scans while still catching
|
||||||
|
shallow gaps.
|
||||||
|
- If a known Commons upload is missing from `flickr_uploads`, re-run the full
|
||||||
|
3-step pipeline above.
|
||||||
|
|
|
||||||
3
download_sent_mail.example.json
Normal file
3
download_sent_mail.example.json
Normal file
|
|
@ -0,0 +1,3 @@
|
||||||
|
{
|
||||||
|
"cookies_str": "paste your full Flickr Cookie header value here"
|
||||||
|
}
|
||||||
|
|
@ -1,7 +1,9 @@
|
||||||
#!/usr/bin/env python3
|
#!/usr/bin/env python3
|
||||||
"""Download sent FlickrMail messages for backup."""
|
"""Download sent FlickrMail messages for backup."""
|
||||||
|
|
||||||
|
import json
|
||||||
import time
|
import time
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
import requests
|
import requests
|
||||||
from bs4 import BeautifulSoup
|
from bs4 import BeautifulSoup
|
||||||
|
|
@ -18,6 +20,8 @@ BASE_URL = "https://www.flickr.com"
|
||||||
SENT_MAIL_URL = f"{BASE_URL}/mail/sent/page{{page}}"
|
SENT_MAIL_URL = f"{BASE_URL}/mail/sent/page{{page}}"
|
||||||
MESSAGE_URL = f"{BASE_URL}/mail/sent/{{message_id}}"
|
MESSAGE_URL = f"{BASE_URL}/mail/sent/{{message_id}}"
|
||||||
MAX_SENT_MAIL_PAGES = 29 # Fallback upper bound if we need to backfill everything
|
MAX_SENT_MAIL_PAGES = 29 # Fallback upper bound if we need to backfill everything
|
||||||
|
CONFIG_FILE = Path(__file__).with_name("download_sent_mail.local.json")
|
||||||
|
EXAMPLE_CONFIG_FILE = Path(__file__).with_name("download_sent_mail.example.json")
|
||||||
|
|
||||||
HEADERS = {
|
HEADERS = {
|
||||||
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:147.0) Gecko/20100101 Firefox/147.0",
|
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:147.0) Gecko/20100101 Firefox/147.0",
|
||||||
|
|
@ -34,7 +38,23 @@ HEADERS = {
|
||||||
"Priority": "u=0, i",
|
"Priority": "u=0, i",
|
||||||
}
|
}
|
||||||
|
|
||||||
COOKIES_STR = """ccc=%7B%22needsConsent%22%3Atrue%2C%22managed%22%3A0%2C%22changed%22%3A0%2C%22info%22%3A%7B%22cookieBlock%22%3A%7B%22level%22%3A2%2C%22blockRan%22%3A1%7D%7D%7D; _sp_ses.df80=*; _sp_id.df80=968931de-089d-4576-b729-6662c2c13a65.1770187027.1.1770187129..adf2374b-b85c-4899-afb7-63c2203d0c44..9422de57-9cdf-49c9-ac54-183eaa1ec457.1770187027101.24; TAsessionID=7f373c97-e9f8-46cb-bc1a-cb4f164ce46b|NEW; notice_behavior=expressed,eu; usprivacy=1---; acstring=3~550.1942.3126.3005.3077.1329.196.1725.1092; euconsent-v2=CQfGXgAQfGXgAAvACDENCQFsAP_gAEPgAAAALktB9G5cSSFBYCJVYbtEYAQDwFhg4oAhAgABEwAATBoAoIwGBGAoIAiAICACAAAAIARAIAEECAAAQAAAIIABAAAMAEAAIAACIAAACAABAgAACEAIAAggWAAAAEBEAFQAgAAAQBIACFAAAgABAUABAAAAAACAAQAAACAgQAAAAAAAAAAAkAhAAAAAAAAAABAMAAABIAAAAAAAAAAAAAAAAAAABAAAAICBAAAAQAAAAAAAAAAAAAAAAAAAAgqY0H0blxJIUFgIFVhu0QgBBPAWADigCEAAAEDAABMGgCgjAIUYCAgSIAgIAAAAAAgBEAgAQAIAABAAAAAgAEAAAwAQAAgAAAAAAAAAAECAAAAQAgACCBYAAAAQEQAVACBAABAEgAIUAAAAAEBQAEAAAAAAIABAAAAICBAAAAAAAAAAACQCEAAAAAAAAAAEAwBAAEgAAAAAAAAAAAAAAAAAAAEABAAgIEAAABAA.YAAAAAAAAAAA.ILktB9G5cSSFBYCJVYbtEYAQTwFhg4oAhAgABEwAATBoAoIwGFGAoIEiAICACAAAAIARAIAEECAAAQAAAIIABAAAMAEAAIAACIAAACAABAgAACEAIAAggWAAAAEBEAFQAgQAAQBIACFAAAgABAUABAAAAAACAAQAAACAgQAAAAAAAAAAAkAhAAAAAAAAAABAMAQABIAAAAAAAAAAAAAAAAAAABAAQAICBAAAAQAAAAAAAAAAAAAAAAAAAAgA; notice_preferences=2:; notice_gdpr_prefs=0,1,2:; cmapi_gtm_bl=; cmapi_cookie_privacy=permit 1,2,3; AMCV_48E815355BFE96970A495CD0%40AdobeOrg=281789898%7CMCMID%7C44859851125632937290373504988866174366%7CMCOPTOUT-1770194232s%7CNONE%7CvVersion%7C4.1.0; AMCVS_48E815355BFE96970A495CD0%40AdobeOrg=1; xb=646693; localization=en-us%3Buk%3Bgb; flrbp=1770187037-cfbf3914859af9ef68992c8389162e65e81c86c4; flrbgrp=1770187037-8e700fa7d73b4f2d43550f40513e7c6f507fd20f; flrbgdrp=1770187037-9af21cc74000b5f3f0943243608b4284d5f60ffd; flrbgmrp=1770187037-53f7bfff110731954be6bdfb2f587d59a8305670; flrbrst=1770187037-440e42fcee9b4e8e81ba8bc3eb3d0fc8b62e7083; flrtags=1770187037-7b50035cb956b9216a2f3372f498f7008d8e26a8; flrbrp=1770187037-c0195dc99caa020d4e32b39556131add862f26a0; flrb=34; session_id=2693fb01-87a0-42b1-a426-74642807b534; cookie_session=834645%3A29f2a9722d8bac88553ea1baf7ea11b4; cookie_accid=834645; cookie_epass=29f2a9722d8bac88553ea1baf7ea11b4; sa=1775371036%3A79962317%40N00%3A8fb60f4760b4840f37af3ebc90a8cb57; vp=2075%2C1177%2C1%2C0; flrbfd=1770187037-88a4e436729c9c5551794483fbd9c80e9dac2354; flrbpap=1770187037-18adaacf3a389df4a7bdc05cd471e492c54ef841; liqpw=2075; liqph=672"""
|
def load_cookie_string() -> str:
|
||||||
|
"""Load Flickr cookies string from local JSON config."""
|
||||||
|
if not CONFIG_FILE.exists():
|
||||||
|
raise RuntimeError(
|
||||||
|
f"Missing config file: {CONFIG_FILE}. "
|
||||||
|
f"Copy {EXAMPLE_CONFIG_FILE.name} to {CONFIG_FILE.name} and set cookies_str."
|
||||||
|
)
|
||||||
|
|
||||||
|
try:
|
||||||
|
data = json.loads(CONFIG_FILE.read_text())
|
||||||
|
except json.JSONDecodeError as exc:
|
||||||
|
raise RuntimeError(f"Invalid JSON in {CONFIG_FILE}: {exc}") from exc
|
||||||
|
|
||||||
|
cookie_str = data.get("cookies_str", "").strip()
|
||||||
|
if not cookie_str:
|
||||||
|
raise RuntimeError(f"{CONFIG_FILE} must contain a non-empty 'cookies_str' value")
|
||||||
|
return cookie_str
|
||||||
|
|
||||||
|
|
||||||
def parse_cookies(cookie_str: str) -> dict[str, str]:
|
def parse_cookies(cookie_str: str) -> dict[str, str]:
|
||||||
|
|
@ -51,7 +71,7 @@ def create_session() -> requests.Session:
|
||||||
"""Create a requests session with authentication."""
|
"""Create a requests session with authentication."""
|
||||||
session = requests.Session()
|
session = requests.Session()
|
||||||
session.headers.update(HEADERS)
|
session.headers.update(HEADERS)
|
||||||
session.cookies.update(parse_cookies(COOKIES_STR))
|
session.cookies.update(parse_cookies(load_cookie_string()))
|
||||||
return session
|
return session
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue