Move Flickr sent-mail cookies into local config file

2026-02-07 14:41:41 +00:00 · 2026-02-07 14:41:41 +00:00 · 252a854e76
commit 252a854e76
parent 2819652afd
5 changed files with 127 additions and 79 deletions
--- a/.gitignore
+++ b/.gitignore
@ -3,3 +3,4 @@ __pycache__
 commons_contributions/thumbnail_cache.json
 commons_contributions/sent_mail_index.json
 flickr_mail.db
 download_sent_mail.local.json
--- a/AGENTS.md
+++ b/AGENTS.md
@ -85,16 +85,18 @@ for the Flickr mail URL. Scrapes the user's profile page for embedded params.
 Shows recent Wikimedia Commons uploads on the home page, filtered to only
 those obtained via Flickr mail requests.
-**Data files** (in `commons_contributions/`):
+**Database tables used by the app**:
- `flickr_uploads.json`: List of Commons uploads from Flickr with metadata
+- `sent_messages`: downloaded from Flickr sent mail, includes extracted Flickr
- `thumbnail_cache.json`: Cached Commons API thumbnail URLs (7-day TTL)
+  URL and Wikipedia URL from message body
- `sent_mail_index.json`: Index of sent mail messages (flickr_url → wikipedia_url)
+- `contributions`: downloaded from Commons `usercontribs`
 - `flickr_uploads`: derived table built by `update_flickr_uploads.py` by
  matching Commons uploads to Flickr URLs
 - `thumbnail_cache`: cached Commons API thumbnail URLs (7-day TTL)
 **Key functions**:
 - `build_sent_mail_index()`: Parses sent mail JSON files, extracts Flickr and
  Wikipedia URLs from message bodies, caches the index
 - `get_recent_commons_uploads()`: Loads uploads, filters by sent mail match,
-  fetches thumbnails from Commons API
+  joins `flickr_uploads` with `sent_messages`, and fetches thumbnails from
  Commons API
 - `normalize_flickr_url()`: Normalizes URLs for matching (removes protocol, www, trailing slash)
 **CommonsUpload dataclass**:
@ -104,9 +106,14 @@ those obtained via Flickr mail requests.
 - `wiki_link_url`, `wiki_link_label`: Handles Wikidata vs Wikipedia links
 **Maintenance script** (`update_flickr_uploads.py`):
-Run to find Flickr uploads from UploadWizard contributions that don't have
+Builds/updates `flickr_uploads` from `contributions` and links to
-the Flickr URL in the edit comment. Queries Commons API for image metadata
+`sent_messages`.
-and checks the Credit field for Flickr URLs.
+- Scans file contributions containing `UploadWizard` in the comment
 - Supports both comment styles:
  - `User created page with UploadWizard` (older)
  - `Uploaded a work by ... with UploadWizard` (newer; often includes URL)
 - Extracts Flickr URL from contribution comment when present
 - Falls back to Commons `extmetadata.Credit` lookup when comment has no URL
 ### Category Search (`/category` route)
@ -125,7 +132,7 @@ to allow back-navigation to the category.
 ### Previous Message Detection (`get_previous_messages`)
-Checks `sent_mail/messages_index.json` for previous messages to a Flickr user.
+Checks the `sent_messages` database table for previous messages to a Flickr user.
 Matches by both display name and username (case-insensitive). Results shown as
 an info alert on the message page.
@ -159,6 +166,24 @@ print(f"{len(result.photos)} photos, {result.total_pages} pages")
 print(result.photos[0].title, result.photos[0].license_name)
 ```
 ## Data Sync Workflow
 To refresh "recent Commons uploads obtained via Flickr mail", run scripts in
 this order:
 1. `./download_sent_mail.py`
 2. `./download_commons_contributions.py`
 3. `./update_flickr_uploads.py`
 Notes:
 - `download_sent_mail.py` reads Flickr auth cookies from
  `download_sent_mail.local.json` (`cookies_str` key). Copy
  `download_sent_mail.example.json` to create local config.
 - `main.py` does not populate `flickr_uploads`; it only reads from it.
 - `download_commons_contributions.py` intentionally stops after several
  consecutive fully-known API batches (overlap window) to avoid full-history
  scans while still catching shallow gaps.
 ## Potential Improvements
 - Cache search results to reduce Flickr requests
--- a/README.md
+++ b/README.md
@ -1,89 +1,88 @@
-# Flickr Photo Finder for Wikipedia Articles
+# Flickr Mail
 Tool lives here: <https://edwardbetts.com/flickr_mail/>
-This tool is designed to help you find photos on Flickr for Wikipedia articles
+Flickr Mail is a Flask app that helps find Flickr photos for Wikipedia articles
-and contact the photographer. It's a Python application that leverages the Flask
+and contact photographers to request Wikipedia-compatible licensing.
 framework for web development.
-## Table of Contents
+## What It Does
 - [Introduction](#introduction)
 - [Usage](#usage)
 - [Error Handling](#error-handling)
 - [Running the Application](#running-the-application)
-## Introduction
+- Searches Flickr from a Wikipedia article title/URL
 - Shows license status for each result (free vs non-free CC variants)
 - Builds a ready-to-send Flickr message for non-free licenses
 - Finds image-less articles in a Wikipedia category
 - Shows recent Commons uploads that came from Flickr mail outreach
-This tool is developed and maintained by Edward Betts (edward@4angle.com). Its
+## Project Layout
 primary purpose is to simplify the process of discovering and contacting
 photographers on Flickr whose photos can be used to enhance Wikipedia articles.
-### Key Features
+- `main.py`: Flask app routes and core logic
- **Integrated Flickr search**: Enter a Wikipedia article title and see Flickr
+- `templates/`: UI templates
-  photos directly in the interface - no need to visit Flickr's search page.
+- `download_sent_mail.py`: sync Flickr sent messages into DB
- **Photo grid with metadata**: Search results display as a grid of thumbnails
+- `download_commons_contributions.py`: sync Commons contributions into DB
-  showing the user's name and license for each photo.
+- `update_flickr_uploads.py`: derive `flickr_uploads` from contributions/sent mail
- **License handling**: Photos with Wikipedia-compatible licenses (CC BY,
+- `flickr_mail.db`: SQLite database
  CC BY-SA, CC0, Public Domain) are highlighted with a green badge and link
  directly to the Commons UploadWizard. Non-free CC licenses (NC/ND) show a
  tailored message explaining Wikipedia's requirements. Supports both CC 2.0
  and CC 4.0 license codes.
 - **One-click message composition**: Click any photo to compose a permission
  request message with the photo displayed alongside, showing the user's Flickr
  profile and current license.
 - **Previous message detection**: The message page checks sent mail history and
  warns if you have previously contacted the user.
 - **Category search**: Find Wikipedia articles without images in a given
  category, with links to search Flickr for each article.
 - **Pagination**: Browse through thousands of search results with page navigation.
 - **Recent uploads showcase**: The home page displays recent Wikimedia Commons
  uploads that were obtained via Flickr mail requests, with links to the
  Wikipedia article and user's Flickr profile.
 - Handle exceptions gracefully and provide detailed error information.
-## Usage
+## Database Pipeline
-To use the tool, follow these steps:
+The recent uploads section depends on a 3-step pipeline:
-1. Start the tool by running the script.
+1. `./download_sent_mail.py` updates `sent_messages`
-2. Access the tool through a web browser.
+2. `./download_commons_contributions.py` updates `contributions`
-3. Enter a Wikipedia article title or URL, or use "Find articles by category"
+3. `./update_flickr_uploads.py` builds/updates `flickr_uploads`
   to discover articles that need images.
 4. Browse the Flickr search results displayed in the interface.
 5. Click on a photo to select it. If the license is Wikipedia-compatible, you'll
   be linked to the Commons UploadWizard. Otherwise, a message is composed to
   request a license change.
 6. Copy the subject and message, then click "Send message on Flickr" to contact
   the user.
-## Error Handling
+`main.py` only reads `flickr_uploads`; it does not populate it.
-The application includes error handling to ensure a smooth user experience. If
+## UploadWizard Detection
 an error occurs, it will display a detailed error message with traceback
 information. The error handling is designed to provide valuable insights into
 any issues that may arise during use.
-## Running the Application
+`update_flickr_uploads.py` supports both Commons UploadWizard comment styles:
-To run the application, ensure you have Python 3 installed on your system. You
+- `User created page with UploadWizard` (older)
-will also need to install the required Python modules mentioned in the script,
+- `Uploaded a work by ... with UploadWizard` (newer)
 including Flask, requests, and others.
-1. Clone this repository to your local machine.
+It first tries to extract a Flickr URL directly from the contribution comment.
-2. Navigate to the project directory.
+If absent, it falls back to Commons `extmetadata.Credit`.
-3. Run the following command to start the application:
+
 ## Local Run
 Install dependencies (example):
 ```bash
 pip install flask requests beautifulsoup4 sqlalchemy
 ```
 Start the app:
 ```bash
 python3 main.py
 ```
-4. Access the application by opening a web browser and visiting the provided URL
+Then open:
   (usually `http://localhost:5000/`).
-That's it! You can now use the Flickr Photo Finder tool to streamline the
+- `http://localhost:5000/`
 process of finding and contacting photographers for Wikipedia articles.
-If you encounter any issues or have questions, feel free to contact Edward Betts
+## Refresh Data
 (edward@4angle.com).
-Happy photo hunting!
+Run in this order:
 ```bash
 ./download_sent_mail.py
 ./download_commons_contributions.py
 ./update_flickr_uploads.py
 ```
 Before running `./download_sent_mail.py`, create local auth config:
 ```bash
 cp download_sent_mail.example.json download_sent_mail.local.json
 ```
 Then edit `download_sent_mail.local.json` and set `cookies_str` to your full
 Flickr `Cookie` header value.
 ## Notes
 - `download_commons_contributions.py` uses an overlap window of known-only
  batches before stopping to avoid full-history scans while still catching
  shallow gaps.
 - If a known Commons upload is missing from `flickr_uploads`, re-run the full
  3-step pipeline above.
--- a/download_sent_mail.example.json
+++ b/download_sent_mail.example.json
@ -0,0 +1,3 @@
 {
  "cookies_str": "paste your full Flickr Cookie header value here"
 }
--- a/download_sent_mail.py
+++ b/download_sent_mail.py
@ -1,7 +1,9 @@
 #!/usr/bin/env python3
 """Download sent FlickrMail messages for backup."""
 import json
 import time
 from pathlib import Path
 import requests
 from bs4 import BeautifulSoup
@ -18,6 +20,8 @@ BASE_URL = "https://www.flickr.com"
 SENT_MAIL_URL = f"{BASE_URL}/mail/sent/page{{page}}"
 MESSAGE_URL = f"{BASE_URL}/mail/sent/{{message_id}}"
 MAX_SENT_MAIL_PAGES = 29  # Fallback upper bound if we need to backfill everything
 CONFIG_FILE = Path(__file__).with_name("download_sent_mail.local.json")
 EXAMPLE_CONFIG_FILE = Path(__file__).with_name("download_sent_mail.example.json")
 HEADERS = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:147.0) Gecko/20100101 Firefox/147.0",
@ -34,7 +38,23 @@ HEADERS = {
    "Priority": "u=0, i",
 }
-COOKIES_STR = """ccc=%7B%22needsConsent%22%3Atrue%2C%22managed%22%3A0%2C%22changed%22%3A0%2C%22info%22%3A%7B%22cookieBlock%22%3A%7B%22level%22%3A2%2C%22blockRan%22%3A1%7D%7D%7D; _sp_ses.df80=*; _sp_id.df80=968931de-089d-4576-b729-6662c2c13a65.1770187027.1.1770187129..adf2374b-b85c-4899-afb7-63c2203d0c44..9422de57-9cdf-49c9-ac54-183eaa1ec457.1770187027101.24; TAsessionID=7f373c97-e9f8-46cb-bc1a-cb4f164ce46b|NEW; notice_behavior=expressed,eu; usprivacy=1---; acstring=3~550.1942.3126.3005.3077.1329.196.1725.1092; euconsent-v2=CQfGXgAQfGXgAAvACDENCQFsAP_gAEPgAAAALktB9G5cSSFBYCJVYbtEYAQDwFhg4oAhAgABEwAATBoAoIwGBGAoIAiAICACAAAAIARAIAEECAAAQAAAIIABAAAMAEAAIAACIAAACAABAgAACEAIAAggWAAAAEBEAFQAgAAAQBIACFAAAgABAUABAAAAAACAAQAAACAgQAAAAAAAAAAAkAhAAAAAAAAAABAMAAABIAAAAAAAAAAAAAAAAAAABAAAAICBAAAAQAAAAAAAAAAAAAAAAAAAAgqY0H0blxJIUFgIFVhu0QgBBPAWADigCEAAAEDAABMGgCgjAIUYCAgSIAgIAAAAAAgBEAgAQAIAABAAAAAgAEAAAwAQAAgAAAAAAAAAAECAAAAQAgACCBYAAAAQEQAVACBAABAEgAIUAAAAAEBQAEAAAAAAIABAAAAICBAAAAAAAAAAACQCEAAAAAAAAAAEAwBAAEgAAAAAAAAAAAAAAAAAAAEABAAgIEAAABAA.YAAAAAAAAAAA.ILktB9G5cSSFBYCJVYbtEYAQTwFhg4oAhAgABEwAATBoAoIwGFGAoIEiAICACAAAAIARAIAEECAAAQAAAIIABAAAMAEAAIAACIAAACAABAgAACEAIAAggWAAAAEBEAFQAgQAAQBIACFAAAgABAUABAAAAAACAAQAAACAgQAAAAAAAAAAAkAhAAAAAAAAAABAMAQABIAAAAAAAAAAAAAAAAAAABAAQAICBAAAAQAAAAAAAAAAAAAAAAAAAAgA; notice_preferences=2:; notice_gdpr_prefs=0,1,2:; cmapi_gtm_bl=; cmapi_cookie_privacy=permit 1,2,3; AMCV_48E815355BFE96970A495CD0%40AdobeOrg=281789898%7CMCMID%7C44859851125632937290373504988866174366%7CMCOPTOUT-1770194232s%7CNONE%7CvVersion%7C4.1.0; AMCVS_48E815355BFE96970A495CD0%40AdobeOrg=1; xb=646693; localization=en-us%3Buk%3Bgb; flrbp=1770187037-cfbf3914859af9ef68992c8389162e65e81c86c4; flrbgrp=1770187037-8e700fa7d73b4f2d43550f40513e7c6f507fd20f; flrbgdrp=1770187037-9af21cc74000b5f3f0943243608b4284d5f60ffd; flrbgmrp=1770187037-53f7bfff110731954be6bdfb2f587d59a8305670; flrbrst=1770187037-440e42fcee9b4e8e81ba8bc3eb3d0fc8b62e7083; flrtags=1770187037-7b50035cb956b9216a2f3372f498f7008d8e26a8; flrbrp=1770187037-c0195dc99caa020d4e32b39556131add862f26a0; flrb=34; session_id=2693fb01-87a0-42b1-a426-74642807b534; cookie_session=834645%3A29f2a9722d8bac88553ea1baf7ea11b4; cookie_accid=834645; cookie_epass=29f2a9722d8bac88553ea1baf7ea11b4; sa=1775371036%3A79962317%40N00%3A8fb60f4760b4840f37af3ebc90a8cb57; vp=2075%2C1177%2C1%2C0; flrbfd=1770187037-88a4e436729c9c5551794483fbd9c80e9dac2354; flrbpap=1770187037-18adaacf3a389df4a7bdc05cd471e492c54ef841; liqpw=2075; liqph=672"""
+def load_cookie_string() -> str:
    """Load Flickr cookies string from local JSON config."""
    if not CONFIG_FILE.exists():
        raise RuntimeError(
            f"Missing config file: {CONFIG_FILE}. "
            f"Copy {EXAMPLE_CONFIG_FILE.name} to {CONFIG_FILE.name} and set cookies_str."
        )
    try:
        data = json.loads(CONFIG_FILE.read_text())
    except json.JSONDecodeError as exc:
        raise RuntimeError(f"Invalid JSON in {CONFIG_FILE}: {exc}") from exc
    cookie_str = data.get("cookies_str", "").strip()
    if not cookie_str:
        raise RuntimeError(f"{CONFIG_FILE} must contain a non-empty 'cookies_str' value")
    return cookie_str
 def parse_cookies(cookie_str: str) -> dict[str, str]:
@ -51,7 +71,7 @@ def create_session() -> requests.Session:
    """Create a requests session with authentication."""
    session = requests.Session()
    session.headers.update(HEADERS)
-    session.cookies.update(parse_cookies(COOKIES_STR))
+    session.cookies.update(parse_cookies(load_cookie_string()))
    return session