Commit graph

6 commits

Author SHA1 Message Date
fe89db11bd Improve link matching to avoid more classes of bad edits
- Skip no-parameter templates (navboxes) and add annotated link,
  excerpt, main, see to the list of skipped parameterised templates
- Preserve sentence-initial capitalisation when replacement is lowercase
- Skip matches that sit entirely inside an existing [[link]] destination
- Treat link destinations that start with q as more specific links to
  preserve, in both find_link_in_chunk and find_link_and_section

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 14:44:28 +01:00
4fe0acc167 Improve link matching to avoid many classes of bad edits
parse_cite: extend to skip {{cite}}/{{citation}}, {{short description}},
{{gli}}, {{defn}}, external links [https://...], italic text ''...'',
and bullet-point lines containing bare URLs (unformatted bibliography
entries). Uses brace-counting to handle nested templates correctly.

parse_links: yield [[Category:...]] links as 'category' tokens so they
are never modified.

add_link: handle three new boundary cases where the match spans an
existing [[link]]:
- match ends exactly at the link boundary: replace the whole thing with
  a single clean link (e.g. surface [[runoff (hydrology)|runoff]] →
  [[surface runoff]])
- match starts right after [[: absorb the stray [[ (e.g.
  [[anti-globalization]] movement → [[anti-globalization movement]])
- match starts partway inside a link: skip (would produce broken wikitext)
- match spans into but not through a link: use a piped prefix link
  (e.g. cross-platform [[interchange station]] →
  [[cross-platform interchange|cross-platform]] [[interchange station]])

Fallback search: mask [[Category:...]] spans with spaces so the pattern
cannot match inside them. Guard against matches that are part of a
longer named entity (title-case phrase followed by extra words then an
abbreviation in parentheses, e.g. "Anti-Globalization Movement of
Russia (AGMR)").

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 18:11:23 +01:00
479dc864fd Remove debugging output 2023-12-09 18:43:05 +00:00
14d8539298 Link matching improvements 2023-12-09 18:42:53 +00:00
1da620875a Add type hints and docstrings 2023-12-09 18:42:03 +00:00
f07b407e7a Initial commit 2023-10-04 12:56:21 +01:00