Wikidata:Requests for permissions/Bot/KlaraBot
KlaraBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Iamcarbon (talk • contribs • logs)
Task/s:
Append a human's lifespan to descriptions when they do not already exist.
for example: Lithuanian historian -> Lithuanian historian (1923–2017) [diff]
Code:
Human descriptions will be updated under the following conditions:
- There is no existing lifespan
- A birth and/or death date is available that:
- has at least one reference
- is precise to at least the nearest year
- uses the Gregorian calendar (or the default calendar)
- Adding the lifespan does introduce ambiguity in relation to reigns, terms in office, or other significant roles
Additional Rules:
- Deprecated statements are excluded
- Preferred values are prioritized over non-preferred ones
- In cases of conflicting claims:
- if the item has a Wikipedia article that includes the lifespan in the first sentence, it will be used to corroborate an existing claim
- the item will be logged for publishing in a future public report, so we can manually improve the claims
- Data from unreliable sources or individuals born before 1700 may be ignored
Function details: --Iamcarbon (talk) 00:17, 8 December 2024 (UTC)
- Comment in enwiki also such life-span edits are made at short-descriptions. I would consider them "not-bot like". Side-notice @Iamcarbon: You got shock-blocked, because you manually removed labels where mul-label existed. Estopedist1 (talk) 08:19, 8 December 2024 (UTC)
- Bot description looks fine for the most part. I think AI assisted unattended edits are a bad idea in general, I would much prefer that in cases of multiple or conflicting claims it logs the claim and writes out a publicly available report on them at some developer-determined interval. These cases ought to be handled manually by a human. Edit: Or alternatively, depending on the amount, just ignore the ambiguous cases. Infrastruktur (talk) 17:53, 8 December 2024 (UTC)
- Agreed. I have replaced the LLM with a Wikipedia parser to disambiguate conflicting claims, and added a Sqlite database to log these items so they can be reported in the future. If we can't corroborate the year with wikipedia, it will be skipped. Iamcarbon (talk) 20:47, 8 December 2024 (UTC)
- Comment
- I’ve also updated the rules so that when a death date is added to an item that currently has a birth date only, the description is updated to include the full lifespan.
- Examples:
- Uruguay professional football player (
born 7 February 1911) → Uruguay professional football player (1911–1986) (diff) - Bishop of Guildford (
born 1929) → Bishop of Guildford (1929–2024) (diff) - Iamcarbon (talk) 07:30, 17 December 2024 (UTC)
- Support A well-defined task with a limited scope, and a typical bot task because doing it manually would require too much effort. I assume this task is only being applied to English descriptions? How many edits do you expect for this task? Normally, providing 50 example edits or so is suggested, but given the straightforward nature of this task, it seems quite clear. Difool (talk) 11:37, 18 December 2024 (UTC)
- I've gone ahead and made 50 example edits. Special:Contributions/KlaraBot. I would expect somewhere around ~100K initial edits for this task, and ~10K additional edits oer month as dates of death are added. The initial plan is to limit to en only descriptions, but this may be extended to other languages in the future. Iamcarbon (talk) 20:27, 18 December 2024 (UTC)
- Oppose As far as I remember there is a guideline that no birth or death should be added in the description if there is no risk of confusion with another person. -- Plexci (talk) 11:55, 18 December 2024 (UTC)
- Hi @Plexci. Thanks for the feedback! Do we know if these are still our current guidelines?
- These lifespans are being applied fairly consistently added across all popular items, across all languages - which requires a lot of editor time. Check the history of Q11043959 for example.
- For some personal context: I typically dedicate at least an hour each month to verifying human suggestions when associating people with various items (e.g., written works, artists, paintings). Lifespan information is one of the most critical attributes for ensuring confidence in a match. When this data is missing from a description, it requires additional time to open the item, locate birth and death dates, and cross-reference them with the source. This is particularly tedious when trying to find an author for a written work that does not yet exist, as it often requires that you check ALL suggestions that cannot be ruled out by their lifespan.
- Interested in hearing your thoughts. Iamcarbon (talk) 20:00, 18 December 2024 (UTC)
- Hi @Plexci. I would also consider taking a look at these proposed sample edits: Special:Contributions/KlaraBot, and looking up a few random names through search to see how the lifespan helps differentiate them from one another. I agree that the lifespan is unnnessarcy on some well known individuals, but believe these are the exception rather then the common case. Iamcarbon (talk) 20:37, 18 December 2024 (UTC)
- One final note: this bot also includes rules to exclude modify humans or descriptions that include a period of significance. I currently match this using a regex to find terms in office, reigns, and other notable periods. This exclude many well known individuals.
- For instance: a lifespan would not be added to Q23 George Washington or Q9682 Elizabeth II.
- Most other well known individuals, like Q42775 Johnny Cash already have lifespans, which would be skipped. Iamcarbon (talk) 20:57, 18 December 2024 (UTC)
- You don't have a bot flag and let it run nonetheless. As seen on your talk page, you ignore everything when to be asked to stop actions that aren't confirmed. That does not make a trustworthy impression and it seems that this proposal is a pretext to use Wikidata as a sandbox. --Plexci (talk) 22:01, 18 December 2024 (UTC)
- Hi @Plexci. Some of the bulk edits from Iamcarbon were made in a grey area, where semi-automated editing can be permissible - but is not a good practice. This is precisely why I am proposing a bot flag - to ensure this work is done transparency, with clear boundaries, and in agreement of the community.
- My prior actions that got me blocked were related to edits related to the default labels and aliases (a WMDE project aimed to help reduce redundancy across the project and address scalability concerns) - which remains a controversial project project with various issues. My edits relating to this work (done in batches) were made across three item types (given names, family names, and humans) to help identify and issues before a bot task was proposed and any mass removals take place. The first edits to on human and given names turned out to be controversial, as the alias removals can change the search result order in certain languages. The second set of edits to humans where intended to to uncover any issues with removing duplicated aliases on humans. I have no plans to propose or operate any of these tasks under this bot until any unresolved issues with the project are addressed.
- While I may have been too bold in making some of my prior edits, I have been responsive and engaged on any issues brought to my attention. Iamcarbon (talk) 00:08, 19 December 2024 (UTC)
- You don't have a bot flag and let it run nonetheless. As seen on your talk page, you ignore everything when to be asked to stop actions that aren't confirmed. That does not make a trustworthy impression and it seems that this proposal is a pretext to use Wikidata as a sandbox. --Plexci (talk) 22:01, 18 December 2024 (UTC)
- Hi @Plexci. I would also consider taking a look at these proposed sample edits: Special:Contributions/KlaraBot, and looking up a few random names through search to see how the lifespan helps differentiate them from one another. I agree that the lifespan is unnnessarcy on some well known individuals, but believe these are the exception rather then the common case. Iamcarbon (talk) 20:37, 18 December 2024 (UTC)
- Question 1) In which language or languages will the bot edit descriptions? 2) You write "when they can be authoritatively sourced", and yet gives an example where the date of birth is not appropriately sourced, but only has a value imported from a Wikipedia. It doesn't make sense. How will you ensure that the data has reliable sources? --Dipsacus fullonum (talk) 14:39, 18 December 2024 (UTC)
- 1) The current rule is to ensure that there is at least 1 reference for the birth and death dates and that the source that has not been found to be unreliable. The bot also cross checks the dates against the Wikipedia article (when one exists) when there is ambiguity, and logs this case. Less than 10% of humans match these rules.
- 2) The logic currently applies to en only descriptions, but may be extended to other languages (matching their formatting rules). Iamcarbon (talk) 19:46, 18 December 2024 (UTC)
- One more comment here: we also have thousands of existing lifespans included in descriptions that do not match our statements or the Wikipedia descriptions. As part of this work, I am also logging these discrepancies, and plan to publish a public report that we can use to help improve these items - by sourcing better statements, or prioritizing the best ones. Iamcarbon (talk) 20:09, 18 December 2024 (UTC)
̇̇̈* Oppose - due to unsatisfactory answers to questions. Wikipeia is an unreliable source, but a Wikipedia is used as a source in the only example given despite assurances not to use unreliable sources. --Dipsacus fullonum (talk) 20:50, 18 December 2024 (UTC)
- Hi @Dipsacus fullonum. I have modified the bot's description to remove "Authoritatively sourced", and agree that Wikipedia is not reliable. However, I believe that Wikipedia can still be considered generally reliable, and is often our best source of information as the project continues to improve and gain additional references to more authoritative sources.
- That said, I am confident that these edits will be no less accurate then our manual contributions (often from new users), and help free up our contributors time from tedious work (including my own).
- Given the updated bot description, I am interested to hear your updated feedback and whether you still oppose this task. Iamcarbon (talk) 21:37, 18 December 2024 (UTC)
- I cannot support the use of Wikipedia as a source or the use of data from Wikidata statements without a reliable source (including statements imported from Wikipedia). Data without reliable sources should not be repeated elsewhere. Therefore, I remain opposed to this request. Dipsacus fullonum (talk) 02:37, 19 December 2024 (UTC)
- Thank you for the feedback, everyone. Since we didn’t reach a consensus on this bot request, I am withdrawing this proposal.
- As I invested a good deal of time in this work and uncovered many data discrepancies and data quality issues during the process, I will plan to share these findings in a public report.
- Since I’m unable to submit a new task under KlaraBot, I’ll also postpone proposing any additional data maintenance tasks for now. Iamcarbon (talk) 04:06, 19 December 2024 (UTC)