Wikidata:Property proposal/Australian Medical Pioneers Index

Australian Medical Pioneers Index

edit

Originally proposed at Wikidata:Property proposal/Person

DescriptionA database of colonial doctors to 1875
RepresentsAustralian Medical Pioneers Index (Q108397844)
Data typeExternal identifier
Domainpeople
Allowed values[1-9]\d*
Example 1William Bland (Q7636853)2254
Example 2Isaac Scott Nind (Q6077089)1447
Example 3George Hogarth Pringle (Q16065529)1000
Sourcehttp://www.medicalpioneers.com/
External linksUse in sister projects: [ar][de][en][es][fr][he][it][ja][ko][nl][pl][pt][ru][sv][vi][zh][commons][species][wd][en.wikt][fr.wikt].
Number of IDs in source4500+
Expected completenesseventually complete (Q21873974)
Formatter URLhttp://www.medicalpioneers.com/cgi-bin/index.cgi?detail=1&id=$1
Applicable "stated in"-valueAustralian Medical Pioneers Index (Q108397844)

--GZWDer (talk) 17:44, 19 August 2021 (UTC)[reply]

Discussion

edit

  Notified participants of WikiProject Australia. Thierry Caro (talk) 17:45, 21 August 2021 (UTC)[reply]

  •   Support --Dhx1 (talk) 23:22, 21 August 2021 (UTC)[reply]
  •   Support --SHB2000 (talk) 00:03, 22 August 2021 (UTC)[reply]
  •   Support --Canley (talk) 02:00, 22 August 2021 (UTC)[reply]
  • Serious questions. How persistent is this dataset likely to be? It appears to be operated and maintained by one person [1] How stable are these identifiers? I note that they are form part of the URL but do not form part of the record itself, see [2] as an example where there is no mention of 1350 in the record itself. This means that 1350 may simply be an internal artefact of the database itself, e.g. a row number of a spreadsheet. The addition/deletion of records may then alter these numbers over time. We have these problems constantly with major government databases where you might hope there would be competent people involved who understood about the need for stability of identifiers (e.g. the complete renumberings that have occurred in the Queensland Heritage Register in 2014). Now you might say it doesn't matter as the old identifier can still be used to extract the web page from the Internet Archive but not so. Again look at [3], it's not accessible via the Internet Archive and as the website is based on a Deep Web search (that is, the webpages for each entry are produced from executing a database query and don't appear to be findable via manual browsing), they are not amenable to being archived either (I just tried to get the Internet Archive to take a snapshot of [4] and it failed). Just to show what happens in real life with online database. Look at previously adopted property id Australian National Shipwreck ID. The website has moved and the URLs it generates are broken and not available through the Internet Archive. However this is an authoritative database provided by the Government under legislation "The AUCHD also serves as the register of protected underwater cultural heritage for the Underwater Cultural Heritage Act 2018 (the UCH Act) and provides a portal for the public to submit notifications and permit applications required under the UCH Act."), so it should continue to exist somewhere. And with a little googling we can locate the new shipwreck database URL and as it happen, the entries appear to have the same IDs are before (relief!) but look at this example [5], here the identifier 6368 forms part of the actual record as well as being part of the URL. The identifier being proposed here is for a personal project (longevity unclear), not archivable and proposing an ID that may just be an artefact of the current database and not a persistent key/ID. If this propert is identifed toto be an authority control, we need to establish if this database is an "authority" and what commitment there is to the persistence of the database and its identifier. Kerry Raymond (talk) 02:08, 22 August 2021 (UTC)[reply]
  • Very good points, thanks Kerry. I've enquired and looked into the history of this dataset, and while it was originally an individual's project (compiled on index cards!), the State Library of Victoria (SLV) arranged and paid for the development of the schema and website, digitisation and archiving of the cards, and for the ongoing hosting of the database and site. So at least it has the backing of a major GLAM institution which considers the dataset of considerable value (the site is referred to by numerous archives such as NSW Archives and the UK NHS registry), so it's not just one person's passion project which will vanish when they die, give up or stop paying for web hosting (the original compiler, Noel Richards, died in 1998, and there is a new editor-in-chief, Stephen Due). The index numbers/primary keys do seem to be persistent in the schema, not just spreadsheet rows – it's possible that could change but this is easy to migrate to new IDs in Wikidata if it does. The site could still disappear if SLV withdraws support, but as with the QHR and shipwreck examples, this can and does happen to major government websites and data all the time, so is any online data really permanent (the lack of archivability is a shame and concern though)? Worst example I have encountered is the NSW Geographical Name Register which seems actively hostile towards linking to their records and has several times hashed their identifiers! With the Shipwrecks database, DoE just changed the web address slightly, so I have changed the formatter URL on the Wikidata property and they should all work now. --Canley (talk) 04:41, 3 September 2021 (UTC)[reply]
  NODES
INTERN 5
Note 1
Project 6