21:01:31 <brion> #startmeeting ArchCom RFC meeting - Markdown support | Wikimedia meetings channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ 21:01:31 <wm-labs-meetbot`> Meeting started Wed Jun 22 21:01:31 2016 UTC and is due to finish in 60 minutes. The chair is brion. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:01:31 <wm-labs-meetbot`> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:01:31 <wm-labs-meetbot`> The meeting name has been set to 'archcom_rfc_meeting___markdown_support___wikimedia_meetings_channel___please_note__channel_is_logged_and_publicly_posted__do_not_remove_this_note____logs__http___bots_wmflabs_org__wm_bot_logs__23wikimedia_office_' 21:01:46 <brion> i hope that wasn't too many bits for poor meetbot 21:02:19 <robla> #link https://phabricator.wikimedia.org/E218 Phab event for this week's meeting 21:02:40 <brion> #info discussing https://phabricator.wikimedia.org/T137946 develop Markdown support strategy for MediaWiki 21:03:20 <robla> #link https://www.mediawiki.org/wiki/Requests_for_comment/Markdown this week's RFC 21:03:30 * robla wipes brow 21:03:41 <brion> robla, care to chat a bit on the background? 21:04:33 <robla> sure, this is asking "what should our Markdown strategy be?", where pretty much any answer is valid 21:04:56 <brion> :D 21:05:12 <robla> why I'm asking that: there are many, many flavors of "wiki syntax" out there, of which MediaWiki wikitext is only one 21:06:00 <YairRand> (but ours is the _real_ wikisyntax... :P ) 21:06:04 <robla> many implementations claim "Markdown support", which the interpretation varies quite a bit based on implementation 21:06:43 <robla> YairRand: :-D I think that actually gets to the heart of it 21:07:52 <robla> YairRand: do you (or anyone out there) believe that all other implementations will "see the light" and start using our format? should they? 21:08:59 <subbu> a different question is: will all the disparate markdown efforts to go beyond "simple" markdown eventually arrive at the wikitext level of complexity? 21:09:25 <subbu> even if the syntax will probably not be wikitext syntax itself. 21:09:59 <brion> (taking off my chair hat momentarily) what's a reason a given wiki might have for choosing to use markdown? preference, or compatibility with existing data or other tools, or? 21:10:21 <brion> (that might affect how one would go about such support) 21:10:56 <robla> I think both questions are very good, and now I'm having trouble choosing :-) 21:11:00 <brion> :D 21:11:08 <brion> let's do em in turn 21:11:11 <bd808> migrating from a github wiki to mediawiki might be one reason to want markdown page source 21:11:54 <brion> *nod* 21:11:54 <robla> bd808: yup 21:11:54 <YairRand> are there any serious limitations regarding wikitext that are solved in other syntaxes? are they pretty freely convertable? 21:12:09 <Scott_WUaS> (Is there a question here about how Wikimedia markdown talked about now will interface with SQID and Wikidata?) 21:12:35 <robla> YairRand: the Pandoc folks aspire to provide complete interchangability 21:12:58 <brion> #info open question: reasons for choosing markdown? example: moving hosting of a github wiki 21:12:59 <YairRand> robla: ... <clap clap clap> 21:13:06 * subbu is looking at http://pandoc.org/README.html#pandocs-markdown and sees that it is a pretty long spec 21:13:48 <brion> #info open question: complexity and extensions to the markup? example: would we need a syntax extension for templates/parserfunctions/lua/wikidata/etc? 21:14:39 <brion> easy things are easy to convert, hard things are ....... well that's the question isn't it :D 21:15:04 <subbu> one good reason to entertain this markdown question for mediawiki is that it might let us abstract the markup / parsing parts of the codebase behind an interface. 21:15:26 <brion> #info for convertability of markdownish things, see pandoc http://pandoc.org/README.html#pandocs-markdown 21:15:41 <TimStarling> what does cut and paste support mean for users in practice? 21:15:52 <bd808> agreed. getting serious about multiple markup formats would led to cleaning up a lot of entagled cruft in core 21:15:58 <brion> subbu: good point. also, how much do we rely on wikitext eg in the user interface? 21:16:00 <subbu> i toyed with that interface idea in https://www.mediawiki.org/wiki/User:SSastry_(WMF)/Notes/Wikitext#Core_ideas 21:16:35 <subbu> brion, yes, wikitext in the UI is tricky ... 21:16:49 <subbu> site messages are another i guess. 21:17:25 <robla> TimStarling: I know what it means for me, but that's probably a better question for the folks who work with VE regularly, since my understanding is that cut-n-paste bugs happen a lot 21:17:28 <brion> #info question: heavy use of wikitext in UI may require core parser. implications for alternate formats? 21:17:48 * robla goes to find the Phab component for cut-n-paste issues 21:17:58 <subbu> brion, is this (wikitext in UI) used a lot in non-wmf installs of mediawiki? 21:18:27 <robla> https://phabricator.wikimedia.org/project/view/898/ VisualEditor copypaste component in Phab 21:18:29 <TimStarling> would markdown be a third editing mode, after "source" and VE? 21:18:33 <robla> #link https://phabricator.wikimedia.org/project/view/898/ VisualEditor copypaste component in Phab 21:19:00 <subbu> TimStarling, I would think not. 21:19:12 <TimStarling> would you have an "insert markdown" toolbar button which gives you a box for pasting markdown? 21:19:18 <brion> subbu: at least some yes, sentences and paragraphs allowing bold, links, etc on various special pages. don't know how scary they are 21:19:27 <subbu> as in .. i see robla's proposal as that of using it as an interchange format for copy-paste 21:20:12 <brion> #info question: would cut-and-paste and interchange for markdown add a third editing mode beyond source/visual? 21:20:38 <TimStarling> <bd808> agreed. getting serious about multiple markup formats would led to cleaning up a lot of entagled cruft in core 21:20:44 <TimStarling> or it could be done as a ContentHandler 21:21:17 <bd808> yeah. then you could have a mixed wiki if you wanted 21:21:29 <TimStarling> then you wouldn't even touch $wgParser or create a Parser base class 21:21:31 <subbu> i don't see a used case for mixed-markup-format wikis. 21:21:31 <brion> #info tim sez "getting serious about multiple markup formats would led to cleaning up a lot of entagled cruft in core" 21:21:35 <subbu> that would be pretty confusing. 21:21:44 <TimStarling> no, I was quoting bd808 21:21:51 <brion> #info whoops bd808 sez that 21:22:04 * brion quote parsing error ;) 21:22:07 * bd808 denies it all 21:22:41 <TimStarling> it can be the default content handler if you like, the point of doing it as a content handler is that it gives you a convenient pre-existing hook point 21:22:53 <brion> i can see particular uses, such as when a wiki is used as a source repository of documents to be reused.... but they get scary ;) 21:22:59 <brion> (for mixed modes) 21:23:00 <TimStarling> pretty much everything about wikitext has already been abstracted there, for wikidata's benefit 21:23:36 <TimStarling> things like links table updates, redirect syntax, PST and parsing itself 21:23:42 <brion> #info tim is pretty sure ContentHandler can implement a markdown mode well. should already be well-factored. can be used as default contenthandler in theory 21:23:48 <subbu> i see ... 21:24:51 <bd808> that wouldn't effect site messages because the message system grabs onto $wgParser 21:25:12 <bd808> but maybe that's not a bad thing 21:25:17 <brion> but they'd still have to be written in wikitext if they are stored in a wiki page, right? 21:25:19 <TimStarling> yeah, that's the point 21:25:47 <TimStarling> site messages could have the wikitext content type, so you could even preview them using wikitext 21:26:27 <TimStarling> we already support default content types that vary depending on namespace 21:26:27 <brion> #info example of needing core parser: messages in MediaWiki: namespace, such as site notices. force them to use wikitext CH 21:26:34 <TimStarling> again for wikidata's benefit 21:26:34 <robla> is some sort of wikitext always going to be at the heart of MediaWiki or is T112999 forseeable? 21:26:35 <stashbot> T112999: Let MediaWiki operate entirely without wikitext - https://phabricator.wikimedia.org/T112999 21:27:09 <brion> robla: it's conceivable but we'd have to eliminate or make optional the remaining wikitext users ;) 21:27:36 <subbu> brion, i don't think robla is saying get rid of wikitext .. but whether mediawiki might support an option without wikitext. 21:27:48 <bd808> allowing the parser for site messages to change would be like adding a language variant to every i18n language which seems unlikely to turn out well 21:28:22 <brion> right you'd basically have to change them to plaintext or plaintext with a very limited markup that is not full wikitext 21:28:31 * subbu is trying to grok what bd808 just said 21:28:34 <TimStarling> I don't think it would really be helpful to attempt to translate i18n into some other markup language 21:28:46 <brion> but we've got all sorts of fun things like grammatical plural and gender markers done via a subset of wiki markup 21:28:47 <TimStarling> you know, i18n really drove the development of a lot of parser features 21:28:48 <bd808> subbu: en-wikitext && en-markdown 21:29:30 <brion> #info i18n is heavily dependent on a subset of the core parser for plurals, genders, and other message variants... but that doesn't have to be used for content if you don't want 21:29:36 <robla> let's say that the version of wikitext we have now is "wikitext 1.0". is "wikitext 1.1" something we could do? (and still support i18n) 21:30:10 * brion ponders 21:30:36 <brion> could we, or would we want to, split a wikitext spec into 'the bits used for i18n' and 'extra fancy-ass markup used in wikipedia-like content' 21:30:37 <brion> ? 21:30:45 <subbu> robla, wikitext has evolved over the years .. so, i guess the qn. you are asking is if explicit versioning is needed? 21:30:48 <brion> or is that even worse :D 21:30:49 <TimStarling> i18n of course is a mix of formats 21:31:05 <TimStarling> preprocessed plain text, preprocessed HTML and true wikitext 21:31:07 <brion> plaintext, plaintext plus, wikitext, html, .... oh helllllls 21:31:14 <robla> subbu: yeah, I think so 21:31:49 <TimStarling> well, except the qqq language which is pretty consistently wikitext 21:32:52 <brion> #info question: is explicit versioning needed? can/should we make a 'wikitext 1.1' that is always implemented for i18n and ui messages? 21:33:18 <brion> #info note i18n messages are a mix of plaintext+preprocess, HTML+preprocess, and pure wikitext 21:34:42 <TimStarling> robla, are you proposing any role for markdown on WMF wikis? 21:35:10 <Scott_WUaS> (What are the implications of these MediaWiki markdown choices/decisions re ContentTranslation and Wikipedia's 358 languages, and security questions especially?) 21:35:47 <robla> TimStarling: I think it potentially has a role in normalizing CopyPaste issues, but the path toward that is complicated 21:35:59 <brion> #info question: implications of markdown choices on other tools like CT, need for i18n, and security? 21:36:19 <subbu> that requires browsers, doc-creating systems (word, etc.) to support conversion to "standard" markdown. 21:36:45 <TimStarling> it seems very limited as an interchange format 21:37:02 <TimStarling> compared to RTF, HTML, PDF, etc. 21:37:22 <brion> if I were going to copy-paste from a markdown wiki page, bug report, or readme file on github for instance, my choices are to copy-paste the source, or copy-paste the rendered HTML 21:37:39 <robla> subbu: I think at a base level, we have a number of applications that claim "text/html" during copy/paste operations, but text/html copy pasting pretty much anything 21:37:52 <brion> we know that pasting text/html is way harder than it should be ;) but we already support it in VE 21:38:07 <subbu> brion, from some sources, yes. 21:38:13 <TimStarling> pasting HTML into VE is already good enough to be useful 21:38:14 <robla> brion: we support it today, but it's an arms race, isn't it? 21:38:17 <brion> benefits of source copy? 21:38:18 <TimStarling> I have used it a few times 21:38:19 <brion> hehe yep 21:39:00 <robla> no one (that I'm aware of) has defined a useful subset of HTML that is safe for copy/paste operations 21:39:08 <brion> but so is markdown isn't it? 21:39:22 <brion> if we support github's extensions, next we get asked about someone else's extensions 21:40:18 <brion> #info question is the HTML copy-paste "arms race" good enough vs markup-specific paste converter tools for markdown etc? 21:40:36 <TimStarling> HTML paste is likely to work if the HTML is very simple 21:41:00 <TimStarling> for example if you're copying from a github README.md you'd expect it to work 21:41:29 <robla> TimStarling: is there a "very simple" subset of HTML we can get browser makers to support? 21:41:43 <robla> (for copy/paste purposes)? 21:41:47 <subbu> robla, you linked to https://tools.ietf.org/html/draft-ietf-appsawg-text-markdown-12 ... what are your thoughts on how likely it is to be adopted? 21:42:24 <TimStarling> robla: no... but then browsers can't export to markdown either 21:42:31 <brion> #link https://tools.ietf.org/html/draft-ietf-appsawg-text-markdown-12 21:42:55 <robla> subbu: I think like that could happen 21:43:18 <subbu> our original goal for parsoid html2wt (which is still there as a comment in the serialization code) is to be able to accept arbitrary html and convert it to "acceptable" wikitext. but we haven't quite worked on that goal for a while now since we are mostly behind clients whose output is more controlled. 21:44:05 <robla> subbu: what do you mean by "output is more controlled"? 21:44:27 <subbu> as in .. VE/CX/Flow etc. don't generate arbitrary html. 21:44:39 <robla> ah, got it 21:45:25 <subbu> but, if you say, took the html from a bbc article and gave it to parsoid to convert to wikitext, the output isn't pretty. 21:45:35 <robla> so...basically, the copy/paste code works when we can control the generation of the HTML, but most implementations don't conform to our spec 21:45:51 <subbu> no, VE does its own handling of copy-pasted HTML .. it doesn't go through parsoid. 21:46:20 <brion> fun :D 21:46:29 <TimStarling> you mean it cleans up the HTML before it hands it to parsoid for serialization? 21:46:41 <subbu> but, we've talked about creating a library for normalization and cleanup. 21:47:01 <brion> #info for comparison, the HTML paste handling in VE is done by normalizing HTML on the VE end, before it eventually lands in parsoid during save/serialization 21:47:10 <subbu> TimStarling, as far as i know ... they strip unrecognized / unsupported attributes. 21:47:30 <brion> #info ideally the parsoid html2wt would take any html and produce 'acceptable' wikitext but is not fully exercised at that right now 21:49:28 <robla> things like html2wt are going to be necessary for a long time, I imagine, but it seems to me we should at least start pulling people toward a world where html2wt isn't necessary 21:50:32 <brion> well, there's the html-only world possibility :) 21:50:47 <brion> where you'd still have some validation stage 21:50:56 <brion> but not a major reparse i guess 21:51:14 <brion> (and presumably a stage to handle composition of templates, media etc) 21:51:56 <subbu> for parsoid to accept arbitrary html, we would need to run a sanitization pass on the html and strip unrecognized attributes, normalize html, etc. 21:52:09 <robla> I think we live in a world where wikitext is sanitized and tries to be safe, and HTML is known unsafe 21:52:28 <brion> indeed we'd have "inside html" and "outside html" at the least 21:52:32 <subbu> which is also something that needs to happen with a html-only wiki .. sanitization at the very least. 21:52:32 <brion> never, EVER mix em :D 21:52:44 <robla> there's no "sanitized HTML" spec 21:53:09 <subbu> :) 21:53:14 <brion> #info an HTML-only storage world needs to carefully sanitize between "outside HTML" and "safe inside HTML".... but there's no spec! we'd need one 21:53:37 <robla> there's the old HTML email spec 21:53:58 <robla> (but yeah, that's not really a good alternative) 21:54:38 <robla> https://en.wikipedia.org/wiki/HTML_email 21:55:34 <brion> probably we need to spec out our extensions as well, such as how you extract the file name from a usage, a wiki page from a link, a template reference and parameter set from a big ol' blob of divs or whatever 21:55:46 <TimStarling> I think if VE's HTML paste can produce reasonable wikitext markup for any HTML generated from original markdown, then that more or less replaces the need for direct markdown paste 21:56:15 <brion> i tend to agree 21:56:36 <TimStarling> "original markdown" as in http://daringfireball.net/projects/markdown/syntax 21:56:45 <TimStarling> which is much simpler than pandoc markdown 21:57:28 <robla> commonmark would be the modern simple version, I think 21:57:47 <robla> http://commonmark.org/ 21:57:50 <brion> ok we're getting low on time 21:58:09 <brion> any action items to pursue? decisions made? 21:58:21 <subbu> T127329 is the placeholder for the parsoid side work to consolidate html-import/cleanup code into a library for use by whoever. 21:58:21 <stashbot> T127329: Using Parsoid as a wikitext bridge for importing content into wikitext format - https://phabricator.wikimedia.org/T127329 21:58:50 <brion> #link https://phabricator.wikimedia.org/T127329 related parsoid bridge for html-import-to-wikitext 21:59:19 <Scott_WUaS> Thanks All! 21:59:26 <TimStarling> so I'm fairly skeptical about the idea of direct markdown paste as being superior to markdown->html->wikitext 21:59:28 <robla> subbu: my understanding is that you're working on RFCs as a goal soon, right? 21:59:28 <subbu> i was interested in the markdown strategy as a potential benefit for refactoring some code in mediawiki .. but looks like that is mostly already in place? 21:59:50 <brion> yay wikidata -> contenthandler \o/ 22:00:17 <subbu> robla, rfcs for .. that task i pasted above? 22:00:21 <brion> #info tim is skeptical of direct paste; html import seems to serve well 22:00:33 <robla> subbu: something related to T112999? 22:00:34 <stashbot> T112999: Let MediaWiki operate entirely without wikitext - https://phabricator.wikimedia.org/T112999 22:00:43 <brion> #action someone should revise the RfC, probably drop the cut-paste 22:00:44 <subbu> ah, cscott territory. 22:00:49 <subbu> yes. 22:00:58 <brion> #action update T112999 for the ContentHandler era 22:00:58 <stashbot> T112999: Let MediaWiki operate entirely without wikitext - https://phabricator.wikimedia.org/T112999 22:01:15 <subbu> i'll chat with him about it. 22:01:36 <brion> #action subbu will chat with cscott 22:01:38 <brion> thanks all! 22:01:41 <brion> #endmeeting