Agenda
- Location: #wikimedia-office IRC channel
- Meeting type: Problem definition
- Time: 2016-10-26, Wednesday 21:00 UTC (2pm PDT, 23:00 CEST)
- Topic:
Meeting summary
- T138783 - SVG in HTML (robla, 21:01:52)
- T138783 - HTML in SVG (robla, 21:02:06)
- question discussed: should we use the Sanitizer class as HTML validation library for SVG? (robla, 21:15:44)
- question discussed: should MediaWiki be able to alter SVG uploads on save? (robla, 21:19:47)
- bawolff says code might need rearchitecting. AaronSchulz doesn't think it would be too difficult (robla, 21:24:51)
- TimStarling would like there to be an image link option which does client-side SVG rendering (TimStarling, 21:26:46)
- 14:27:41Â <bawolff>Â If we allow altering the svg, i think it would be cool to replace our entire validation code with DomPurify (robla, 21:28:09)
- LINK: https://github.com/cure53/DOMPurify (robla, 21:28:57)
- question discussed: is DOMPurify an improvement over the status quo (robla, 21:35:15)
- 14:42:58Â <gwicke>Â I definitely think that we should leverage other people's work in this space as much as possible (robla, 21:43:36)
- LINK: https://github.com/OWASP/java-html-sanitizer looks potentially interesting (gwicke, 21:51:04)
- LINK: https://github.com/darylldoyle/svg-sanitizer (subbu, 21:53:57)
Meeting ended at 22:00:07 UTC.
Log
1 | 21:01:32 <robla> #startmeeting ArchCom IRC meeting |
---|---|
2 | 21:01:32 <wm-labs-meetbot> Meeting started Wed Oct 26 21:01:32 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot. |
3 | 21:01:32 <wm-labs-meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. |
4 | 21:01:32 <wm-labs-meetbot> The meeting name has been set to 'archcom_irc_meeting' |
5 | 21:01:52 <robla> #topic T138783 - SVG in HTML |
6 | 21:01:52 <stashbot> T138783: SVG Upload should (optionally) allow the xhtml namespace - https://phabricator.wikimedia.org/T138783 |
7 | 21:02:06 <robla> #topic T138783 - HTML in SVG |
8 | 21:02:09 <robla> :-) |
9 | 21:02:14 <robla> hi folks! |
10 | 21:03:06 <robla> so....today's topic is about whether we should allow HTML in our SVG |
11 | 21:03:52 * robla goes to find more links to describe this better.... |
12 | 21:04:45 <robla> <https://lists.wikimedia.org/pipermail/wikitech-l/2016-October/086861.html> |
13 | 21:05:44 <robla> matmarex suggested "We have a HTML validation library (the Sanitizer class) and it could probably be hooked up to validating HTML" |
14 | 21:05:52 <robla> does that seem like a good idea? |
15 | 21:06:26 <TimStarling> MatmaRex is not in this channel and is marked /away |
16 | 21:06:35 <TimStarling> but maybe we have bawolff? |
17 | 21:06:42 <bawolff> Yep |
18 | 21:07:16 <DanielK_WMDE_> i think it's important to differentiate between a) should mediawiki support this at all and b) should this be allowed/used on wikimedia sites |
19 | 21:07:45 <DanielK_WMDE_> also, what html is typically encountered in that context, and how well is that supported by different renderers? |
20 | 21:08:13 <bawolff> as i wrote on the bug, seems in theory workable but im concerned the sanitizer wants to rewrite during sanitization which means youd have to be very pedantic in writing your svgs |
21 | 21:08:41 <bawolff> unless we change the upload pipeline to allow files to be modified during upload |
22 | 21:08:47 <TimStarling> as for (a), if there was a patch in gerrit to allow HTML behind a feature flag, it would presumably be accepted, we don't have a massively high bar for that sort of thing, right? |
23 | 21:09:54 <robla> TimStarling: I think the bar is "is this secure?" |
24 | 21:09:58 <TimStarling> yeah, you would have trouble using sanitizer directly |
25 | 21:10:25 <DanielK_WMDE_> TimStarling: or more nicely, some dort of pluggable validation thingy, yea. |
26 | 21:10:29 <TimStarling> I'm not really a big fan of Sanitizer in terms of the way the code is currently structured |
27 | 21:10:33 <SMalyshev> I'd guess the security depends on the sanitizer? |
28 | 21:10:38 <DanielK_WMDE_> then this could be an extension rather than a feature flag |
29 | 21:11:12 <DanielK_WMDE_> SMalyshev: also on the consumer of the html. currently, it's us rendering these, not the browser. |
30 | 21:11:32 <SMalyshev> you mean SVGs? |
31 | 21:11:45 <robla> what originally started this conversation was a discussion about a draw.io plugin where the feature flag wanted was "turn off sanitization" |
32 | 21:11:51 <DanielK_WMDE_> yes. well, the html inside the svgs |
33 | 21:11:56 <SMalyshev> ah yes if we have SVG renderer we also need to see what's going on there |
34 | 21:12:26 <TimStarling> according to bawolff, our SVG renderer does not support HTML, you'll just get an empty box |
35 | 21:12:26 <bawolff> They were using svg foreign objecty things |
36 | 21:12:56 <bawolff> so there may be fallback for renders not supporting it |
37 | 21:12:56 <TimStarling> which is not surprising, it's a very complicated spec to pull in |
38 | 21:13:56 * DanielK_WMDE_ wants interactive/scripted svg |
39 | 21:14:03 <TimStarling> an idea I had last hour: maybe we could have an option in the image link which enables client-side SVG rendering |
40 | 21:14:24 <bawolff> whether or not rsvg properly handles the fallback i have no idea as i havent tested that |
41 | 21:14:43 <TimStarling> you know there are performance reasons to do SVG in the server at the moment |
42 | 21:14:47 <DanielK_WMDE_> TimStarling: sounds ok for 3rd parties. for wikimedia, we'd need a decent fallback renderer |
43 | 21:14:55 <bawolff> DanielK_WMDE: well thats a whole other bag of worms :) |
44 | 21:15:08 <SMalyshev> if sanitizer follows https://meta.wikimedia.org/wiki/Help:HTML_in_wikitext is seems ok |
45 | 21:15:44 <robla> #info question discussed: should we use the Sanitizer class as HTML validation library for SVG? |
46 | 21:15:49 <bawolff> Sanitizer is what the parser itself uses so it should |
47 | 21:15:59 <DanielK_WMDE_> SMalyshev: i think that page is basically documenting what the sanitizer does |
48 | 21:16:57 <TimStarling> Sanitizer::removeHTMLtags() is currently only used by the parser, so it probably does some things which are wikitext-specific |
49 | 21:17:23 <TimStarling> and like bawolff previously said, it works by rewriting dubious HTML to be less dubious |
50 | 21:17:32 <bawolff> The translated entity names for example are wikitext specific |
51 | 21:17:41 <TimStarling> so you either have to allow it to alter uploads or have a strict rejection mode |
52 | 21:18:15 <robla> (side note: please feel free to add "#info" tag notes to get captured in the summary of this discussion) |
53 | 21:19:01 <bawolff> Allowing mediawiki to alter uploads would make a bunch of things unrelated to this much easier |
54 | 21:19:03 <TimStarling> for example, unquoted attribute values are rewritten to be double-quoted, which avoids relying on client parsing details |
55 | 21:19:46 <TimStarling> there's nothing really stopping us from doing it, right? |
56 | 21:19:47 <robla> #info question discussed: should MediaWiki be able to alter SVG uploads on save? |
57 | 21:20:29 <bawolff> I think some code has to be rearchitected, but not a major amount |
58 | 21:20:53 <TimStarling> AaronSchulz: does it sound difficult to you? |
59 | 21:21:03 <bawolff> The upload code is something a lot of people find scary and dont like touching |
60 | 21:21:31 <bawolff> but i dont thimk it would be very hard |
61 | 21:22:20 <AaronSchulz> bawolff: yeah, probably |
62 | 21:23:00 <TimStarling> UploadBase::performUpload() just uploads $this->mTempPath |
63 | 21:23:19 <TimStarling> it's not even trying to move the file from the stash, it just reuploads |
64 | 21:23:39 <TimStarling> so if you modify that file, you're all done, right? |
65 | 21:24:17 <bawolff> I imagine you would want to also give the user a warning |
66 | 21:24:31 <TimStarling> sure |
67 | 21:24:51 <robla> #info bawolff says code might need rearchitecting. AaronSchulz doesn't think it would be too difficult |
68 | 21:26:46 <TimStarling> #info TimStarling would like there to be an image link option which does client-side SVG rendering |
69 | 21:27:41 <bawolff> If we allow altering the svg, i think it would be cool to replace our entire validation code with DomPurify |
70 | 21:28:09 <robla> #info 14:27:41 <bawolff> If we allow altering the svg, i think it would be cool to replace our entire validation code with DomPurify |
71 | 21:28:22 <robla> thoughts on that? |
72 | 21:28:36 * robla goes to find the DomPurify link |
73 | 21:28:38 <bawolff> slightly complicated by that project being in js, but thats probably not an issue for wmf at least |
74 | 21:28:57 <robla> #link https://github.com/cure53/DOMPurify |
75 | 21:29:14 <bawolff> and i think we would be better using someone elses sanitizer than our own |
76 | 21:31:09 <bawolff> Since more eyes on their sanitizer than ours |
77 | 21:31:11 <robla> bawolff: you're suggesting that the Wikimedia instances would rely on server-side Javascript (i.e. Node.js). do you think it makes sense to require Node.js for secure SVG? |
78 | 21:31:31 <robla> (for third parties? |
79 | 21:31:37 <yurik_> hi, sorry i'm late to SVG party. DanielK_WMDE_ I second your desire for interactive SVGs :) Graphs at this point can produce SVG output, but graphs does not (yet) support an ability to create URL links |
80 | 21:31:54 <bawolff> i wouldnt say im happy about it... |
81 | 21:32:04 <TimStarling> that library does not look like an improvement |
82 | 21:32:38 <Scott_WUaS> (hi All) |
83 | 21:33:04 <yurik> so if graphoid was allowed to produce SVG output, it would already be a quality improvement, but not necessarily a functional improvement right away |
84 | 21:33:39 <yurik> but at least it gives a path forward, whereas without SVG, we have no easy way of producing clickable graphs |
85 | 21:33:59 <robla> legoktm pointed out T86874 on wikitech-l |
86 | 21:33:59 <stashbot> T86874: Make SVG sanitization into a library - https://phabricator.wikimedia.org/T86874 |
87 | 21:34:35 <bawolff> TimStarling: can you be more specific? I havent looked at that library in detail im more coming from a i dont like our current approach |
88 | 21:35:15 <robla> #info question discussed: is DOMPurify an improvement over the status quo |
89 | 21:35:53 <TimStarling> for a start, it's only 860 physical lines of code, so if you wanted it you could just port it to PHP |
90 | 21:36:51 <TimStarling> it's apparently just an XSS filter, not a cross-site request filter |
91 | 21:38:12 <bawolff> Yes, there would have to be a layer on top for parts outside their security model but inside ours |
92 | 21:38:23 <TimStarling> I don't know how well it has been reviewed |
93 | 21:39:18 <gwicke> IIRC csteipp viewed DOMPurify favorably, but we didn't manage to finish the review before he left |
94 | 21:39:25 <TimStarling> it's a lot more permissive than Sanitizer |
95 | 21:40:48 <gwicke> agreed on the need to layer our specific restrictions on top |
96 | 21:41:23 <TimStarling> var IS_ALLOWED_URI = /^(?:(?:(?:f|ht)tps?|mailto|tel):|[^a-z]|[a-z+.\-]+(?:[^a-z+.\-:]|$))/i; |
97 | 21:41:36 <gwicke> parsoid's sanitizer works at the token / sax level, so it shouldn't be all that hard to wrap it into a SAX visitor |
98 | 21:41:45 <TimStarling> I've got to go |
99 | 21:42:41 <robla> TimStarling: d'oh! ok....thanks for the discussion! |
100 | 21:42:58 <gwicke> I definitely think that we should leverage other people's work in this space as much as possible |
101 | 21:43:36 <robla> #info 14:42:58 <gwicke> I definitely think that we should leverage other people's work in this space as much as possible |
102 | 21:45:06 <robla> gwicke: do you think we should stop working on our PHP version of SVG validation? |
103 | 21:45:09 <yurik> robla, do you keep a list of potential usecases? Graphoid SVG output would be the first in line |
104 | 21:45:36 <yurik> (i meant i will be the first person to get into that line) |
105 | 21:46:14 <robla> yurik: I don't. This topic isn't *yet* an RFC, but if you want to turn this into an RFC, that'd be a great place to put your case on the first line |
106 | 21:46:14 <bawolff> Isnt graphoid svgs entirely machine generated (ie not attacker controlled)? Does it need general sanitation? |
107 | 21:47:00 <gwicke> robla: I think we should look at features / maturity of different options, and then evaluate a) which requires the least initial investment, and b) which will need the bigger long term maintenance effort on our end |
108 | 21:47:03 <yurik> bawolff, that was my understanding too, but for some reason csteip (i think) wanted to sanitize everything regardless of the source |
109 | 21:47:18 <yurik> "just in case" (tm) |
110 | 21:47:42 <robla> gwicke: I can't commit to writing an RFC for this, but that all seems reasonable. would you be up for writing it? |
111 | 21:47:46 <bawolff> Gwicke: do you know other options? |
112 | 21:48:05 <gwicke> I haven't checked the PHP code in a while, but my recollection from the last time I talked about this with csteipp was that it wasn't very complete for things like html-in-SVG |
113 | 21:48:24 <bawolff> DomPurify was the only one i could find that was even remotely good |
114 | 21:48:36 <yurik> the problem with that logic is that Vega is already enabled on the client, so if there is a security vulnerability, people are already exposed to it (i hope we caught all of them). So just adding SVG output does not seem to increase the attack surface |
115 | 21:48:41 <gwicke> bawolff: that's the only one we seriously considered back then |
116 | 21:49:23 <gwicke> but who knows, the web library world is changing every couple of months these days.. |
117 | 21:49:56 <bawolff> Im intrigued by Tim's comment about porting to php was interesting. That honestly never even occured to me |
118 | 21:50:23 <bawolff> but im not sure about the library support in php for DOM |
119 | 21:50:42 <bawolff> which dompurify seems to use extensively |
120 | 21:51:04 <gwicke> https://github.com/OWASP/java-html-sanitizer looks potentially interesting |
121 | 21:51:25 <bawolff> perhaps its there. Im honestly not very familar with that area of the php ecosystem |
122 | 21:51:38 * robla creates an RFC page https://www.mediawiki.org/wiki/Requests_for_comment/SVG_Upload_should_(optionally)_allow_the_xhtml_namespace |
123 | 21:51:41 <gwicke> I don't think it supports SVG docs, however |
124 | 21:53:06 <robla> ok...we're coming to the end of the hour. This has been a really informative discussion |
125 | 21:53:32 <robla> ...and we have a place we can collaborate on prose ^^^ |
126 | 21:53:57 <subbu> https://github.com/darylldoyle/svg-sanitizer |
127 | 21:54:14 <bawolff> Did we come to a conclusion about the original issue? |
128 | 21:54:42 * subbu only read the last few lines of this rfc since he got here late |
129 | 21:55:01 <robla> bawolff: I think the original issue was about should we allow XHTML at all, and I haven't heard any objections to it |
130 | 21:55:26 <gwicke> "just" need to solve the sanitization challenge.. |
131 | 21:55:27 <robla> the conversation seems moved to "which validation library should we use?" |
132 | 21:55:43 <bawolff> Subbu: thats exactly what im looking for |
133 | 21:55:49 <robla> that question seems it will be solved after vi vs emacs ;-) |
134 | 21:56:24 <robla> seriously though, we have a bunch of validation options we should probably consider in the RFC writeup |
135 | 21:56:27 <gwicke> the second issue is that we'll need to upgrade our rasterization tools to support HTML |
136 | 21:56:51 <gwicke> we do have at least one option for that, but it's not rsvg |
137 | 21:56:52 <robla> gwicke: yeah, I don't think we'll have time to touch on that topic |
138 | 21:57:11 <robla> let's take the discussion back to phab. I'll take the action to summarize there |
139 | 21:57:12 <Scott_WUaS> Thanks for the discussion, All! |
140 | 21:57:33 <robla> any last thoughts before I hit #endmeeting? |
141 | 21:57:56 <robla> (I'll end this arbitrarily very close to the top of the hour) |
142 | 21:58:31 <robla> we should continue this conversation over on #wikimedia-multimedia maybe :-) |
143 | 21:58:44 <bawolff> Gwicke: i dont know if thats actually neccesary for the original use case |
144 | 21:58:47 <Scott_WUaS> gwicke: further thoughts about the sanitization issue? |
145 | 21:58:51 * robla regrets not more explicitly pinging them before this |
146 | 21:59:02 <robla> 60 second warning |
147 | 21:59:14 <bawolff> obviously it would make more sense |
148 | 21:59:52 <robla> alrighty....thank you everyone! No meeting planned next week, and the following week has an option or two |
149 | 22:00:01 <Scott_WUaS> Thank you! |
150 | 22:00:07 <robla> #endmeeting |
People present (lines said)
- robla (45)
- bawolff (32)
- TimStarling (27)
- gwicke (13)
- DanielK_WMDE_ (9)
- yurik (7)
- Scott_WUaS (4)
- SMalyshev (4)
- wm-labs-meetbot (3)
- stashbot (2)
- subbu (2)
- AaronSchulz (1)
- yurik_ (1)
Other meetings
Architecture meetings | ||
---|---|---|
13:00 PT ArchCom Planning Meetings | upcoming | all since 2016-03-30 |
14:00 PT ArchCom-RFC Meetings | upcoming | all since 2015-09-09 |