Hi all,
some months ago some mails on this ml dealt with html clean-up using the TagSoup lib (see [1]). Currently, TagSoup not only creates valid XML, but it' told to strip unknown elements ("bogons"), too: XFormsUtils.java sets ignorebogonsfeature to true, suppressing "bogons" effectively [2]. This is ok regarding security matters, but the same step (purging the HTML from unknown tags) is done twice, once by tagsoup and again by clean-html.xsl. In scenarios where certain custom, non-HTML elements should be allowed, the developer has to change both TagSoup and clean-html.xsl. To change the tagSoup behaviour, either a modification of the XFormsUtils.java source or the TagSoup source is required. As an enhancement, i would propose: - either change the tagSoup call just to create valid XML and let solely clean-html.xsl decide which elements are valid (this means that ignoreBogonsFeature should be set to false); - or implement a way to configure tagSoup using properties in properties-local.xml, so the user can decide if tagsoup should strip unknown tags. Personally, i would prefer the first option, but also the second one would IMHO be a step forward. It would be nice to hear some more opinions regarding this matter. florian [1] http://wiki.orbeon.com/forms/doc/developer-guide/xforms-controls#TOC-HTML-cleanup [2] http://github.com/orbeon/orbeon-forms/blob/master/src/java/org/orbeon/oxf/xforms/XFormsUtils.java#L256 -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Hi all,
to make the TagSoup and JTidy configuration accessible without the need to recompile XFormsUtils.java, i'd like to propose the attached patch. It handles two issues as follows: - TagSoup: there's a new boolean config property names "oxf.xforms.tagsoup.ignoreBogonsFeature" with default value "true". Changing this to false should make TagSoup accept unknown (non-html) elements. - JTidy: a new set of tidy config options with priority over the hard-coded ones in XFormsUtils.java is defined using the new config property "oxf.xforms.tidy.propertiesFile" (anyURI). That URI by default points to oxf:/config/tidy.properties, making the complete Tidy config accessible using the Java properties syntax. The proposed tidy.properties defines the canvas tag as additional, valid element that would otherwise get stripped from the content parsed by JTidy. Maybe some additional hints would be useful which tidy properties are available (could be placed in tidy.properties or in the wiki). I would be glad to hear your opinions! florian Index: src/resources-packaged/config/properties-xforms.xml =================================================================== --- src/resources-packaged/config/properties-xforms.xml (revision 205f9ed19328b47eddc290497b0252c5d20627b7) +++ src/resources-packaged/config/properties-xforms.xml (revision ) @@ -109,6 +109,8 @@ <property as="xs:boolean" name="oxf.xforms.datepicker.two-months" value="false"/> <property as="xs:string" name="oxf.xforms.htmleditor" value="yui"/> <!-- fck | yui --> <property as="xs:boolean" name="oxf.xforms.show-error-dialog" value="true"/> + <property as="xs:boolean" name="oxf.xforms.tagsoup.ignoreBogonsFeature" value="true"/> + <property as="xs:anyURI" name="oxf.xforms.tidy.propertiesFile" value="oxf:/config/tidy.properties"/> <property as="xs:integer" name="oxf.xforms.internal-short-delay" value="100"/> <property as="xs:integer" name="oxf.xforms.delay-before-incremental-request" value="500"/> Index: src/resources/config/tidy.properties =================================================================== --- src/resources/config/tidy.properties (revision ) +++ src/resources/config/tidy.properties (revision ) @@ -0,0 +1,21 @@ +## +# Copyright (C) 2010 Orbeon, Inc. +# +# This program is free software; you can redistribute it and/or modify it under the terms of the +# GNU Lesser General Public License as published by the Free Software Foundation; either version +# 2.1 of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; +# without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +# See the GNU Lesser General Public License for more details. +# +# The full text of the license is available at http://www.gnu.org/copyleft/lesser.html +# +show-warnings = false +quiet = true +new-empty-tags = canvas +new-inline-tags = canvas +input-encoding = utf-8 +numeric-entities = false +output-xml = false +input-xml = false \ No newline at end of file Index: src/java/org/orbeon/oxf/xforms/XFormsUtils.java =================================================================== --- src/java/org/orbeon/oxf/xforms/XFormsUtils.java (revision 205f9ed19328b47eddc290497b0252c5d20627b7) +++ src/java/org/orbeon/oxf/xforms/XFormsUtils.java (revision ) @@ -34,6 +34,7 @@ import org.orbeon.oxf.xml.*; import org.orbeon.oxf.xml.XMLUtils; import org.orbeon.oxf.xml.dom4j.*; +import org.orbeon.oxf.properties.Properties; import org.orbeon.saxon.Configuration; import org.orbeon.saxon.dom4j.NodeWrapper; import org.orbeon.saxon.functions.FunctionLibrary; @@ -60,6 +61,9 @@ private static final int SRC_CONTENT_BUFFER_SIZE = 1024; + private static final String XFORMS_TAGSOUP_IGNOREBOGONS = "oxf.xforms.tagsoup.ignoreBogonsFeature"; + private static final String XFORMS_TIDY_CONFIG_URI = "oxf.xforms.tidy.propertiesFile"; + // Binary types supported for upload, images, etc. private static final Map<String, String> SUPPORTED_BINARY_TYPES = new HashMap<String, String>(); @@ -228,10 +232,28 @@ public static org.w3c.dom.Document htmlStringToDocument(String value, LocationData locationData) { // Create and configure Tidy instance final Tidy tidy = new Tidy(); + final java.util.Properties tidyProps = new java.util.Properties(); + final String tidyPropsURI = Properties.instance().getPropertySet().getStringOrURIAsString(XFORMS_TIDY_CONFIG_URI); + boolean tidyConfAvailable = false; + + // try to grab external tidy config + try { + final URL tidyPropsURL = URLFactory.createURL(tidyPropsURI); + tidyProps.load(tidyPropsURL.openStream()); + tidy.setConfigurationFromProps(tidyProps); + tidyConfAvailable = true; + } catch (MalformedURLException e) { + throw new OXFException("Cannot create URL bases on tidy config property: '" + tidyPropsURI + "'", e); + } catch (IOException e) { + throw new OXFException("Cannot load external tidy config file at: '" + tidyPropsURI + "'", e); + } + // Fallback if external config isn't available + if (!tidyConfAvailable) { - tidy.setShowWarnings(false); - tidy.setQuiet(true); - tidy.setInputEncoding("utf-8"); - //tidy.setNumEntities(true); // CHECK: what does this do exactly? + tidy.setShowWarnings(false); + tidy.setQuiet(true); + tidy.setInputEncoding("utf-8"); + //tidy.setNumEntities(true); // CHECK: what does this do exactly? + } // Parse and output to SAXResult final byte[] valueBytes; @@ -252,8 +274,17 @@ try { final XMLReader xmlReader = new org.ccil.cowan.tagsoup.Parser(); final HTMLSchema theSchema = new HTMLSchema(); + xmlReader.setProperty(org.ccil.cowan.tagsoup.Parser.schemaProperty, theSchema); + + // try to get the ignoreBogonsProperty from Properties, set true if not available + final Boolean ignoreBogonsProperty = Properties.instance().getPropertySet().getBoolean(XFORMS_TAGSOUP_IGNOREBOGONS); + if (ignoreBogonsProperty != null) { + xmlReader.setFeature(org.ccil.cowan.tagsoup.Parser.ignoreBogonsFeature, ignoreBogonsProperty.booleanValue()); + } else { - xmlReader.setFeature(org.ccil.cowan.tagsoup.Parser.ignoreBogonsFeature, true); + xmlReader.setFeature(org.ccil.cowan.tagsoup.Parser.ignoreBogonsFeature, true); + } + final TransformerHandler identity = TransformerUtils.getIdentityTransformerHandler(); identity.setResult(result); xmlReader.setContentHandler(identity); -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
I agree, we need to find a way to make these items configurable
- first at the installation level - at some point at the individual component level Also I've noticed we're missing a number of tags in the clean-html.xsl <xsl:template match="*:sub | *:sup | ... Subscript and Superscript are supported by YUI but get gobbled up by clean-html.xsl /r |
The same goes for blockquote: supported by YUI RTE toolbar but stripped by clean-html.xsl
add to fix: <xsl:template match="*:blockquote | *:sub | *:sup ... |
Administrator
|
Thanks I added these:
http://github.com/orbeon/orbeon-forms/commit/be545915170a27c104998075b68d630a073c97a2 -Erik On Thu, Sep 30, 2010 at 7:58 AM, rdanner <[hidden email]> wrote: > > The same goes for blockquote: supported by YUI RTE toolbar but stripped by > clean-html.xsl > > add to fix: > <xsl:template match="*:blockquote | *:sub | *:sup ... > -- > View this message in context: http://orbeon-forms-ops-users.24843.n4.nabble.com/RfE-Tagsoup-configuration-in-properties-local-xml-tp2550129p2720927.html > Sent from the Orbeon Forms (ops-users) mailing list archive at Nabble.com. > > > -- > You receive this message as a subscriber of the [hidden email] mailing list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > OW2 mailing lists service home page: http://www.ow2.org/wws > > -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Administrator
|
In reply to this post by fl.schmitt(ops-users)
Florian,
Thanks for this patch. Just one concern: this means that the Tidy properties file will be loaded every time htmlStringToDocument() is called, right? And this can be pretty often as this is used by all controls that output HTML. Somehow, this should be cached. What do you think? -Erik On Wed, Sep 29, 2010 at 1:24 AM, Florian Schmitt <[hidden email]> wrote: > Hi all, > > to make the TagSoup and JTidy configuration accessible without the need > to recompile XFormsUtils.java, i'd like to propose the attached patch. > It handles two issues as follows: > > - TagSoup: there's a new boolean config property names > "oxf.xforms.tagsoup.ignoreBogonsFeature" with default value "true". > Changing this to false should make TagSoup accept unknown (non-html) > elements. > > - JTidy: a new set of tidy config options with priority over the > hard-coded ones in XFormsUtils.java is defined using the new config > property "oxf.xforms.tidy.propertiesFile" (anyURI). That URI by default > points to oxf:/config/tidy.properties, making the complete Tidy config > accessible using the Java properties syntax. The proposed > tidy.properties defines the canvas tag as additional, valid element that > would otherwise get stripped from the content parsed by JTidy. > > Maybe some additional hints would be useful which tidy properties are > available (could be placed in tidy.properties or in the wiki). > > I would be glad to hear your opinions! > > > florian > > > > > -- > You receive this message as a subscriber of the [hidden email] mailing list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > OW2 mailing lists service home page: http://www.ow2.org/wws > > -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Erik,
> Just one concern: this means that the Tidy properties file will be > loaded every time htmlStringToDocument() is called, right? And this > can be pretty often as this is used by all controls that output HTML. > > Somehow, this should be cached. What do you think? you're right, i didn't consider this. I'll try to find a solution. Maybe it would be nice for development purposes to keep that behaviour, so modifications to the JTidy config could take effect immediately. But in a production environment, loading the JTidy config every time surely isn't desirable. florian -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Administrator
|
>> Just one concern: this means that the Tidy properties file will be
As an implementation note: ideally we would have a trivial API for
>> loaded every time htmlStringToDocument() is called, right? And this >> can be pretty often as this is used by all controls that output HTML. >> >> Somehow, this should be cached. What do you think? > > you're right, i didn't consider this. I'll try to find a solution. Maybe > it would be nice for development purposes to keep that behaviour, so > modifications to the JTidy config could take effect immediately. But in > a production environment, loading the JTidy config every time surely > isn't desirable. this kind of caching against a single file. We do something similar to cache e.g. XML Schemas in the XForms engine. But looking at how we do that, it's not abstracted enough yet. Still, it shouldn't be too hard to do right ;) -Erik -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Free forum by Nabble | Edit this page |