RfE: Tagsoup configuration in properties-local.xml?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

RfE: Tagsoup configuration in properties-local.xml?

fl.schmitt(ops-users)
Hi all,

some months ago some mails on this ml dealt with html clean-up using the
TagSoup lib (see [1]). Currently, TagSoup not only creates valid XML,
but it' told to strip unknown elements ("bogons"), too: XFormsUtils.java
sets ignorebogonsfeature to true, suppressing "bogons" effectively [2].

This is ok regarding security matters, but the same step (purging the
HTML from unknown tags) is done twice, once by tagsoup and again by
clean-html.xsl. In scenarios where certain custom, non-HTML elements
should be allowed, the developer has to change both TagSoup and
clean-html.xsl. To change the tagSoup behaviour, either a modification
of the XFormsUtils.java source or the TagSoup source is required.

As an enhancement, i would propose:
- either change the tagSoup call just to create valid XML and let solely
clean-html.xsl decide which elements are valid (this means that
ignoreBogonsFeature should be set to false);
- or implement a way to configure tagSoup using properties in
properties-local.xml, so the user can decide if tagsoup should strip
unknown tags.

Personally, i would prefer the first option, but also the second one
would IMHO be a step forward.

It would be nice to hear some more opinions regarding this matter.

florian

[1]
http://wiki.orbeon.com/forms/doc/developer-guide/xforms-controls#TOC-HTML-cleanup
[2]
http://github.com/orbeon/orbeon-forms/blob/master/src/java/org/orbeon/oxf/xforms/XFormsUtils.java#L256


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Patch regarding TagSoup and JTidy

fl.schmitt(ops-users)
Hi all,

to make the TagSoup and JTidy configuration accessible without the need
to recompile XFormsUtils.java, i'd like to propose the attached patch.
It handles two issues as follows:

- TagSoup: there's a new boolean config property names
"oxf.xforms.tagsoup.ignoreBogonsFeature" with default value "true".
Changing this to false should make TagSoup accept unknown (non-html)
elements.

- JTidy: a new set of tidy config options with priority over the
hard-coded ones in XFormsUtils.java is defined using the new config
property "oxf.xforms.tidy.propertiesFile" (anyURI). That URI by default
points to oxf:/config/tidy.properties, making the complete Tidy config
accessible using the Java properties syntax. The proposed
tidy.properties defines the canvas tag as additional, valid element that
would otherwise get stripped from the content parsed by JTidy.

Maybe some additional hints would be useful which tidy properties are
available (could be placed in tidy.properties or in the wiki).

I would be glad to hear your opinions!


florian



Index: src/resources-packaged/config/properties-xforms.xml
===================================================================
--- src/resources-packaged/config/properties-xforms.xml (revision 205f9ed19328b47eddc290497b0252c5d20627b7)
+++ src/resources-packaged/config/properties-xforms.xml (revision )
@@ -109,6 +109,8 @@
     <property as="xs:boolean" name="oxf.xforms.datepicker.two-months"                       value="false"/>
     <property as="xs:string"  name="oxf.xforms.htmleditor"                                  value="yui"/>       <!-- fck | yui -->
     <property as="xs:boolean" name="oxf.xforms.show-error-dialog"                           value="true"/>
+    <property as="xs:boolean" name="oxf.xforms.tagsoup.ignoreBogonsFeature"                 value="true"/>
+    <property as="xs:anyURI"  name="oxf.xforms.tidy.propertiesFile"                         value="oxf:/config/tidy.properties"/>
 
     <property as="xs:integer" name="oxf.xforms.internal-short-delay"                        value="100"/>
     <property as="xs:integer" name="oxf.xforms.delay-before-incremental-request"            value="500"/>
Index: src/resources/config/tidy.properties
===================================================================
--- src/resources/config/tidy.properties (revision )
+++ src/resources/config/tidy.properties (revision )
@@ -0,0 +1,21 @@
+##
+# Copyright (C) 2010 Orbeon, Inc.
+#
+# This program is free software; you can redistribute it and/or modify it under the terms of the
+# GNU Lesser General Public License as published by the Free Software Foundation; either version
+# 2.1 of the License, or (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
+# without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+# See the GNU Lesser General Public License for more details.
+#
+# The full text of the license is available at http://www.gnu.org/copyleft/lesser.html
+#
+show-warnings       = false
+quiet               = true
+new-empty-tags      = canvas
+new-inline-tags     = canvas
+input-encoding      = utf-8
+numeric-entities    = false
+output-xml          = false
+input-xml           = false
\ No newline at end of file
Index: src/java/org/orbeon/oxf/xforms/XFormsUtils.java
===================================================================
--- src/java/org/orbeon/oxf/xforms/XFormsUtils.java (revision 205f9ed19328b47eddc290497b0252c5d20627b7)
+++ src/java/org/orbeon/oxf/xforms/XFormsUtils.java (revision )
@@ -34,6 +34,7 @@
 import org.orbeon.oxf.xml.*;
 import org.orbeon.oxf.xml.XMLUtils;
 import org.orbeon.oxf.xml.dom4j.*;
+import org.orbeon.oxf.properties.Properties;
 import org.orbeon.saxon.Configuration;
 import org.orbeon.saxon.dom4j.NodeWrapper;
 import org.orbeon.saxon.functions.FunctionLibrary;
@@ -60,6 +61,9 @@
 
     private static final int SRC_CONTENT_BUFFER_SIZE = 1024;
 
+    private static final String XFORMS_TAGSOUP_IGNOREBOGONS = "oxf.xforms.tagsoup.ignoreBogonsFeature";
+    private static final String XFORMS_TIDY_CONFIG_URI = "oxf.xforms.tidy.propertiesFile";
+
     // Binary types supported for upload, images, etc.
     private static final Map<String, String> SUPPORTED_BINARY_TYPES = new HashMap<String, String>();
 
@@ -228,10 +232,28 @@
     public static org.w3c.dom.Document htmlStringToDocument(String value, LocationData locationData) {
         // Create and configure Tidy instance
         final Tidy tidy = new Tidy();
+        final java.util.Properties tidyProps = new java.util.Properties();
+        final String tidyPropsURI = Properties.instance().getPropertySet().getStringOrURIAsString(XFORMS_TIDY_CONFIG_URI);
+        boolean tidyConfAvailable = false;
+
+        // try to grab external tidy config
+        try {
+            final URL tidyPropsURL = URLFactory.createURL(tidyPropsURI);
+            tidyProps.load(tidyPropsURL.openStream());
+            tidy.setConfigurationFromProps(tidyProps);
+            tidyConfAvailable = true;
+        } catch (MalformedURLException e) {
+            throw new OXFException("Cannot create URL bases on tidy config property: '" + tidyPropsURI + "'", e);
+        } catch (IOException e) {
+            throw new OXFException("Cannot load external tidy config file at: '" + tidyPropsURI + "'", e);
+        }
+        // Fallback if external config isn't available
+        if (!tidyConfAvailable) {
-        tidy.setShowWarnings(false);
-        tidy.setQuiet(true);
-        tidy.setInputEncoding("utf-8");
-        //tidy.setNumEntities(true); // CHECK: what does this do exactly?
+            tidy.setShowWarnings(false);
+            tidy.setQuiet(true);
+            tidy.setInputEncoding("utf-8");
+            //tidy.setNumEntities(true); // CHECK: what does this do exactly?
+        }
 
         // Parse and output to SAXResult
         final byte[] valueBytes;
@@ -252,8 +274,17 @@
         try {
             final XMLReader xmlReader = new org.ccil.cowan.tagsoup.Parser();
             final HTMLSchema theSchema = new HTMLSchema();
+
             xmlReader.setProperty(org.ccil.cowan.tagsoup.Parser.schemaProperty, theSchema);
+
+            // try to get the ignoreBogonsProperty from Properties, set true if not available
+            final Boolean ignoreBogonsProperty = Properties.instance().getPropertySet().getBoolean(XFORMS_TAGSOUP_IGNOREBOGONS);
+            if (ignoreBogonsProperty != null) {
+                xmlReader.setFeature(org.ccil.cowan.tagsoup.Parser.ignoreBogonsFeature, ignoreBogonsProperty.booleanValue());
+            } else {
-            xmlReader.setFeature(org.ccil.cowan.tagsoup.Parser.ignoreBogonsFeature, true);
+                xmlReader.setFeature(org.ccil.cowan.tagsoup.Parser.ignoreBogonsFeature, true);
+            }
+
             final TransformerHandler identity = TransformerUtils.getIdentityTransformerHandler();
             identity.setResult(result);
             xmlReader.setContentHandler(identity);



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Patch regarding TagSoup and JTidy

rdanner
I agree, we need to find a way to make these items configurable
- first at the installation level
- at some point at the individual component level

Also I've noticed we're missing a number of tags in the clean-html.xsl

 <xsl:template match="*:sub | *:sup | ...

Subscript and Superscript are supported by YUI but get gobbled up by clean-html.xsl

/r
Reply | Threaded
Open this post in threaded view
|

Re: Patch regarding TagSoup and JTidy

rdanner
The same goes for blockquote: supported by YUI RTE toolbar but stripped by clean-html.xsl

add to fix:
    <xsl:template match="*:blockquote | *:sub | *:sup ...
Reply | Threaded
Open this post in threaded view
|

Re: Re: Patch regarding TagSoup and JTidy

Erik Bruchez
Administrator
Thanks I added these:

http://github.com/orbeon/orbeon-forms/commit/be545915170a27c104998075b68d630a073c97a2

-Erik

On Thu, Sep 30, 2010 at 7:58 AM, rdanner <[hidden email]> wrote:

>
> The same goes for blockquote: supported by YUI RTE toolbar but stripped by
> clean-html.xsl
>
> add to fix:
>    <xsl:template match="*:blockquote | *:sub | *:sup ...
> --
> View this message in context: http://orbeon-forms-ops-users.24843.n4.nabble.com/RfE-Tagsoup-configuration-in-properties-local-xml-tp2550129p2720927.html
> Sent from the Orbeon Forms (ops-users) mailing list archive at Nabble.com.
>
>
> --
> You receive this message as a subscriber of the [hidden email] mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Patch regarding TagSoup and JTidy

Erik Bruchez
Administrator
In reply to this post by fl.schmitt(ops-users)
Florian,

Thanks for this patch.

Just one concern: this means that the Tidy properties file will be
loaded every time htmlStringToDocument() is called, right? And this
can be pretty often as this is used by all controls that output HTML.

Somehow, this should be cached. What do you think?

-Erik

On Wed, Sep 29, 2010 at 1:24 AM, Florian Schmitt
<[hidden email]> wrote:

> Hi all,
>
> to make the TagSoup and JTidy configuration accessible without the need
> to recompile XFormsUtils.java, i'd like to propose the attached patch.
> It handles two issues as follows:
>
> - TagSoup: there's a new boolean config property names
> "oxf.xforms.tagsoup.ignoreBogonsFeature" with default value "true".
> Changing this to false should make TagSoup accept unknown (non-html)
> elements.
>
> - JTidy: a new set of tidy config options with priority over the
> hard-coded ones in XFormsUtils.java is defined using the new config
> property "oxf.xforms.tidy.propertiesFile" (anyURI). That URI by default
> points to oxf:/config/tidy.properties, making the complete Tidy config
> accessible using the Java properties syntax. The proposed
> tidy.properties defines the canvas tag as additional, valid element that
> would otherwise get stripped from the content parsed by JTidy.
>
> Maybe some additional hints would be useful which tidy properties are
> available (could be placed in tidy.properties or in the wiki).
>
> I would be glad to hear your opinions!
>
>
> florian
>
>
>
>
> --
> You receive this message as a subscriber of the [hidden email] mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Patch regarding TagSoup and JTidy

fl.schmitt(ops-users)
Erik,

> Just one concern: this means that the Tidy properties file will be
> loaded every time htmlStringToDocument() is called, right? And this
> can be pretty often as this is used by all controls that output HTML.
>
> Somehow, this should be cached. What do you think?

you're right, i didn't consider this. I'll try to find a solution. Maybe
it would be nice for development purposes to keep that behaviour, so
modifications to the JTidy config could take effect immediately. But in
a production environment, loading the JTidy config every time surely
isn't desirable.

florian


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Patch regarding TagSoup and JTidy

Erik Bruchez
Administrator
>> Just one concern: this means that the Tidy properties file will be

>> loaded every time htmlStringToDocument() is called, right? And this
>> can be pretty often as this is used by all controls that output HTML.
>>
>> Somehow, this should be cached. What do you think?
>
> you're right, i didn't consider this. I'll try to find a solution. Maybe
> it would be nice for development purposes to keep that behaviour, so
> modifications to the JTidy config could take effect immediately. But in
> a production environment, loading the JTidy config every time surely
> isn't desirable.
As an implementation note: ideally we would have a trivial API for
this kind of caching against a single file. We do something similar to
cache e.g. XML Schemas in the XForms engine. But looking at how we do
that, it's not abstracted enough yet. Still, it shouldn't be too hard
to do right ;)

-Erik


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws