Problems with eXist-searching and embedded html!

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Problems with eXist-searching and embedded html!

Marcus-2

Hi,
ok, i just had to do some query for my app and asked for help at eXist-mailinglist and here is the problem! Hope anyone of you will have some ideas to resolve that!!!
 
As some of you may remeber i used embedded html for some of my contents saved in the documents. That for i had to do some work with the FCKeditor as well, but here goes the big problem now and i was leaded to that point when getting some answers about the eXist index structure!
 
Ok, Lets say i have a document with following content added by the FCKeditor to save to eXist:
 
<KatEintrag>
        <KatID>10</KatID>
        <Inhalt>
&lt;p&gt;&lt;font size="2"&gt;uͤ&amp;nbsp; &amp;amp;#x0075;&amp;amp;#x0364;&amp;amp;nbsp; &lt;/font&gt;&lt;font size="2"&gt;&amp;amp;#x0075;&amp;amp;#x0364;&lt;/font&gt;ͤͤ&lt;/p&gt; uͤ &amp;amp;nbsp;&lt;br /&gt;&lt;img src="/kkbib/UserFiles/Image/wink_smile.gif" style="width: 25px; height: 25px;" alt="" /&gt;
        </Inhalt>
        <Referenz>Test</Referenz>
        <Apparat>
&lt;font size="5"&gt;&amp;amp;#x0075;&amp;amp;#x0364;&amp;amp;nbsp;&lt;/font&gt;&lt;font size="5"&gt;&amp;#x0075;&amp;#x0364;&lt;/font&gt;&amp;#x0364&amp;#x0075
        </Apparat>
 </KatEintrag>
Than the problem we have is, that eXist indexes all single terms, even the @ belonging to the embedded html!
So in this case: "p, font, size, img, src, style, witdth, height, alt" would be indexed as well If one now searches eXist for those terms, all docs inhabit such embedded html@ are matches and therefore marked as ones. If someone like me now wants to transform the results coming from eXist to highlight the matches for the user, something like the following will happen: (real result of search!): I just searched for "size" even if it should give no results back, it comes back with following results:
 
I enterd the data within a textarea @mediatype="text/html" via the FCKeditor and displayed it within an xforms:output @mediatype="text/html"! Before displaying the data i run an xsl to highlight the <exist:matches>
 
this is the xml, coming back from eXist:
---------------------------------------------------------
<KatEintrag><KatID>10</KatID><Inhalt>&lt;p&gt;&lt;font <exist:match xmlns:exist="http://exist.sourceforge.net/NS/exist">size</exist:match>="2"&gt;uͤ&amp;nbsp; &amp;amp;#x0075;&amp;amp;#x0364;&amp;amp;nbsp;&lt;/font&gt;&lt;font <exist:match xmlns:exist="http://exist.sourceforge.net/NS/exist">size</exist:match>="2"&gt;&amp;amp;#x0075;&amp;amp;#x0364;&lt;/font&gt;ͤͤ&lt;/p&gt; uͤ &amp;amp;nbsp;&lt;br /&gt;&lt;img src="/kkbib/UserFiles/Image/wink_smile.gif" style="width: 25px; height: 25px;" alt="" /&gt;</Inhalt><Referenz>Test</Referenz><Apparat>&lt;font <exist:match xmlns:exist="http://exist.sourceforge.net/NS/exist">size</exist:match>="5"&gt;&amp;amp;#x0075;&amp;amp;#x0364;&amp;amp;nbsp;
&lt;/font&gt;&lt;font <exist:match xmlns:exist="http://exist.sourceforge.net/NS/exist">size</exist:match>="5"&gt;&amp;#x0075;&amp;#x0364;&lt;/font&gt;
&amp;#x0364&amp;#x0075</Apparat></KatEintrag>
 
 
After xsl, this is the one showing up in the browser source-code:
---------------------------------------------------------------------------------------------
<td><span class="paratitle"> <div id="xforms-element-175" class="xforms-control xforms-output xforms-mediatype-text-html xforms-mediatype-text"> <p><font>span class="xml-match" style="background:#dddddd; color:red;"&gt; size</font> ="2"&gt;uͤ&nbsp; &amp;#x0075;&amp;#x0364;&amp;nbsp;<font>span class="xml-match" style="background:#dddddd; color:red;"&gt; size</font> ="2"&gt;&amp;#x0075;&amp;#x0364;ͤͤ </p>uͤ &amp;nbsp;<br><img alt="" style="width: 25px; height: 25px;" src="/kkbib/UserFiles/Image/wink_smile.gif"></div><label class=" xforms-alert-inactive xforms-alert" for="xforms-element-175"></label></span></td>
 
And thats the one showing up to the users-view:
----------------------------------------------------------------------

span class="xml-match" style="background:#dddddd; color:red;"> size ="2">uͤ  &#x0075;&#x0364;&nbsp;span class="xml-match" style="background:#dddddd; color:red;"> size ="2">&#x0075;&#x0364;ͤͤ

uͤ &nbsp;
 
 
So, i hope that makes the problem clear!?
The mainpoint is, that those docs are only given back as results, because eXist indexed those embedded html@ a single terms! Therefor finds the matches and viola - there is the problem! Any suggestions???
 
Thanks, Marcus


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws

wink_smile.gif (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Problems with eXist-searching and embedded html!

Alessandro Vernet
Administrator
Hi Marcus,

On 6/18/07, Marcus <[hidden email]> wrote:
> So, i hope that makes the problem  clear!?
> The mainpoint is, that those docs are only given  back as results, because eXist
> indexed those embedded html@ a single terms!  Therefor finds the matches and
> viola - there is the problem! Any  suggestions???

This is a tricky one. Maybe one way to handle this is to:

1) Use JTidy or TagSoup to parse the HTML into XHTML.
2) Extract only the text in that XHTML (string(/*)).
3) Store this in another element of the instance, which you would
index with eXist.

The most challenging part is #1. Most likely you would need to write
some Java code for this and call the appropriate library. Do you have
another approach in mind?

Alex
--
Orbeon Forms - Web 2.0 Forms, open-source, for the Enterprise
http://www.orbeon.com/



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Problems with eXist-searching and embedded html!

fl.schmitt(ops-users)
In reply to this post by Marcus-2
Hi Marcus,

i'm not sure whether i understand the problem correctly, but what about pre- and postprocessing the html content from html/xml to "escaped html" and back again? That has the advantage that eXist "sees" only the html *as html* and will be able to handle the tags/elements as such. Of course you will have to configure the indexing inside eXist: see http://www.exist-db.org/indexing.html#N101C4 , especially regarding the "mixed content" example.

To preprocess the html to text, you could implement a xslt containing something like
<xsl:template match="Inhalt">
    <xsl:copy>
        <xsl:value-of select="saxon:serialize(.,'html')" />
    </xsl:copy>
</xsl:template>

For the postprcessing, i use something like

<xsl:template match="Inhalt">
    <xsl:value-of select="saxon:parse('&lt;Inhalt&gt;',.,'&lt;/Inhalt&gt;')" />
</xsl:template>


HTH
florian



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Problems with eXist-searching and embedded html!

Marcus-2
In reply to this post by Alessandro Vernet
Hi there,

after read some replys on this and the eXist list and talked to Ryan as well
i think i have to find a way to store XHTML content to eXist and there index
it as "mixed content"
I'm afraid, all other solutions won't work for this project, as one of the
important points was and still is the possibilty of using embedded html to
add special formating to the content for later viewing! This is one of those
project where people need and want functionalities, but have NO IDEA if
something is possible or which what afford this could be established at all
:-(( Of course poeple having no deeper knowledge on database-systems or
webapplications - but they don't care either, they just want a working
solution, no matter how difficult to achiev - thats live!!!

ok, back to the last postings :-)
Ryan mentioned the same as Florian , BTW thanks for that code-snippets as i
often have no clue how to work with some functionality unless i have seen
some examples - not the best behaviour for a computer scientist, but i have
to live with that :-) But what i don't understand is, how the second
solution works:

For the postprcessing, i use something like
<xsl:template match="Inhalt">
    <xsl:value-of select="saxon:parse('&lt;Inhalt&gt;',.,'&lt;/Inhalt&gt;')"
/>
</xsl:template>

Will this also tranform every innerHtml of Inhalt back to the lexical
version??? While i don't know what html-tags will be used inside the
<Inhalt>-Tag i can't write a template for every tag-possibilty :-(

But as Alex mention in the last one, the important question is, if i will
get xhtml from the FCK or if have to tidy it myself. I think as every time,
the answer is the last one, right? Does anyone has every used JTidy or
HTMLTidy with OPS and can give me some examples or offer some working code
here??? My knowledge of Javacoding is not the best - ok, i can still
understand some code and make own changes - i just got the eXist-xmldbRealm
working and coded that new, but to improve a full new functionality would be
to difficulk to solve it in such short time i have left  :-(( So every help
on this is appreciated very much!!!

Regards, Marcus

PS: @Alex: 2# and 3# won't work here while i would have to change all the
schemas, models and viewings and this would be the more dirty hack which my
Prof won't honer I'm sure :-(


----- Original Message -----
From: "Alessandro Vernet" <[hidden email]>
To: <[hidden email]>
Sent: Monday, June 18, 2007 11:15 PM
Subject: Re: [ops-users] Problems with eXist-searching and embedded html!


> Hi Marcus,
>
> On 6/18/07, Marcus <[hidden email]> wrote:
>> So, i hope that makes the problem  clear!?
>> The mainpoint is, that those docs are only given  back as results,
>> because eXist
>> indexed those embedded html@ a single terms!  Therefor finds the matches
>> and
>> viola - there is the problem! Any  suggestions???
>
> This is a tricky one. Maybe one way to handle this is to:
>
> 1) Use JTidy or TagSoup to parse the HTML into XHTML.
> 2) Extract only the text in that XHTML (string(/*)).
> 3) Store this in another element of the instance, which you would
> index with eXist.
>
> The most challenging part is #1. Most likely you would need to write
> some Java code for this and call the appropriate library. Do you have
> another approach in mind?
>
> Alex
> --
> Orbeon Forms - Web 2.0 Forms, open-source, for the Enterprise
> http://www.orbeon.com/
>
>

--------------------------------------------------------------------------------


>
> --
> You receive this message as a subscriber of the [hidden email]
> mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
>




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Problems with eXist-searching and embedded html!

fl.schmitt(ops-users)
Hi Marcus,

>> For the postprcessing, i use something like
>> <xsl:template match="Inhalt">
>>    <xsl:value-of
>> select="saxon:parse('&lt;Inhalt&gt;',.,'&lt;/Inhalt&gt;')" />
>> </xsl:template>

Hmm - i suppose i forgot a concat() - calling saxon:parse with more than
one argument will probably fail:
http://www.saxonica.com/documentation/extensions/functions/parse.html
So it should be
select="saxon:parse(concat('&lt;Inhalt&gt;',.,'&lt;/Inhalt&gt;'))"

The only reason not to copy the Inhalt element but to construct it "by
hand" is that Inhalt may have more then one child element, so i try to
ensure that there's only one document element, namely Inhalt.

> Will this also tranform every innerHtml of Inhalt back to the lexical
> version??? While i don't know what html-tags will be used inside the
> <Inhalt>-Tag i can't write a template for every tag-possibilty :-(

Yes, i think so :-) i use it for fckeditor with some custom tags, and
everything inside Inhalt gets transformed.

> But as Alex mention in the last one, the important question is, if i
> will get xhtml from the FCK or if have to tidy it myself. I think as
> every time, the answer is the last one, right?

Again i'm not sure, but the FCKeditor homepage says that the editor does
xhtml 1.0 output: http://www.fckeditor.net/ , under "features". As i
said, i have a similar scenario, and the parse and serialize
transformations were the only ones i had to apply before or after
talking to the eXist db.


Greetings,
florian




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Problems with eXist-searching and embedded html!

Marcus-2
In reply to this post by Alessandro Vernet
Hi Alex,
some researches brought up the information, that every code provided by
FCKeditor should be XHTML 1.0 conform!
And Florian said, that he did it the same way without any difficulties.

Regards, Marcus


----- Original Message -----
From: "Alessandro Vernet" <[hidden email]>
To: <[hidden email]>
Sent: Monday, June 18, 2007 11:15 PM
Subject: Re: [ops-users] Problems with eXist-searching and embedded html!


> Hi Marcus,
>
> On 6/18/07, Marcus <[hidden email]> wrote:
>> So, i hope that makes the problem  clear!?
>> The mainpoint is, that those docs are only given  back as results,
>> because eXist
>> indexed those embedded html@ a single terms!  Therefor finds the matches
>> and
>> viola - there is the problem! Any  suggestions???
>
> This is a tricky one. Maybe one way to handle this is to:
>
> 1) Use JTidy or TagSoup to parse the HTML into XHTML.
> 2) Extract only the text in that XHTML (string(/*)).
> 3) Store this in another element of the instance, which you would
> index with eXist.
>
> The most challenging part is #1. Most likely you would need to write
> some Java code for this and call the appropriate library. Do you have
> another approach in mind?
>
> Alex
> --
> Orbeon Forms - Web 2.0 Forms, open-source, for the Enterprise
> http://www.orbeon.com/
>
>

--------------------------------------------------------------------------------


>
> --
> You receive this message as a subscriber of the [hidden email]
> mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
>




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Problems with eXist-searching and embedded html!

Marcus-2
In reply to this post by fl.schmitt(ops-users)
Hi Florian,
i tried the following XPL, but i still got 1 Error left :-(
If i comment the importeant template-match out, no errors occure :-(
Perhaps someone could tell me why?
BTW: Where are the differences between the certain xslt-pipelines? xslt,
xslt-2.0, unsafe-xslt???


<p:config xmlns:p="http://www.orbeon.com/oxf/pipeline"
          xmlns:oxf="http://www.orbeon.com/oxf/processors"
    xmlns:saxon="http://saxon.sf.net/"
          xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <p:param name="instance" type="input"/>
    <p:param name="data" type="output"/>

    <p:processor name="oxf:unsafe-xslt">
        <p:input name="data" href="#instance"/>
        <p:input name="config">
   <xsl:stylesheet version="2.0"
       xmlns="http://www.w3.org/1999/xhtml"
    xmlns:saxon="http://saxon.sf.net/"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="@* | node()">
     <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
     </xsl:copy>
    </xsl:template>

    <xsl:template
match="Inhalt|Apparat|AltBez|Provenienz|Besitz|Schreiber|Incipit|Explicit">
     <xsl:copy>
      <xsl:value-of select="saxon:serialize(.,'html')"/>
     </xsl:copy>
    </xsl:template>

   </xsl:stylesheet>
        </p:input>
        <p:output name="data" id="serialized" />
    </p:processor>

 <p:processor name="oxf:identity">
  <p:input name="data" href="#serialized" debug="SERIALIZE"/>
  <p:output name="data" ref="data" debug="SERIALIZE"/>
 </p:processor>

</p:config>


Hope that someone sees the error :-(
Thanks, Marcus



----- Original Message -----
From: "Florian Schmitt" <[hidden email]>
To: <[hidden email]>
Sent: Tuesday, June 19, 2007 8:23 AM
Subject: Re: [ops-users] Problems with eXist-searching and embedded html!


> Hi Marcus,
>
>>> For the postprcessing, i use something like
>>> <xsl:template match="Inhalt">
>>>    <xsl:value-of
>>> select="saxon:parse('&lt;Inhalt&gt;',.,'&lt;/Inhalt&gt;')" />
>>> </xsl:template>
>
> Hmm - i suppose i forgot a concat() - calling saxon:parse with more than
> one argument will probably fail:
> http://www.saxonica.com/documentation/extensions/functions/parse.html
> So it should be
> select="saxon:parse(concat('&lt;Inhalt&gt;',.,'&lt;/Inhalt&gt;'))"
>
> The only reason not to copy the Inhalt element but to construct it "by
> hand" is that Inhalt may have more then one child element, so i try to
> ensure that there's only one document element, namely Inhalt.
>
>> Will this also tranform every innerHtml of Inhalt back to the lexical
>> version??? While i don't know what html-tags will be used inside the
>> <Inhalt>-Tag i can't write a template for every tag-possibilty :-(
>
> Yes, i think so :-) i use it for fckeditor with some custom tags, and
> everything inside Inhalt gets transformed.
>
>> But as Alex mention in the last one, the important question is, if i
>> will get xhtml from the FCK or if have to tidy it myself. I think as
>> every time, the answer is the last one, right?
>
> Again i'm not sure, but the FCKeditor homepage says that the editor does
> xhtml 1.0 output: http://www.fckeditor.net/ , under "features". As i
> said, i have a similar scenario, and the parse and serialize
> transformations were the only ones i had to apply before or after
> talking to the eXist db.
>
>
> Greetings,
> florian
>
>
>

--------------------------------------------------------------------------------


>
> --
> You receive this message as a subscriber of the [hidden email]
> mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
>




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Problems with eXist-searching and embedded html!

fl.schmitt(ops-users)
Hi Marcus,

> i tried the following XPL, but i still got 1 Error left :-(
> If i comment the importeant template-match out, no errors occure :-(
> Perhaps someone could tell me why?

I suppose you're referring to this template?

 >    <xsl:template
 >
match="Inhalt|Apparat|AltBez|Provenienz|Besitz|Schreiber|Incipit|Explicit">
 >     <xsl:copy>
 >      <xsl:value-of select="saxon:serialize(.,'html')"/>
 >     </xsl:copy>
 >    </xsl:template>

I think you have to declare the html output as xsl:output element (again
a point i missed in my description - sorry!). In my XPL, i use a XML
output as follows:

<xsl:stylesheet version="2.0"
     xmlns:saxon="http://saxon.sf.net/"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

     <xsl:output name="xml" method="xml" omit-xml-declaration="yes"
encoding="UTF-8"/>
     (...)
     <!-- templates -->
</xsl:stylesheet>

Documentation for the saxon:serialize function:
http://www.saxonica.com/documentation/extensions/functions/serialize.html

... and for xsl:output:

http://xml.cnec.org/xsl/elements/output.html
http://www.w3schools.com/xsl/el_output.asp

> BTW: Where are the differences between the certain xslt-pipelines? xslt,
> xslt-2.0, unsafe-xslt???

http://www.orbeon.com/ops/doc/processors-xslt#d18e194  :-)


HTH
florian





--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Problems with eXist-searching and embedded html!

Marcus-2
In reply to this post by Marcus-2
Hi,
its me, once again:

The error seems to be the saxon-function :-(   saxon:serialize(.,'xhtml')
A more detailt error message is:

 oxf:/apps/common/serialize_SIG.xpl, line -1, column -1, description null:
Failed to compile stylesheet. 1 error detected. Error at , line 15 of
oxf:/apps/common/serialize_SIG.xpl: Requested output format xhtml has not
been defined Failed to compile stylesheet. 1 error detected.

But i need help with that, while i don't know what to do about it :-(
Thanks, Marcus



----- Original Message -----
From: "Marcus" <[hidden email]>
To: <[hidden email]>
Sent: Friday, June 22, 2007 5:51 AM
Subject: Re: [ops-users] Problems with eXist-searching and embedded html!


> Hi Florian,
> i tried the following XPL, but i still got 1 Error left :-(
> If i comment the importeant template-match out, no errors occure :-(
> Perhaps someone could tell me why?
> BTW: Where are the differences between the certain xslt-pipelines? xslt,
> xslt-2.0, unsafe-xslt???
>
>
> <p:config xmlns:p="http://www.orbeon.com/oxf/pipeline"
>          xmlns:oxf="http://www.orbeon.com/oxf/processors"
>    xmlns:saxon="http://saxon.sf.net/"
>          xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
>
>    <p:param name="instance" type="input"/>
>    <p:param name="data" type="output"/>
>
>    <p:processor name="oxf:unsafe-xslt">
>        <p:input name="data" href="#instance"/>
>        <p:input name="config">
>   <xsl:stylesheet version="2.0"
>       xmlns="http://www.w3.org/1999/xhtml"
>    xmlns:saxon="http://saxon.sf.net/"
>    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
>
>    <xsl:template match="@* | node()">
>     <xsl:copy>
>      <xsl:apply-templates select="@* | node()"/>
>     </xsl:copy>
>    </xsl:template>
>
>    <xsl:template
> match="Inhalt|Apparat|AltBez|Provenienz|Besitz|Schreiber|Incipit|Explicit">
>     <xsl:copy>
>      <xsl:value-of select="saxon:serialize(.,'html')"/>
>     </xsl:copy>
>    </xsl:template>
>
>   </xsl:stylesheet>
>        </p:input>
>        <p:output name="data" id="serialized" />
>    </p:processor>
>
> <p:processor name="oxf:identity">
>  <p:input name="data" href="#serialized" debug="SERIALIZE"/>
>  <p:output name="data" ref="data" debug="SERIALIZE"/>
> </p:processor>
>
> </p:config>
>
>
> Hope that someone sees the error :-(
> Thanks, Marcus
>
>
>
> ----- Original Message -----
> From: "Florian Schmitt" <[hidden email]>
> To: <[hidden email]>
> Sent: Tuesday, June 19, 2007 8:23 AM
> Subject: Re: [ops-users] Problems with eXist-searching and embedded html!
>
>
>> Hi Marcus,
>>
>>>> For the postprcessing, i use something like
>>>> <xsl:template match="Inhalt">
>>>>    <xsl:value-of
>>>> select="saxon:parse('&lt;Inhalt&gt;',.,'&lt;/Inhalt&gt;')" />
>>>> </xsl:template>
>>
>> Hmm - i suppose i forgot a concat() - calling saxon:parse with more than
>> one argument will probably fail:
>> http://www.saxonica.com/documentation/extensions/functions/parse.html
>> So it should be
>> select="saxon:parse(concat('&lt;Inhalt&gt;',.,'&lt;/Inhalt&gt;'))"
>>
>> The only reason not to copy the Inhalt element but to construct it "by
>> hand" is that Inhalt may have more then one child element, so i try to
>> ensure that there's only one document element, namely Inhalt.
>>
>>> Will this also tranform every innerHtml of Inhalt back to the lexical
>>> version??? While i don't know what html-tags will be used inside the
>>> <Inhalt>-Tag i can't write a template for every tag-possibilty :-(
>>
>> Yes, i think so :-) i use it for fckeditor with some custom tags, and
>> everything inside Inhalt gets transformed.
>>
>>> But as Alex mention in the last one, the important question is, if i
>>> will get xhtml from the FCK or if have to tidy it myself. I think as
>>> every time, the answer is the last one, right?
>>
>> Again i'm not sure, but the FCKeditor homepage says that the editor does
>> xhtml 1.0 output: http://www.fckeditor.net/ , under "features". As i
>> said, i have a similar scenario, and the parse and serialize
>> transformations were the only ones i had to apply before or after
>> talking to the eXist db.
>>
>>
>> Greetings,
>> florian
>>
>>
>>
>
>
> --------------------------------------------------------------------------------
>
>
>>
>> --
>> You receive this message as a subscriber of the [hidden email]
>> mailing list.
>> To unsubscribe: mailto:[hidden email]
>> For general help: mailto:[hidden email]?subject=help
>> ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
>>
>
>
>

--------------------------------------------------------------------------------


>
> --
> You receive this message as a subscriber of the [hidden email]
> mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
>




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Problems with eXist-searching and embedded html!

Marcus-2
Hi,
ok, this is one of the trickiest things i try to find a solution :-(
I'm thankful for every hint and i'm always willing to learn, but after
trying for 2 days, i neeed help with that!

OK, here is what i found out so far, and what works now, after spending much
time on little things, that could be so easy, but i didn't get it -
sometimes i think i'm realy stupid, but ok.

My submissions:
-----------------
    <xforms:submission id="serialize-submission"
ref="xxforms:instance('document-instance')"
            action="/admin/service/serialize/{xxforms:instance('parameters-instance')/form-id}"
            method="post" replace="instance" f:url-type="resource"
xxforms:instance="document-instance" />
    <xforms:submission id="parse-submission"
ref="xxforms:instance('document-instance')"
            action="/admin/service/parse/{xxforms:instance('parameters-instance')/form-id}"
            method="post" replace="instance" f:url-type="resource"
xxforms:instance="document-instance" />

The first hurd was to find out, that i need to do this with method="post",
while with get, i had always an NULL input in the pipeline :-( But ok, now i
can access the instance in my XPL :-)
Hope the rest of it is ok!?

My Pageflow:
--------------
    <page id="serialize-to-xhtml" path-info="/*/service/serialize/([^/]+)"
matcher="oxf:perl5-matcher"
            view="oxf:/apps/common/serialize_${1}.xpl"/>
    <page id="parse-to-lexical" path-info="/*/service/parse/([^/]+)"
matcher="oxf:perl5-matcher"
            view="oxf:/apps/common/parse_${1}.xpl"/>

OK, here i tried to use the xpl with @model - until i noticed, that my
results where never replaced in my real document-instance but where lost
somewhere else. So, NOW i'm glad to view the result in the Instance
Inspector, while my instance is replaced :-) Hope that the rest is still ok
and without any errors!?

My original instance: (after working with the FCKeditor)
--------------------
<KatEintrag>
    <KatID>10</KatID>
    <Inhalt>&lt;p&gt;&lt;font size="2"&gt;uͤ&amp;nbsp;
&amp;amp;#x0075;&amp;amp;#x0364;&amp;amp;nbsp;&lt;/font&gt;&lt;font
size="2"&gt;&amp;amp;#x0075;&amp;amp;#x0364;&lt;/font&gt;ͤͤ&lt;/p&gt; uͤ
&amp;amp;nbsp;&lt;br /&gt;&lt;img
src="/kkbib/UserFiles/Image/wink_smile.gif" style="width: 25px; height:
25px;" alt="" /&gt;</Inhalt>
    <Referenz>Test</Referenz>
     <Apparat>&lt;font
size="5"&gt;&amp;amp;#x0075;&amp;amp;#x0364;&amp;amp;nbsp;&lt;/font&gt;&lt;font
size="5"&gt;&amp;#x0075;&amp;#x0364;&lt;/font&gt;&amp;#x0364&amp;#x0075</Apparat>
</KatEintrag>

What i need for saving to eXist: (just like the formatted output of the
Instance Inspector)
--------------------------------
<KatEintrag>
    <KatID>10</KatID>
    <Inhalt>
        <p><font size="2">uͤ&nbsp;
&amp;#x0075;&amp;#x0364;&amp;nbsp;</font><font
size="2">&amp;#x0075;&amp;#x0364;</font>ͤͤ</p> uͤ &amp;nbsp;<br /><img
src="/kkbib/UserFiles/Image/wink_smile.gif" style="width: 25px; height:
25px;" alt="" />
    </Inhalt>
    <Referenz>Test</Referenz>
    <Apparat>
        <font size="5">&amp;#x0075;&amp;#x0364;&amp;nbsp;</font><font
size="5">&#x0075;&#x0364;</font> &#x0364&#x0075
    </Apparat>
</KatEintrag>

But every try brings other results than the exspected :-(
Here is my XPL:
-----------------
<p:config xmlns:p="http://www.orbeon.com/oxf/pipeline"
xmlns:oxf="http://www.orbeon.com/oxf/processors"
    xmlns:saxon="http://saxon.sf.net/"
xmlns:xxforms="http://orbeon.org/oxf/xml/xforms"
    xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <p:param name="instance" type="input"/>
    <p:param name="data" type="output"/>

    <p:processor name="oxf:xslt">
        <p:input name="data" href="#instance" debug="INSTANCE"/>
        <p:input name="config" debug="CONFIG">

           <xsl:stylesheet version="2.0"
xmlns="http://www.w3.org/1999/xhtml" xmlns:saxon="http://saxon.sf.net/"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

               <xsl:output method="html" omit-xml-declaration="yes"
name="html"/>

                   <xsl:template match="@* | node()">
                       <xsl:copy>
                          <xsl:apply-templates select="@* | node()"/>
                       </xsl:copy>
                  </xsl:template>

                <xsl:template
match="Inhalt|Apparat|AltBez|Provenienz|Besitz|Schreiber|Incipit|Explicit">
                    <xsl:copy>
                       <xsl:value-of select="saxon:serialize(.,'html')"/>
                    </xsl:copy>
               </xsl:template>

           </xsl:stylesheet>
        </p:input>
        <p:output name="data" ref="data" debug="SERIALIZED"/>
    </p:processor>
</p:config>

With:  <xsl:output method="html" omit-xml-declaration="yes" name="html"/>
and <xsl:value-of select="saxon:serialize(.,'html')"/> i'm getting:
-----------------------
<KatEintrag>
    <KatID>10</KatID>
    <Inhalt><Inhalt>&lt;p&gt;&lt;font size="2"&gt;uͤ&amp;nbsp;
&amp;amp;#x0075;&amp;amp;#x0364;&amp;amp;nbsp;&lt;/font&gt;&lt;font
size="2"&gt;&amp;amp;#x0075;&amp;amp;#x0364;&lt;/font&gt;ͤͤ&lt;/p&gt; uͤ
&amp;amp;nbsp;&lt;br /&gt;&lt;img
src="/kkbib/UserFiles/Image/wink_smile.gif" style="width: 25px; height:
25px;" alt="" /&gt; </Inhalt></Inhalt>
    <Referenz>Test</Referenz>
    <Apparat><Apparat>&lt;font
size="5"&gt;&amp;amp;#x0075;&amp;amp;#x0364;&amp;amp;nbsp;&lt;/font&gt;&lt;font
size="5"&gt;&amp;#x0075;&amp;#x0364;&lt;/font&gt; &amp;#x0364&amp;#x0075
</Apparat></Apparat>
</KatEintrag>

With:  <xsl:output method="xml" omit-xml-declaration="yes" name="xml"/> and
<xsl:value-of select="saxon:serialize(.,'xml')"/> i'm getting:
----------------------
--> the same!

With:  <xsl:output method="text" omit-xml-declaration="yes" name="text"/>
and <xsl:value-of select="saxon:serialize(.,'text')"/> i'm getting:
----------------------
seems that nothing happens, but i still can't save the doc to eXist...

So, i have no further ideas after trying for 2 days every solutions and
combinations i can think of :-(
I would be very glad, if someone could help me and perhaps sending my the
codes of the 2 XPLs that i need :-( (While after resolving this problem i
also have to find the right submission, page-flow and xpl for parsing back
from exist!)
As far as i understood Florian and the hints, it should have worked this
way, but still it doesn't. I'm sure and i hope that it will be just a little
trick resolving the desired results, but i need help with that!

I'm afraid i won't be able to reply any other mails until sunday night, but
i hope to find some help, when i return.
Wish you a nice weekend and thanks for all your help :-)
Best regards, Marcus




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws