Orbeon Forms community mailing list

Orbeon be configured to query lucene?

Classic

List

Threaded

5 messages Options

Richard Braman

Orbeon be configured to query lucene?

Message

What would the architecture look like?

My guess is that a servlet would query the lucene index and display the results as xml, for OPS to present.

Richard Braman
[hidden email]
561.748.4002 (voice)

http://www.taxcodesoftware.org
Free Open Source Tax Software

--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws

Alessandro Vernet

Re: Orbeon be configured to query lucene?

Administrator

On 4/10/06, Richard Braman <[hidden email]> wrote:
> What would the architecture look like?
> My guess is that a servlet would query the lucene index and display the
> results as xml, for OPS to present.

Hi Richard,

We haven't done much with Lucene here, but I think Eric van der Vlist
did, and maybe even wrote a processor for Lucene. Maybe Eric will
comment directly on this.

Alex
--
Blog (XML, Web apps, Open Source):
http://www.orbeon.com/blog/

--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws

--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet

Eric van der Vlist

Re: Orbeon be configured to query lucene?

Hi,

Le jeudi 13 avril 2006 à 15:45 -0700, Alessandro Vernet a écrit :

> On 4/10/06, Richard Braman <[hidden email]> wrote:
> > What would the architecture look like?
> > My guess is that a servlet would query the lucene index and display the
> > results as xml, for OPS to present.
>
> Hi Richard,
>
> We haven't done much with Lucene here, but I think Eric van der Vlist
> did, and maybe even wrote a processor for Lucene. Maybe Eric will
> comment directly on this.

I have done two different implementations integrating Lucene with OPS.

The first one is for XMLfr and you can play with it at
http://beta.xmlfr.org/orbeon/lucene/cherche .

This one uses an OPS processor to query Lucene indexes. The input of
this query processor contains the parameters of the query and its output
is a RSS 1.0 document (see
http://beta.xmlfr.org/orbeon/lucene/rss?query=orbeon+presentationserver
for an example of such an output).

The indexing is done outside of OPS through a crontab process running a
Java program.

This implementation could be used as a basis for other applications, but
some features are currently rather specific to XMLfr. For instance the
you can select between two set of indexes, one of them calculating the
level of pertinence taking the date and the document type into account
(an article will come before a wire item which will come before a
mailing list message and newer stuff come before older ones).

The second implementation is still work in progress even if I have a
proof of concept which is working pretty fine.

The idea is to manage everything within OPS.

The query processor is derived from the one developed for XMLfr and
adapted to be less advanced but more generic.

The indexing is done as a background task within OPS through the
scheduler.

The indexing uses a modified mime type XML database that associates OPS
pipelines to media types and the algorithms to index different file
types are defined as OPS pipelines.

I have started developing processors (called from these pipelines) to
index Word, Excel, OpenOffice, PDF, XML and HTML documents.

The configuration (directories to index, priority of the indexing
process, ...) is done through a XML configuration file that can be
updated while the server is running through a configuration processor.

This system could be used to develop a Java OPS based alternative to
Beagle (http://beaglewiki.org/Main_Page) but I am sure many other
applications could take advantage of these processors.

I would be happy to publish all that under open source licences,
unfortunately, all these developments are still very experimental and
not very well documented and I don't expect to have enough time to work
on them anytime soon (except of course if some funding could be raised
that would allow me to postpone other payed activities).

To summarize, I am sure Lucene + OPS make a very interesting couple but
I don't have as much to share as I'd like to right now!

Eric
--
GPG-PGP: 2A528005
Did you know it? Python has now a Relax NG (partial) implementation.
http://advogato.org/proj/xvif/
------------------------------------------------------------------------
Eric van der Vlist http://xmlfr.org http://dyomedea.com
(ISO) RELAX NG ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------

--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws

signature.asc (196 bytes) Download Attachment

Daniel E. Renfer

Re: Orbeon be configured to query lucene?

This brings up an interesting question. In the event that we were to
send an XPL file over HTTP, what media type should we be using?
'application/xpl+xml' springs immediately to mind, but I don't think
that is registered yet. (Is it?) Should I just stick with the standard
(text|application)/xml, or is there a need for a new media type for
XPL?

Danel E. Renfer (http://kronkltd.net/)

On 4/14/06, Eric van der Vlist <[hidden email]> wrote:

> Hi,
>
> Le jeudi 13 avril 2006 à 15:45 -0700, Alessandro Vernet a écrit :
> > On 4/10/06, Richard Braman <[hidden email]> wrote:
> > > What would the architecture look like?
> > > My guess is that a servlet would query the lucene index and display the
> > > results as xml, for OPS to present.
> >
> > Hi Richard,
> >
> > We haven't done much with Lucene here, but I think Eric van der Vlist
> > did, and maybe even wrote a processor for Lucene. Maybe Eric will
> > comment directly on this.
>
> I have done two different implementations integrating Lucene with OPS.
>
> The first one is for XMLfr and you can play with it at
> http://beta.xmlfr.org/orbeon/lucene/cherche .
>
> This one uses an OPS processor to query Lucene indexes. The input of
> this query processor contains the parameters of the query and its output
> is a RSS 1.0 document (see
> http://beta.xmlfr.org/orbeon/lucene/rss?query=orbeon+presentationserver
> for an example of such an output).
>
> The indexing is done outside of OPS through a crontab process running a
> Java program.
>
> This implementation could be used as a basis for other applications, but
> some features are currently rather specific to XMLfr. For instance the
> you can select between two set of indexes, one of them calculating the
> level of pertinence taking the date and the document type into account
> (an article will come before a wire item which will come before a
> mailing list message and newer stuff come before older ones).
>
> The second implementation is still work in progress even if I have a
> proof of concept which is working pretty fine.
>
> The idea is to manage everything within OPS.
>
> The query processor is derived from the one developed for XMLfr and
> adapted to be less advanced but more generic.
>
> The indexing is done as a background task within OPS through the
> scheduler.
>
> The indexing uses a modified mime type XML database that associates OPS
> pipelines to media types and the algorithms to index different file
> types are defined as OPS pipelines.
>
> I have started developing processors (called from these pipelines) to
> index Word, Excel, OpenOffice, PDF, XML and HTML documents.
>
> The configuration (directories to index, priority of the indexing
> process, ...) is done through a XML configuration file that can be
> updated while the server is running through a configuration processor.
>
> This system could be used to develop a Java OPS based alternative to
> Beagle (http://beaglewiki.org/Main_Page) but I am sure many other
> applications could take advantage of these processors.
>
> I would be happy to publish all that under open source licences,
> unfortunately, all these developments are still very experimental and
> not very well documented and I don't expect to have enough time to work
> on them anytime soon (except of course if some funding could be raised
> that would allow me to postpone other payed activities).
>
> To summarize, I am sure Lucene + OPS make a very interesting couple but
> I don't have as much to share as I'd like to right now!
>
> Eric
> --
> GPG-PGP: 2A528005
> Did you know it? Python has now a Relax NG (partial) implementation.
> http://advogato.org/proj/xvif/
> ------------------------------------------------------------------------
> Eric van der Vlist http://xmlfr.org http://dyomedea.com
> (ISO) RELAX NG ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
> (W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
> ------------------------------------------------------------------------
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
>
> iD8DBQBEP1XZDvn+ZCpSgAURAmLjAJwPhYt1XZnuJfHb9PA0O+zcioyKsQCdEBQ1
> 5c5qwo2TTMJi/anMIqJBcIY=
> =geVg
> -----END PGP SIGNATURE-----
>
>
>
> --
> You receive this message as a subscriber of the [hidden email] mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
>
>
>

Alessandro Vernet

Re: Orbeon be configured to query lucene?

Administrator

On 4/14/06, Daniel E. Renfer <[hidden email]> wrote:
> This brings up an interesting question. In the event that we were to
> send an XPL file over HTTP, what media type should we be using?
> 'application/xpl+xml' springs immediately to mind, but I don't think
> that is registered yet. (Is it?) Should I just stick with the standard
> (text|application)/xml, or is there a need for a new media type for
> XPL?

Hi Daniel,

I would stick with the standard application/xml, unless you need the
receiving side to take some special action when XPL is being received,
i.e. an action different than the one taken for other XML files.

Alex
--
Blog (XML, Web apps, Open Source):
http://www.orbeon.com/blog/

--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws

--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet