Orbeon Performance & Scalability

classic Classic list List threaded Threaded
20 messages Options
up4
Reply | Threaded
Open this post in threaded view
|

Orbeon Performance & Scalability

up4
Hi Guys,

I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) with 8GB RAM assigned to the JVM with Tomcat 7.

I'm pre-populating a profile form with 20 000 (20K) XML data instances in eXist through its REST interface.

My prepopulating script and I are the only users for now.

Loading an individual instance detail view in form runner is a breeze, but the summary page (either the default view or search results) takes around 3 minutes to load.

So, what configuration changes should I implement to make this setup faster? I looked at the wiki, but nothing seems to apply to the summary view.

Help would indeed be appreciated.

Please find attached an example of an instance data XML.

Regards,

Vincent





--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws

data.xml (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Orbeon Performance & Scalability

Erik Bruchez
Administrator
Vincent,

This means that you should have the improved eXist search query [1],
and so things should be faster!

Did you make sure there is a proper Lucene [2] index configured in
eXist, and that you re-indexed your collections with the eXist client?

-Erik

[1] https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec
[2] http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing

On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote:

> Hi Guys,
>
> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) with
> 8GB RAM assigned to the JVM with Tomcat 7.
>
> I'm pre-populating a profile form with 20 000 (20K) XML data instances in
> eXist through its REST interface.
>
> My prepopulating script and I are the only users for now.
>
> Loading an individual instance detail view in form runner is a breeze, but
> the summary page (either the default view or search results) takes around 3
> minutes to load.
>
> So, what configuration changes should I implement to make this setup faster?
> I looked at the wiki, but nothing seems to apply to the summary view.
>
> Help would indeed be appreciated.
>
> Please find attached an example of an instance data XML.
>
> Regards,
>
> Vincent
>
>
>
>
>
> --
> You receive this message as a subscriber of the [hidden email] mailing
> list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws
>


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
up4
Reply | Threaded
Open this post in threaded view
|

Re: Orbeon Performance & Scalability

up4
Hi Erik,

Thanks for your reply. Will try the Lucene index and eXist client re-indexing tomorrow. But, will it impact the default view of the summary page (with no search criterion)?

Thanks!

Vincent



On 2012-03-06, at 12:10 AM, Erik Bruchez wrote:

> Vincent,
>
> This means that you should have the improved eXist search query [1],
> and so things should be faster!
>
> Did you make sure there is a proper Lucene [2] index configured in
> eXist, and that you re-indexed your collections with the eXist client?
>
> -Erik
>
> [1] https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec
> [2] http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing
>
> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote:
>> Hi Guys,
>>
>> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) with
>> 8GB RAM assigned to the JVM with Tomcat 7.
>>
>> I'm pre-populating a profile form with 20 000 (20K) XML data instances in
>> eXist through its REST interface.
>>
>> My prepopulating script and I are the only users for now.
>>
>> Loading an individual instance detail view in form runner is a breeze, but
>> the summary page (either the default view or search results) takes around 3
>> minutes to load.
>>
>> So, what configuration changes should I implement to make this setup faster?
>> I looked at the wiki, but nothing seems to apply to the summary view.
>>
>> Help would indeed be appreciated.
>>
>> Please find attached an example of an instance data XML.
>>
>> Regards,
>>
>> Vincent
>>
>>
>>
>>
>>
>> --
>> You receive this message as a subscriber of the [hidden email] mailing
>> list.
>> To unsubscribe: mailto:[hidden email]
>> For general help: mailto:[hidden email]?subject=help
>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>
>
> --
> You receive this message as a subscriber of the [hidden email] mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Orbeon Performance & Scalability

Erik Bruchez
Administrator
Vincent,

The index should only make things better.

-Erik

On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote:

> Hi Erik,
>
> Thanks for your reply. Will try the Lucene index and eXist client re-indexing tomorrow. But, will it impact the default view of the summary page (with no search criterion)?
>
> Thanks!
>
> Vincent
>
>
>
> On 2012-03-06, at 12:10 AM, Erik Bruchez wrote:
>
>> Vincent,
>>
>> This means that you should have the improved eXist search query [1],
>> and so things should be faster!
>>
>> Did you make sure there is a proper Lucene [2] index configured in
>> eXist, and that you re-indexed your collections with the eXist client?
>>
>> -Erik
>>
>> [1] https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec
>> [2] http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing
>>
>> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote:
>>> Hi Guys,
>>>
>>> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) with
>>> 8GB RAM assigned to the JVM with Tomcat 7.
>>>
>>> I'm pre-populating a profile form with 20 000 (20K) XML data instances in
>>> eXist through its REST interface.
>>>
>>> My prepopulating script and I are the only users for now.
>>>
>>> Loading an individual instance detail view in form runner is a breeze, but
>>> the summary page (either the default view or search results) takes around 3
>>> minutes to load.
>>>
>>> So, what configuration changes should I implement to make this setup faster?
>>> I looked at the wiki, but nothing seems to apply to the summary view.
>>>
>>> Help would indeed be appreciated.
>>>
>>> Please find attached an example of an instance data XML.
>>>
>>> Regards,
>>>
>>> Vincent
>>>
>>>
>>>
>>>
>>>
>>> --
>>> You receive this message as a subscriber of the [hidden email] mailing
>>> list.
>>> To unsubscribe: mailto:[hidden email]
>>> For general help: mailto:[hidden email]?subject=help
>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>
>>
>> --
>> You receive this message as a subscriber of the [hidden email] mailing list.
>> To unsubscribe: mailto:[hidden email]
>> For general help: mailto:[hidden email]?subject=help
>> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>
>
> --
> You receive this message as a subscriber of the [hidden email] mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws
>


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
up4
Reply | Threaded
Open this post in threaded view
|

Re: Orbeon Performance & Scalability

up4
Hi Erik,

So the reindexing made no noticeable changes.

The "company" form, for 2k instances still loads at around 20 seconds. And the "person" form, with 20k instances still loads at around 3 minutes. See screens below.

Any other trick I could try?

Thanks!

Vincent






On 2012-03-07, at 12:30 AM, Erik Bruchez wrote:

Vincent,

The index should only make things better.

-Erik

On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote:
Hi Erik,

Thanks for your reply. Will try the Lucene index and eXist client re-indexing tomorrow. But, will it impact the default view of the summary page (with no search criterion)?

Thanks!

Vincent



On 2012-03-06, at 12:10 AM, Erik Bruchez wrote:

Vincent,

This means that you should have the improved eXist search query [1],
and so things should be faster!

Did you make sure there is a proper Lucene [2] index configured in
eXist, and that you re-indexed your collections with the eXist client?

-Erik

[1] https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec
[2] http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing

On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote:
Hi Guys,

I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) with
8GB RAM assigned to the JVM with Tomcat 7.

I'm pre-populating a profile form with 20 000 (20K) XML data instances in
eXist through its REST interface.

My prepopulating script and I are the only users for now.

Loading an individual instance detail view in form runner is a breeze, but
the summary page (either the default view or search results) takes around 3
minutes to load.

So, what configuration changes should I implement to make this setup faster?
I looked at the wiki, but nothing seems to apply to the summary view.

Help would indeed be appreciated.

Please find attached an example of an instance data XML.

Regards,

Vincent





--
You receive this message as a subscriber of the [hidden email] mailing
list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Orbeon Performance & Scalability

Erik Bruchez
Administrator
Vincent,

For the improvement mentioned earlier in the thread, we used the Postman REST Client for Chrome to run a simplified version of the search query:


Here is the query and the XPL file that runs it:


To run query, simply POST it to:

<a href="http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]">http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]

By taking out parts of the query we were able to figure out the parts that were slow and improve on it. Is that something you are able to try?

-Erik

On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote:
Hi Erik,

So the reindexing made no noticeable changes.

The "company" form, for 2k instances still loads at around 20 seconds. And the "person" form, with 20k instances still loads at around 3 minutes. See screens below.

Any other trick I could try?

Thanks!

Vincent






On 2012-03-07, at 12:30 AM, Erik Bruchez wrote:

Vincent,

The index should only make things better.

-Erik

On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote:
Hi Erik,

Thanks for your reply. Will try the Lucene index and eXist client re-indexing tomorrow. But, will it impact the default view of the summary page (with no search criterion)?

Thanks!

Vincent



On 2012-03-06, at 12:10 AM, Erik Bruchez wrote:

Vincent,

This means that you should have the improved eXist search query [1],
and so things should be faster!

Did you make sure there is a proper Lucene [2] index configured in
eXist, and that you re-indexed your collections with the eXist client?

-Erik

[1] https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec
[2] http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing

On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote:
Hi Guys,

I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) with
8GB RAM assigned to the JVM with Tomcat 7.

I'm pre-populating a profile form with 20 000 (20K) XML data instances in
eXist through its REST interface.

My prepopulating script and I are the only users for now.

Loading an individual instance detail view in form runner is a breeze, but
the summary page (either the default view or search results) takes around 3
minutes to load.

So, what configuration changes should I implement to make this setup faster?
I looked at the wiki, but nothing seems to apply to the summary view.

Help would indeed be appreciated.

Please find attached an example of an instance data XML.

Regards,

Vincent





--
You receive this message as a subscriber of the [hidden email] mailing
list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
up4
Reply | Threaded
Open this post in threaded view
|

Re: Orbeon Performance & Scalability

up4
Yes! I will run the Postman setup tomorrow and get back to you before the end of the week.

Thanks!

Vincent


On 2012-03-13, at 2:19 AM, Erik Bruchez wrote:

Vincent,

For the improvement mentioned earlier in the thread, we used the Postman REST Client for Chrome to run a simplified version of the search query:


Here is the query and the XPL file that runs it:


To run query, simply POST it to:

<a href="http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]">http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]

By taking out parts of the query we were able to figure out the parts that were slow and improve on it. Is that something you are able to try?

-Erik

On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote:
Hi Erik,

So the reindexing made no noticeable changes.

The "company" form, for 2k instances still loads at around 20 seconds. And the "person" form, with 20k instances still loads at around 3 minutes. See screens below.

Any other trick I could try?

Thanks!

Vincent


<Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at 5.32.10 PM.png>




On 2012-03-07, at 12:30 AM, Erik Bruchez wrote:

Vincent,

The index should only make things better.

-Erik

On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote:
Hi Erik,

Thanks for your reply. Will try the Lucene index and eXist client re-indexing tomorrow. But, will it impact the default view of the summary page (with no search criterion)?

Thanks!

Vincent



On 2012-03-06, at 12:10 AM, Erik Bruchez wrote:

Vincent,

This means that you should have the improved eXist search query [1],
and so things should be faster!

Did you make sure there is a proper Lucene [2] index configured in
eXist, and that you re-indexed your collections with the eXist client?

-Erik

[1] https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec
[2] http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing

On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote:
Hi Guys,

I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) with
8GB RAM assigned to the JVM with Tomcat 7.

I'm pre-populating a profile form with 20 000 (20K) XML data instances in
eXist through its REST interface.

My prepopulating script and I are the only users for now.

Loading an individual instance detail view in form runner is a breeze, but
the summary page (either the default view or search results) takes around 3
minutes to load.

So, what configuration changes should I implement to make this setup faster?
I looked at the wiki, but nothing seems to apply to the summary view.

Help would indeed be appreciated.

Please find attached an example of an instance data XML.

Regards,

Vincent





--
You receive this message as a subscriber of the [hidden email] mailing
list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws





--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Orbeon Performance & Scalability

Erik Bruchez
Administrator
Cool, excellent. Let us know of you need help.

-Erik

On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote:

> Yes! I will run the Postman setup tomorrow and get back to you before the
> end of the week.
>
> Thanks!
>
> Vincent
>
>
> On 2012-03-13, at 2:19 AM, Erik Bruchez wrote:
>
> Vincent,
>
> For the improvement mentioned earlier in the thread, we used the Postman
> REST Client for Chrome to run a simplified version of the search query:
>
> https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related
>
> Here is the query and the XPL file that runs it:
>
> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml
> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl
>
> To run query, simply POST it to:
>
> http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]
>
> By taking out parts of the query we were able to figure out the parts that
> were slow and improve on it. Is that something you are able to try?
>
> -Erik
>
> On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote:
>>
>> Hi Erik,
>>
>> So the reindexing made no noticeable changes.
>>
>> The "company" form, for 2k instances still loads at around 20 seconds. And
>> the "person" form, with 20k instances still loads at around 3 minutes. See
>> screens below.
>>
>> Any other trick I could try?
>>
>> Thanks!
>>
>> Vincent
>>
>>
>> <Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at
>> 5.32.10 PM.png>
>>
>>
>>
>>
>> On 2012-03-07, at 12:30 AM, Erik Bruchez wrote:
>>
>> Vincent,
>>
>> The index should only make things better.
>>
>> -Erik
>>
>> On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote:
>>
>> Hi Erik,
>>
>>
>> Thanks for your reply. Will try the Lucene index and eXist client
>> re-indexing tomorrow. But, will it impact the default view of the summary
>> page (with no search criterion)?
>>
>>
>> Thanks!
>>
>>
>> Vincent
>>
>>
>>
>>
>> On 2012-03-06, at 12:10 AM, Erik Bruchez wrote:
>>
>>
>> Vincent,
>>
>>
>> This means that you should have the improved eXist search query [1],
>>
>> and so things should be faster!
>>
>>
>> Did you make sure there is a proper Lucene [2] index configured in
>>
>> eXist, and that you re-indexed your collections with the eXist client?
>>
>>
>> -Erik
>>
>>
>> [1]
>> https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec
>>
>> [2]
>> http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing
>>
>>
>> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote:
>>
>> Hi Guys,
>>
>>
>> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion)
>> with
>>
>> 8GB RAM assigned to the JVM with Tomcat 7.
>>
>>
>> I'm pre-populating a profile form with 20 000 (20K) XML data instances in
>>
>> eXist through its REST interface.
>>
>>
>> My prepopulating script and I are the only users for now.
>>
>>
>> Loading an individual instance detail view in form runner is a breeze, but
>>
>> the summary page (either the default view or search results) takes around
>> 3
>>
>> minutes to load.
>>
>>
>> So, what configuration changes should I implement to make this setup
>> faster?
>>
>> I looked at the wiki, but nothing seems to apply to the summary view.
>>
>>
>> Help would indeed be appreciated.
>>
>>
>> Please find attached an example of an instance data XML.
>>
>>
>> Regards,
>>
>>
>> Vincent
>>
>>
>>
>>
>>
>>
>> --
>>
>> You receive this message as a subscriber of the [hidden email] mailing
>>
>> list.
>>
>> To unsubscribe: mailto:[hidden email]
>>
>> For general help: mailto:[hidden email]?subject=help
>>
>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>
>>
>>
>> --
>>
>> You receive this message as a subscriber of the [hidden email] mailing
>> list.
>>
>> To unsubscribe: mailto:[hidden email]
>>
>> For general help: mailto:[hidden email]?subject=help
>>
>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>
>>
>>
>>
>> --
>>
>> You receive this message as a subscriber of the [hidden email] mailing
>> list.
>>
>> To unsubscribe: mailto:[hidden email]
>>
>> For general help: mailto:[hidden email]?subject=help
>>
>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>
>>
>>
>> --
>> You receive this message as a subscriber of the [hidden email] mailing
>> list.
>> To unsubscribe: mailto:[hidden email]
>> For general help: mailto:[hidden email]?subject=help
>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>
>>
>>
>>
>> --
>> You receive this message as a subscriber of the [hidden email] mailing
>> list.
>> To unsubscribe: mailto:[hidden email]
>> For general help: mailto:[hidden email]?subject=help
>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>
>
>
>
>
> --
> You receive this message as a subscriber of the [hidden email] mailing
> list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws
>


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
up4
Reply | Threaded
Open this post in threaded view
|

Re: Orbeon Performance & Scalability

up4
Yes, I will need help. But not on this (I'm quite good at profiling, much less so in XForms ;)). Please see my other message coming soon about repeated sections. If you can help me there, I can put more time on the performance problem. ;-)

So, still with the same nightly build version and data and forms as last time. I run a simple XQuery that is part of your code. Actually, just the snippet where you count the number of documents (see attached).

The same query takes 18ms for 2K data.xml instances and 30s for 20K instances.

It seems to me that any call on "collection()" is awfully inefficient. Based on your code, you call it twice in the query! Once for the query, once for the count.

Is there a way we could manipulate the Lucene index directly. I'm an old buddy of Lucene's and it never gave me that kind of bad performanceship. Ever.

Please let me know if you would be interested in helping me rewrite this query.

Vincent













On 2012-03-13, at 7:11 PM, Erik Bruchez wrote:

> Cool, excellent. Let us know of you need help.
>
> -Erik
>
> On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote:
>> Yes! I will run the Postman setup tomorrow and get back to you before the
>> end of the week.
>>
>> Thanks!
>>
>> Vincent
>>
>>
>> On 2012-03-13, at 2:19 AM, Erik Bruchez wrote:
>>
>> Vincent,
>>
>> For the improvement mentioned earlier in the thread, we used the Postman
>> REST Client for Chrome to run a simplified version of the search query:
>>
>> https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related
>>
>> Here is the query and the XPL file that runs it:
>>
>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml
>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl
>>
>> To run query, simply POST it to:
>>
>> http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]
>>
>> By taking out parts of the query we were able to figure out the parts that
>> were slow and improve on it. Is that something you are able to try?
>>
>> -Erik
>>
>> On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote:
>>>
>>> Hi Erik,
>>>
>>> So the reindexing made no noticeable changes.
>>>
>>> The "company" form, for 2k instances still loads at around 20 seconds. And
>>> the "person" form, with 20k instances still loads at around 3 minutes. See
>>> screens below.
>>>
>>> Any other trick I could try?
>>>
>>> Thanks!
>>>
>>> Vincent
>>>
>>>
>>> <Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at
>>> 5.32.10 PM.png>
>>>
>>>
>>>
>>>
>>> On 2012-03-07, at 12:30 AM, Erik Bruchez wrote:
>>>
>>> Vincent,
>>>
>>> The index should only make things better.
>>>
>>> -Erik
>>>
>>> On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote:
>>>
>>> Hi Erik,
>>>
>>>
>>> Thanks for your reply. Will try the Lucene index and eXist client
>>> re-indexing tomorrow. But, will it impact the default view of the summary
>>> page (with no search criterion)?
>>>
>>>
>>> Thanks!
>>>
>>>
>>> Vincent
>>>
>>>
>>>
>>>
>>> On 2012-03-06, at 12:10 AM, Erik Bruchez wrote:
>>>
>>>
>>> Vincent,
>>>
>>>
>>> This means that you should have the improved eXist search query [1],
>>>
>>> and so things should be faster!
>>>
>>>
>>> Did you make sure there is a proper Lucene [2] index configured in
>>>
>>> eXist, and that you re-indexed your collections with the eXist client?
>>>
>>>
>>> -Erik
>>>
>>>
>>> [1]
>>> https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec
>>>
>>> [2]
>>> http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing
>>>
>>>
>>> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote:
>>>
>>> Hi Guys,
>>>
>>>
>>> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion)
>>> with
>>>
>>> 8GB RAM assigned to the JVM with Tomcat 7.
>>>
>>>
>>> I'm pre-populating a profile form with 20 000 (20K) XML data instances in
>>>
>>> eXist through its REST interface.
>>>
>>>
>>> My prepopulating script and I are the only users for now.
>>>
>>>
>>> Loading an individual instance detail view in form runner is a breeze, but
>>>
>>> the summary page (either the default view or search results) takes around
>>> 3
>>>
>>> minutes to load.
>>>
>>>
>>> So, what configuration changes should I implement to make this setup
>>> faster?
>>>
>>> I looked at the wiki, but nothing seems to apply to the summary view.
>>>
>>>
>>> Help would indeed be appreciated.
>>>
>>>
>>> Please find attached an example of an instance data XML.
>>>
>>>
>>> Regards,
>>>
>>>
>>> Vincent
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> You receive this message as a subscriber of the [hidden email] mailing
>>>
>>> list.
>>>
>>> To unsubscribe: mailto:[hidden email]
>>>
>>> For general help: mailto:[hidden email]?subject=help
>>>
>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>
>>>
>>>
>>> --
>>>
>>> You receive this message as a subscriber of the [hidden email] mailing
>>> list.
>>>
>>> To unsubscribe: mailto:[hidden email]
>>>
>>> For general help: mailto:[hidden email]?subject=help
>>>
>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>
>>>
>>>
>>>
>>> --
>>>
>>> You receive this message as a subscriber of the [hidden email] mailing
>>> list.
>>>
>>> To unsubscribe: mailto:[hidden email]
>>>
>>> For general help: mailto:[hidden email]?subject=help
>>>
>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>
>>>
>>>
>>> --
>>> You receive this message as a subscriber of the [hidden email] mailing
>>> list.
>>> To unsubscribe: mailto:[hidden email]
>>> For general help: mailto:[hidden email]?subject=help
>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>
>>>
>>>
>>>
>>> --
>>> You receive this message as a subscriber of the [hidden email] mailing
>>> list.
>>> To unsubscribe: mailto:[hidden email]
>>> For general help: mailto:[hidden email]?subject=help
>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>
>>
>>
>>
>>
>> --
>> You receive this message as a subscriber of the [hidden email] mailing
>> list.
>> To unsubscribe: mailto:[hidden email]
>> For general help: mailto:[hidden email]?subject=help
>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>
>
> --
> You receive this message as a subscriber of the [hidden email] mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws

person.png (82K) Download Attachment
company.png (81K) Download Attachment
count.xq (163 bytes) Download Attachment
up4
Reply | Threaded
Open this post in threaded view
|

Re: Re: Orbeon Performance & Scalability

up4
By help, I meant answering my questions, of course! :D

I will look into Lucene hooks within eXist today.

Thanks,

V

On 2012-03-14, at 2:19 PM, Vincent Olivier <[hidden email]> wrote:

> Yes, I will need help. But not on this (I'm quite good at profiling, much less so in XForms ;)). Please see my other message coming soon about repeated sections. If you can help me there, I can put more time on the performance problem. ;-)
>
> So, still with the same nightly build version and data and forms as last time. I run a simple XQuery that is part of your code. Actually, just the snippet where you count the number of documents (see attached).
>
> The same query takes 18ms for 2K data.xml instances and 30s for 20K instances.
>
> It seems to me that any call on "collection()" is awfully inefficient. Based on your code, you call it twice in the query! Once for the query, once for the count.
>
> Is there a way we could manipulate the Lucene index directly. I'm an old buddy of Lucene's and it never gave me that kind of bad performanceship. Ever.
>
> Please let me know if you would be interested in helping me rewrite this query.
>
> Vincent
>
>
>
> <person.png>
>
> <company.png>
>
>
>
> <count.xq>
>
>
>
> On 2012-03-13, at 7:11 PM, Erik Bruchez wrote:
>
>> Cool, excellent. Let us know of you need help.
>>
>> -Erik
>>
>> On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote:
>>> Yes! I will run the Postman setup tomorrow and get back to you before the
>>> end of the week.
>>>
>>> Thanks!
>>>
>>> Vincent
>>>
>>>
>>> On 2012-03-13, at 2:19 AM, Erik Bruchez wrote:
>>>
>>> Vincent,
>>>
>>> For the improvement mentioned earlier in the thread, we used the Postman
>>> REST Client for Chrome to run a simplified version of the search query:
>>>
>>> https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related
>>>
>>> Here is the query and the XPL file that runs it:
>>>
>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml
>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl
>>>
>>> To run query, simply POST it to:
>>>
>>> http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]
>>>
>>> By taking out parts of the query we were able to figure out the parts that
>>> were slow and improve on it. Is that something you are able to try?
>>>
>>> -Erik
>>>
>>> On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote:
>>>>
>>>> Hi Erik,
>>>>
>>>> So the reindexing made no noticeable changes.
>>>>
>>>> The "company" form, for 2k instances still loads at around 20 seconds. And
>>>> the "person" form, with 20k instances still loads at around 3 minutes. See
>>>> screens below.
>>>>
>>>> Any other trick I could try?
>>>>
>>>> Thanks!
>>>>
>>>> Vincent
>>>>
>>>>
>>>> <Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at
>>>> 5.32.10 PM.png>
>>>>
>>>>
>>>>
>>>>
>>>> On 2012-03-07, at 12:30 AM, Erik Bruchez wrote:
>>>>
>>>> Vincent,
>>>>
>>>> The index should only make things better.
>>>>
>>>> -Erik
>>>>
>>>> On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote:
>>>>
>>>> Hi Erik,
>>>>
>>>>
>>>> Thanks for your reply. Will try the Lucene index and eXist client
>>>> re-indexing tomorrow. But, will it impact the default view of the summary
>>>> page (with no search criterion)?
>>>>
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> Vincent
>>>>
>>>>
>>>>
>>>>
>>>> On 2012-03-06, at 12:10 AM, Erik Bruchez wrote:
>>>>
>>>>
>>>> Vincent,
>>>>
>>>>
>>>> This means that you should have the improved eXist search query [1],
>>>>
>>>> and so things should be faster!
>>>>
>>>>
>>>> Did you make sure there is a proper Lucene [2] index configured in
>>>>
>>>> eXist, and that you re-indexed your collections with the eXist client?
>>>>
>>>>
>>>> -Erik
>>>>
>>>>
>>>> [1]
>>>> https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec
>>>>
>>>> [2]
>>>> http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing
>>>>
>>>>
>>>> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote:
>>>>
>>>> Hi Guys,
>>>>
>>>>
>>>> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion)
>>>> with
>>>>
>>>> 8GB RAM assigned to the JVM with Tomcat 7.
>>>>
>>>>
>>>> I'm pre-populating a profile form with 20 000 (20K) XML data instances in
>>>>
>>>> eXist through its REST interface.
>>>>
>>>>
>>>> My prepopulating script and I are the only users for now.
>>>>
>>>>
>>>> Loading an individual instance detail view in form runner is a breeze, but
>>>>
>>>> the summary page (either the default view or search results) takes around
>>>> 3
>>>>
>>>> minutes to load.
>>>>
>>>>
>>>> So, what configuration changes should I implement to make this setup
>>>> faster?
>>>>
>>>> I looked at the wiki, but nothing seems to apply to the summary view.
>>>>
>>>>
>>>> Help would indeed be appreciated.
>>>>
>>>>
>>>> Please find attached an example of an instance data XML.
>>>>
>>>>
>>>> Regards,
>>>>
>>>>
>>>> Vincent
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>
>>>> list.
>>>>
>>>> To unsubscribe: mailto:[hidden email]
>>>>
>>>> For general help: mailto:[hidden email]?subject=help
>>>>
>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>> list.
>>>>
>>>> To unsubscribe: mailto:[hidden email]
>>>>
>>>> For general help: mailto:[hidden email]?subject=help
>>>>
>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>> list.
>>>>
>>>> To unsubscribe: mailto:[hidden email]
>>>>
>>>> For general help: mailto:[hidden email]?subject=help
>>>>
>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>
>>>>
>>>>
>>>> --
>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>> list.
>>>> To unsubscribe: mailto:[hidden email]
>>>> For general help: mailto:[hidden email]?subject=help
>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>> list.
>>>> To unsubscribe: mailto:[hidden email]
>>>> For general help: mailto:[hidden email]?subject=help
>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>
>>>
>>>
>>>
>>>
>>> --
>>> You receive this message as a subscriber of the [hidden email] mailing
>>> list.
>>> To unsubscribe: mailto:[hidden email]
>>> For general help: mailto:[hidden email]?subject=help
>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>
>>
>> --
>> You receive this message as a subscriber of the [hidden email] mailing list.
>> To unsubscribe: mailto:[hidden email]
>> For general help: mailto:[hidden email]?subject=help
>> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>
> --
> You receive this message as a subscriber of the [hidden email] mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Orbeon Performance & Scalability

Erik Bruchez
Administrator
Vincent,

Mmh yes that makes sense ;) So here it is:

First, thanks for trying the query. It's a good catch, and it might be
the main reason for the slowness.

However the question now is: how to fix this, assuming we do want to
find out how many documents are in that collection?

On the Lucene question: that's an eXist feature, and the answer is "I
don't know". It woud be better to ask this on the exist-open
mailng-list:

  http://sourceforge.net/mail/?group_id=17691

And yes if you can keep helping on this it would be great!

-Erik

On Thu, Mar 15, 2012 at 5:33 AM, Vincent Olivier <[hidden email]> wrote:

> By help, I meant answering my questions, of course! :D
>
> I will look into Lucene hooks within eXist today.
>
> Thanks,
>
> V
>
> On 2012-03-14, at 2:19 PM, Vincent Olivier <[hidden email]> wrote:
>
>> Yes, I will need help. But not on this (I'm quite good at profiling, much less so in XForms ;)). Please see my other message coming soon about repeated sections. If you can help me there, I can put more time on the performance problem. ;-)
>>
>> So, still with the same nightly build version and data and forms as last time. I run a simple XQuery that is part of your code. Actually, just the snippet where you count the number of documents (see attached).
>>
>> The same query takes 18ms for 2K data.xml instances and 30s for 20K instances.
>>
>> It seems to me that any call on "collection()" is awfully inefficient. Based on your code, you call it twice in the query! Once for the query, once for the count.
>>
>> Is there a way we could manipulate the Lucene index directly. I'm an old buddy of Lucene's and it never gave me that kind of bad performanceship. Ever.
>>
>> Please let me know if you would be interested in helping me rewrite this query.
>>
>> Vincent
>>
>>
>>
>> <person.png>
>>
>> <company.png>
>>
>>
>>
>> <count.xq>
>>
>>
>>
>> On 2012-03-13, at 7:11 PM, Erik Bruchez wrote:
>>
>>> Cool, excellent. Let us know of you need help.
>>>
>>> -Erik
>>>
>>> On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote:
>>>> Yes! I will run the Postman setup tomorrow and get back to you before the
>>>> end of the week.
>>>>
>>>> Thanks!
>>>>
>>>> Vincent
>>>>
>>>>
>>>> On 2012-03-13, at 2:19 AM, Erik Bruchez wrote:
>>>>
>>>> Vincent,
>>>>
>>>> For the improvement mentioned earlier in the thread, we used the Postman
>>>> REST Client for Chrome to run a simplified version of the search query:
>>>>
>>>> https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related
>>>>
>>>> Here is the query and the XPL file that runs it:
>>>>
>>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml
>>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl
>>>>
>>>> To run query, simply POST it to:
>>>>
>>>> http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]
>>>>
>>>> By taking out parts of the query we were able to figure out the parts that
>>>> were slow and improve on it. Is that something you are able to try?
>>>>
>>>> -Erik
>>>>
>>>> On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote:
>>>>>
>>>>> Hi Erik,
>>>>>
>>>>> So the reindexing made no noticeable changes.
>>>>>
>>>>> The "company" form, for 2k instances still loads at around 20 seconds. And
>>>>> the "person" form, with 20k instances still loads at around 3 minutes. See
>>>>> screens below.
>>>>>
>>>>> Any other trick I could try?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Vincent
>>>>>
>>>>>
>>>>> <Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at
>>>>> 5.32.10 PM.png>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 2012-03-07, at 12:30 AM, Erik Bruchez wrote:
>>>>>
>>>>> Vincent,
>>>>>
>>>>> The index should only make things better.
>>>>>
>>>>> -Erik
>>>>>
>>>>> On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote:
>>>>>
>>>>> Hi Erik,
>>>>>
>>>>>
>>>>> Thanks for your reply. Will try the Lucene index and eXist client
>>>>> re-indexing tomorrow. But, will it impact the default view of the summary
>>>>> page (with no search criterion)?
>>>>>
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> Vincent
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 2012-03-06, at 12:10 AM, Erik Bruchez wrote:
>>>>>
>>>>>
>>>>> Vincent,
>>>>>
>>>>>
>>>>> This means that you should have the improved eXist search query [1],
>>>>>
>>>>> and so things should be faster!
>>>>>
>>>>>
>>>>> Did you make sure there is a proper Lucene [2] index configured in
>>>>>
>>>>> eXist, and that you re-indexed your collections with the eXist client?
>>>>>
>>>>>
>>>>> -Erik
>>>>>
>>>>>
>>>>> [1]
>>>>> https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec
>>>>>
>>>>> [2]
>>>>> http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing
>>>>>
>>>>>
>>>>> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote:
>>>>>
>>>>> Hi Guys,
>>>>>
>>>>>
>>>>> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion)
>>>>> with
>>>>>
>>>>> 8GB RAM assigned to the JVM with Tomcat 7.
>>>>>
>>>>>
>>>>> I'm pre-populating a profile form with 20 000 (20K) XML data instances in
>>>>>
>>>>> eXist through its REST interface.
>>>>>
>>>>>
>>>>> My prepopulating script and I are the only users for now.
>>>>>
>>>>>
>>>>> Loading an individual instance detail view in form runner is a breeze, but
>>>>>
>>>>> the summary page (either the default view or search results) takes around
>>>>> 3
>>>>>
>>>>> minutes to load.
>>>>>
>>>>>
>>>>> So, what configuration changes should I implement to make this setup
>>>>> faster?
>>>>>
>>>>> I looked at the wiki, but nothing seems to apply to the summary view.
>>>>>
>>>>>
>>>>> Help would indeed be appreciated.
>>>>>
>>>>>
>>>>> Please find attached an example of an instance data XML.
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>>
>>>>> Vincent
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>
>>>>> list.
>>>>>
>>>>> To unsubscribe: mailto:[hidden email]
>>>>>
>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>
>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>> list.
>>>>>
>>>>> To unsubscribe: mailto:[hidden email]
>>>>>
>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>
>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>> list.
>>>>>
>>>>> To unsubscribe: mailto:[hidden email]
>>>>>
>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>
>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>> list.
>>>>> To unsubscribe: mailto:[hidden email]
>>>>> For general help: mailto:[hidden email]?subject=help
>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>> list.
>>>>> To unsubscribe: mailto:[hidden email]
>>>>> For general help: mailto:[hidden email]?subject=help
>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>> list.
>>>> To unsubscribe: mailto:[hidden email]
>>>> For general help: mailto:[hidden email]?subject=help
>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>
>>>
>>> --
>>> You receive this message as a subscriber of the [hidden email] mailing list.
>>> To unsubscribe: mailto:[hidden email]
>>> For general help: mailto:[hidden email]?subject=help
>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>
>>
>> --
>> You receive this message as a subscriber of the [hidden email] mailing list.
>> To unsubscribe: mailto:[hidden email]
>> For general help: mailto:[hidden email]?subject=help
>> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>
> --
> You receive this message as a subscriber of the [hidden email] mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws
>


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
up4
Reply | Threaded
Open this post in threaded view
|

Re: Orbeon Performance & Scalability

up4
Hi Erik,

I'm on it. But my guess would be that as long as there is a Lucene index somewhere, there is optimization to be made.

More on this on Tuesday,

Vincent


On 2012-03-16, at 7:10 PM, Erik Bruchez wrote:

> Vincent,
>
> Mmh yes that makes sense ;) So here it is:
>
> First, thanks for trying the query. It's a good catch, and it might be
> the main reason for the slowness.
>
> However the question now is: how to fix this, assuming we do want to
> find out how many documents are in that collection?
>
> On the Lucene question: that's an eXist feature, and the answer is "I
> don't know". It woud be better to ask this on the exist-open
> mailng-list:
>
>  http://sourceforge.net/mail/?group_id=17691
>
> And yes if you can keep helping on this it would be great!
>
> -Erik
>
> On Thu, Mar 15, 2012 at 5:33 AM, Vincent Olivier <[hidden email]> wrote:
>> By help, I meant answering my questions, of course! :D
>>
>> I will look into Lucene hooks within eXist today.
>>
>> Thanks,
>>
>> V
>>
>> On 2012-03-14, at 2:19 PM, Vincent Olivier <[hidden email]> wrote:
>>
>>> Yes, I will need help. But not on this (I'm quite good at profiling, much less so in XForms ;)). Please see my other message coming soon about repeated sections. If you can help me there, I can put more time on the performance problem. ;-)
>>>
>>> So, still with the same nightly build version and data and forms as last time. I run a simple XQuery that is part of your code. Actually, just the snippet where you count the number of documents (see attached).
>>>
>>> The same query takes 18ms for 2K data.xml instances and 30s for 20K instances.
>>>
>>> It seems to me that any call on "collection()" is awfully inefficient. Based on your code, you call it twice in the query! Once for the query, once for the count.
>>>
>>> Is there a way we could manipulate the Lucene index directly. I'm an old buddy of Lucene's and it never gave me that kind of bad performanceship. Ever.
>>>
>>> Please let me know if you would be interested in helping me rewrite this query.
>>>
>>> Vincent
>>>
>>>
>>>
>>> <person.png>
>>>
>>> <company.png>
>>>
>>>
>>>
>>> <count.xq>
>>>
>>>
>>>
>>> On 2012-03-13, at 7:11 PM, Erik Bruchez wrote:
>>>
>>>> Cool, excellent. Let us know of you need help.
>>>>
>>>> -Erik
>>>>
>>>> On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote:
>>>>> Yes! I will run the Postman setup tomorrow and get back to you before the
>>>>> end of the week.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Vincent
>>>>>
>>>>>
>>>>> On 2012-03-13, at 2:19 AM, Erik Bruchez wrote:
>>>>>
>>>>> Vincent,
>>>>>
>>>>> For the improvement mentioned earlier in the thread, we used the Postman
>>>>> REST Client for Chrome to run a simplified version of the search query:
>>>>>
>>>>> https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related
>>>>>
>>>>> Here is the query and the XPL file that runs it:
>>>>>
>>>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml
>>>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl
>>>>>
>>>>> To run query, simply POST it to:
>>>>>
>>>>> http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]
>>>>>
>>>>> By taking out parts of the query we were able to figure out the parts that
>>>>> were slow and improve on it. Is that something you are able to try?
>>>>>
>>>>> -Erik
>>>>>
>>>>> On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote:
>>>>>>
>>>>>> Hi Erik,
>>>>>>
>>>>>> So the reindexing made no noticeable changes.
>>>>>>
>>>>>> The "company" form, for 2k instances still loads at around 20 seconds. And
>>>>>> the "person" form, with 20k instances still loads at around 3 minutes. See
>>>>>> screens below.
>>>>>>
>>>>>> Any other trick I could try?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Vincent
>>>>>>
>>>>>>
>>>>>> <Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at
>>>>>> 5.32.10 PM.png>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2012-03-07, at 12:30 AM, Erik Bruchez wrote:
>>>>>>
>>>>>> Vincent,
>>>>>>
>>>>>> The index should only make things better.
>>>>>>
>>>>>> -Erik
>>>>>>
>>>>>> On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote:
>>>>>>
>>>>>> Hi Erik,
>>>>>>
>>>>>>
>>>>>> Thanks for your reply. Will try the Lucene index and eXist client
>>>>>> re-indexing tomorrow. But, will it impact the default view of the summary
>>>>>> page (with no search criterion)?
>>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>> Vincent
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2012-03-06, at 12:10 AM, Erik Bruchez wrote:
>>>>>>
>>>>>>
>>>>>> Vincent,
>>>>>>
>>>>>>
>>>>>> This means that you should have the improved eXist search query [1],
>>>>>>
>>>>>> and so things should be faster!
>>>>>>
>>>>>>
>>>>>> Did you make sure there is a proper Lucene [2] index configured in
>>>>>>
>>>>>> eXist, and that you re-indexed your collections with the eXist client?
>>>>>>
>>>>>>
>>>>>> -Erik
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec
>>>>>>
>>>>>> [2]
>>>>>> http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote:
>>>>>>
>>>>>> Hi Guys,
>>>>>>
>>>>>>
>>>>>> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion)
>>>>>> with
>>>>>>
>>>>>> 8GB RAM assigned to the JVM with Tomcat 7.
>>>>>>
>>>>>>
>>>>>> I'm pre-populating a profile form with 20 000 (20K) XML data instances in
>>>>>>
>>>>>> eXist through its REST interface.
>>>>>>
>>>>>>
>>>>>> My prepopulating script and I are the only users for now.
>>>>>>
>>>>>>
>>>>>> Loading an individual instance detail view in form runner is a breeze, but
>>>>>>
>>>>>> the summary page (either the default view or search results) takes around
>>>>>> 3
>>>>>>
>>>>>> minutes to load.
>>>>>>
>>>>>>
>>>>>> So, what configuration changes should I implement to make this setup
>>>>>> faster?
>>>>>>
>>>>>> I looked at the wiki, but nothing seems to apply to the summary view.
>>>>>>
>>>>>>
>>>>>> Help would indeed be appreciated.
>>>>>>
>>>>>>
>>>>>> Please find attached an example of an instance data XML.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>>
>>>>>> Vincent
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>>
>>>>>> list.
>>>>>>
>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>
>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>
>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>> list.
>>>>>>
>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>
>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>
>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>> list.
>>>>>>
>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>
>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>
>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>> list.
>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>> list.
>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>> list.
>>>>> To unsubscribe: mailto:[hidden email]
>>>>> For general help: mailto:[hidden email]?subject=help
>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>
>>>>
>>>> --
>>>> You receive this message as a subscriber of the [hidden email] mailing list.
>>>> To unsubscribe: mailto:[hidden email]
>>>> For general help: mailto:[hidden email]?subject=help
>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>
>>>
>>> --
>>> You receive this message as a subscriber of the [hidden email] mailing list.
>>> To unsubscribe: mailto:[hidden email]
>>> For general help: mailto:[hidden email]?subject=help
>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>
>>
>> --
>> You receive this message as a subscriber of the [hidden email] mailing list.
>> To unsubscribe: mailto:[hidden email]
>> For general help: mailto:[hidden email]?subject=help
>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>
>
> --
> You receive this message as a subscriber of the [hidden email] mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
up4
Reply | Threaded
Open this post in threaded view
|

Re: Orbeon Performance & Scalability

up4
Hi again,

I'm waiting for the eXist mailing list to enlighten me on how the Lucene index is exposed to XQuery. Because if I'm limited to what the eXist doc is showing, it will never be good enough for large collection.

Because, eXist requires an in-memory collection to be passed to the ft:query() method and also reads the documents for all the Lucene hits and re-builds a in-memory collection for that. So, for a query that returns a fair proportion of the collection's documents, that's twice the collection size for each ft:query() call.

I'm not going to wait until the eXist community gets back to me and try plan B, instead: have a custom submission just send each form instance to an external SOLR setup and rewrite the summary query using only the SOLR index.

What I need for this: is it possible to pass the document ID (in exist, this is the folder containing the "data.xml" file) along with the form instance XML as a POST to the SOLR service (very XML friendly).

And if it is something that is of interest to any of you, I might post a little video on how to set this up.

Vincent


On 2012-03-17, at 1:10 PM, Vincent Olivier wrote:

> Hi Erik,
>
> I'm on it. But my guess would be that as long as there is a Lucene index somewhere, there is optimization to be made.
>
> More on this on Tuesday,
>
> Vincent
>
>
> On 2012-03-16, at 7:10 PM, Erik Bruchez wrote:
>
>> Vincent,
>>
>> Mmh yes that makes sense ;) So here it is:
>>
>> First, thanks for trying the query. It's a good catch, and it might be
>> the main reason for the slowness.
>>
>> However the question now is: how to fix this, assuming we do want to
>> find out how many documents are in that collection?
>>
>> On the Lucene question: that's an eXist feature, and the answer is "I
>> don't know". It woud be better to ask this on the exist-open
>> mailng-list:
>>
>> http://sourceforge.net/mail/?group_id=17691
>>
>> And yes if you can keep helping on this it would be great!
>>
>> -Erik
>>
>> On Thu, Mar 15, 2012 at 5:33 AM, Vincent Olivier <[hidden email]> wrote:
>>> By help, I meant answering my questions, of course! :D
>>>
>>> I will look into Lucene hooks within eXist today.
>>>
>>> Thanks,
>>>
>>> V
>>>
>>> On 2012-03-14, at 2:19 PM, Vincent Olivier <[hidden email]> wrote:
>>>
>>>> Yes, I will need help. But not on this (I'm quite good at profiling, much less so in XForms ;)). Please see my other message coming soon about repeated sections. If you can help me there, I can put more time on the performance problem. ;-)
>>>>
>>>> So, still with the same nightly build version and data and forms as last time. I run a simple XQuery that is part of your code. Actually, just the snippet where you count the number of documents (see attached).
>>>>
>>>> The same query takes 18ms for 2K data.xml instances and 30s for 20K instances.
>>>>
>>>> It seems to me that any call on "collection()" is awfully inefficient. Based on your code, you call it twice in the query! Once for the query, once for the count.
>>>>
>>>> Is there a way we could manipulate the Lucene index directly. I'm an old buddy of Lucene's and it never gave me that kind of bad performanceship. Ever.
>>>>
>>>> Please let me know if you would be interested in helping me rewrite this query.
>>>>
>>>> Vincent
>>>>
>>>>
>>>>
>>>> <person.png>
>>>>
>>>> <company.png>
>>>>
>>>>
>>>>
>>>> <count.xq>
>>>>
>>>>
>>>>
>>>> On 2012-03-13, at 7:11 PM, Erik Bruchez wrote:
>>>>
>>>>> Cool, excellent. Let us know of you need help.
>>>>>
>>>>> -Erik
>>>>>
>>>>> On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote:
>>>>>> Yes! I will run the Postman setup tomorrow and get back to you before the
>>>>>> end of the week.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Vincent
>>>>>>
>>>>>>
>>>>>> On 2012-03-13, at 2:19 AM, Erik Bruchez wrote:
>>>>>>
>>>>>> Vincent,
>>>>>>
>>>>>> For the improvement mentioned earlier in the thread, we used the Postman
>>>>>> REST Client for Chrome to run a simplified version of the search query:
>>>>>>
>>>>>> https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related
>>>>>>
>>>>>> Here is the query and the XPL file that runs it:
>>>>>>
>>>>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml
>>>>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl
>>>>>>
>>>>>> To run query, simply POST it to:
>>>>>>
>>>>>> http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]
>>>>>>
>>>>>> By taking out parts of the query we were able to figure out the parts that
>>>>>> were slow and improve on it. Is that something you are able to try?
>>>>>>
>>>>>> -Erik
>>>>>>
>>>>>> On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote:
>>>>>>>
>>>>>>> Hi Erik,
>>>>>>>
>>>>>>> So the reindexing made no noticeable changes.
>>>>>>>
>>>>>>> The "company" form, for 2k instances still loads at around 20 seconds. And
>>>>>>> the "person" form, with 20k instances still loads at around 3 minutes. See
>>>>>>> screens below.
>>>>>>>
>>>>>>> Any other trick I could try?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Vincent
>>>>>>>
>>>>>>>
>>>>>>> <Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at
>>>>>>> 5.32.10 PM.png>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2012-03-07, at 12:30 AM, Erik Bruchez wrote:
>>>>>>>
>>>>>>> Vincent,
>>>>>>>
>>>>>>> The index should only make things better.
>>>>>>>
>>>>>>> -Erik
>>>>>>>
>>>>>>> On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote:
>>>>>>>
>>>>>>> Hi Erik,
>>>>>>>
>>>>>>>
>>>>>>> Thanks for your reply. Will try the Lucene index and eXist client
>>>>>>> re-indexing tomorrow. But, will it impact the default view of the summary
>>>>>>> page (with no search criterion)?
>>>>>>>
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>> Vincent
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2012-03-06, at 12:10 AM, Erik Bruchez wrote:
>>>>>>>
>>>>>>>
>>>>>>> Vincent,
>>>>>>>
>>>>>>>
>>>>>>> This means that you should have the improved eXist search query [1],
>>>>>>>
>>>>>>> and so things should be faster!
>>>>>>>
>>>>>>>
>>>>>>> Did you make sure there is a proper Lucene [2] index configured in
>>>>>>>
>>>>>>> eXist, and that you re-indexed your collections with the eXist client?
>>>>>>>
>>>>>>>
>>>>>>> -Erik
>>>>>>>
>>>>>>>
>>>>>>> [1]
>>>>>>> https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec
>>>>>>>
>>>>>>> [2]
>>>>>>> http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote:
>>>>>>>
>>>>>>> Hi Guys,
>>>>>>>
>>>>>>>
>>>>>>> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion)
>>>>>>> with
>>>>>>>
>>>>>>> 8GB RAM assigned to the JVM with Tomcat 7.
>>>>>>>
>>>>>>>
>>>>>>> I'm pre-populating a profile form with 20 000 (20K) XML data instances in
>>>>>>>
>>>>>>> eXist through its REST interface.
>>>>>>>
>>>>>>>
>>>>>>> My prepopulating script and I are the only users for now.
>>>>>>>
>>>>>>>
>>>>>>> Loading an individual instance detail view in form runner is a breeze, but
>>>>>>>
>>>>>>> the summary page (either the default view or search results) takes around
>>>>>>> 3
>>>>>>>
>>>>>>> minutes to load.
>>>>>>>
>>>>>>>
>>>>>>> So, what configuration changes should I implement to make this setup
>>>>>>> faster?
>>>>>>>
>>>>>>> I looked at the wiki, but nothing seems to apply to the summary view.
>>>>>>>
>>>>>>>
>>>>>>> Help would indeed be appreciated.
>>>>>>>
>>>>>>>
>>>>>>> Please find attached an example of an instance data XML.
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>>
>>>>>>> Vincent
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>>>
>>>>>>> list.
>>>>>>>
>>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>>
>>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>>
>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>>> list.
>>>>>>>
>>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>>
>>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>>
>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>>> list.
>>>>>>>
>>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>>
>>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>>
>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>>> list.
>>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>>> list.
>>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>> list.
>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>
>>>>>
>>>>> --
>>>>> You receive this message as a subscriber of the [hidden email] mailing list.
>>>>> To unsubscribe: mailto:[hidden email]
>>>>> For general help: mailto:[hidden email]?subject=help
>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>
>>>>
>>>> --
>>>> You receive this message as a subscriber of the [hidden email] mailing list.
>>>> To unsubscribe: mailto:[hidden email]
>>>> For general help: mailto:[hidden email]?subject=help
>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>
>>>
>>> --
>>> You receive this message as a subscriber of the [hidden email] mailing list.
>>> To unsubscribe: mailto:[hidden email]
>>> For general help: mailto:[hidden email]?subject=help
>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>
>>
>> --
>> You receive this message as a subscriber of the [hidden email] mailing list.
>> To unsubscribe: mailto:[hidden email]
>> For general help: mailto:[hidden email]?subject=help
>> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>
> --
> You receive this message as a subscriber of the [hidden email] mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
up4
Reply | Threaded
Open this post in threaded view
|

Re: Orbeon Performance & Scalability

up4
Hi again,

Sorry, I'm being dyslexic, here.

By submission, I mean the minimal impact code change within the form itself to process the SOLR submission within the same user "submit" event that will trigger the eXist persistence submission.

Hope this makes actual sense.

Vincent


On 2012-03-22, at 1:01 PM, Vincent Olivier wrote:

> Hi again,
>
> I'm waiting for the eXist mailing list to enlighten me on how the Lucene index is exposed to XQuery. Because if I'm limited to what the eXist doc is showing, it will never be good enough for large collection.
>
> Because, eXist requires an in-memory collection to be passed to the ft:query() method and also reads the documents for all the Lucene hits and re-builds a in-memory collection for that. So, for a query that returns a fair proportion of the collection's documents, that's twice the collection size for each ft:query() call.
>
> I'm not going to wait until the eXist community gets back to me and try plan B, instead: have a custom submission just send each form instance to an external SOLR setup and rewrite the summary query using only the SOLR index.
>
> What I need for this: is it possible to pass the document ID (in exist, this is the folder containing the "data.xml" file) along with the form instance XML as a POST to the SOLR service (very XML friendly).
>
> And if it is something that is of interest to any of you, I might post a little video on how to set this up.
>
> Vincent
>
>
> On 2012-03-17, at 1:10 PM, Vincent Olivier wrote:
>
>> Hi Erik,
>>
>> I'm on it. But my guess would be that as long as there is a Lucene index somewhere, there is optimization to be made.
>>
>> More on this on Tuesday,
>>
>> Vincent
>>
>>
>> On 2012-03-16, at 7:10 PM, Erik Bruchez wrote:
>>
>>> Vincent,
>>>
>>> Mmh yes that makes sense ;) So here it is:
>>>
>>> First, thanks for trying the query. It's a good catch, and it might be
>>> the main reason for the slowness.
>>>
>>> However the question now is: how to fix this, assuming we do want to
>>> find out how many documents are in that collection?
>>>
>>> On the Lucene question: that's an eXist feature, and the answer is "I
>>> don't know". It woud be better to ask this on the exist-open
>>> mailng-list:
>>>
>>> http://sourceforge.net/mail/?group_id=17691
>>>
>>> And yes if you can keep helping on this it would be great!
>>>
>>> -Erik
>>>
>>> On Thu, Mar 15, 2012 at 5:33 AM, Vincent Olivier <[hidden email]> wrote:
>>>> By help, I meant answering my questions, of course! :D
>>>>
>>>> I will look into Lucene hooks within eXist today.
>>>>
>>>> Thanks,
>>>>
>>>> V
>>>>
>>>> On 2012-03-14, at 2:19 PM, Vincent Olivier <[hidden email]> wrote:
>>>>
>>>>> Yes, I will need help. But not on this (I'm quite good at profiling, much less so in XForms ;)). Please see my other message coming soon about repeated sections. If you can help me there, I can put more time on the performance problem. ;-)
>>>>>
>>>>> So, still with the same nightly build version and data and forms as last time. I run a simple XQuery that is part of your code. Actually, just the snippet where you count the number of documents (see attached).
>>>>>
>>>>> The same query takes 18ms for 2K data.xml instances and 30s for 20K instances.
>>>>>
>>>>> It seems to me that any call on "collection()" is awfully inefficient. Based on your code, you call it twice in the query! Once for the query, once for the count.
>>>>>
>>>>> Is there a way we could manipulate the Lucene index directly. I'm an old buddy of Lucene's and it never gave me that kind of bad performanceship. Ever.
>>>>>
>>>>> Please let me know if you would be interested in helping me rewrite this query.
>>>>>
>>>>> Vincent
>>>>>
>>>>>
>>>>>
>>>>> <person.png>
>>>>>
>>>>> <company.png>
>>>>>
>>>>>
>>>>>
>>>>> <count.xq>
>>>>>
>>>>>
>>>>>
>>>>> On 2012-03-13, at 7:11 PM, Erik Bruchez wrote:
>>>>>
>>>>>> Cool, excellent. Let us know of you need help.
>>>>>>
>>>>>> -Erik
>>>>>>
>>>>>> On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote:
>>>>>>> Yes! I will run the Postman setup tomorrow and get back to you before the
>>>>>>> end of the week.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Vincent
>>>>>>>
>>>>>>>
>>>>>>> On 2012-03-13, at 2:19 AM, Erik Bruchez wrote:
>>>>>>>
>>>>>>> Vincent,
>>>>>>>
>>>>>>> For the improvement mentioned earlier in the thread, we used the Postman
>>>>>>> REST Client for Chrome to run a simplified version of the search query:
>>>>>>>
>>>>>>> https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related
>>>>>>>
>>>>>>> Here is the query and the XPL file that runs it:
>>>>>>>
>>>>>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml
>>>>>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl
>>>>>>>
>>>>>>> To run query, simply POST it to:
>>>>>>>
>>>>>>> http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]
>>>>>>>
>>>>>>> By taking out parts of the query we were able to figure out the parts that
>>>>>>> were slow and improve on it. Is that something you are able to try?
>>>>>>>
>>>>>>> -Erik
>>>>>>>
>>>>>>> On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote:
>>>>>>>>
>>>>>>>> Hi Erik,
>>>>>>>>
>>>>>>>> So the reindexing made no noticeable changes.
>>>>>>>>
>>>>>>>> The "company" form, for 2k instances still loads at around 20 seconds. And
>>>>>>>> the "person" form, with 20k instances still loads at around 3 minutes. See
>>>>>>>> screens below.
>>>>>>>>
>>>>>>>> Any other trick I could try?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Vincent
>>>>>>>>
>>>>>>>>
>>>>>>>> <Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at
>>>>>>>> 5.32.10 PM.png>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2012-03-07, at 12:30 AM, Erik Bruchez wrote:
>>>>>>>>
>>>>>>>> Vincent,
>>>>>>>>
>>>>>>>> The index should only make things better.
>>>>>>>>
>>>>>>>> -Erik
>>>>>>>>
>>>>>>>> On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote:
>>>>>>>>
>>>>>>>> Hi Erik,
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks for your reply. Will try the Lucene index and eXist client
>>>>>>>> re-indexing tomorrow. But, will it impact the default view of the summary
>>>>>>>> page (with no search criterion)?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>> Vincent
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2012-03-06, at 12:10 AM, Erik Bruchez wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Vincent,
>>>>>>>>
>>>>>>>>
>>>>>>>> This means that you should have the improved eXist search query [1],
>>>>>>>>
>>>>>>>> and so things should be faster!
>>>>>>>>
>>>>>>>>
>>>>>>>> Did you make sure there is a proper Lucene [2] index configured in
>>>>>>>>
>>>>>>>> eXist, and that you re-indexed your collections with the eXist client?
>>>>>>>>
>>>>>>>>
>>>>>>>> -Erik
>>>>>>>>
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec
>>>>>>>>
>>>>>>>> [2]
>>>>>>>> http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote:
>>>>>>>>
>>>>>>>> Hi Guys,
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion)
>>>>>>>> with
>>>>>>>>
>>>>>>>> 8GB RAM assigned to the JVM with Tomcat 7.
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm pre-populating a profile form with 20 000 (20K) XML data instances in
>>>>>>>>
>>>>>>>> eXist through its REST interface.
>>>>>>>>
>>>>>>>>
>>>>>>>> My prepopulating script and I are the only users for now.
>>>>>>>>
>>>>>>>>
>>>>>>>> Loading an individual instance detail view in form runner is a breeze, but
>>>>>>>>
>>>>>>>> the summary page (either the default view or search results) takes around
>>>>>>>> 3
>>>>>>>>
>>>>>>>> minutes to load.
>>>>>>>>
>>>>>>>>
>>>>>>>> So, what configuration changes should I implement to make this setup
>>>>>>>> faster?
>>>>>>>>
>>>>>>>> I looked at the wiki, but nothing seems to apply to the summary view.
>>>>>>>>
>>>>>>>>
>>>>>>>> Help would indeed be appreciated.
>>>>>>>>
>>>>>>>>
>>>>>>>> Please find attached an example of an instance data XML.
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>>
>>>>>>>> Vincent
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>>>>
>>>>>>>> list.
>>>>>>>>
>>>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>>>
>>>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>>>
>>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>>>> list.
>>>>>>>>
>>>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>>>
>>>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>>>
>>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>>>> list.
>>>>>>>>
>>>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>>>
>>>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>>>
>>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>>>> list.
>>>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>>>> list.
>>>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>>> list.
>>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> You receive this message as a subscriber of the [hidden email] mailing list.
>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>
>>>>>
>>>>> --
>>>>> You receive this message as a subscriber of the [hidden email] mailing list.
>>>>> To unsubscribe: mailto:[hidden email]
>>>>> For general help: mailto:[hidden email]?subject=help
>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>
>>>>
>>>> --
>>>> You receive this message as a subscriber of the [hidden email] mailing list.
>>>> To unsubscribe: mailto:[hidden email]
>>>> For general help: mailto:[hidden email]?subject=help
>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>
>>>
>>> --
>>> You receive this message as a subscriber of the [hidden email] mailing list.
>>> To unsubscribe: mailto:[hidden email]
>>> For general help: mailto:[hidden email]?subject=help
>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>
>>
>> --
>> You receive this message as a subscriber of the [hidden email] mailing list.
>> To unsubscribe: mailto:[hidden email]
>> For general help: mailto:[hidden email]?subject=help
>> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>
> --
> You receive this message as a subscriber of the [hidden email] mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Orbeon Performance & Scalability

Erik Bruchez
Administrator
Vincent,

When Orbeon Forms writes an XML document, the document id is part of
the path. So it's already available. Or do I not understand properly?

-Erik

On Thu, Mar 22, 2012 at 10:09 AM, Vincent Olivier <[hidden email]> wrote:

> Hi again,
>
> Sorry, I'm being dyslexic, here.
>
> By submission, I mean the minimal impact code change within the form itself to process the SOLR submission within the same user "submit" event that will trigger the eXist persistence submission.
>
> Hope this makes actual sense.
>
> Vincent
>
>
> On 2012-03-22, at 1:01 PM, Vincent Olivier wrote:
>
>> Hi again,
>>
>> I'm waiting for the eXist mailing list to enlighten me on how the Lucene index is exposed to XQuery. Because if I'm limited to what the eXist doc is showing, it will never be good enough for large collection.
>>
>> Because, eXist requires an in-memory collection to be passed to the ft:query() method and also reads the documents for all the Lucene hits and re-builds a in-memory collection for that. So, for a query that returns a fair proportion of the collection's documents, that's twice the collection size for each ft:query() call.
>>
>> I'm not going to wait until the eXist community gets back to me and try plan B, instead: have a custom submission just send each form instance to an external SOLR setup and rewrite the summary query using only the SOLR index.
>>
>> What I need for this: is it possible to pass the document ID (in exist, this is the folder containing the "data.xml" file) along with the form instance XML as a POST to the SOLR service (very XML friendly).
>>
>> And if it is something that is of interest to any of you, I might post a little video on how to set this up.
>>
>> Vincent
>>
>>
>> On 2012-03-17, at 1:10 PM, Vincent Olivier wrote:
>>
>>> Hi Erik,
>>>
>>> I'm on it. But my guess would be that as long as there is a Lucene index somewhere, there is optimization to be made.
>>>
>>> More on this on Tuesday,
>>>
>>> Vincent
>>>
>>>
>>> On 2012-03-16, at 7:10 PM, Erik Bruchez wrote:
>>>
>>>> Vincent,
>>>>
>>>> Mmh yes that makes sense ;) So here it is:
>>>>
>>>> First, thanks for trying the query. It's a good catch, and it might be
>>>> the main reason for the slowness.
>>>>
>>>> However the question now is: how to fix this, assuming we do want to
>>>> find out how many documents are in that collection?
>>>>
>>>> On the Lucene question: that's an eXist feature, and the answer is "I
>>>> don't know". It woud be better to ask this on the exist-open
>>>> mailng-list:
>>>>
>>>> http://sourceforge.net/mail/?group_id=17691
>>>>
>>>> And yes if you can keep helping on this it would be great!
>>>>
>>>> -Erik
>>>>
>>>> On Thu, Mar 15, 2012 at 5:33 AM, Vincent Olivier <[hidden email]> wrote:
>>>>> By help, I meant answering my questions, of course! :D
>>>>>
>>>>> I will look into Lucene hooks within eXist today.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> V
>>>>>
>>>>> On 2012-03-14, at 2:19 PM, Vincent Olivier <[hidden email]> wrote:
>>>>>
>>>>>> Yes, I will need help. But not on this (I'm quite good at profiling, much less so in XForms ;)). Please see my other message coming soon about repeated sections. If you can help me there, I can put more time on the performance problem. ;-)
>>>>>>
>>>>>> So, still with the same nightly build version and data and forms as last time. I run a simple XQuery that is part of your code. Actually, just the snippet where you count the number of documents (see attached).
>>>>>>
>>>>>> The same query takes 18ms for 2K data.xml instances and 30s for 20K instances.
>>>>>>
>>>>>> It seems to me that any call on "collection()" is awfully inefficient. Based on your code, you call it twice in the query! Once for the query, once for the count.
>>>>>>
>>>>>> Is there a way we could manipulate the Lucene index directly. I'm an old buddy of Lucene's and it never gave me that kind of bad performanceship. Ever.
>>>>>>
>>>>>> Please let me know if you would be interested in helping me rewrite this query.
>>>>>>
>>>>>> Vincent
>>>>>>
>>>>>>
>>>>>>
>>>>>> <person.png>
>>>>>>
>>>>>> <company.png>
>>>>>>
>>>>>>
>>>>>>
>>>>>> <count.xq>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2012-03-13, at 7:11 PM, Erik Bruchez wrote:
>>>>>>
>>>>>>> Cool, excellent. Let us know of you need help.
>>>>>>>
>>>>>>> -Erik
>>>>>>>
>>>>>>> On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote:
>>>>>>>> Yes! I will run the Postman setup tomorrow and get back to you before the
>>>>>>>> end of the week.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Vincent
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2012-03-13, at 2:19 AM, Erik Bruchez wrote:
>>>>>>>>
>>>>>>>> Vincent,
>>>>>>>>
>>>>>>>> For the improvement mentioned earlier in the thread, we used the Postman
>>>>>>>> REST Client for Chrome to run a simplified version of the search query:
>>>>>>>>
>>>>>>>> https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related
>>>>>>>>
>>>>>>>> Here is the query and the XPL file that runs it:
>>>>>>>>
>>>>>>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml
>>>>>>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl
>>>>>>>>
>>>>>>>> To run query, simply POST it to:
>>>>>>>>
>>>>>>>> http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]
>>>>>>>>
>>>>>>>> By taking out parts of the query we were able to figure out the parts that
>>>>>>>> were slow and improve on it. Is that something you are able to try?
>>>>>>>>
>>>>>>>> -Erik
>>>>>>>>
>>>>>>>> On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote:
>>>>>>>>>
>>>>>>>>> Hi Erik,
>>>>>>>>>
>>>>>>>>> So the reindexing made no noticeable changes.
>>>>>>>>>
>>>>>>>>> The "company" form, for 2k instances still loads at around 20 seconds. And
>>>>>>>>> the "person" form, with 20k instances still loads at around 3 minutes. See
>>>>>>>>> screens below.
>>>>>>>>>
>>>>>>>>> Any other trick I could try?
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> Vincent
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> <Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at
>>>>>>>>> 5.32.10 PM.png>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2012-03-07, at 12:30 AM, Erik Bruchez wrote:
>>>>>>>>>
>>>>>>>>> Vincent,
>>>>>>>>>
>>>>>>>>> The index should only make things better.
>>>>>>>>>
>>>>>>>>> -Erik
>>>>>>>>>
>>>>>>>>> On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote:
>>>>>>>>>
>>>>>>>>> Hi Erik,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks for your reply. Will try the Lucene index and eXist client
>>>>>>>>> re-indexing tomorrow. But, will it impact the default view of the summary
>>>>>>>>> page (with no search criterion)?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Vincent
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2012-03-06, at 12:10 AM, Erik Bruchez wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Vincent,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This means that you should have the improved eXist search query [1],
>>>>>>>>>
>>>>>>>>> and so things should be faster!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Did you make sure there is a proper Lucene [2] index configured in
>>>>>>>>>
>>>>>>>>> eXist, and that you re-indexed your collections with the eXist client?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Erik
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec
>>>>>>>>>
>>>>>>>>> [2]
>>>>>>>>> http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote:
>>>>>>>>>
>>>>>>>>> Hi Guys,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion)
>>>>>>>>> with
>>>>>>>>>
>>>>>>>>> 8GB RAM assigned to the JVM with Tomcat 7.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm pre-populating a profile form with 20 000 (20K) XML data instances in
>>>>>>>>>
>>>>>>>>> eXist through its REST interface.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> My prepopulating script and I are the only users for now.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Loading an individual instance detail view in form runner is a breeze, but
>>>>>>>>>
>>>>>>>>> the summary page (either the default view or search results) takes around
>>>>>>>>> 3
>>>>>>>>>
>>>>>>>>> minutes to load.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> So, what configuration changes should I implement to make this setup
>>>>>>>>> faster?
>>>>>>>>>
>>>>>>>>> I looked at the wiki, but nothing seems to apply to the summary view.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Help would indeed be appreciated.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Please find attached an example of an instance data XML.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Vincent
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>>>>>
>>>>>>>>> list.
>>>>>>>>>
>>>>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>>>>
>>>>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>>>>
>>>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>>>>> list.
>>>>>>>>>
>>>>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>>>>
>>>>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>>>>
>>>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>>>>> list.
>>>>>>>>>
>>>>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>>>>
>>>>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>>>>
>>>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>>>>> list.
>>>>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>>>>> list.
>>>>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> You receive this message as a subscriber of the [hidden email] mailing
>>>>>>>> list.
>>>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> You receive this message as a subscriber of the [hidden email] mailing list.
>>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>>
>>>>>>
>>>>>> --
>>>>>> You receive this message as a subscriber of the [hidden email] mailing list.
>>>>>> To unsubscribe: mailto:[hidden email]
>>>>>> For general help: mailto:[hidden email]?subject=help
>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>
>>>>>
>>>>> --
>>>>> You receive this message as a subscriber of the [hidden email] mailing list.
>>>>> To unsubscribe: mailto:[hidden email]
>>>>> For general help: mailto:[hidden email]?subject=help
>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>>>
>>>>
>>>> --
>>>> You receive this message as a subscriber of the [hidden email] mailing list.
>>>> To unsubscribe: mailto:[hidden email]
>>>> For general help: mailto:[hidden email]?subject=help
>>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>>
>>>
>>> --
>>> You receive this message as a subscriber of the [hidden email] mailing list.
>>> To unsubscribe: mailto:[hidden email]
>>> For general help: mailto:[hidden email]?subject=help
>>> OW2 mailing lists service home page: http://www.ow2.org/wws
>>
>>
>> --
>> You receive this message as a subscriber of the [hidden email] mailing list.
>> To unsubscribe: mailto:[hidden email]
>> For general help: mailto:[hidden email]?subject=help
>> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>
>
> --
> You receive this message as a subscriber of the [hidden email] mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws
>


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
up4
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Orbeon Performance & Scalability

up4
Hi Erik,

Yes, actually I was focusing on something that is a easy detail.

I have been looking at eXist's implementation of ft:query and it doesn't look like I will find an easy fix to use that with so many documents.

More over, one gets the same problem when working with autocomplete controls in forms for collections of that size as well.

I think the easiest fix for now would be to maintain an external index (I like SOLR) for both the summary pages and the autocomplete controls.

So what I want to try now, is to add a SOLR submission after the user clicks on the submit button and before the form persistence submission to transparently send the XML to SOLR for external indexing and I would use the Orbeon collection id (containing the "data.xml") as the document ID in SOLR.

I'm wondering what the code would look like, minimally, in order to achieve that.

Thanks!

Vincent


On 2012-03-26, at 11:59 AM, Erik Bruchez wrote:

Vincent,

When Orbeon Forms writes an XML document, the document id is part of
the path. So it's already available. Or do I not understand properly?

-Erik

On Thu, Mar 22, 2012 at 10:09 AM, Vincent Olivier <[hidden email]> wrote:
Hi again,

Sorry, I'm being dyslexic, here.

By submission, I mean the minimal impact code change within the form itself to process the SOLR submission within the same user "submit" event that will trigger the eXist persistence submission.

Hope this makes actual sense.

Vincent


On 2012-03-22, at 1:01 PM, Vincent Olivier wrote:

Hi again,

I'm waiting for the eXist mailing list to enlighten me on how the Lucene index is exposed to XQuery. Because if I'm limited to what the eXist doc is showing, it will never be good enough for large collection.

Because, eXist requires an in-memory collection to be passed to the ft:query() method and also reads the documents for all the Lucene hits and re-builds a in-memory collection for that. So, for a query that returns a fair proportion of the collection's documents, that's twice the collection size for each ft:query() call.

I'm not going to wait until the eXist community gets back to me and try plan B, instead: have a custom submission just send each form instance to an external SOLR setup and rewrite the summary query using only the SOLR index.

What I need for this: is it possible to pass the document ID (in exist, this is the folder containing the "data.xml" file) along with the form instance XML as a POST to the SOLR service (very XML friendly).

And if it is something that is of interest to any of you, I might post a little video on how to set this up.

Vincent


On 2012-03-17, at 1:10 PM, Vincent Olivier wrote:

Hi Erik,

I'm on it. But my guess would be that as long as there is a Lucene index somewhere, there is optimization to be made.

More on this on Tuesday,

Vincent


On 2012-03-16, at 7:10 PM, Erik Bruchez wrote:

Vincent,

Mmh yes that makes sense ;) So here it is:

First, thanks for trying the query. It's a good catch, and it might be
the main reason for the slowness.

However the question now is: how to fix this, assuming we do want to
find out how many documents are in that collection?

On the Lucene question: that's an eXist feature, and the answer is "I
don't know". It woud be better to ask this on the exist-open
mailng-list:

http://sourceforge.net/mail/?group_id=17691

And yes if you can keep helping on this it would be great!

-Erik

On Thu, Mar 15, 2012 at 5:33 AM, Vincent Olivier <[hidden email]> wrote:
By help, I meant answering my questions, of course! :D

I will look into Lucene hooks within eXist today.

Thanks,

V

On 2012-03-14, at 2:19 PM, Vincent Olivier <[hidden email]> wrote:

Yes, I will need help. But not on this (I'm quite good at profiling, much less so in XForms ;)). Please see my other message coming soon about repeated sections. If you can help me there, I can put more time on the performance problem. ;-)

So, still with the same nightly build version and data and forms as last time. I run a simple XQuery that is part of your code. Actually, just the snippet where you count the number of documents (see attached).

The same query takes 18ms for 2K data.xml instances and 30s for 20K instances.

It seems to me that any call on "collection()" is awfully inefficient. Based on your code, you call it twice in the query! Once for the query, once for the count.

Is there a way we could manipulate the Lucene index directly. I'm an old buddy of Lucene's and it never gave me that kind of bad performanceship. Ever.

Please let me know if you would be interested in helping me rewrite this query.

Vincent



<person.png>

<company.png>



<count.xq>



On 2012-03-13, at 7:11 PM, Erik Bruchez wrote:

Cool, excellent. Let us know of you need help.

-Erik

On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote:
Yes! I will run the Postman setup tomorrow and get back to you before the
end of the week.

Thanks!

Vincent


On 2012-03-13, at 2:19 AM, Erik Bruchez wrote:

Vincent,

For the improvement mentioned earlier in the thread, we used the Postman
REST Client for Chrome to run a simplified version of the search query:

https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related

Here is the query and the XPL file that runs it:

https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml
https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl

To run query, simply POST it to:

<a href="http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]">http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]

By taking out parts of the query we were able to figure out the parts that
were slow and improve on it. Is that something you are able to try?

-Erik

On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote:

Hi Erik,

So the reindexing made no noticeable changes.

The "company" form, for 2k instances still loads at around 20 seconds. And
the "person" form, with 20k instances still loads at around 3 minutes. See
screens below.

Any other trick I could try?

Thanks!

Vincent


<Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at
5.32.10 PM.png>




On 2012-03-07, at 12:30 AM, Erik Bruchez wrote:

Vincent,

The index should only make things better.

-Erik

On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote:

Hi Erik,


Thanks for your reply. Will try the Lucene index and eXist client
re-indexing tomorrow. But, will it impact the default view of the summary
page (with no search criterion)?


Thanks!


Vincent




On 2012-03-06, at 12:10 AM, Erik Bruchez wrote:


Vincent,


This means that you should have the improved eXist search query [1],

and so things should be faster!


Did you make sure there is a proper Lucene [2] index configured in

eXist, and that you re-indexed your collections with the eXist client?


-Erik


[1]
https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec

[2]
http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing


On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote:

Hi Guys,


I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion)
with

8GB RAM assigned to the JVM with Tomcat 7.


I'm pre-populating a profile form with 20 000 (20K) XML data instances in

eXist through its REST interface.


My prepopulating script and I are the only users for now.


Loading an individual instance detail view in form runner is a breeze, but

the summary page (either the default view or search results) takes around
3

minutes to load.


So, what configuration changes should I implement to make this setup
faster?

I looked at the wiki, but nothing seems to apply to the summary view.


Help would indeed be appreciated.


Please find attached an example of an instance data XML.


Regards,


Vincent






--

You receive this message as a subscriber of the [hidden email] mailing

list.

To unsubscribe: [hidden email]

For general help: [hidden email]

OW2 mailing lists service home page: http://www.ow2.org/wws



--

You receive this message as a subscriber of the [hidden email] mailing
list.

To unsubscribe: [hidden email]

For general help: [hidden email]

OW2 mailing lists service home page: http://www.ow2.org/wws




--

You receive this message as a subscriber of the [hidden email] mailing
list.

To unsubscribe: [hidden email]

For general help: [hidden email]

OW2 mailing lists service home page: http://www.ow2.org/wws



--
You receive this message as a subscriber of the [hidden email] mailing
list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws




--
You receive this message as a subscriber of the [hidden email] mailing
list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws





--
You receive this message as a subscriber of the [hidden email] mailing
list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
up4
Reply | Threaded
Open this post in threaded view
|

Re: Orbeon Performance & Scalability

up4
Hi guys,


Actually, I think there are 2 options at this point:
  • MySQL persistence
  • eXist persistence with SOLR indexing (at persistence level)
Because I think that eXist's internal indexing doesn't scale, period.

I'm going to try MySQL first (but I would like to know there is the query code for the summary page in that case).

And also, I would like your thoughts on extending the eXist persistence to have the XML data sent to SOLR everytime an instance is persisted/updated.

Regards,

Vincent


On 2012-03-26, at 12:14 PM, Vincent Olivier wrote:

Hi Erik,

Yes, actually I was focusing on something that is a easy detail.

I have been looking at eXist's implementation of ft:query and it doesn't look like I will find an easy fix to use that with so many documents.

More over, one gets the same problem when working with autocomplete controls in forms for collections of that size as well.

I think the easiest fix for now would be to maintain an external index (I like SOLR) for both the summary pages and the autocomplete controls.

So what I want to try now, is to add a SOLR submission after the user clicks on the submit button and before the form persistence submission to transparently send the XML to SOLR for external indexing and I would use the Orbeon collection id (containing the "data.xml") as the document ID in SOLR.

I'm wondering what the code would look like, minimally, in order to achieve that.

Thanks!

Vincent


On 2012-03-26, at 11:59 AM, Erik Bruchez wrote:

Vincent,

When Orbeon Forms writes an XML document, the document id is part of
the path. So it's already available. Or do I not understand properly?

-Erik

On Thu, Mar 22, 2012 at 10:09 AM, Vincent Olivier <[hidden email]> wrote:
Hi again,

Sorry, I'm being dyslexic, here.

By submission, I mean the minimal impact code change within the form itself to process the SOLR submission within the same user "submit" event that will trigger the eXist persistence submission.

Hope this makes actual sense.

Vincent


On 2012-03-22, at 1:01 PM, Vincent Olivier wrote:

Hi again,

I'm waiting for the eXist mailing list to enlighten me on how the Lucene index is exposed to XQuery. Because if I'm limited to what the eXist doc is showing, it will never be good enough for large collection.

Because, eXist requires an in-memory collection to be passed to the ft:query() method and also reads the documents for all the Lucene hits and re-builds a in-memory collection for that. So, for a query that returns a fair proportion of the collection's documents, that's twice the collection size for each ft:query() call.

I'm not going to wait until the eXist community gets back to me and try plan B, instead: have a custom submission just send each form instance to an external SOLR setup and rewrite the summary query using only the SOLR index.

What I need for this: is it possible to pass the document ID (in exist, this is the folder containing the "data.xml" file) along with the form instance XML as a POST to the SOLR service (very XML friendly).

And if it is something that is of interest to any of you, I might post a little video on how to set this up.

Vincent


On 2012-03-17, at 1:10 PM, Vincent Olivier wrote:

Hi Erik,

I'm on it. But my guess would be that as long as there is a Lucene index somewhere, there is optimization to be made.

More on this on Tuesday,

Vincent


On 2012-03-16, at 7:10 PM, Erik Bruchez wrote:

Vincent,

Mmh yes that makes sense ;) So here it is:

First, thanks for trying the query. It's a good catch, and it might be
the main reason for the slowness.

However the question now is: how to fix this, assuming we do want to
find out how many documents are in that collection?

On the Lucene question: that's an eXist feature, and the answer is "I
don't know". It woud be better to ask this on the exist-open
mailng-list:

http://sourceforge.net/mail/?group_id=17691

And yes if you can keep helping on this it would be great!

-Erik

On Thu, Mar 15, 2012 at 5:33 AM, Vincent Olivier <[hidden email]> wrote:
By help, I meant answering my questions, of course! :D

I will look into Lucene hooks within eXist today.

Thanks,

V

On 2012-03-14, at 2:19 PM, Vincent Olivier <[hidden email]> wrote:

Yes, I will need help. But not on this (I'm quite good at profiling, much less so in XForms ;)). Please see my other message coming soon about repeated sections. If you can help me there, I can put more time on the performance problem. ;-)

So, still with the same nightly build version and data and forms as last time. I run a simple XQuery that is part of your code. Actually, just the snippet where you count the number of documents (see attached).

The same query takes 18ms for 2K data.xml instances and 30s for 20K instances.

It seems to me that any call on "collection()" is awfully inefficient. Based on your code, you call it twice in the query! Once for the query, once for the count.

Is there a way we could manipulate the Lucene index directly. I'm an old buddy of Lucene's and it never gave me that kind of bad performanceship. Ever.

Please let me know if you would be interested in helping me rewrite this query.

Vincent



<person.png>

<company.png>



<count.xq>



On 2012-03-13, at 7:11 PM, Erik Bruchez wrote:

Cool, excellent. Let us know of you need help.

-Erik

On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote:
Yes! I will run the Postman setup tomorrow and get back to you before the
end of the week.

Thanks!

Vincent


On 2012-03-13, at 2:19 AM, Erik Bruchez wrote:

Vincent,

For the improvement mentioned earlier in the thread, we used the Postman
REST Client for Chrome to run a simplified version of the search query:

https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related

Here is the query and the XPL file that runs it:

https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml
https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl

To run query, simply POST it to:

<a href="http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]">http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]

By taking out parts of the query we were able to figure out the parts that
were slow and improve on it. Is that something you are able to try?

-Erik

On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote:

Hi Erik,

So the reindexing made no noticeable changes.

The "company" form, for 2k instances still loads at around 20 seconds. And
the "person" form, with 20k instances still loads at around 3 minutes. See
screens below.

Any other trick I could try?

Thanks!

Vincent


<Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at
5.32.10 PM.png>




On 2012-03-07, at 12:30 AM, Erik Bruchez wrote:

Vincent,

The index should only make things better.

-Erik

On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote:

Hi Erik,


Thanks for your reply. Will try the Lucene index and eXist client
re-indexing tomorrow. But, will it impact the default view of the summary
page (with no search criterion)?


Thanks!


Vincent




On 2012-03-06, at 12:10 AM, Erik Bruchez wrote:


Vincent,


This means that you should have the improved eXist search query [1],

and so things should be faster!


Did you make sure there is a proper Lucene [2] index configured in

eXist, and that you re-indexed your collections with the eXist client?


-Erik


[1]
https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec

[2]
http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing


On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote:

Hi Guys,


I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion)
with

8GB RAM assigned to the JVM with Tomcat 7.


I'm pre-populating a profile form with 20 000 (20K) XML data instances in

eXist through its REST interface.


My prepopulating script and I are the only users for now.


Loading an individual instance detail view in form runner is a breeze, but

the summary page (either the default view or search results) takes around
3

minutes to load.


So, what configuration changes should I implement to make this setup
faster?

I looked at the wiki, but nothing seems to apply to the summary view.


Help would indeed be appreciated.


Please find attached an example of an instance data XML.


Regards,


Vincent






--

You receive this message as a subscriber of the [hidden email] mailing

list.

To unsubscribe: [hidden email]

For general help: [hidden email]

OW2 mailing lists service home page: http://www.ow2.org/wws



--

You receive this message as a subscriber of the [hidden email] mailing
list.

To unsubscribe: [hidden email]

For general help: [hidden email]

OW2 mailing lists service home page: http://www.ow2.org/wws




--

You receive this message as a subscriber of the [hidden email] mailing
list.

To unsubscribe: [hidden email]

For general help: [hidden email]

OW2 mailing lists service home page: http://www.ow2.org/wws



--
You receive this message as a subscriber of the [hidden email] mailing
list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws




--
You receive this message as a subscriber of the [hidden email] mailing
list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws





--
You receive this message as a subscriber of the [hidden email] mailing
list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Orbeon Performance & Scalability

Erik Bruchez
Administrator
Vincent,

Here is the search code for MySQL:

https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/mysql/search.xpl

Separate indexing via SOLR would be good, but we won't have time to
work on this anytime soon. It would be even better if eXist could do
things properly. But can we rule out that we can't improve our XQuery
query to go faster?

-Erik

On Mon, Apr 2, 2012 at 1:04 PM, Vincent Olivier <[hidden email]> wrote:

> Hi guys,
>
>
> Actually, I think there are 2 options at this point:
>
> MySQL persistence
> eXist persistence with SOLR indexing (at persistence level)
>
> Because I think that eXist's internal indexing doesn't scale, period.
>
> I'm going to try MySQL first (but I would like to know there is the query
> code for the summary page in that case).
>
> And also, I would like your thoughts on extending the eXist persistence to
> have the XML data sent to SOLR everytime an instance is persisted/updated.
>
> Regards,
>
> Vincent
>
>
> On 2012-03-26, at 12:14 PM, Vincent Olivier wrote:
>
> Hi Erik,
>
> Yes, actually I was focusing on something that is a easy detail.
>
> I have been looking at eXist's implementation of ft:query and it doesn't
> look like I will find an easy fix to use that with so many documents.
>
> More over, one gets the same problem when working with autocomplete controls
> in forms for collections of that size as well.
>
> I think the easiest fix for now would be to maintain an external index (I
> like SOLR) for both the summary pages and the autocomplete controls.
>
> So what I want to try now, is to add a SOLR submission after the user clicks
> on the submit button and before the form persistence submission to
> transparently send the XML to SOLR for external indexing and I would use the
> Orbeon collection id (containing the "data.xml") as the document ID in SOLR.
>
> I'm wondering what the code would look like, minimally, in order to achieve
> that.
>
> Thanks!
>
> Vincent
>
>
> On 2012-03-26, at 11:59 AM, Erik Bruchez wrote:
>
> Vincent,
>
> When Orbeon Forms writes an XML document, the document id is part of
> the path. So it's already available. Or do I not understand properly?
>
> -Erik
>
> On Thu, Mar 22, 2012 at 10:09 AM, Vincent Olivier <[hidden email]> wrote:
>
> Hi again,
>
>
> Sorry, I'm being dyslexic, here.
>
>
> By submission, I mean the minimal impact code change within the form itself
> to process the SOLR submission within the same user "submit" event that will
> trigger the eXist persistence submission.
>
>
> Hope this makes actual sense.
>
>
> Vincent
>
>
>
> On 2012-03-22, at 1:01 PM, Vincent Olivier wrote:
>
>
> Hi again,
>
>
> I'm waiting for the eXist mailing list to enlighten me on how the Lucene
> index is exposed to XQuery. Because if I'm limited to what the eXist doc is
> showing, it will never be good enough for large collection.
>
>
> Because, eXist requires an in-memory collection to be passed to the
> ft:query() method and also reads the documents for all the Lucene hits and
> re-builds a in-memory collection for that. So, for a query that returns a
> fair proportion of the collection's documents, that's twice the collection
> size for each ft:query() call.
>
>
> I'm not going to wait until the eXist community gets back to me and try plan
> B, instead: have a custom submission just send each form instance to an
> external SOLR setup and rewrite the summary query using only the SOLR index.
>
>
> What I need for this: is it possible to pass the document ID (in exist, this
> is the folder containing the "data.xml" file) along with the form instance
> XML as a POST to the SOLR service (very XML friendly).
>
>
> And if it is something that is of interest to any of you, I might post a
> little video on how to set this up.
>
>
> Vincent
>
>
>
> On 2012-03-17, at 1:10 PM, Vincent Olivier wrote:
>
>
> Hi Erik,
>
>
> I'm on it. But my guess would be that as long as there is a Lucene index
> somewhere, there is optimization to be made.
>
>
> More on this on Tuesday,
>
>
> Vincent
>
>
>
> On 2012-03-16, at 7:10 PM, Erik Bruchez wrote:
>
>
> Vincent,
>
>
> Mmh yes that makes sense ;) So here it is:
>
>
> First, thanks for trying the query. It's a good catch, and it might be
>
> the main reason for the slowness.
>
>
> However the question now is: how to fix this, assuming we do want to
>
> find out how many documents are in that collection?
>
>
> On the Lucene question: that's an eXist feature, and the answer is "I
>
> don't know". It woud be better to ask this on the exist-open
>
> mailng-list:
>
>
> http://sourceforge.net/mail/?group_id=17691
>
>
> And yes if you can keep helping on this it would be great!
>
>
> -Erik
>
>
> On Thu, Mar 15, 2012 at 5:33 AM, Vincent Olivier <[hidden email]> wrote:
>
> By help, I meant answering my questions, of course! :D
>
>
> I will look into Lucene hooks within eXist today.
>
>
> Thanks,
>
>
> V
>
>
> On 2012-03-14, at 2:19 PM, Vincent Olivier <[hidden email]> wrote:
>
>
> Yes, I will need help. But not on this (I'm quite good at profiling, much
> less so in XForms ;)). Please see my other message coming soon about
> repeated sections. If you can help me there, I can put more time on the
> performance problem. ;-)
>
>
> So, still with the same nightly build version and data and forms as last
> time. I run a simple XQuery that is part of your code. Actually, just the
> snippet where you count the number of documents (see attached).
>
>
> The same query takes 18ms for 2K data.xml instances and 30s for 20K
> instances.
>
>
> It seems to me that any call on "collection()" is awfully inefficient. Based
> on your code, you call it twice in the query! Once for the query, once for
> the count.
>
>
> Is there a way we could manipulate the Lucene index directly. I'm an old
> buddy of Lucene's and it never gave me that kind of bad performanceship.
> Ever.
>
>
> Please let me know if you would be interested in helping me rewrite this
> query.
>
>
> Vincent
>
>
>
>
> <person.png>
>
>
> <company.png>
>
>
>
>
> <count.xq>
>
>
>
>
> On 2012-03-13, at 7:11 PM, Erik Bruchez wrote:
>
>
> Cool, excellent. Let us know of you need help.
>
>
> -Erik
>
>
> On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote:
>
> Yes! I will run the Postman setup tomorrow and get back to you before the
>
> end of the week.
>
>
> Thanks!
>
>
> Vincent
>
>
>
> On 2012-03-13, at 2:19 AM, Erik Bruchez wrote:
>
>
> Vincent,
>
>
> For the improvement mentioned earlier in the thread, we used the Postman
>
> REST Client for Chrome to run a simplified version of the search query:
>
>
> https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related
>
>
> Here is the query and the XPL file that runs it:
>
>
> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml
>
> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl
>
>
> To run query, simply POST it to:
>
>
> http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]
>
>
> By taking out parts of the query we were able to figure out the parts that
>
> were slow and improve on it. Is that something you are able to try?
>
>
> -Erik
>
>
> On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote:
>
>
> Hi Erik,
>
>
> So the reindexing made no noticeable changes.
>
>
> The "company" form, for 2k instances still loads at around 20 seconds. And
>
> the "person" form, with 20k instances still loads at around 3 minutes. See
>
> screens below.
>
>
> Any other trick I could try?
>
>
> Thanks!
>
>
> Vincent
>
>
>
> <Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at
>
> 5.32.10 PM.png>
>
>
>
>
>
> On 2012-03-07, at 12:30 AM, Erik Bruchez wrote:
>
>
> Vincent,
>
>
> The index should only make things better.
>
>
> -Erik
>
>
> On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote:
>
>
> Hi Erik,
>
>
>
> Thanks for your reply. Will try the Lucene index and eXist client
>
> re-indexing tomorrow. But, will it impact the default view of the summary
>
> page (with no search criterion)?
>
>
>
> Thanks!
>
>
>
> Vincent
>
>
>
>
>
> On 2012-03-06, at 12:10 AM, Erik Bruchez wrote:
>
>
>
> Vincent,
>
>
>
> This means that you should have the improved eXist search query [1],
>
>
> and so things should be faster!
>
>
>
> Did you make sure there is a proper Lucene [2] index configured in
>
>
> eXist, and that you re-indexed your collections with the eXist client?
>
>
>
> -Erik
>
>
>
> [1]
>
> https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec
>
>
> [2]
>
> http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing
>
>
>
> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote:
>
>
> Hi Guys,
>
>
>
> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion)
>
> with
>
>
> 8GB RAM assigned to the JVM with Tomcat 7.
>
>
>
> I'm pre-populating a profile form with 20 000 (20K) XML data instances in
>
>
> eXist through its REST interface.
>
>
>
> My prepopulating script and I are the only users for now.
>
>
>
> Loading an individual instance detail view in form runner is a breeze, but
>
>
> the summary page (either the default view or search results) takes around
>
> 3
>
>
> minutes to load.
>
>
>
> So, what configuration changes should I implement to make this setup
>
> faster?
>
>
> I looked at the wiki, but nothing seems to apply to the summary view.
>
>
>
> Help would indeed be appreciated.
>
>
>
> Please find attached an example of an instance data XML.
>
>
>
> Regards,
>
>
>
> Vincent
>
>
>
>
>
>
>
> --
>
>
> You receive this message as a subscriber of the [hidden email] mailing
>
>
> list.
>
>
> To unsubscribe: mailto:[hidden email]
>
>
> For general help: mailto:[hidden email]?subject=help
>
>
> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>
>
>
> --
>
>
> You receive this message as a subscriber of the [hidden email] mailing
>
> list.
>
>
> To unsubscribe: mailto:[hidden email]
>
>
> For general help: mailto:[hidden email]?subject=help
>
>
> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>
>
>
>
> --
>
>
> You receive this message as a subscriber of the [hidden email] mailing
>
> list.
>
>
> To unsubscribe: mailto:[hidden email]
>
>
> For general help: mailto:[hidden email]?subject=help
>
>
> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>
>
>
> --
>
> You receive this message as a subscriber of the [hidden email] mailing
>
> list.
>
> To unsubscribe: mailto:[hidden email]
>
> For general help: mailto:[hidden email]?subject=help
>
> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>
>
>
>
> --
>
> You receive this message as a subscriber of the [hidden email] mailing
>
> list.
>
> To unsubscribe: mailto:[hidden email]
>
> For general help: mailto:[hidden email]?subject=help
>
> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>
>
>
>
>
> --
>
> You receive this message as a subscriber of the [hidden email] mailing
>
> list.
>
> To unsubscribe: mailto:[hidden email]
>
> For general help: mailto:[hidden email]?subject=help
>
> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>
> ...
>
> [Message clipped]
>
> --
> You receive this message as a subscriber of the [hidden email] mailing
> list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws
>


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
up4
Reply | Threaded
Open this post in threaded view
|

Re: Orbeon Performance & Scalability

up4
Hi Erik,

Sorry for the late reply.

My problem is that the eXist XQuery interface to the Lucene index seems intrinsically flawed, as far as performances are concerned. I have submitted this issue to the eXist mailing list, got a reply from Wolfgang Meier, have tried his solution to the best of my very limited XQuery knowledge, but still, even when taking his reply into account, and given the simplicity of the test query I have put in place, it seems that eXist is really to blame. It's obviously not so much the ft:query call that is the problem, but rather any form of usage of the collection() method. If you see what Wolfgang means, and provide some help on how to implement it (I have tried to reach him subsequently without success), I am willing to try it.

If you want, I can send you a compressed repository of the 20k docs I'm testing the performances on.

I'm very much looking forward to get Orbeon in shape for that kind of use case… But I'm investigating MySQL's performances, now.

Vincent

On 2012-04-03, at 12:30 AM, Erik Bruchez wrote:

Vincent,

Here is the search code for MySQL:

https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/mysql/search.xpl

Separate indexing via SOLR would be good, but we won't have time to
work on this anytime soon. It would be even better if eXist could do
things properly. But can we rule out that we can't improve our XQuery
query to go faster?

-Erik

On Mon, Apr 2, 2012 at 1:04 PM, Vincent Olivier <[hidden email]> wrote:
Hi guys,


Actually, I think there are 2 options at this point:

MySQL persistence
eXist persistence with SOLR indexing (at persistence level)

Because I think that eXist's internal indexing doesn't scale, period.

I'm going to try MySQL first (but I would like to know there is the query
code for the summary page in that case).

And also, I would like your thoughts on extending the eXist persistence to
have the XML data sent to SOLR everytime an instance is persisted/updated.

Regards,

Vincent


On 2012-03-26, at 12:14 PM, Vincent Olivier wrote:

Hi Erik,

Yes, actually I was focusing on something that is a easy detail.

I have been looking at eXist's implementation of ft:query and it doesn't
look like I will find an easy fix to use that with so many documents.

More over, one gets the same problem when working with autocomplete controls
in forms for collections of that size as well.

I think the easiest fix for now would be to maintain an external index (I
like SOLR) for both the summary pages and the autocomplete controls.

So what I want to try now, is to add a SOLR submission after the user clicks
on the submit button and before the form persistence submission to
transparently send the XML to SOLR for external indexing and I would use the
Orbeon collection id (containing the "data.xml") as the document ID in SOLR.

I'm wondering what the code would look like, minimally, in order to achieve
that.

Thanks!

Vincent


On 2012-03-26, at 11:59 AM, Erik Bruchez wrote:

Vincent,

When Orbeon Forms writes an XML document, the document id is part of
the path. So it's already available. Or do I not understand properly?

-Erik

On Thu, Mar 22, 2012 at 10:09 AM, Vincent Olivier <[hidden email]> wrote:

Hi again,


Sorry, I'm being dyslexic, here.


By submission, I mean the minimal impact code change within the form itself
to process the SOLR submission within the same user "submit" event that will
trigger the eXist persistence submission.


Hope this makes actual sense.


Vincent



On 2012-03-22, at 1:01 PM, Vincent Olivier wrote:


Hi again,


I'm waiting for the eXist mailing list to enlighten me on how the Lucene
index is exposed to XQuery. Because if I'm limited to what the eXist doc is
showing, it will never be good enough for large collection.


Because, eXist requires an in-memory collection to be passed to the
ft:query() method and also reads the documents for all the Lucene hits and
re-builds a in-memory collection for that. So, for a query that returns a
fair proportion of the collection's documents, that's twice the collection
size for each ft:query() call.


I'm not going to wait until the eXist community gets back to me and try plan
B, instead: have a custom submission just send each form instance to an
external SOLR setup and rewrite the summary query using only the SOLR index.


What I need for this: is it possible to pass the document ID (in exist, this
is the folder containing the "data.xml" file) along with the form instance
XML as a POST to the SOLR service (very XML friendly).


And if it is something that is of interest to any of you, I might post a
little video on how to set this up.


Vincent



On 2012-03-17, at 1:10 PM, Vincent Olivier wrote:


Hi Erik,


I'm on it. But my guess would be that as long as there is a Lucene index
somewhere, there is optimization to be made.


More on this on Tuesday,


Vincent



On 2012-03-16, at 7:10 PM, Erik Bruchez wrote:


Vincent,


Mmh yes that makes sense ;) So here it is:


First, thanks for trying the query. It's a good catch, and it might be

the main reason for the slowness.


However the question now is: how to fix this, assuming we do want to

find out how many documents are in that collection?


On the Lucene question: that's an eXist feature, and the answer is "I

don't know". It woud be better to ask this on the exist-open

mailng-list:


http://sourceforge.net/mail/?group_id=17691


And yes if you can keep helping on this it would be great!


-Erik


On Thu, Mar 15, 2012 at 5:33 AM, Vincent Olivier <[hidden email]> wrote:

By help, I meant answering my questions, of course! :D


I will look into Lucene hooks within eXist today.


Thanks,


V


On 2012-03-14, at 2:19 PM, Vincent Olivier <[hidden email]> wrote:


Yes, I will need help. But not on this (I'm quite good at profiling, much
less so in XForms ;)). Please see my other message coming soon about
repeated sections. If you can help me there, I can put more time on the
performance problem. ;-)


So, still with the same nightly build version and data and forms as last
time. I run a simple XQuery that is part of your code. Actually, just the
snippet where you count the number of documents (see attached).


The same query takes 18ms for 2K data.xml instances and 30s for 20K
instances.


It seems to me that any call on "collection()" is awfully inefficient. Based
on your code, you call it twice in the query! Once for the query, once for
the count.


Is there a way we could manipulate the Lucene index directly. I'm an old
buddy of Lucene's and it never gave me that kind of bad performanceship.
Ever.


Please let me know if you would be interested in helping me rewrite this
query.


Vincent




<person.png>


<company.png>




<count.xq>




On 2012-03-13, at 7:11 PM, Erik Bruchez wrote:


Cool, excellent. Let us know of you need help.


-Erik


On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote:

Yes! I will run the Postman setup tomorrow and get back to you before the

end of the week.


Thanks!


Vincent



On 2012-03-13, at 2:19 AM, Erik Bruchez wrote:


Vincent,


For the improvement mentioned earlier in the thread, we used the Postman

REST Client for Chrome to run a simplified version of the search query:


https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related


Here is the query and the XPL file that runs it:


https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml

https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl


To run query, simply POST it to:


http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]


By taking out parts of the query we were able to figure out the parts that

were slow and improve on it. Is that something you are able to try?


-Erik


On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote:


Hi Erik,


So the reindexing made no noticeable changes.


The "company" form, for 2k instances still loads at around 20 seconds. And

the "person" form, with 20k instances still loads at around 3 minutes. See

screens below.


Any other trick I could try?


Thanks!


Vincent



<Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at

5.32.10 PM.png>





On 2012-03-07, at 12:30 AM, Erik Bruchez wrote:


Vincent,


The index should only make things better.


-Erik


On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote:


Hi Erik,



Thanks for your reply. Will try the Lucene index and eXist client

re-indexing tomorrow. But, will it impact the default view of the summary

page (with no search criterion)?



Thanks!



Vincent





On 2012-03-06, at 12:10 AM, Erik Bruchez wrote:



Vincent,



This means that you should have the improved eXist search query [1],


and so things should be faster!



Did you make sure there is a proper Lucene [2] index configured in


eXist, and that you re-indexed your collections with the eXist client?



-Erik



[1]

https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec


[2]

http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing



On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote:


Hi Guys,



I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion)

with


8GB RAM assigned to the JVM with Tomcat 7.



I'm pre-populating a profile form with 20 000 (20K) XML data instances in


eXist through its REST interface.



My prepopulating script and I are the only users for now.



Loading an individual instance detail view in form runner is a breeze, but


the summary page (either the default view or search results) takes around

3


minutes to load.



So, what configuration changes should I implement to make this setup

faster?


I looked at the wiki, but nothing seems to apply to the summary view.



Help would indeed be appreciated.



Please find attached an example of an instance data XML.



Regards,



Vincent







--


You receive this message as a subscriber of the [hidden email] mailing


list.


To unsubscribe: mailto:[hidden email]


For general help: mailto:[hidden email]?subject=help


OW2 mailing lists service home page: http://www.ow2.org/wws




--


You receive this message as a subscriber of the [hidden email] mailing

list.


To unsubscribe: mailto:[hidden email]


For general help: mailto:[hidden email]?subject=help


OW2 mailing lists service home page: http://www.ow2.org/wws





--


You receive this message as a subscriber of the [hidden email] mailing

list.


To unsubscribe: mailto:[hidden email]


For general help: mailto:[hidden email]?subject=help


OW2 mailing lists service home page: http://www.ow2.org/wws




--

You receive this message as a subscriber of the [hidden email] mailing

list.

To unsubscribe: mailto:[hidden email]

For general help: mailto:[hidden email]?subject=help

OW2 mailing lists service home page: http://www.ow2.org/wws





--

You receive this message as a subscriber of the [hidden email] mailing

list.

To unsubscribe: mailto:[hidden email]

For general help: mailto:[hidden email]?subject=help

OW2 mailing lists service home page: http://www.ow2.org/wws






--

You receive this message as a subscriber of the [hidden email] mailing

list.

To unsubscribe: mailto:[hidden email]

For general help: mailto:[hidden email]?subject=help

OW2 mailing lists service home page: http://www.ow2.org/wws


...

[Message clipped]

--
You receive this message as a subscriber of the [hidden email] mailing
list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Orbeon Performance & Scalability

Erik Bruchez
Administrator
Vincent,

Thanks for the update.

Feel free to send me the repository with the 20k docs. I can't
guarantee I'll look at it immediately.

Please keep us posted on the MySQL side of things!

-Erik

On Wed, Apr 11, 2012 at 3:13 PM, Vincent Olivier <[hidden email]> wrote:

> Hi Erik,
>
> Sorry for the late reply.
>
> My problem is that the eXist XQuery interface to the Lucene index seems
> intrinsically flawed, as far as performances are concerned. I have submitted
> this issue to the eXist mailing list, got a reply from Wolfgang Meier, have
> tried his solution to the best of my very limited XQuery knowledge, but
> still, even when taking his reply into account, and given the simplicity of
> the test query I have put in place, it seems that eXist is really to blame.
> It's obviously not so much the ft:query call that is the problem, but rather
> any form of usage of the collection() method. If you see what Wolfgang
> means, and provide some help on how to implement it (I have tried to reach
> him subsequently without success), I am willing to try it.
>
> If you want, I can send you a compressed repository of the 20k docs I'm
> testing the performances on.
>
> I'm very much looking forward to get Orbeon in shape for that kind of use
> case… But I'm investigating MySQL's performances, now.
>
> Vincent
>
> On 2012-04-03, at 12:30 AM, Erik Bruchez wrote:
>
> Vincent,
>
> Here is the search code for MySQL:
>
> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/mysql/search.xpl
>
> Separate indexing via SOLR would be good, but we won't have time to
> work on this anytime soon. It would be even better if eXist could do
> things properly. But can we rule out that we can't improve our XQuery
> query to go faster?
>
> -Erik
>
> On Mon, Apr 2, 2012 at 1:04 PM, Vincent Olivier <[hidden email]> wrote:
>
> Hi guys,
>
>
>
> Actually, I think there are 2 options at this point:
>
>
> MySQL persistence
>
> eXist persistence with SOLR indexing (at persistence level)
>
>
> Because I think that eXist's internal indexing doesn't scale, period.
>
>
> I'm going to try MySQL first (but I would like to know there is the query
>
> code for the summary page in that case).
>
>
> And also, I would like your thoughts on extending the eXist persistence to
>
> have the XML data sent to SOLR everytime an instance is persisted/updated.
>
>
> Regards,
>
>
> Vincent
>
>
>
> On 2012-03-26, at 12:14 PM, Vincent Olivier wrote:
>
>
> Hi Erik,
>
>
> Yes, actually I was focusing on something that is a easy detail.
>
>
> I have been looking at eXist's implementation of ft:query and it doesn't
>
> look like I will find an easy fix to use that with so many documents.
>
>
> More over, one gets the same problem when working with autocomplete controls
>
> in forms for collections of that size as well.
>
>
> I think the easiest fix for now would be to maintain an external index (I
>
> like SOLR) for both the summary pages and the autocomplete controls.
>
>
> So what I want to try now, is to add a SOLR submission after the user clicks
>
> on the submit button and before the form persistence submission to
>
> transparently send the XML to SOLR for external indexing and I would use the
>
> Orbeon collection id (containing the "data.xml") as the document ID in SOLR.
>
>
> I'm wondering what the code would look like, minimally, in order to achieve
>
> that.
>
>
> Thanks!
>
>
> Vincent
>
>
>
> On 2012-03-26, at 11:59 AM, Erik Bruchez wrote:
>
>
> Vincent,
>
>
> When Orbeon Forms writes an XML document, the document id is part of
>
> the path. So it's already available. Or do I not understand properly?
>
>
> -Erik
>
>
> On Thu, Mar 22, 2012 at 10:09 AM, Vincent Olivier <[hidden email]> wrote:
>
>
> Hi again,
>
>
>
> Sorry, I'm being dyslexic, here.
>
>
>
> By submission, I mean the minimal impact code change within the form itself
>
> to process the SOLR submission within the same user "submit" event that will
>
> trigger the eXist persistence submission.
>
>
>
> Hope this makes actual sense.
>
>
>
> Vincent
>
>
>
>
> On 2012-03-22, at 1:01 PM, Vincent Olivier wrote:
>
>
>
> Hi again,
>
>
>
> I'm waiting for the eXist mailing list to enlighten me on how the Lucene
>
> index is exposed to XQuery. Because if I'm limited to what the eXist doc is
>
> showing, it will never be good enough for large collection.
>
>
>
> Because, eXist requires an in-memory collection to be passed to the
>
> ft:query() method and also reads the documents for all the Lucene hits and
>
> re-builds a in-memory collection for that. So, for a query that returns a
>
> fair proportion of the collection's documents, that's twice the collection
>
> size for each ft:query() call.
>
>
>
> I'm not going to wait until the eXist community gets back to me and try plan
>
> B, instead: have a custom submission just send each form instance to an
>
> external SOLR setup and rewrite the summary query using only the SOLR index.
>
>
>
> What I need for this: is it possible to pass the document ID (in exist, this
>
> is the folder containing the "data.xml" file) along with the form instance
>
> XML as a POST to the SOLR service (very XML friendly).
>
>
>
> And if it is something that is of interest to any of you, I might post a
>
> little video on how to set this up.
>
>
>
> Vincent
>
>
>
>
> On 2012-03-17, at 1:10 PM, Vincent Olivier wrote:
>
>
>
> Hi Erik,
>
>
>
> I'm on it. But my guess would be that as long as there is a Lucene index
>
> somewhere, there is optimization to be made.
>
>
>
> More on this on Tuesday,
>
>
>
> Vincent
>
>
>
>
> On 2012-03-16, at 7:10 PM, Erik Bruchez wrote:
>
>
>
> Vincent,
>
>
>
> Mmh yes that makes sense ;) So here it is:
>
>
>
> First, thanks for trying the query. It's a good catch, and it might be
>
>
> the main reason for the slowness.
>
>
>
> However the question now is: how to fix this, assuming we do want to
>
>
> find out how many documents are in that collection?
>
>
>
> On the Lucene question: that's an eXist feature, and the answer is "I
>
>
> don't know". It woud be better to ask this on the exist-open
>
>
> mailng-list:
>
>
>
> http://sourceforge.net/mail/?group_id=17691
>
>
>
> And yes if you can keep helping on this it would be great!
>
>
>
> -Erik
>
>
>
> On Thu, Mar 15, 2012 at 5:33 AM, Vincent Olivier <[hidden email]> wrote:
>
>
> By help, I meant answering my questions, of course! :D
>
>
>
> I will look into Lucene hooks within eXist today.
>
>
>
> Thanks,
>
>
>
> V
>
>
>
> On 2012-03-14, at 2:19 PM, Vincent Olivier <[hidden email]> wrote:
>
>
>
> Yes, I will need help. But not on this (I'm quite good at profiling, much
>
> less so in XForms ;)). Please see my other message coming soon about
>
> repeated sections. If you can help me there, I can put more time on the
>
> performance problem. ;-)
>
>
>
> So, still with the same nightly build version and data and forms as last
>
> time. I run a simple XQuery that is part of your code. Actually, just the
>
> snippet where you count the number of documents (see attached).
>
>
>
> The same query takes 18ms for 2K data.xml instances and 30s for 20K
>
> instances.
>
>
>
> It seems to me that any call on "collection()" is awfully inefficient. Based
>
> on your code, you call it twice in the query! Once for the query, once for
>
> the count.
>
>
>
> Is there a way we could manipulate the Lucene index directly. I'm an old
>
> buddy of Lucene's and it never gave me that kind of bad performanceship.
>
> Ever.
>
>
>
> Please let me know if you would be interested in helping me rewrite this
>
> query.
>
>
>
> Vincent
>
>
>
>
>
> <person.png>
>
>
>
> <company.png>
>
>
>
>
>
> <count.xq>
>
>
>
>
>
> On 2012-03-13, at 7:11 PM, Erik Bruchez wrote:
>
>
>
> Cool, excellent. Let us know of you need help.
>
>
>
> -Erik
>
>
>
> On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote:
>
>
> Yes! I will run the Postman setup tomorrow and get back to you before the
>
>
> end of the week.
>
>
>
> Thanks!
>
>
>
> Vincent
>
>
>
>
> On 2012-03-13, at 2:19 AM, Erik Bruchez wrote:
>
>
>
> Vincent,
>
>
>
> For the improvement mentioned earlier in the thread, we used the Postman
>
>
> REST Client for Chrome to run a simplified version of the search query:
>
>
>
> https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related
>
>
>
> Here is the query and the XPL file that runs it:
>
>
>
> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml
>
>
> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl
>
>
>
> To run query, simply POST it to:
>
>
>
> http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]
>
>
>
> By taking out parts of the query we were able to figure out the parts that
>
>
> were slow and improve on it. Is that something you are able to try?
>
>
>
> -Erik
>
>
>
> On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote:
>
>
>
> Hi Erik,
>
>
>
> So the reindexing made no noticeable changes.
>
>
>
> The "company" form, for 2k instances still loads at around 20 seconds. And
>
>
> the "person" form, with 20k instances still loads at around 3 minutes. See
>
>
> screens below.
>
>
>
> Any other trick I could try?
>
>
>
> Thanks!
>
>
>
> Vincent
>
>
>
>
> <Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at
>
>
> 5.32.10 PM.png>
>
>
>
>
>
>
> On 2012-03-07, at 12:30 AM, Erik Bruchez wrote:
>
>
>
> Vincent,
>
>
>
> The index should only make things better.
>
>
>
> -Erik
>
>
>
> On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote:
>
>
>
> Hi Erik,
>
>
>
>
> Thanks for your reply. Will try the Lucene index and eXist client
>
>
> re-indexing tomorrow. But, will it impact the default view of the summary
>
>
> page (with no search criterion)?
>
>
>
>
> Thanks!
>
>
>
>
> Vincent
>
>
>
>
>
>
> On 2012-03-06, at 12:10 AM, Erik Bruchez wrote:
>
>
>
>
> Vincent,
>
>
>
>
> This means that you should have the improved eXist search query [1],
>
>
>
> and so things should be faster!
>
>
>
>
> Did you make sure there is a proper Lucene [2] index configured in
>
>
>
> eXist, and that you re-indexed your collections with the eXist client?
>
>
>
>
> -Erik
>
>
>
>
> [1]
>
>
> https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec
>
>
>
> [2]
>
>
> http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing
>
>
>
>
> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote:
>
>
>
> Hi Guys,
>
>
>
>
> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion)
>
>
> with
>
>
>
> 8GB RAM assigned to the JVM with Tomcat 7.
>
>
>
>
> I'm pre-populating a profile form with 20 000 (20K) XML data instances in
>
>
>
> eXist through its REST interface.
>
>
>
>
> My prepopulating script and I are the only users for now.
>
>
>
>
> Loading an individual instance detail view in form runner is a breeze, but
>
>
>
> the summary page (either the default view or search results) takes around
>
>
> 3
>
>
>
> minutes to load.
>
>
>
>
> So, what configuration changes should I implement to make this setup
>
>
> faster?
>
>
>
> I looked at the wiki, but nothing seems to apply to the summary view.
>
>
>
>
> Help would indeed be appreciated.
>
>
>
>
> Please find attached an example of an instance data XML.
>
>
>
>
> Regards,
>
>
>
>
> Vincent
>
>
>
>
>
>
>
>
> --
>
>
>
> You receive this message as a subscriber of the [hidden email] mailing
>
>
>
> list.
>
>
>
> To unsubscribe: mailto:[hidden email]
>
>
>
> For general help: mailto:[hidden email]?subject=help
>
>
>
> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>
>
>
>
> --
>
>
>
> You receive this message as a subscriber of the [hidden email] mailing
>
>
> list.
>
>
>
> To unsubscribe: mailto:[hidden email]
>
>
>
> For general help: mailto:[hidden email]?subject=help
>
>
>
> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>
>
>
>
>
> --
>
>
>
> You receive this message as a subscriber of the [hidden email] mailing
>
>
> list.
>
>
>
> To unsubscribe: mailto:[hidden email]
>
>
>
> For general help: mailto:[hidden email]?subject=help
>
>
>
> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>
>
>
>
> --
>
>
> You receive this message as a subscriber of the [hidden email] mailing
>
>
> list.
>
>
> To unsubscribe: mailto:[hidden email]
>
>
> For general help: mailto:[hidden email]?subject=help
>
>
> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>
>
>
>
>
> --
>
>
> You receive this message as a subscriber of the [hidden email] mailing
>
>
> list.
>
>
> To unsubscribe: mailto:[hidden email]
>
>
> For general help: mailto:[hidden email]?subject=help
>
>
> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>
>
>
>
>
>
> --
>
>
> You receive this message as a subscriber of the [hidden email] mailing
>
>
> list.
>
>
> To unsubscribe: mailto:[hidden email]
>
>
> For general help: mailto:[hidden email]?subject=help
>
>
> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>
>
> ...
>
>
> [Message clipped]
>
>
> --
>
> You receive this message as a subscriber of the [hidden email] mailing
>
> list.
>
> To unsubscribe: mailto:[hidden email]
>
> For general help: mailto:[hidden email]?subject=help
>
> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>
>
> --
> You receive this message as a subscriber of the [hidden email] mailing
> list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>
>
>
> --
> You receive this message as a subscriber of the [hidden email] mailing
> list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws
>


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws