Hi Guys,
I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) with 8GB RAM assigned to the JVM with Tomcat 7. I'm pre-populating a profile form with 20 000 (20K) XML data instances in eXist through its REST interface. My prepopulating script and I are the only users for now. Loading an individual instance detail view in form runner is a breeze, but the summary page (either the default view or search results) takes around 3 minutes to load. So, what configuration changes should I implement to make this setup faster? I looked at the wiki, but nothing seems to apply to the summary view. Help would indeed be appreciated. Please find attached an example of an instance data XML. Regards, Vincent -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws data.xml (1K) Download Attachment |
Administrator
|
Vincent,
This means that you should have the improved eXist search query [1], and so things should be faster! Did you make sure there is a proper Lucene [2] index configured in eXist, and that you re-indexed your collections with the eXist client? -Erik [1] https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec [2] http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote: > Hi Guys, > > I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) with > 8GB RAM assigned to the JVM with Tomcat 7. > > I'm pre-populating a profile form with 20 000 (20K) XML data instances in > eXist through its REST interface. > > My prepopulating script and I are the only users for now. > > Loading an individual instance detail view in form runner is a breeze, but > the summary page (either the default view or search results) takes around 3 > minutes to load. > > So, what configuration changes should I implement to make this setup faster? > I looked at the wiki, but nothing seems to apply to the summary view. > > Help would indeed be appreciated. > > Please find attached an example of an instance data XML. > > Regards, > > Vincent > > > > > > -- > You receive this message as a subscriber of the [hidden email] mailing > list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > OW2 mailing lists service home page: http://www.ow2.org/wws > -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Hi Erik,
Thanks for your reply. Will try the Lucene index and eXist client re-indexing tomorrow. But, will it impact the default view of the summary page (with no search criterion)? Thanks! Vincent On 2012-03-06, at 12:10 AM, Erik Bruchez wrote: > Vincent, > > This means that you should have the improved eXist search query [1], > and so things should be faster! > > Did you make sure there is a proper Lucene [2] index configured in > eXist, and that you re-indexed your collections with the eXist client? > > -Erik > > [1] https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec > [2] http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing > > On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote: >> Hi Guys, >> >> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) with >> 8GB RAM assigned to the JVM with Tomcat 7. >> >> I'm pre-populating a profile form with 20 000 (20K) XML data instances in >> eXist through its REST interface. >> >> My prepopulating script and I are the only users for now. >> >> Loading an individual instance detail view in form runner is a breeze, but >> the summary page (either the default view or search results) takes around 3 >> minutes to load. >> >> So, what configuration changes should I implement to make this setup faster? >> I looked at the wiki, but nothing seems to apply to the summary view. >> >> Help would indeed be appreciated. >> >> Please find attached an example of an instance data XML. >> >> Regards, >> >> Vincent >> >> >> >> >> >> -- >> You receive this message as a subscriber of the [hidden email] mailing >> list. >> To unsubscribe: mailto:[hidden email] >> For general help: mailto:[hidden email]?subject=help >> OW2 mailing lists service home page: http://www.ow2.org/wws >> > > -- > You receive this message as a subscriber of the [hidden email] mailing list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > OW2 mailing lists service home page: http://www.ow2.org/wws -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Administrator
|
Vincent,
The index should only make things better. -Erik On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote: > Hi Erik, > > Thanks for your reply. Will try the Lucene index and eXist client re-indexing tomorrow. But, will it impact the default view of the summary page (with no search criterion)? > > Thanks! > > Vincent > > > > On 2012-03-06, at 12:10 AM, Erik Bruchez wrote: > >> Vincent, >> >> This means that you should have the improved eXist search query [1], >> and so things should be faster! >> >> Did you make sure there is a proper Lucene [2] index configured in >> eXist, and that you re-indexed your collections with the eXist client? >> >> -Erik >> >> [1] https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec >> [2] http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing >> >> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote: >>> Hi Guys, >>> >>> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) with >>> 8GB RAM assigned to the JVM with Tomcat 7. >>> >>> I'm pre-populating a profile form with 20 000 (20K) XML data instances in >>> eXist through its REST interface. >>> >>> My prepopulating script and I are the only users for now. >>> >>> Loading an individual instance detail view in form runner is a breeze, but >>> the summary page (either the default view or search results) takes around 3 >>> minutes to load. >>> >>> So, what configuration changes should I implement to make this setup faster? >>> I looked at the wiki, but nothing seems to apply to the summary view. >>> >>> Help would indeed be appreciated. >>> >>> Please find attached an example of an instance data XML. >>> >>> Regards, >>> >>> Vincent >>> >>> >>> >>> >>> >>> -- >>> You receive this message as a subscriber of the [hidden email] mailing >>> list. >>> To unsubscribe: mailto:[hidden email] >>> For general help: mailto:[hidden email]?subject=help >>> OW2 mailing lists service home page: http://www.ow2.org/wws >>> >> >> -- >> You receive this message as a subscriber of the [hidden email] mailing list. >> To unsubscribe: mailto:[hidden email] >> For general help: mailto:[hidden email]?subject=help >> OW2 mailing lists service home page: http://www.ow2.org/wws > > > > -- > You receive this message as a subscriber of the [hidden email] mailing list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > OW2 mailing lists service home page: http://www.ow2.org/wws > -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Hi Erik,
So the reindexing made no noticeable changes. The "company" form, for 2k instances still loads at around 20 seconds. And the "person" form, with 20k instances still loads at around 3 minutes. See screens below. Any other trick I could try? Thanks! Vincent On 2012-03-07, at 12:30 AM, Erik Bruchez wrote:
-- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Administrator
|
Vincent,
For the improvement mentioned earlier in the thread, we used the Postman REST Client for Chrome to run a simplified version of the search query: Here is the query and the XPL file that runs it: To run query, simply POST it to: <a href="http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]">http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM]
By taking out parts of the query we were able to figure out the parts that were slow and improve on it. Is that something you are able to try? -Erik
On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote:
-- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Yes! I will run the Postman setup tomorrow and get back to you before the end of the week.
Thanks! Vincent On 2012-03-13, at 2:19 AM, Erik Bruchez wrote: Vincent, -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Administrator
|
Cool, excellent. Let us know of you need help.
-Erik On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote: > Yes! I will run the Postman setup tomorrow and get back to you before the > end of the week. > > Thanks! > > Vincent > > > On 2012-03-13, at 2:19 AM, Erik Bruchez wrote: > > Vincent, > > For the improvement mentioned earlier in the thread, we used the Postman > REST Client for Chrome to run a simplified version of the search query: > > https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related > > Here is the query and the XPL file that runs it: > > https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml > https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl > > To run query, simply POST it to: > > http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM] > > By taking out parts of the query we were able to figure out the parts that > were slow and improve on it. Is that something you are able to try? > > -Erik > > On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote: >> >> Hi Erik, >> >> So the reindexing made no noticeable changes. >> >> The "company" form, for 2k instances still loads at around 20 seconds. And >> the "person" form, with 20k instances still loads at around 3 minutes. See >> screens below. >> >> Any other trick I could try? >> >> Thanks! >> >> Vincent >> >> >> <Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at >> 5.32.10 PM.png> >> >> >> >> >> On 2012-03-07, at 12:30 AM, Erik Bruchez wrote: >> >> Vincent, >> >> The index should only make things better. >> >> -Erik >> >> On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote: >> >> Hi Erik, >> >> >> Thanks for your reply. Will try the Lucene index and eXist client >> re-indexing tomorrow. But, will it impact the default view of the summary >> page (with no search criterion)? >> >> >> Thanks! >> >> >> Vincent >> >> >> >> >> On 2012-03-06, at 12:10 AM, Erik Bruchez wrote: >> >> >> Vincent, >> >> >> This means that you should have the improved eXist search query [1], >> >> and so things should be faster! >> >> >> Did you make sure there is a proper Lucene [2] index configured in >> >> eXist, and that you re-indexed your collections with the eXist client? >> >> >> -Erik >> >> >> [1] >> https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec >> >> [2] >> http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing >> >> >> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote: >> >> Hi Guys, >> >> >> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) >> with >> >> 8GB RAM assigned to the JVM with Tomcat 7. >> >> >> I'm pre-populating a profile form with 20 000 (20K) XML data instances in >> >> eXist through its REST interface. >> >> >> My prepopulating script and I are the only users for now. >> >> >> Loading an individual instance detail view in form runner is a breeze, but >> >> the summary page (either the default view or search results) takes around >> 3 >> >> minutes to load. >> >> >> So, what configuration changes should I implement to make this setup >> faster? >> >> I looked at the wiki, but nothing seems to apply to the summary view. >> >> >> Help would indeed be appreciated. >> >> >> Please find attached an example of an instance data XML. >> >> >> Regards, >> >> >> Vincent >> >> >> >> >> >> >> -- >> >> You receive this message as a subscriber of the [hidden email] mailing >> >> list. >> >> To unsubscribe: mailto:[hidden email] >> >> For general help: mailto:[hidden email]?subject=help >> >> OW2 mailing lists service home page: http://www.ow2.org/wws >> >> >> >> -- >> >> You receive this message as a subscriber of the [hidden email] mailing >> list. >> >> To unsubscribe: mailto:[hidden email] >> >> For general help: mailto:[hidden email]?subject=help >> >> OW2 mailing lists service home page: http://www.ow2.org/wws >> >> >> >> >> -- >> >> You receive this message as a subscriber of the [hidden email] mailing >> list. >> >> To unsubscribe: mailto:[hidden email] >> >> For general help: mailto:[hidden email]?subject=help >> >> OW2 mailing lists service home page: http://www.ow2.org/wws >> >> >> >> -- >> You receive this message as a subscriber of the [hidden email] mailing >> list. >> To unsubscribe: mailto:[hidden email] >> For general help: mailto:[hidden email]?subject=help >> OW2 mailing lists service home page: http://www.ow2.org/wws >> >> >> >> >> -- >> You receive this message as a subscriber of the [hidden email] mailing >> list. >> To unsubscribe: mailto:[hidden email] >> For general help: mailto:[hidden email]?subject=help >> OW2 mailing lists service home page: http://www.ow2.org/wws >> > > > > > -- > You receive this message as a subscriber of the [hidden email] mailing > list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > OW2 mailing lists service home page: http://www.ow2.org/wws > -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Yes, I will need help. But not on this (I'm quite good at profiling, much less so in XForms ;)). Please see my other message coming soon about repeated sections. If you can help me there, I can put more time on the performance problem. ;-)
So, still with the same nightly build version and data and forms as last time. I run a simple XQuery that is part of your code. Actually, just the snippet where you count the number of documents (see attached). The same query takes 18ms for 2K data.xml instances and 30s for 20K instances. It seems to me that any call on "collection()" is awfully inefficient. Based on your code, you call it twice in the query! Once for the query, once for the count. Is there a way we could manipulate the Lucene index directly. I'm an old buddy of Lucene's and it never gave me that kind of bad performanceship. Ever. Please let me know if you would be interested in helping me rewrite this query. Vincent On 2012-03-13, at 7:11 PM, Erik Bruchez wrote: > Cool, excellent. Let us know of you need help. > > -Erik > > On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote: >> Yes! I will run the Postman setup tomorrow and get back to you before the >> end of the week. >> >> Thanks! >> >> Vincent >> >> >> On 2012-03-13, at 2:19 AM, Erik Bruchez wrote: >> >> Vincent, >> >> For the improvement mentioned earlier in the thread, we used the Postman >> REST Client for Chrome to run a simplified version of the search query: >> >> https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related >> >> Here is the query and the XPL file that runs it: >> >> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml >> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl >> >> To run query, simply POST it to: >> >> http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM] >> >> By taking out parts of the query we were able to figure out the parts that >> were slow and improve on it. Is that something you are able to try? >> >> -Erik >> >> On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote: >>> >>> Hi Erik, >>> >>> So the reindexing made no noticeable changes. >>> >>> The "company" form, for 2k instances still loads at around 20 seconds. And >>> the "person" form, with 20k instances still loads at around 3 minutes. See >>> screens below. >>> >>> Any other trick I could try? >>> >>> Thanks! >>> >>> Vincent >>> >>> >>> <Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at >>> 5.32.10 PM.png> >>> >>> >>> >>> >>> On 2012-03-07, at 12:30 AM, Erik Bruchez wrote: >>> >>> Vincent, >>> >>> The index should only make things better. >>> >>> -Erik >>> >>> On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote: >>> >>> Hi Erik, >>> >>> >>> Thanks for your reply. Will try the Lucene index and eXist client >>> re-indexing tomorrow. But, will it impact the default view of the summary >>> page (with no search criterion)? >>> >>> >>> Thanks! >>> >>> >>> Vincent >>> >>> >>> >>> >>> On 2012-03-06, at 12:10 AM, Erik Bruchez wrote: >>> >>> >>> Vincent, >>> >>> >>> This means that you should have the improved eXist search query [1], >>> >>> and so things should be faster! >>> >>> >>> Did you make sure there is a proper Lucene [2] index configured in >>> >>> eXist, and that you re-indexed your collections with the eXist client? >>> >>> >>> -Erik >>> >>> >>> [1] >>> https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec >>> >>> [2] >>> http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing >>> >>> >>> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote: >>> >>> Hi Guys, >>> >>> >>> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) >>> with >>> >>> 8GB RAM assigned to the JVM with Tomcat 7. >>> >>> >>> I'm pre-populating a profile form with 20 000 (20K) XML data instances in >>> >>> eXist through its REST interface. >>> >>> >>> My prepopulating script and I are the only users for now. >>> >>> >>> Loading an individual instance detail view in form runner is a breeze, but >>> >>> the summary page (either the default view or search results) takes around >>> 3 >>> >>> minutes to load. >>> >>> >>> So, what configuration changes should I implement to make this setup >>> faster? >>> >>> I looked at the wiki, but nothing seems to apply to the summary view. >>> >>> >>> Help would indeed be appreciated. >>> >>> >>> Please find attached an example of an instance data XML. >>> >>> >>> Regards, >>> >>> >>> Vincent >>> >>> >>> >>> >>> >>> >>> -- >>> >>> You receive this message as a subscriber of the [hidden email] mailing >>> >>> list. >>> >>> To unsubscribe: mailto:[hidden email] >>> >>> For general help: mailto:[hidden email]?subject=help >>> >>> OW2 mailing lists service home page: http://www.ow2.org/wws >>> >>> >>> >>> -- >>> >>> You receive this message as a subscriber of the [hidden email] mailing >>> list. >>> >>> To unsubscribe: mailto:[hidden email] >>> >>> For general help: mailto:[hidden email]?subject=help >>> >>> OW2 mailing lists service home page: http://www.ow2.org/wws >>> >>> >>> >>> >>> -- >>> >>> You receive this message as a subscriber of the [hidden email] mailing >>> list. >>> >>> To unsubscribe: mailto:[hidden email] >>> >>> For general help: mailto:[hidden email]?subject=help >>> >>> OW2 mailing lists service home page: http://www.ow2.org/wws >>> >>> >>> >>> -- >>> You receive this message as a subscriber of the [hidden email] mailing >>> list. >>> To unsubscribe: mailto:[hidden email] >>> For general help: mailto:[hidden email]?subject=help >>> OW2 mailing lists service home page: http://www.ow2.org/wws >>> >>> >>> >>> >>> -- >>> You receive this message as a subscriber of the [hidden email] mailing >>> list. >>> To unsubscribe: mailto:[hidden email] >>> For general help: mailto:[hidden email]?subject=help >>> OW2 mailing lists service home page: http://www.ow2.org/wws >>> >> >> >> >> >> -- >> You receive this message as a subscriber of the [hidden email] mailing >> list. >> To unsubscribe: mailto:[hidden email] >> For general help: mailto:[hidden email]?subject=help >> OW2 mailing lists service home page: http://www.ow2.org/wws >> > > -- > You receive this message as a subscriber of the [hidden email] mailing list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > OW2 mailing lists service home page: http://www.ow2.org/wws -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws person.png (82K) Download Attachment company.png (81K) Download Attachment count.xq (163 bytes) Download Attachment |
By help, I meant answering my questions, of course! :D
I will look into Lucene hooks within eXist today. Thanks, V On 2012-03-14, at 2:19 PM, Vincent Olivier <[hidden email]> wrote: > Yes, I will need help. But not on this (I'm quite good at profiling, much less so in XForms ;)). Please see my other message coming soon about repeated sections. If you can help me there, I can put more time on the performance problem. ;-) > > So, still with the same nightly build version and data and forms as last time. I run a simple XQuery that is part of your code. Actually, just the snippet where you count the number of documents (see attached). > > The same query takes 18ms for 2K data.xml instances and 30s for 20K instances. > > It seems to me that any call on "collection()" is awfully inefficient. Based on your code, you call it twice in the query! Once for the query, once for the count. > > Is there a way we could manipulate the Lucene index directly. I'm an old buddy of Lucene's and it never gave me that kind of bad performanceship. Ever. > > Please let me know if you would be interested in helping me rewrite this query. > > Vincent > > > > <person.png> > > <company.png> > > > > <count.xq> > > > > On 2012-03-13, at 7:11 PM, Erik Bruchez wrote: > >> Cool, excellent. Let us know of you need help. >> >> -Erik >> >> On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote: >>> Yes! I will run the Postman setup tomorrow and get back to you before the >>> end of the week. >>> >>> Thanks! >>> >>> Vincent >>> >>> >>> On 2012-03-13, at 2:19 AM, Erik Bruchez wrote: >>> >>> Vincent, >>> >>> For the improvement mentioned earlier in the thread, we used the Postman >>> REST Client for Chrome to run a simplified version of the search query: >>> >>> https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related >>> >>> Here is the query and the XPL file that runs it: >>> >>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml >>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl >>> >>> To run query, simply POST it to: >>> >>> http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM] >>> >>> By taking out parts of the query we were able to figure out the parts that >>> were slow and improve on it. Is that something you are able to try? >>> >>> -Erik >>> >>> On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote: >>>> >>>> Hi Erik, >>>> >>>> So the reindexing made no noticeable changes. >>>> >>>> The "company" form, for 2k instances still loads at around 20 seconds. And >>>> the "person" form, with 20k instances still loads at around 3 minutes. See >>>> screens below. >>>> >>>> Any other trick I could try? >>>> >>>> Thanks! >>>> >>>> Vincent >>>> >>>> >>>> <Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at >>>> 5.32.10 PM.png> >>>> >>>> >>>> >>>> >>>> On 2012-03-07, at 12:30 AM, Erik Bruchez wrote: >>>> >>>> Vincent, >>>> >>>> The index should only make things better. >>>> >>>> -Erik >>>> >>>> On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote: >>>> >>>> Hi Erik, >>>> >>>> >>>> Thanks for your reply. Will try the Lucene index and eXist client >>>> re-indexing tomorrow. But, will it impact the default view of the summary >>>> page (with no search criterion)? >>>> >>>> >>>> Thanks! >>>> >>>> >>>> Vincent >>>> >>>> >>>> >>>> >>>> On 2012-03-06, at 12:10 AM, Erik Bruchez wrote: >>>> >>>> >>>> Vincent, >>>> >>>> >>>> This means that you should have the improved eXist search query [1], >>>> >>>> and so things should be faster! >>>> >>>> >>>> Did you make sure there is a proper Lucene [2] index configured in >>>> >>>> eXist, and that you re-indexed your collections with the eXist client? >>>> >>>> >>>> -Erik >>>> >>>> >>>> [1] >>>> https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec >>>> >>>> [2] >>>> http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing >>>> >>>> >>>> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote: >>>> >>>> Hi Guys, >>>> >>>> >>>> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) >>>> with >>>> >>>> 8GB RAM assigned to the JVM with Tomcat 7. >>>> >>>> >>>> I'm pre-populating a profile form with 20 000 (20K) XML data instances in >>>> >>>> eXist through its REST interface. >>>> >>>> >>>> My prepopulating script and I are the only users for now. >>>> >>>> >>>> Loading an individual instance detail view in form runner is a breeze, but >>>> >>>> the summary page (either the default view or search results) takes around >>>> 3 >>>> >>>> minutes to load. >>>> >>>> >>>> So, what configuration changes should I implement to make this setup >>>> faster? >>>> >>>> I looked at the wiki, but nothing seems to apply to the summary view. >>>> >>>> >>>> Help would indeed be appreciated. >>>> >>>> >>>> Please find attached an example of an instance data XML. >>>> >>>> >>>> Regards, >>>> >>>> >>>> Vincent >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> You receive this message as a subscriber of the [hidden email] mailing >>>> >>>> list. >>>> >>>> To unsubscribe: mailto:[hidden email] >>>> >>>> For general help: mailto:[hidden email]?subject=help >>>> >>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>> >>>> >>>> >>>> -- >>>> >>>> You receive this message as a subscriber of the [hidden email] mailing >>>> list. >>>> >>>> To unsubscribe: mailto:[hidden email] >>>> >>>> For general help: mailto:[hidden email]?subject=help >>>> >>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> You receive this message as a subscriber of the [hidden email] mailing >>>> list. >>>> >>>> To unsubscribe: mailto:[hidden email] >>>> >>>> For general help: mailto:[hidden email]?subject=help >>>> >>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>> >>>> >>>> >>>> -- >>>> You receive this message as a subscriber of the [hidden email] mailing >>>> list. >>>> To unsubscribe: mailto:[hidden email] >>>> For general help: mailto:[hidden email]?subject=help >>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>> >>>> >>>> >>>> >>>> -- >>>> You receive this message as a subscriber of the [hidden email] mailing >>>> list. >>>> To unsubscribe: mailto:[hidden email] >>>> For general help: mailto:[hidden email]?subject=help >>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>> >>> >>> >>> >>> >>> -- >>> You receive this message as a subscriber of the [hidden email] mailing >>> list. >>> To unsubscribe: mailto:[hidden email] >>> For general help: mailto:[hidden email]?subject=help >>> OW2 mailing lists service home page: http://www.ow2.org/wws >>> >> >> -- >> You receive this message as a subscriber of the [hidden email] mailing list. >> To unsubscribe: mailto:[hidden email] >> For general help: mailto:[hidden email]?subject=help >> OW2 mailing lists service home page: http://www.ow2.org/wws > > > -- > You receive this message as a subscriber of the [hidden email] mailing list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > OW2 mailing lists service home page: http://www.ow2.org/wws -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Administrator
|
Vincent,
Mmh yes that makes sense ;) So here it is: First, thanks for trying the query. It's a good catch, and it might be the main reason for the slowness. However the question now is: how to fix this, assuming we do want to find out how many documents are in that collection? On the Lucene question: that's an eXist feature, and the answer is "I don't know". It woud be better to ask this on the exist-open mailng-list: http://sourceforge.net/mail/?group_id=17691 And yes if you can keep helping on this it would be great! -Erik On Thu, Mar 15, 2012 at 5:33 AM, Vincent Olivier <[hidden email]> wrote: > By help, I meant answering my questions, of course! :D > > I will look into Lucene hooks within eXist today. > > Thanks, > > V > > On 2012-03-14, at 2:19 PM, Vincent Olivier <[hidden email]> wrote: > >> Yes, I will need help. But not on this (I'm quite good at profiling, much less so in XForms ;)). Please see my other message coming soon about repeated sections. If you can help me there, I can put more time on the performance problem. ;-) >> >> So, still with the same nightly build version and data and forms as last time. I run a simple XQuery that is part of your code. Actually, just the snippet where you count the number of documents (see attached). >> >> The same query takes 18ms for 2K data.xml instances and 30s for 20K instances. >> >> It seems to me that any call on "collection()" is awfully inefficient. Based on your code, you call it twice in the query! Once for the query, once for the count. >> >> Is there a way we could manipulate the Lucene index directly. I'm an old buddy of Lucene's and it never gave me that kind of bad performanceship. Ever. >> >> Please let me know if you would be interested in helping me rewrite this query. >> >> Vincent >> >> >> >> <person.png> >> >> <company.png> >> >> >> >> <count.xq> >> >> >> >> On 2012-03-13, at 7:11 PM, Erik Bruchez wrote: >> >>> Cool, excellent. Let us know of you need help. >>> >>> -Erik >>> >>> On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote: >>>> Yes! I will run the Postman setup tomorrow and get back to you before the >>>> end of the week. >>>> >>>> Thanks! >>>> >>>> Vincent >>>> >>>> >>>> On 2012-03-13, at 2:19 AM, Erik Bruchez wrote: >>>> >>>> Vincent, >>>> >>>> For the improvement mentioned earlier in the thread, we used the Postman >>>> REST Client for Chrome to run a simplified version of the search query: >>>> >>>> https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related >>>> >>>> Here is the query and the XPL file that runs it: >>>> >>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml >>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl >>>> >>>> To run query, simply POST it to: >>>> >>>> http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM] >>>> >>>> By taking out parts of the query we were able to figure out the parts that >>>> were slow and improve on it. Is that something you are able to try? >>>> >>>> -Erik >>>> >>>> On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote: >>>>> >>>>> Hi Erik, >>>>> >>>>> So the reindexing made no noticeable changes. >>>>> >>>>> The "company" form, for 2k instances still loads at around 20 seconds. And >>>>> the "person" form, with 20k instances still loads at around 3 minutes. See >>>>> screens below. >>>>> >>>>> Any other trick I could try? >>>>> >>>>> Thanks! >>>>> >>>>> Vincent >>>>> >>>>> >>>>> <Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at >>>>> 5.32.10 PM.png> >>>>> >>>>> >>>>> >>>>> >>>>> On 2012-03-07, at 12:30 AM, Erik Bruchez wrote: >>>>> >>>>> Vincent, >>>>> >>>>> The index should only make things better. >>>>> >>>>> -Erik >>>>> >>>>> On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote: >>>>> >>>>> Hi Erik, >>>>> >>>>> >>>>> Thanks for your reply. Will try the Lucene index and eXist client >>>>> re-indexing tomorrow. But, will it impact the default view of the summary >>>>> page (with no search criterion)? >>>>> >>>>> >>>>> Thanks! >>>>> >>>>> >>>>> Vincent >>>>> >>>>> >>>>> >>>>> >>>>> On 2012-03-06, at 12:10 AM, Erik Bruchez wrote: >>>>> >>>>> >>>>> Vincent, >>>>> >>>>> >>>>> This means that you should have the improved eXist search query [1], >>>>> >>>>> and so things should be faster! >>>>> >>>>> >>>>> Did you make sure there is a proper Lucene [2] index configured in >>>>> >>>>> eXist, and that you re-indexed your collections with the eXist client? >>>>> >>>>> >>>>> -Erik >>>>> >>>>> >>>>> [1] >>>>> https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec >>>>> >>>>> [2] >>>>> http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing >>>>> >>>>> >>>>> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote: >>>>> >>>>> Hi Guys, >>>>> >>>>> >>>>> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) >>>>> with >>>>> >>>>> 8GB RAM assigned to the JVM with Tomcat 7. >>>>> >>>>> >>>>> I'm pre-populating a profile form with 20 000 (20K) XML data instances in >>>>> >>>>> eXist through its REST interface. >>>>> >>>>> >>>>> My prepopulating script and I are the only users for now. >>>>> >>>>> >>>>> Loading an individual instance detail view in form runner is a breeze, but >>>>> >>>>> the summary page (either the default view or search results) takes around >>>>> 3 >>>>> >>>>> minutes to load. >>>>> >>>>> >>>>> So, what configuration changes should I implement to make this setup >>>>> faster? >>>>> >>>>> I looked at the wiki, but nothing seems to apply to the summary view. >>>>> >>>>> >>>>> Help would indeed be appreciated. >>>>> >>>>> >>>>> Please find attached an example of an instance data XML. >>>>> >>>>> >>>>> Regards, >>>>> >>>>> >>>>> Vincent >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>> >>>>> list. >>>>> >>>>> To unsubscribe: mailto:[hidden email] >>>>> >>>>> For general help: mailto:[hidden email]?subject=help >>>>> >>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>> list. >>>>> >>>>> To unsubscribe: mailto:[hidden email] >>>>> >>>>> For general help: mailto:[hidden email]?subject=help >>>>> >>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>> list. >>>>> >>>>> To unsubscribe: mailto:[hidden email] >>>>> >>>>> For general help: mailto:[hidden email]?subject=help >>>>> >>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>> >>>>> >>>>> >>>>> -- >>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>> list. >>>>> To unsubscribe: mailto:[hidden email] >>>>> For general help: mailto:[hidden email]?subject=help >>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>> list. >>>>> To unsubscribe: mailto:[hidden email] >>>>> For general help: mailto:[hidden email]?subject=help >>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>> >>>> >>>> >>>> >>>> >>>> -- >>>> You receive this message as a subscriber of the [hidden email] mailing >>>> list. >>>> To unsubscribe: mailto:[hidden email] >>>> For general help: mailto:[hidden email]?subject=help >>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>> >>> >>> -- >>> You receive this message as a subscriber of the [hidden email] mailing list. >>> To unsubscribe: mailto:[hidden email] >>> For general help: mailto:[hidden email]?subject=help >>> OW2 mailing lists service home page: http://www.ow2.org/wws >> >> >> -- >> You receive this message as a subscriber of the [hidden email] mailing list. >> To unsubscribe: mailto:[hidden email] >> For general help: mailto:[hidden email]?subject=help >> OW2 mailing lists service home page: http://www.ow2.org/wws > > > -- > You receive this message as a subscriber of the [hidden email] mailing list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > OW2 mailing lists service home page: http://www.ow2.org/wws > -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Hi Erik,
I'm on it. But my guess would be that as long as there is a Lucene index somewhere, there is optimization to be made. More on this on Tuesday, Vincent On 2012-03-16, at 7:10 PM, Erik Bruchez wrote: > Vincent, > > Mmh yes that makes sense ;) So here it is: > > First, thanks for trying the query. It's a good catch, and it might be > the main reason for the slowness. > > However the question now is: how to fix this, assuming we do want to > find out how many documents are in that collection? > > On the Lucene question: that's an eXist feature, and the answer is "I > don't know". It woud be better to ask this on the exist-open > mailng-list: > > http://sourceforge.net/mail/?group_id=17691 > > And yes if you can keep helping on this it would be great! > > -Erik > > On Thu, Mar 15, 2012 at 5:33 AM, Vincent Olivier <[hidden email]> wrote: >> By help, I meant answering my questions, of course! :D >> >> I will look into Lucene hooks within eXist today. >> >> Thanks, >> >> V >> >> On 2012-03-14, at 2:19 PM, Vincent Olivier <[hidden email]> wrote: >> >>> Yes, I will need help. But not on this (I'm quite good at profiling, much less so in XForms ;)). Please see my other message coming soon about repeated sections. If you can help me there, I can put more time on the performance problem. ;-) >>> >>> So, still with the same nightly build version and data and forms as last time. I run a simple XQuery that is part of your code. Actually, just the snippet where you count the number of documents (see attached). >>> >>> The same query takes 18ms for 2K data.xml instances and 30s for 20K instances. >>> >>> It seems to me that any call on "collection()" is awfully inefficient. Based on your code, you call it twice in the query! Once for the query, once for the count. >>> >>> Is there a way we could manipulate the Lucene index directly. I'm an old buddy of Lucene's and it never gave me that kind of bad performanceship. Ever. >>> >>> Please let me know if you would be interested in helping me rewrite this query. >>> >>> Vincent >>> >>> >>> >>> <person.png> >>> >>> <company.png> >>> >>> >>> >>> <count.xq> >>> >>> >>> >>> On 2012-03-13, at 7:11 PM, Erik Bruchez wrote: >>> >>>> Cool, excellent. Let us know of you need help. >>>> >>>> -Erik >>>> >>>> On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote: >>>>> Yes! I will run the Postman setup tomorrow and get back to you before the >>>>> end of the week. >>>>> >>>>> Thanks! >>>>> >>>>> Vincent >>>>> >>>>> >>>>> On 2012-03-13, at 2:19 AM, Erik Bruchez wrote: >>>>> >>>>> Vincent, >>>>> >>>>> For the improvement mentioned earlier in the thread, we used the Postman >>>>> REST Client for Chrome to run a simplified version of the search query: >>>>> >>>>> https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related >>>>> >>>>> Here is the query and the XPL file that runs it: >>>>> >>>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml >>>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl >>>>> >>>>> To run query, simply POST it to: >>>>> >>>>> http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM] >>>>> >>>>> By taking out parts of the query we were able to figure out the parts that >>>>> were slow and improve on it. Is that something you are able to try? >>>>> >>>>> -Erik >>>>> >>>>> On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote: >>>>>> >>>>>> Hi Erik, >>>>>> >>>>>> So the reindexing made no noticeable changes. >>>>>> >>>>>> The "company" form, for 2k instances still loads at around 20 seconds. And >>>>>> the "person" form, with 20k instances still loads at around 3 minutes. See >>>>>> screens below. >>>>>> >>>>>> Any other trick I could try? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> Vincent >>>>>> >>>>>> >>>>>> <Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at >>>>>> 5.32.10 PM.png> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 2012-03-07, at 12:30 AM, Erik Bruchez wrote: >>>>>> >>>>>> Vincent, >>>>>> >>>>>> The index should only make things better. >>>>>> >>>>>> -Erik >>>>>> >>>>>> On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote: >>>>>> >>>>>> Hi Erik, >>>>>> >>>>>> >>>>>> Thanks for your reply. Will try the Lucene index and eXist client >>>>>> re-indexing tomorrow. But, will it impact the default view of the summary >>>>>> page (with no search criterion)? >>>>>> >>>>>> >>>>>> Thanks! >>>>>> >>>>>> >>>>>> Vincent >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 2012-03-06, at 12:10 AM, Erik Bruchez wrote: >>>>>> >>>>>> >>>>>> Vincent, >>>>>> >>>>>> >>>>>> This means that you should have the improved eXist search query [1], >>>>>> >>>>>> and so things should be faster! >>>>>> >>>>>> >>>>>> Did you make sure there is a proper Lucene [2] index configured in >>>>>> >>>>>> eXist, and that you re-indexed your collections with the eXist client? >>>>>> >>>>>> >>>>>> -Erik >>>>>> >>>>>> >>>>>> [1] >>>>>> https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec >>>>>> >>>>>> [2] >>>>>> http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing >>>>>> >>>>>> >>>>>> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote: >>>>>> >>>>>> Hi Guys, >>>>>> >>>>>> >>>>>> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) >>>>>> with >>>>>> >>>>>> 8GB RAM assigned to the JVM with Tomcat 7. >>>>>> >>>>>> >>>>>> I'm pre-populating a profile form with 20 000 (20K) XML data instances in >>>>>> >>>>>> eXist through its REST interface. >>>>>> >>>>>> >>>>>> My prepopulating script and I are the only users for now. >>>>>> >>>>>> >>>>>> Loading an individual instance detail view in form runner is a breeze, but >>>>>> >>>>>> the summary page (either the default view or search results) takes around >>>>>> 3 >>>>>> >>>>>> minutes to load. >>>>>> >>>>>> >>>>>> So, what configuration changes should I implement to make this setup >>>>>> faster? >>>>>> >>>>>> I looked at the wiki, but nothing seems to apply to the summary view. >>>>>> >>>>>> >>>>>> Help would indeed be appreciated. >>>>>> >>>>>> >>>>>> Please find attached an example of an instance data XML. >>>>>> >>>>>> >>>>>> Regards, >>>>>> >>>>>> >>>>>> Vincent >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>> >>>>>> list. >>>>>> >>>>>> To unsubscribe: mailto:[hidden email] >>>>>> >>>>>> For general help: mailto:[hidden email]?subject=help >>>>>> >>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>> list. >>>>>> >>>>>> To unsubscribe: mailto:[hidden email] >>>>>> >>>>>> For general help: mailto:[hidden email]?subject=help >>>>>> >>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>> list. >>>>>> >>>>>> To unsubscribe: mailto:[hidden email] >>>>>> >>>>>> For general help: mailto:[hidden email]?subject=help >>>>>> >>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>> list. >>>>>> To unsubscribe: mailto:[hidden email] >>>>>> For general help: mailto:[hidden email]?subject=help >>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>> list. >>>>>> To unsubscribe: mailto:[hidden email] >>>>>> For general help: mailto:[hidden email]?subject=help >>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>> list. >>>>> To unsubscribe: mailto:[hidden email] >>>>> For general help: mailto:[hidden email]?subject=help >>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>> >>>> >>>> -- >>>> You receive this message as a subscriber of the [hidden email] mailing list. >>>> To unsubscribe: mailto:[hidden email] >>>> For general help: mailto:[hidden email]?subject=help >>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>> >>> >>> -- >>> You receive this message as a subscriber of the [hidden email] mailing list. >>> To unsubscribe: mailto:[hidden email] >>> For general help: mailto:[hidden email]?subject=help >>> OW2 mailing lists service home page: http://www.ow2.org/wws >> >> >> -- >> You receive this message as a subscriber of the [hidden email] mailing list. >> To unsubscribe: mailto:[hidden email] >> For general help: mailto:[hidden email]?subject=help >> OW2 mailing lists service home page: http://www.ow2.org/wws >> > > -- > You receive this message as a subscriber of the [hidden email] mailing list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > OW2 mailing lists service home page: http://www.ow2.org/wws -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Hi again,
I'm waiting for the eXist mailing list to enlighten me on how the Lucene index is exposed to XQuery. Because if I'm limited to what the eXist doc is showing, it will never be good enough for large collection. Because, eXist requires an in-memory collection to be passed to the ft:query() method and also reads the documents for all the Lucene hits and re-builds a in-memory collection for that. So, for a query that returns a fair proportion of the collection's documents, that's twice the collection size for each ft:query() call. I'm not going to wait until the eXist community gets back to me and try plan B, instead: have a custom submission just send each form instance to an external SOLR setup and rewrite the summary query using only the SOLR index. What I need for this: is it possible to pass the document ID (in exist, this is the folder containing the "data.xml" file) along with the form instance XML as a POST to the SOLR service (very XML friendly). And if it is something that is of interest to any of you, I might post a little video on how to set this up. Vincent On 2012-03-17, at 1:10 PM, Vincent Olivier wrote: > Hi Erik, > > I'm on it. But my guess would be that as long as there is a Lucene index somewhere, there is optimization to be made. > > More on this on Tuesday, > > Vincent > > > On 2012-03-16, at 7:10 PM, Erik Bruchez wrote: > >> Vincent, >> >> Mmh yes that makes sense ;) So here it is: >> >> First, thanks for trying the query. It's a good catch, and it might be >> the main reason for the slowness. >> >> However the question now is: how to fix this, assuming we do want to >> find out how many documents are in that collection? >> >> On the Lucene question: that's an eXist feature, and the answer is "I >> don't know". It woud be better to ask this on the exist-open >> mailng-list: >> >> http://sourceforge.net/mail/?group_id=17691 >> >> And yes if you can keep helping on this it would be great! >> >> -Erik >> >> On Thu, Mar 15, 2012 at 5:33 AM, Vincent Olivier <[hidden email]> wrote: >>> By help, I meant answering my questions, of course! :D >>> >>> I will look into Lucene hooks within eXist today. >>> >>> Thanks, >>> >>> V >>> >>> On 2012-03-14, at 2:19 PM, Vincent Olivier <[hidden email]> wrote: >>> >>>> Yes, I will need help. But not on this (I'm quite good at profiling, much less so in XForms ;)). Please see my other message coming soon about repeated sections. If you can help me there, I can put more time on the performance problem. ;-) >>>> >>>> So, still with the same nightly build version and data and forms as last time. I run a simple XQuery that is part of your code. Actually, just the snippet where you count the number of documents (see attached). >>>> >>>> The same query takes 18ms for 2K data.xml instances and 30s for 20K instances. >>>> >>>> It seems to me that any call on "collection()" is awfully inefficient. Based on your code, you call it twice in the query! Once for the query, once for the count. >>>> >>>> Is there a way we could manipulate the Lucene index directly. I'm an old buddy of Lucene's and it never gave me that kind of bad performanceship. Ever. >>>> >>>> Please let me know if you would be interested in helping me rewrite this query. >>>> >>>> Vincent >>>> >>>> >>>> >>>> <person.png> >>>> >>>> <company.png> >>>> >>>> >>>> >>>> <count.xq> >>>> >>>> >>>> >>>> On 2012-03-13, at 7:11 PM, Erik Bruchez wrote: >>>> >>>>> Cool, excellent. Let us know of you need help. >>>>> >>>>> -Erik >>>>> >>>>> On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote: >>>>>> Yes! I will run the Postman setup tomorrow and get back to you before the >>>>>> end of the week. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> Vincent >>>>>> >>>>>> >>>>>> On 2012-03-13, at 2:19 AM, Erik Bruchez wrote: >>>>>> >>>>>> Vincent, >>>>>> >>>>>> For the improvement mentioned earlier in the thread, we used the Postman >>>>>> REST Client for Chrome to run a simplified version of the search query: >>>>>> >>>>>> https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related >>>>>> >>>>>> Here is the query and the XPL file that runs it: >>>>>> >>>>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml >>>>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl >>>>>> >>>>>> To run query, simply POST it to: >>>>>> >>>>>> http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM] >>>>>> >>>>>> By taking out parts of the query we were able to figure out the parts that >>>>>> were slow and improve on it. Is that something you are able to try? >>>>>> >>>>>> -Erik >>>>>> >>>>>> On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote: >>>>>>> >>>>>>> Hi Erik, >>>>>>> >>>>>>> So the reindexing made no noticeable changes. >>>>>>> >>>>>>> The "company" form, for 2k instances still loads at around 20 seconds. And >>>>>>> the "person" form, with 20k instances still loads at around 3 minutes. See >>>>>>> screens below. >>>>>>> >>>>>>> Any other trick I could try? >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> Vincent >>>>>>> >>>>>>> >>>>>>> <Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at >>>>>>> 5.32.10 PM.png> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 2012-03-07, at 12:30 AM, Erik Bruchez wrote: >>>>>>> >>>>>>> Vincent, >>>>>>> >>>>>>> The index should only make things better. >>>>>>> >>>>>>> -Erik >>>>>>> >>>>>>> On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote: >>>>>>> >>>>>>> Hi Erik, >>>>>>> >>>>>>> >>>>>>> Thanks for your reply. Will try the Lucene index and eXist client >>>>>>> re-indexing tomorrow. But, will it impact the default view of the summary >>>>>>> page (with no search criterion)? >>>>>>> >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> >>>>>>> Vincent >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 2012-03-06, at 12:10 AM, Erik Bruchez wrote: >>>>>>> >>>>>>> >>>>>>> Vincent, >>>>>>> >>>>>>> >>>>>>> This means that you should have the improved eXist search query [1], >>>>>>> >>>>>>> and so things should be faster! >>>>>>> >>>>>>> >>>>>>> Did you make sure there is a proper Lucene [2] index configured in >>>>>>> >>>>>>> eXist, and that you re-indexed your collections with the eXist client? >>>>>>> >>>>>>> >>>>>>> -Erik >>>>>>> >>>>>>> >>>>>>> [1] >>>>>>> https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec >>>>>>> >>>>>>> [2] >>>>>>> http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing >>>>>>> >>>>>>> >>>>>>> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote: >>>>>>> >>>>>>> Hi Guys, >>>>>>> >>>>>>> >>>>>>> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) >>>>>>> with >>>>>>> >>>>>>> 8GB RAM assigned to the JVM with Tomcat 7. >>>>>>> >>>>>>> >>>>>>> I'm pre-populating a profile form with 20 000 (20K) XML data instances in >>>>>>> >>>>>>> eXist through its REST interface. >>>>>>> >>>>>>> >>>>>>> My prepopulating script and I are the only users for now. >>>>>>> >>>>>>> >>>>>>> Loading an individual instance detail view in form runner is a breeze, but >>>>>>> >>>>>>> the summary page (either the default view or search results) takes around >>>>>>> 3 >>>>>>> >>>>>>> minutes to load. >>>>>>> >>>>>>> >>>>>>> So, what configuration changes should I implement to make this setup >>>>>>> faster? >>>>>>> >>>>>>> I looked at the wiki, but nothing seems to apply to the summary view. >>>>>>> >>>>>>> >>>>>>> Help would indeed be appreciated. >>>>>>> >>>>>>> >>>>>>> Please find attached an example of an instance data XML. >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> >>>>>>> Vincent >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>>> >>>>>>> list. >>>>>>> >>>>>>> To unsubscribe: mailto:[hidden email] >>>>>>> >>>>>>> For general help: mailto:[hidden email]?subject=help >>>>>>> >>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>>> list. >>>>>>> >>>>>>> To unsubscribe: mailto:[hidden email] >>>>>>> >>>>>>> For general help: mailto:[hidden email]?subject=help >>>>>>> >>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>>> list. >>>>>>> >>>>>>> To unsubscribe: mailto:[hidden email] >>>>>>> >>>>>>> For general help: mailto:[hidden email]?subject=help >>>>>>> >>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>>> list. >>>>>>> To unsubscribe: mailto:[hidden email] >>>>>>> For general help: mailto:[hidden email]?subject=help >>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>>> list. >>>>>>> To unsubscribe: mailto:[hidden email] >>>>>>> For general help: mailto:[hidden email]?subject=help >>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>> list. >>>>>> To unsubscribe: mailto:[hidden email] >>>>>> For general help: mailto:[hidden email]?subject=help >>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>> >>>>> >>>>> -- >>>>> You receive this message as a subscriber of the [hidden email] mailing list. >>>>> To unsubscribe: mailto:[hidden email] >>>>> For general help: mailto:[hidden email]?subject=help >>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>> >>>> >>>> -- >>>> You receive this message as a subscriber of the [hidden email] mailing list. >>>> To unsubscribe: mailto:[hidden email] >>>> For general help: mailto:[hidden email]?subject=help >>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>> >>> >>> -- >>> You receive this message as a subscriber of the [hidden email] mailing list. >>> To unsubscribe: mailto:[hidden email] >>> For general help: mailto:[hidden email]?subject=help >>> OW2 mailing lists service home page: http://www.ow2.org/wws >>> >> >> -- >> You receive this message as a subscriber of the [hidden email] mailing list. >> To unsubscribe: mailto:[hidden email] >> For general help: mailto:[hidden email]?subject=help >> OW2 mailing lists service home page: http://www.ow2.org/wws > > > -- > You receive this message as a subscriber of the [hidden email] mailing list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > OW2 mailing lists service home page: http://www.ow2.org/wws -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Hi again,
Sorry, I'm being dyslexic, here. By submission, I mean the minimal impact code change within the form itself to process the SOLR submission within the same user "submit" event that will trigger the eXist persistence submission. Hope this makes actual sense. Vincent On 2012-03-22, at 1:01 PM, Vincent Olivier wrote: > Hi again, > > I'm waiting for the eXist mailing list to enlighten me on how the Lucene index is exposed to XQuery. Because if I'm limited to what the eXist doc is showing, it will never be good enough for large collection. > > Because, eXist requires an in-memory collection to be passed to the ft:query() method and also reads the documents for all the Lucene hits and re-builds a in-memory collection for that. So, for a query that returns a fair proportion of the collection's documents, that's twice the collection size for each ft:query() call. > > I'm not going to wait until the eXist community gets back to me and try plan B, instead: have a custom submission just send each form instance to an external SOLR setup and rewrite the summary query using only the SOLR index. > > What I need for this: is it possible to pass the document ID (in exist, this is the folder containing the "data.xml" file) along with the form instance XML as a POST to the SOLR service (very XML friendly). > > And if it is something that is of interest to any of you, I might post a little video on how to set this up. > > Vincent > > > On 2012-03-17, at 1:10 PM, Vincent Olivier wrote: > >> Hi Erik, >> >> I'm on it. But my guess would be that as long as there is a Lucene index somewhere, there is optimization to be made. >> >> More on this on Tuesday, >> >> Vincent >> >> >> On 2012-03-16, at 7:10 PM, Erik Bruchez wrote: >> >>> Vincent, >>> >>> Mmh yes that makes sense ;) So here it is: >>> >>> First, thanks for trying the query. It's a good catch, and it might be >>> the main reason for the slowness. >>> >>> However the question now is: how to fix this, assuming we do want to >>> find out how many documents are in that collection? >>> >>> On the Lucene question: that's an eXist feature, and the answer is "I >>> don't know". It woud be better to ask this on the exist-open >>> mailng-list: >>> >>> http://sourceforge.net/mail/?group_id=17691 >>> >>> And yes if you can keep helping on this it would be great! >>> >>> -Erik >>> >>> On Thu, Mar 15, 2012 at 5:33 AM, Vincent Olivier <[hidden email]> wrote: >>>> By help, I meant answering my questions, of course! :D >>>> >>>> I will look into Lucene hooks within eXist today. >>>> >>>> Thanks, >>>> >>>> V >>>> >>>> On 2012-03-14, at 2:19 PM, Vincent Olivier <[hidden email]> wrote: >>>> >>>>> Yes, I will need help. But not on this (I'm quite good at profiling, much less so in XForms ;)). Please see my other message coming soon about repeated sections. If you can help me there, I can put more time on the performance problem. ;-) >>>>> >>>>> So, still with the same nightly build version and data and forms as last time. I run a simple XQuery that is part of your code. Actually, just the snippet where you count the number of documents (see attached). >>>>> >>>>> The same query takes 18ms for 2K data.xml instances and 30s for 20K instances. >>>>> >>>>> It seems to me that any call on "collection()" is awfully inefficient. Based on your code, you call it twice in the query! Once for the query, once for the count. >>>>> >>>>> Is there a way we could manipulate the Lucene index directly. I'm an old buddy of Lucene's and it never gave me that kind of bad performanceship. Ever. >>>>> >>>>> Please let me know if you would be interested in helping me rewrite this query. >>>>> >>>>> Vincent >>>>> >>>>> >>>>> >>>>> <person.png> >>>>> >>>>> <company.png> >>>>> >>>>> >>>>> >>>>> <count.xq> >>>>> >>>>> >>>>> >>>>> On 2012-03-13, at 7:11 PM, Erik Bruchez wrote: >>>>> >>>>>> Cool, excellent. Let us know of you need help. >>>>>> >>>>>> -Erik >>>>>> >>>>>> On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote: >>>>>>> Yes! I will run the Postman setup tomorrow and get back to you before the >>>>>>> end of the week. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> Vincent >>>>>>> >>>>>>> >>>>>>> On 2012-03-13, at 2:19 AM, Erik Bruchez wrote: >>>>>>> >>>>>>> Vincent, >>>>>>> >>>>>>> For the improvement mentioned earlier in the thread, we used the Postman >>>>>>> REST Client for Chrome to run a simplified version of the search query: >>>>>>> >>>>>>> https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related >>>>>>> >>>>>>> Here is the query and the XPL file that runs it: >>>>>>> >>>>>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml >>>>>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl >>>>>>> >>>>>>> To run query, simply POST it to: >>>>>>> >>>>>>> http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM] >>>>>>> >>>>>>> By taking out parts of the query we were able to figure out the parts that >>>>>>> were slow and improve on it. Is that something you are able to try? >>>>>>> >>>>>>> -Erik >>>>>>> >>>>>>> On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote: >>>>>>>> >>>>>>>> Hi Erik, >>>>>>>> >>>>>>>> So the reindexing made no noticeable changes. >>>>>>>> >>>>>>>> The "company" form, for 2k instances still loads at around 20 seconds. And >>>>>>>> the "person" form, with 20k instances still loads at around 3 minutes. See >>>>>>>> screens below. >>>>>>>> >>>>>>>> Any other trick I could try? >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> Vincent >>>>>>>> >>>>>>>> >>>>>>>> <Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at >>>>>>>> 5.32.10 PM.png> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 2012-03-07, at 12:30 AM, Erik Bruchez wrote: >>>>>>>> >>>>>>>> Vincent, >>>>>>>> >>>>>>>> The index should only make things better. >>>>>>>> >>>>>>>> -Erik >>>>>>>> >>>>>>>> On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote: >>>>>>>> >>>>>>>> Hi Erik, >>>>>>>> >>>>>>>> >>>>>>>> Thanks for your reply. Will try the Lucene index and eXist client >>>>>>>> re-indexing tomorrow. But, will it impact the default view of the summary >>>>>>>> page (with no search criterion)? >>>>>>>> >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> >>>>>>>> Vincent >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 2012-03-06, at 12:10 AM, Erik Bruchez wrote: >>>>>>>> >>>>>>>> >>>>>>>> Vincent, >>>>>>>> >>>>>>>> >>>>>>>> This means that you should have the improved eXist search query [1], >>>>>>>> >>>>>>>> and so things should be faster! >>>>>>>> >>>>>>>> >>>>>>>> Did you make sure there is a proper Lucene [2] index configured in >>>>>>>> >>>>>>>> eXist, and that you re-indexed your collections with the eXist client? >>>>>>>> >>>>>>>> >>>>>>>> -Erik >>>>>>>> >>>>>>>> >>>>>>>> [1] >>>>>>>> https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec >>>>>>>> >>>>>>>> [2] >>>>>>>> http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote: >>>>>>>> >>>>>>>> Hi Guys, >>>>>>>> >>>>>>>> >>>>>>>> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) >>>>>>>> with >>>>>>>> >>>>>>>> 8GB RAM assigned to the JVM with Tomcat 7. >>>>>>>> >>>>>>>> >>>>>>>> I'm pre-populating a profile form with 20 000 (20K) XML data instances in >>>>>>>> >>>>>>>> eXist through its REST interface. >>>>>>>> >>>>>>>> >>>>>>>> My prepopulating script and I are the only users for now. >>>>>>>> >>>>>>>> >>>>>>>> Loading an individual instance detail view in form runner is a breeze, but >>>>>>>> >>>>>>>> the summary page (either the default view or search results) takes around >>>>>>>> 3 >>>>>>>> >>>>>>>> minutes to load. >>>>>>>> >>>>>>>> >>>>>>>> So, what configuration changes should I implement to make this setup >>>>>>>> faster? >>>>>>>> >>>>>>>> I looked at the wiki, but nothing seems to apply to the summary view. >>>>>>>> >>>>>>>> >>>>>>>> Help would indeed be appreciated. >>>>>>>> >>>>>>>> >>>>>>>> Please find attached an example of an instance data XML. >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> >>>>>>>> >>>>>>>> Vincent >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>>>> >>>>>>>> list. >>>>>>>> >>>>>>>> To unsubscribe: mailto:[hidden email] >>>>>>>> >>>>>>>> For general help: mailto:[hidden email]?subject=help >>>>>>>> >>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>>>> list. >>>>>>>> >>>>>>>> To unsubscribe: mailto:[hidden email] >>>>>>>> >>>>>>>> For general help: mailto:[hidden email]?subject=help >>>>>>>> >>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>>>> list. >>>>>>>> >>>>>>>> To unsubscribe: mailto:[hidden email] >>>>>>>> >>>>>>>> For general help: mailto:[hidden email]?subject=help >>>>>>>> >>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>>>> list. >>>>>>>> To unsubscribe: mailto:[hidden email] >>>>>>>> For general help: mailto:[hidden email]?subject=help >>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>>>> list. >>>>>>>> To unsubscribe: mailto:[hidden email] >>>>>>>> For general help: mailto:[hidden email]?subject=help >>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>>> list. >>>>>>> To unsubscribe: mailto:[hidden email] >>>>>>> For general help: mailto:[hidden email]?subject=help >>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>>> >>>>>> >>>>>> -- >>>>>> You receive this message as a subscriber of the [hidden email] mailing list. >>>>>> To unsubscribe: mailto:[hidden email] >>>>>> For general help: mailto:[hidden email]?subject=help >>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>> >>>>> >>>>> -- >>>>> You receive this message as a subscriber of the [hidden email] mailing list. >>>>> To unsubscribe: mailto:[hidden email] >>>>> For general help: mailto:[hidden email]?subject=help >>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>> >>>> >>>> -- >>>> You receive this message as a subscriber of the [hidden email] mailing list. >>>> To unsubscribe: mailto:[hidden email] >>>> For general help: mailto:[hidden email]?subject=help >>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>> >>> >>> -- >>> You receive this message as a subscriber of the [hidden email] mailing list. >>> To unsubscribe: mailto:[hidden email] >>> For general help: mailto:[hidden email]?subject=help >>> OW2 mailing lists service home page: http://www.ow2.org/wws >> >> >> -- >> You receive this message as a subscriber of the [hidden email] mailing list. >> To unsubscribe: mailto:[hidden email] >> For general help: mailto:[hidden email]?subject=help >> OW2 mailing lists service home page: http://www.ow2.org/wws > > > -- > You receive this message as a subscriber of the [hidden email] mailing list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > OW2 mailing lists service home page: http://www.ow2.org/wws -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Administrator
|
Vincent,
When Orbeon Forms writes an XML document, the document id is part of the path. So it's already available. Or do I not understand properly? -Erik On Thu, Mar 22, 2012 at 10:09 AM, Vincent Olivier <[hidden email]> wrote: > Hi again, > > Sorry, I'm being dyslexic, here. > > By submission, I mean the minimal impact code change within the form itself to process the SOLR submission within the same user "submit" event that will trigger the eXist persistence submission. > > Hope this makes actual sense. > > Vincent > > > On 2012-03-22, at 1:01 PM, Vincent Olivier wrote: > >> Hi again, >> >> I'm waiting for the eXist mailing list to enlighten me on how the Lucene index is exposed to XQuery. Because if I'm limited to what the eXist doc is showing, it will never be good enough for large collection. >> >> Because, eXist requires an in-memory collection to be passed to the ft:query() method and also reads the documents for all the Lucene hits and re-builds a in-memory collection for that. So, for a query that returns a fair proportion of the collection's documents, that's twice the collection size for each ft:query() call. >> >> I'm not going to wait until the eXist community gets back to me and try plan B, instead: have a custom submission just send each form instance to an external SOLR setup and rewrite the summary query using only the SOLR index. >> >> What I need for this: is it possible to pass the document ID (in exist, this is the folder containing the "data.xml" file) along with the form instance XML as a POST to the SOLR service (very XML friendly). >> >> And if it is something that is of interest to any of you, I might post a little video on how to set this up. >> >> Vincent >> >> >> On 2012-03-17, at 1:10 PM, Vincent Olivier wrote: >> >>> Hi Erik, >>> >>> I'm on it. But my guess would be that as long as there is a Lucene index somewhere, there is optimization to be made. >>> >>> More on this on Tuesday, >>> >>> Vincent >>> >>> >>> On 2012-03-16, at 7:10 PM, Erik Bruchez wrote: >>> >>>> Vincent, >>>> >>>> Mmh yes that makes sense ;) So here it is: >>>> >>>> First, thanks for trying the query. It's a good catch, and it might be >>>> the main reason for the slowness. >>>> >>>> However the question now is: how to fix this, assuming we do want to >>>> find out how many documents are in that collection? >>>> >>>> On the Lucene question: that's an eXist feature, and the answer is "I >>>> don't know". It woud be better to ask this on the exist-open >>>> mailng-list: >>>> >>>> http://sourceforge.net/mail/?group_id=17691 >>>> >>>> And yes if you can keep helping on this it would be great! >>>> >>>> -Erik >>>> >>>> On Thu, Mar 15, 2012 at 5:33 AM, Vincent Olivier <[hidden email]> wrote: >>>>> By help, I meant answering my questions, of course! :D >>>>> >>>>> I will look into Lucene hooks within eXist today. >>>>> >>>>> Thanks, >>>>> >>>>> V >>>>> >>>>> On 2012-03-14, at 2:19 PM, Vincent Olivier <[hidden email]> wrote: >>>>> >>>>>> Yes, I will need help. But not on this (I'm quite good at profiling, much less so in XForms ;)). Please see my other message coming soon about repeated sections. If you can help me there, I can put more time on the performance problem. ;-) >>>>>> >>>>>> So, still with the same nightly build version and data and forms as last time. I run a simple XQuery that is part of your code. Actually, just the snippet where you count the number of documents (see attached). >>>>>> >>>>>> The same query takes 18ms for 2K data.xml instances and 30s for 20K instances. >>>>>> >>>>>> It seems to me that any call on "collection()" is awfully inefficient. Based on your code, you call it twice in the query! Once for the query, once for the count. >>>>>> >>>>>> Is there a way we could manipulate the Lucene index directly. I'm an old buddy of Lucene's and it never gave me that kind of bad performanceship. Ever. >>>>>> >>>>>> Please let me know if you would be interested in helping me rewrite this query. >>>>>> >>>>>> Vincent >>>>>> >>>>>> >>>>>> >>>>>> <person.png> >>>>>> >>>>>> <company.png> >>>>>> >>>>>> >>>>>> >>>>>> <count.xq> >>>>>> >>>>>> >>>>>> >>>>>> On 2012-03-13, at 7:11 PM, Erik Bruchez wrote: >>>>>> >>>>>>> Cool, excellent. Let us know of you need help. >>>>>>> >>>>>>> -Erik >>>>>>> >>>>>>> On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote: >>>>>>>> Yes! I will run the Postman setup tomorrow and get back to you before the >>>>>>>> end of the week. >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> Vincent >>>>>>>> >>>>>>>> >>>>>>>> On 2012-03-13, at 2:19 AM, Erik Bruchez wrote: >>>>>>>> >>>>>>>> Vincent, >>>>>>>> >>>>>>>> For the improvement mentioned earlier in the thread, we used the Postman >>>>>>>> REST Client for Chrome to run a simplified version of the search query: >>>>>>>> >>>>>>>> https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related >>>>>>>> >>>>>>>> Here is the query and the XPL file that runs it: >>>>>>>> >>>>>>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml >>>>>>>> https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl >>>>>>>> >>>>>>>> To run query, simply POST it to: >>>>>>>> >>>>>>>> http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM] >>>>>>>> >>>>>>>> By taking out parts of the query we were able to figure out the parts that >>>>>>>> were slow and improve on it. Is that something you are able to try? >>>>>>>> >>>>>>>> -Erik >>>>>>>> >>>>>>>> On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote: >>>>>>>>> >>>>>>>>> Hi Erik, >>>>>>>>> >>>>>>>>> So the reindexing made no noticeable changes. >>>>>>>>> >>>>>>>>> The "company" form, for 2k instances still loads at around 20 seconds. And >>>>>>>>> the "person" form, with 20k instances still loads at around 3 minutes. See >>>>>>>>> screens below. >>>>>>>>> >>>>>>>>> Any other trick I could try? >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> >>>>>>>>> Vincent >>>>>>>>> >>>>>>>>> >>>>>>>>> <Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at >>>>>>>>> 5.32.10 PM.png> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2012-03-07, at 12:30 AM, Erik Bruchez wrote: >>>>>>>>> >>>>>>>>> Vincent, >>>>>>>>> >>>>>>>>> The index should only make things better. >>>>>>>>> >>>>>>>>> -Erik >>>>>>>>> >>>>>>>>> On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote: >>>>>>>>> >>>>>>>>> Hi Erik, >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks for your reply. Will try the Lucene index and eXist client >>>>>>>>> re-indexing tomorrow. But, will it impact the default view of the summary >>>>>>>>> page (with no search criterion)? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> >>>>>>>>> >>>>>>>>> Vincent >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2012-03-06, at 12:10 AM, Erik Bruchez wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Vincent, >>>>>>>>> >>>>>>>>> >>>>>>>>> This means that you should have the improved eXist search query [1], >>>>>>>>> >>>>>>>>> and so things should be faster! >>>>>>>>> >>>>>>>>> >>>>>>>>> Did you make sure there is a proper Lucene [2] index configured in >>>>>>>>> >>>>>>>>> eXist, and that you re-indexed your collections with the eXist client? >>>>>>>>> >>>>>>>>> >>>>>>>>> -Erik >>>>>>>>> >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec >>>>>>>>> >>>>>>>>> [2] >>>>>>>>> http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote: >>>>>>>>> >>>>>>>>> Hi Guys, >>>>>>>>> >>>>>>>>> >>>>>>>>> I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) >>>>>>>>> with >>>>>>>>> >>>>>>>>> 8GB RAM assigned to the JVM with Tomcat 7. >>>>>>>>> >>>>>>>>> >>>>>>>>> I'm pre-populating a profile form with 20 000 (20K) XML data instances in >>>>>>>>> >>>>>>>>> eXist through its REST interface. >>>>>>>>> >>>>>>>>> >>>>>>>>> My prepopulating script and I are the only users for now. >>>>>>>>> >>>>>>>>> >>>>>>>>> Loading an individual instance detail view in form runner is a breeze, but >>>>>>>>> >>>>>>>>> the summary page (either the default view or search results) takes around >>>>>>>>> 3 >>>>>>>>> >>>>>>>>> minutes to load. >>>>>>>>> >>>>>>>>> >>>>>>>>> So, what configuration changes should I implement to make this setup >>>>>>>>> faster? >>>>>>>>> >>>>>>>>> I looked at the wiki, but nothing seems to apply to the summary view. >>>>>>>>> >>>>>>>>> >>>>>>>>> Help would indeed be appreciated. >>>>>>>>> >>>>>>>>> >>>>>>>>> Please find attached an example of an instance data XML. >>>>>>>>> >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> >>>>>>>>> Vincent >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>>>>> >>>>>>>>> list. >>>>>>>>> >>>>>>>>> To unsubscribe: mailto:[hidden email] >>>>>>>>> >>>>>>>>> For general help: mailto:[hidden email]?subject=help >>>>>>>>> >>>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>>>>> list. >>>>>>>>> >>>>>>>>> To unsubscribe: mailto:[hidden email] >>>>>>>>> >>>>>>>>> For general help: mailto:[hidden email]?subject=help >>>>>>>>> >>>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>>>>> list. >>>>>>>>> >>>>>>>>> To unsubscribe: mailto:[hidden email] >>>>>>>>> >>>>>>>>> For general help: mailto:[hidden email]?subject=help >>>>>>>>> >>>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>>>>> list. >>>>>>>>> To unsubscribe: mailto:[hidden email] >>>>>>>>> For general help: mailto:[hidden email]?subject=help >>>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>>>>> list. >>>>>>>>> To unsubscribe: mailto:[hidden email] >>>>>>>>> For general help: mailto:[hidden email]?subject=help >>>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> You receive this message as a subscriber of the [hidden email] mailing >>>>>>>> list. >>>>>>>> To unsubscribe: mailto:[hidden email] >>>>>>>> For general help: mailto:[hidden email]?subject=help >>>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> You receive this message as a subscriber of the [hidden email] mailing list. >>>>>>> To unsubscribe: mailto:[hidden email] >>>>>>> For general help: mailto:[hidden email]?subject=help >>>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>>> >>>>>> >>>>>> -- >>>>>> You receive this message as a subscriber of the [hidden email] mailing list. >>>>>> To unsubscribe: mailto:[hidden email] >>>>>> For general help: mailto:[hidden email]?subject=help >>>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>> >>>>> >>>>> -- >>>>> You receive this message as a subscriber of the [hidden email] mailing list. >>>>> To unsubscribe: mailto:[hidden email] >>>>> For general help: mailto:[hidden email]?subject=help >>>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>>>> >>>> >>>> -- >>>> You receive this message as a subscriber of the [hidden email] mailing list. >>>> To unsubscribe: mailto:[hidden email] >>>> For general help: mailto:[hidden email]?subject=help >>>> OW2 mailing lists service home page: http://www.ow2.org/wws >>> >>> >>> -- >>> You receive this message as a subscriber of the [hidden email] mailing list. >>> To unsubscribe: mailto:[hidden email] >>> For general help: mailto:[hidden email]?subject=help >>> OW2 mailing lists service home page: http://www.ow2.org/wws >> >> >> -- >> You receive this message as a subscriber of the [hidden email] mailing list. >> To unsubscribe: mailto:[hidden email] >> For general help: mailto:[hidden email]?subject=help >> OW2 mailing lists service home page: http://www.ow2.org/wws > > > > -- > You receive this message as a subscriber of the [hidden email] mailing list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > OW2 mailing lists service home page: http://www.ow2.org/wws > -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Hi Erik,
Yes, actually I was focusing on something that is a easy detail. I have been looking at eXist's implementation of ft:query and it doesn't look like I will find an easy fix to use that with so many documents. More over, one gets the same problem when working with autocomplete controls in forms for collections of that size as well. I think the easiest fix for now would be to maintain an external index (I like SOLR) for both the summary pages and the autocomplete controls. So what I want to try now, is to add a SOLR submission after the user clicks on the submit button and before the form persistence submission to transparently send the XML to SOLR for external indexing and I would use the Orbeon collection id (containing the "data.xml") as the document ID in SOLR. I'm wondering what the code would look like, minimally, in order to achieve that. Thanks! Vincent On 2012-03-26, at 11:59 AM, Erik Bruchez wrote:
-- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Hi guys,
Actually, I think there are 2 options at this point:
Because I think that eXist's internal indexing doesn't scale, period. I'm going to try MySQL first (but I would like to know there is the query code for the summary page in that case). And also, I would like your thoughts on extending the eXist persistence to have the XML data sent to SOLR everytime an instance is persisted/updated. Regards, Vincent On 2012-03-26, at 12:14 PM, Vincent Olivier wrote:
-- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Administrator
|
Vincent,
Here is the search code for MySQL: https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/mysql/search.xpl Separate indexing via SOLR would be good, but we won't have time to work on this anytime soon. It would be even better if eXist could do things properly. But can we rule out that we can't improve our XQuery query to go faster? -Erik On Mon, Apr 2, 2012 at 1:04 PM, Vincent Olivier <[hidden email]> wrote: > Hi guys, > > > Actually, I think there are 2 options at this point: > > MySQL persistence > eXist persistence with SOLR indexing (at persistence level) > > Because I think that eXist's internal indexing doesn't scale, period. > > I'm going to try MySQL first (but I would like to know there is the query > code for the summary page in that case). > > And also, I would like your thoughts on extending the eXist persistence to > have the XML data sent to SOLR everytime an instance is persisted/updated. > > Regards, > > Vincent > > > On 2012-03-26, at 12:14 PM, Vincent Olivier wrote: > > Hi Erik, > > Yes, actually I was focusing on something that is a easy detail. > > I have been looking at eXist's implementation of ft:query and it doesn't > look like I will find an easy fix to use that with so many documents. > > More over, one gets the same problem when working with autocomplete controls > in forms for collections of that size as well. > > I think the easiest fix for now would be to maintain an external index (I > like SOLR) for both the summary pages and the autocomplete controls. > > So what I want to try now, is to add a SOLR submission after the user clicks > on the submit button and before the form persistence submission to > transparently send the XML to SOLR for external indexing and I would use the > Orbeon collection id (containing the "data.xml") as the document ID in SOLR. > > I'm wondering what the code would look like, minimally, in order to achieve > that. > > Thanks! > > Vincent > > > On 2012-03-26, at 11:59 AM, Erik Bruchez wrote: > > Vincent, > > When Orbeon Forms writes an XML document, the document id is part of > the path. So it's already available. Or do I not understand properly? > > -Erik > > On Thu, Mar 22, 2012 at 10:09 AM, Vincent Olivier <[hidden email]> wrote: > > Hi again, > > > Sorry, I'm being dyslexic, here. > > > By submission, I mean the minimal impact code change within the form itself > to process the SOLR submission within the same user "submit" event that will > trigger the eXist persistence submission. > > > Hope this makes actual sense. > > > Vincent > > > > On 2012-03-22, at 1:01 PM, Vincent Olivier wrote: > > > Hi again, > > > I'm waiting for the eXist mailing list to enlighten me on how the Lucene > index is exposed to XQuery. Because if I'm limited to what the eXist doc is > showing, it will never be good enough for large collection. > > > Because, eXist requires an in-memory collection to be passed to the > ft:query() method and also reads the documents for all the Lucene hits and > re-builds a in-memory collection for that. So, for a query that returns a > fair proportion of the collection's documents, that's twice the collection > size for each ft:query() call. > > > I'm not going to wait until the eXist community gets back to me and try plan > B, instead: have a custom submission just send each form instance to an > external SOLR setup and rewrite the summary query using only the SOLR index. > > > What I need for this: is it possible to pass the document ID (in exist, this > is the folder containing the "data.xml" file) along with the form instance > XML as a POST to the SOLR service (very XML friendly). > > > And if it is something that is of interest to any of you, I might post a > little video on how to set this up. > > > Vincent > > > > On 2012-03-17, at 1:10 PM, Vincent Olivier wrote: > > > Hi Erik, > > > I'm on it. But my guess would be that as long as there is a Lucene index > somewhere, there is optimization to be made. > > > More on this on Tuesday, > > > Vincent > > > > On 2012-03-16, at 7:10 PM, Erik Bruchez wrote: > > > Vincent, > > > Mmh yes that makes sense ;) So here it is: > > > First, thanks for trying the query. It's a good catch, and it might be > > the main reason for the slowness. > > > However the question now is: how to fix this, assuming we do want to > > find out how many documents are in that collection? > > > On the Lucene question: that's an eXist feature, and the answer is "I > > don't know". It woud be better to ask this on the exist-open > > mailng-list: > > > http://sourceforge.net/mail/?group_id=17691 > > > And yes if you can keep helping on this it would be great! > > > -Erik > > > On Thu, Mar 15, 2012 at 5:33 AM, Vincent Olivier <[hidden email]> wrote: > > By help, I meant answering my questions, of course! :D > > > I will look into Lucene hooks within eXist today. > > > Thanks, > > > V > > > On 2012-03-14, at 2:19 PM, Vincent Olivier <[hidden email]> wrote: > > > Yes, I will need help. But not on this (I'm quite good at profiling, much > less so in XForms ;)). Please see my other message coming soon about > repeated sections. If you can help me there, I can put more time on the > performance problem. ;-) > > > So, still with the same nightly build version and data and forms as last > time. I run a simple XQuery that is part of your code. Actually, just the > snippet where you count the number of documents (see attached). > > > The same query takes 18ms for 2K data.xml instances and 30s for 20K > instances. > > > It seems to me that any call on "collection()" is awfully inefficient. Based > on your code, you call it twice in the query! Once for the query, once for > the count. > > > Is there a way we could manipulate the Lucene index directly. I'm an old > buddy of Lucene's and it never gave me that kind of bad performanceship. > Ever. > > > Please let me know if you would be interested in helping me rewrite this > query. > > > Vincent > > > > > <person.png> > > > <company.png> > > > > > <count.xq> > > > > > On 2012-03-13, at 7:11 PM, Erik Bruchez wrote: > > > Cool, excellent. Let us know of you need help. > > > -Erik > > > On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote: > > Yes! I will run the Postman setup tomorrow and get back to you before the > > end of the week. > > > Thanks! > > > Vincent > > > > On 2012-03-13, at 2:19 AM, Erik Bruchez wrote: > > > Vincent, > > > For the improvement mentioned earlier in the thread, we used the Postman > > REST Client for Chrome to run a simplified version of the search query: > > > https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related > > > Here is the query and the XPL file that runs it: > > > https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml > > https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl > > > To run query, simply POST it to: > > > http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM] > > > By taking out parts of the query we were able to figure out the parts that > > were slow and improve on it. Is that something you are able to try? > > > -Erik > > > On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote: > > > Hi Erik, > > > So the reindexing made no noticeable changes. > > > The "company" form, for 2k instances still loads at around 20 seconds. And > > the "person" form, with 20k instances still loads at around 3 minutes. See > > screens below. > > > Any other trick I could try? > > > Thanks! > > > Vincent > > > > <Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at > > 5.32.10 PM.png> > > > > > > On 2012-03-07, at 12:30 AM, Erik Bruchez wrote: > > > Vincent, > > > The index should only make things better. > > > -Erik > > > On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote: > > > Hi Erik, > > > > Thanks for your reply. Will try the Lucene index and eXist client > > re-indexing tomorrow. But, will it impact the default view of the summary > > page (with no search criterion)? > > > > Thanks! > > > > Vincent > > > > > > On 2012-03-06, at 12:10 AM, Erik Bruchez wrote: > > > > Vincent, > > > > This means that you should have the improved eXist search query [1], > > > and so things should be faster! > > > > Did you make sure there is a proper Lucene [2] index configured in > > > eXist, and that you re-indexed your collections with the eXist client? > > > > -Erik > > > > [1] > > https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec > > > [2] > > http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing > > > > On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote: > > > Hi Guys, > > > > I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) > > with > > > 8GB RAM assigned to the JVM with Tomcat 7. > > > > I'm pre-populating a profile form with 20 000 (20K) XML data instances in > > > eXist through its REST interface. > > > > My prepopulating script and I are the only users for now. > > > > Loading an individual instance detail view in form runner is a breeze, but > > > the summary page (either the default view or search results) takes around > > 3 > > > minutes to load. > > > > So, what configuration changes should I implement to make this setup > > faster? > > > I looked at the wiki, but nothing seems to apply to the summary view. > > > > Help would indeed be appreciated. > > > > Please find attached an example of an instance data XML. > > > > Regards, > > > > Vincent > > > > > > > > -- > > > You receive this message as a subscriber of the [hidden email] mailing > > > list. > > > To unsubscribe: mailto:[hidden email] > > > For general help: mailto:[hidden email]?subject=help > > > OW2 mailing lists service home page: http://www.ow2.org/wws > > > > > -- > > > You receive this message as a subscriber of the [hidden email] mailing > > list. > > > To unsubscribe: mailto:[hidden email] > > > For general help: mailto:[hidden email]?subject=help > > > OW2 mailing lists service home page: http://www.ow2.org/wws > > > > > > -- > > > You receive this message as a subscriber of the [hidden email] mailing > > list. > > > To unsubscribe: mailto:[hidden email] > > > For general help: mailto:[hidden email]?subject=help > > > OW2 mailing lists service home page: http://www.ow2.org/wws > > > > > -- > > You receive this message as a subscriber of the [hidden email] mailing > > list. > > To unsubscribe: mailto:[hidden email] > > For general help: mailto:[hidden email]?subject=help > > OW2 mailing lists service home page: http://www.ow2.org/wws > > > > > > -- > > You receive this message as a subscriber of the [hidden email] mailing > > list. > > To unsubscribe: mailto:[hidden email] > > For general help: mailto:[hidden email]?subject=help > > OW2 mailing lists service home page: http://www.ow2.org/wws > > > > > > > -- > > You receive this message as a subscriber of the [hidden email] mailing > > list. > > To unsubscribe: mailto:[hidden email] > > For general help: mailto:[hidden email]?subject=help > > OW2 mailing lists service home page: http://www.ow2.org/wws > > > ... > > [Message clipped] > > -- > You receive this message as a subscriber of the [hidden email] mailing > list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > OW2 mailing lists service home page: http://www.ow2.org/wws > -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Hi Erik,
Sorry for the late reply. My problem is that the eXist XQuery interface to the Lucene index seems intrinsically flawed, as far as performances are concerned. I have submitted this issue to the eXist mailing list, got a reply from Wolfgang Meier, have tried his solution to the best of my very limited XQuery knowledge, but still, even when taking his reply into account, and given the simplicity of the test query I have put in place, it seems that eXist is really to blame. It's obviously not so much the ft:query call that is the problem, but rather any form of usage of the collection() method. If you see what Wolfgang means, and provide some help on how to implement it (I have tried to reach him subsequently without success), I am willing to try it. If you want, I can send you a compressed repository of the 20k docs I'm testing the performances on. I'm very much looking forward to get Orbeon in shape for that kind of use case… But I'm investigating MySQL's performances, now. Vincent On 2012-04-03, at 12:30 AM, Erik Bruchez wrote:
-- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Administrator
|
Vincent,
Thanks for the update. Feel free to send me the repository with the 20k docs. I can't guarantee I'll look at it immediately. Please keep us posted on the MySQL side of things! -Erik On Wed, Apr 11, 2012 at 3:13 PM, Vincent Olivier <[hidden email]> wrote: > Hi Erik, > > Sorry for the late reply. > > My problem is that the eXist XQuery interface to the Lucene index seems > intrinsically flawed, as far as performances are concerned. I have submitted > this issue to the eXist mailing list, got a reply from Wolfgang Meier, have > tried his solution to the best of my very limited XQuery knowledge, but > still, even when taking his reply into account, and given the simplicity of > the test query I have put in place, it seems that eXist is really to blame. > It's obviously not so much the ft:query call that is the problem, but rather > any form of usage of the collection() method. If you see what Wolfgang > means, and provide some help on how to implement it (I have tried to reach > him subsequently without success), I am willing to try it. > > If you want, I can send you a compressed repository of the 20k docs I'm > testing the performances on. > > I'm very much looking forward to get Orbeon in shape for that kind of use > case… But I'm investigating MySQL's performances, now. > > Vincent > > On 2012-04-03, at 12:30 AM, Erik Bruchez wrote: > > Vincent, > > Here is the search code for MySQL: > > https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/mysql/search.xpl > > Separate indexing via SOLR would be good, but we won't have time to > work on this anytime soon. It would be even better if eXist could do > things properly. But can we rule out that we can't improve our XQuery > query to go faster? > > -Erik > > On Mon, Apr 2, 2012 at 1:04 PM, Vincent Olivier <[hidden email]> wrote: > > Hi guys, > > > > Actually, I think there are 2 options at this point: > > > MySQL persistence > > eXist persistence with SOLR indexing (at persistence level) > > > Because I think that eXist's internal indexing doesn't scale, period. > > > I'm going to try MySQL first (but I would like to know there is the query > > code for the summary page in that case). > > > And also, I would like your thoughts on extending the eXist persistence to > > have the XML data sent to SOLR everytime an instance is persisted/updated. > > > Regards, > > > Vincent > > > > On 2012-03-26, at 12:14 PM, Vincent Olivier wrote: > > > Hi Erik, > > > Yes, actually I was focusing on something that is a easy detail. > > > I have been looking at eXist's implementation of ft:query and it doesn't > > look like I will find an easy fix to use that with so many documents. > > > More over, one gets the same problem when working with autocomplete controls > > in forms for collections of that size as well. > > > I think the easiest fix for now would be to maintain an external index (I > > like SOLR) for both the summary pages and the autocomplete controls. > > > So what I want to try now, is to add a SOLR submission after the user clicks > > on the submit button and before the form persistence submission to > > transparently send the XML to SOLR for external indexing and I would use the > > Orbeon collection id (containing the "data.xml") as the document ID in SOLR. > > > I'm wondering what the code would look like, minimally, in order to achieve > > that. > > > Thanks! > > > Vincent > > > > On 2012-03-26, at 11:59 AM, Erik Bruchez wrote: > > > Vincent, > > > When Orbeon Forms writes an XML document, the document id is part of > > the path. So it's already available. Or do I not understand properly? > > > -Erik > > > On Thu, Mar 22, 2012 at 10:09 AM, Vincent Olivier <[hidden email]> wrote: > > > Hi again, > > > > Sorry, I'm being dyslexic, here. > > > > By submission, I mean the minimal impact code change within the form itself > > to process the SOLR submission within the same user "submit" event that will > > trigger the eXist persistence submission. > > > > Hope this makes actual sense. > > > > Vincent > > > > > On 2012-03-22, at 1:01 PM, Vincent Olivier wrote: > > > > Hi again, > > > > I'm waiting for the eXist mailing list to enlighten me on how the Lucene > > index is exposed to XQuery. Because if I'm limited to what the eXist doc is > > showing, it will never be good enough for large collection. > > > > Because, eXist requires an in-memory collection to be passed to the > > ft:query() method and also reads the documents for all the Lucene hits and > > re-builds a in-memory collection for that. So, for a query that returns a > > fair proportion of the collection's documents, that's twice the collection > > size for each ft:query() call. > > > > I'm not going to wait until the eXist community gets back to me and try plan > > B, instead: have a custom submission just send each form instance to an > > external SOLR setup and rewrite the summary query using only the SOLR index. > > > > What I need for this: is it possible to pass the document ID (in exist, this > > is the folder containing the "data.xml" file) along with the form instance > > XML as a POST to the SOLR service (very XML friendly). > > > > And if it is something that is of interest to any of you, I might post a > > little video on how to set this up. > > > > Vincent > > > > > On 2012-03-17, at 1:10 PM, Vincent Olivier wrote: > > > > Hi Erik, > > > > I'm on it. But my guess would be that as long as there is a Lucene index > > somewhere, there is optimization to be made. > > > > More on this on Tuesday, > > > > Vincent > > > > > On 2012-03-16, at 7:10 PM, Erik Bruchez wrote: > > > > Vincent, > > > > Mmh yes that makes sense ;) So here it is: > > > > First, thanks for trying the query. It's a good catch, and it might be > > > the main reason for the slowness. > > > > However the question now is: how to fix this, assuming we do want to > > > find out how many documents are in that collection? > > > > On the Lucene question: that's an eXist feature, and the answer is "I > > > don't know". It woud be better to ask this on the exist-open > > > mailng-list: > > > > http://sourceforge.net/mail/?group_id=17691 > > > > And yes if you can keep helping on this it would be great! > > > > -Erik > > > > On Thu, Mar 15, 2012 at 5:33 AM, Vincent Olivier <[hidden email]> wrote: > > > By help, I meant answering my questions, of course! :D > > > > I will look into Lucene hooks within eXist today. > > > > Thanks, > > > > V > > > > On 2012-03-14, at 2:19 PM, Vincent Olivier <[hidden email]> wrote: > > > > Yes, I will need help. But not on this (I'm quite good at profiling, much > > less so in XForms ;)). Please see my other message coming soon about > > repeated sections. If you can help me there, I can put more time on the > > performance problem. ;-) > > > > So, still with the same nightly build version and data and forms as last > > time. I run a simple XQuery that is part of your code. Actually, just the > > snippet where you count the number of documents (see attached). > > > > The same query takes 18ms for 2K data.xml instances and 30s for 20K > > instances. > > > > It seems to me that any call on "collection()" is awfully inefficient. Based > > on your code, you call it twice in the query! Once for the query, once for > > the count. > > > > Is there a way we could manipulate the Lucene index directly. I'm an old > > buddy of Lucene's and it never gave me that kind of bad performanceship. > > Ever. > > > > Please let me know if you would be interested in helping me rewrite this > > query. > > > > Vincent > > > > > > <person.png> > > > > <company.png> > > > > > > <count.xq> > > > > > > On 2012-03-13, at 7:11 PM, Erik Bruchez wrote: > > > > Cool, excellent. Let us know of you need help. > > > > -Erik > > > > On Tue, Mar 13, 2012 at 10:48 AM, Vincent Olivier <[hidden email]> wrote: > > > Yes! I will run the Postman setup tomorrow and get back to you before the > > > end of the week. > > > > Thanks! > > > > Vincent > > > > > On 2012-03-13, at 2:19 AM, Erik Bruchez wrote: > > > > Vincent, > > > > For the improvement mentioned earlier in the thread, we used the Postman > > > REST Client for Chrome to run a simplified version of the search query: > > > > https://chrome.google.com/webstore/detail/fdmmgilgnpjigdojojpjoooidkmcomcm/related > > > > Here is the query and the XPL file that runs it: > > > > https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xml > > > https://github.com/orbeon/orbeon-forms/blob/master/src/resources/apps/fr/persistence/exist/search.xpl > > > > To run query, simply POST it to: > > > > http://localhost:8080/orbeon/fr/service/exist/search/[APP]/[FORM] > > > > By taking out parts of the query we were able to figure out the parts that > > > were slow and improve on it. Is that something you are able to try? > > > > -Erik > > > > On Sat, Mar 10, 2012 at 2:33 PM, Vincent Olivier <[hidden email]> wrote: > > > > Hi Erik, > > > > So the reindexing made no noticeable changes. > > > > The "company" form, for 2k instances still loads at around 20 seconds. And > > > the "person" form, with 20k instances still loads at around 3 minutes. See > > > screens below. > > > > Any other trick I could try? > > > > Thanks! > > > > Vincent > > > > > <Screen Shot 2012-03-10 at 5.28.51 PM.png><Screen Shot 2012-03-10 at > > > 5.32.10 PM.png> > > > > > > > On 2012-03-07, at 12:30 AM, Erik Bruchez wrote: > > > > Vincent, > > > > The index should only make things better. > > > > -Erik > > > > On Tue, Mar 6, 2012 at 9:29 AM, Vincent Olivier <[hidden email]> wrote: > > > > Hi Erik, > > > > > Thanks for your reply. Will try the Lucene index and eXist client > > > re-indexing tomorrow. But, will it impact the default view of the summary > > > page (with no search criterion)? > > > > > Thanks! > > > > > Vincent > > > > > > > On 2012-03-06, at 12:10 AM, Erik Bruchez wrote: > > > > > Vincent, > > > > > This means that you should have the improved eXist search query [1], > > > > and so things should be faster! > > > > > Did you make sure there is a proper Lucene [2] index configured in > > > > eXist, and that you re-indexed your collections with the eXist client? > > > > > -Erik > > > > > [1] > > > https://github.com/orbeon/orbeon-forms/commit/b33c9641364bb206ed334727baf3e000b2eb5fec > > > > [2] > > > http://wiki.orbeon.com/forms/doc/developer-guide/exist-configuration#TOC-Configuring-full-text-indexing > > > > > On Mon, Mar 5, 2012 at 3:48 PM, Vincent Olivier <[hidden email]> wrote: > > > > Hi Guys, > > > > > I'm running yesterday's CE nightly build out-of-the-box on a Mac (Lion) > > > with > > > > 8GB RAM assigned to the JVM with Tomcat 7. > > > > > I'm pre-populating a profile form with 20 000 (20K) XML data instances in > > > > eXist through its REST interface. > > > > > My prepopulating script and I are the only users for now. > > > > > Loading an individual instance detail view in form runner is a breeze, but > > > > the summary page (either the default view or search results) takes around > > > 3 > > > > minutes to load. > > > > > So, what configuration changes should I implement to make this setup > > > faster? > > > > I looked at the wiki, but nothing seems to apply to the summary view. > > > > > Help would indeed be appreciated. > > > > > Please find attached an example of an instance data XML. > > > > > Regards, > > > > > Vincent > > > > > > > > > -- > > > > You receive this message as a subscriber of the [hidden email] mailing > > > > list. > > > > To unsubscribe: mailto:[hidden email] > > > > For general help: mailto:[hidden email]?subject=help > > > > OW2 mailing lists service home page: http://www.ow2.org/wws > > > > > > -- > > > > You receive this message as a subscriber of the [hidden email] mailing > > > list. > > > > To unsubscribe: mailto:[hidden email] > > > > For general help: mailto:[hidden email]?subject=help > > > > OW2 mailing lists service home page: http://www.ow2.org/wws > > > > > > > -- > > > > You receive this message as a subscriber of the [hidden email] mailing > > > list. > > > > To unsubscribe: mailto:[hidden email] > > > > For general help: mailto:[hidden email]?subject=help > > > > OW2 mailing lists service home page: http://www.ow2.org/wws > > > > > > -- > > > You receive this message as a subscriber of the [hidden email] mailing > > > list. > > > To unsubscribe: mailto:[hidden email] > > > For general help: mailto:[hidden email]?subject=help > > > OW2 mailing lists service home page: http://www.ow2.org/wws > > > > > > > -- > > > You receive this message as a subscriber of the [hidden email] mailing > > > list. > > > To unsubscribe: mailto:[hidden email] > > > For general help: mailto:[hidden email]?subject=help > > > OW2 mailing lists service home page: http://www.ow2.org/wws > > > > > > > > -- > > > You receive this message as a subscriber of the [hidden email] mailing > > > list. > > > To unsubscribe: mailto:[hidden email] > > > For general help: mailto:[hidden email]?subject=help > > > OW2 mailing lists service home page: http://www.ow2.org/wws > > > > ... > > > [Message clipped] > > > -- > > You receive this message as a subscriber of the [hidden email] mailing > > list. > > To unsubscribe: mailto:[hidden email] > > For general help: mailto:[hidden email]?subject=help > > OW2 mailing lists service home page: http://www.ow2.org/wws > > > > -- > You receive this message as a subscriber of the [hidden email] mailing > list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > OW2 mailing lists service home page: http://www.ow2.org/wws > > > > > -- > You receive this message as a subscriber of the [hidden email] mailing > list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > OW2 mailing lists service home page: http://www.ow2.org/wws > -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Free forum by Nabble | Edit this page |