Hello,
We're using Orbeon with the mysql persistance layer and we're experiencing some serious performance/scalability issues. I wasn't able to find this in the known issues, but I think it may be an issue for Orbeon. Does anyone else have problems using the mysql persistence service? The problem appears when there are a few thousand records in the "orbeon_form_data" table; it probably depends on the amount of RAM in the machine. With our 2GB machine, it starts to be intolerable around 3K records. When submitting a form (put), or getting a specific form submission (get), there is no problem, but when asking for the form summary (search), the response time becomes enormous, to the point where it simply hangs the entire application. I tried looking at the xpl files that implement the mysql service, to reconstruct the query that's being sent to the mysql db when doing a search. Here's what I end up with: (a query used while getting the summary for a form 'test2' in an app 'test'). select ( select count(*) from orbeon_form_data where (app, form, document_id, last_modified) in ( select app, form, document_id, max(last_modified) last_modified from orbeon_form_data where app = 'test' and form = 'test2' group by app, form, document_id) and deleted = 'N' ) total, ( select count(*) from ( select data.created, data.last_modified, data.document_id, data.app, data.form, extractValue(data.xml,'//control-1') from cbpldb.orbeon_form_data data, ( select max(last_modified) last_modified, app, form, document_id from cbpldb.orbeon_form_data where app = 'test' and form = 'test2' group by app, form, document_id ) latest where data.last_modified = latest.last_modified and data.app = latest.app and data.form = latest.form and data.document_id = latest.document_id and data.deleted = 'N' order by created desc ) a ) search_total I don't know if I reconstructed this from the xpl correctly, but my sql-knowledgeable colleagues tell me that it has 4 nested select statements and an exponential complexity. Another colleague did a MySql Explain: EXPLAIN select ( select count(*) from orbeon_form_data where (app, form, document_id, last_modified) in ( select app, form, document_id, max(last_modified) last_modified from orbeon_form_data where app = 'testing' and form = 'simpleform' group by app, form, document_id) and deleted = 'N' ) total, ( select count(*) from ( select data.created, data.last_modified, data.document_id , extractValue(data.xml, '/*/section-1/control-1') detail_1 from orbeon_form_data data, ( select max(last_modified) last_modified, app, form, document_id from orbeon_form_data where app = 'testing' and form = 'simpleform' group by app, form, document_id ) latest where data.last_modified = latest.last_modified and data.app = latest.app and data.form = latest.form and data.document_id = latest.document_id and data.deleted = 'N' order by created desc )a ) search_total: *** row 1 *** table: NULL type: NULL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: No tables used *** row 2 *** table: NULL type: NULL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: Select tables optimized away *** row 3 *** table: <derived6> type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 3 Extra: Using temporary; Using filesort *** row 4 *** table: data type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 3287 Extra: Using where; Using join buffer *** row 5 *** table: orbeon_form_data type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 3287 Extra: Using where; Using temporary; Using filesort *** row 6 *** table: orbeon_form_data type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 3287 Extra: Using where *** row 7 *** table: orbeon_form_data type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 3287 Extra: Using where; Using temporary; Using filesort Besides using another persistence service, are there any workarounds for this? For now, we will probably use another database for each form, to keep the amount of records low, but this is not ideal. Does anyone know if the Oracle persistence uses similar queries? kind regards, Koen Vanderkimpen -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Administrator
|
Hi Koen,
I suspect this is caused by extractValue(). My understanding is that MySQL treats XML as "just-a-blob", and that ou can't build indexes on it. So this function will parse the content of all your rows when the query runs, which isn't working very well as the number of rows grows. Could you try editing the fr/persistence/exist/search.xpl to remove the extractValue() (or replace it by something that returns a constant value), to see if this is indeed the problem? Alex
--
Follow Orbeon on Twitter: @orbeon Follow me on Twitter: @avernet |
Hi,
Thanks for the reply, this is definitely part of the answer. Unfortunately, we can't test this at the moment. However: I think it's only part of the problem: during execution of the query, mysql has to do a sort on last_modified to find the latest version. Up to a certain point, this can be done in memory, and it's fast. But when mysql runs out of memory, it uses the disk, and the query becomes really slow. This is especially so because the sort, I think, has to be done many times Koen Connect:
Hi Koen, I suspect this is caused by extractValue(). My understanding is that MySQL treats XML as "just-a-blob", and that ou can't build indexes on it. So this function will parse the content of all your rows when the query runs, which isn't working very well as the number of rows grows. Could you try editing the fr/persistence/exist/search.xpl to remove the extractValue() (or replace it by something that returns a constant value), to see if this is indeed the problem? Alex -- View this message in context: http://orbeon-forms-ops-users.24843.n4.nabble.com/mysql-persistence-seems-not-to-scale-well-tp4634494p4655189.html Sent from the Orbeon Forms (ops-users) mailing list archive at Nabble.com. -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: [hidden email] For general help: [hidden email] OW2 mailing lists service home page: http://www.ow2.org/wws -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Administrator
|
Hi Koen,
Sounds good, you'll let us know when/if you get a chance to test that. About the sorting on last_modified, I am less worried about it, as if this is indeed a bottleneck, adding an index in MySQL will most likely take care of that. However, for the extractValue(), we'll most likely need to add columns to store those values as-is, so we don't have to call extractValue() at all. The persistence layer can then set the values in those columns when saving new data, and update them when a form is published. Alex
--
Follow Orbeon on Twitter: @orbeon Follow me on Twitter: @avernet |
Hi,
I tried adding the following indexes on orbeon_form_data: document_id last_modified document_id,last_modified app,form app,form,document_id,last_modified This resulted in some performance gain, but not enough (calling the form runner's persistence layer for a single document still takes at least 300ms, and sometimes over 2s, searches now range from 1s to 8s) I haven't explored yet if this degrades for a higher number of records, as it did before (we have some 250 entries in the table right now) Are these the indexes you would suggest? I can also tell you that: select count(*) from orbeon_form_data where (app, form, document_id, last_modified) in (select app, form, document_id, max(last_modified) last_modified from orbeon_form_data where app = 'test' and form = 'test4' group by app, form, document_id) and deleted = 'N' takes half a second when I execute it directly on mysql, and select app, form, document_id, max(last_modified) last_modified from orbeon_form_data where app = 'test' and form = 'test4' group by app, form, document_id; executes immediately Also, my main problem right now is I'm trying to call the orbeon RESTful persistence api from a Java program on the same server to get ALL the data for a certain form (I mean, the latest modified and not deleted, but with all field data). This has to work in a generic way (so I can't hardcode field names), and I don't really know how to build the query. So what I do, is a simple POST search for all the document_id's (ok, but could be faster), parse this in Java (=fast), and then iteratively GET all those documents (=very slow, when there are a lot of documents) and then parse those in Java(=fast). I know the fr summary view does this more efficiently, but I don't know how to reproduce this in my code. What would a good search POST to the api look like to get _all_ data for a form? Thanks, Koen Vanderkimpen Connect:
Hi Koen, Sounds good, you'll let us know when/if you get a chance to test that. About the sorting on last_modified, I am less worried about it, as if this is indeed a bottleneck, adding an index in MySQL will most likely take care of that. However, for the extractValue(), we'll most likely need to add columns to store those values as-is, so we don't have to call extractValue() at all. The persistence layer can then set the values in those columns when saving new data, and update them when a form is published. Alex -- View this message in context: http://orbeon-forms-ops-users.24843.n4.nabble.com/mysql-persistence-seems-not-to-scale-well-tp4634494p4655197.html Sent from the Orbeon Forms (ops-users) mailing list archive at Nabble.com. -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: [hidden email] For general help: [hidden email] OW2 mailing lists service home page: http://www.ow2.org/wws -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Administrator
|
Hi Koen,
This is an interesting question. I don't have a huge number of rows in my orbeon_form_data (around 900), and as you say, the following is "instant" even without an index: select document_id, max(last_modified) from orbeon_form_data where app = 'orbeon' and form = 'builder' and deleted = 'N' group by document_id It returns about 150 rows for me. So why would the following take 5 seconds? (And I am here just getting the document_id, to check that the time isn't spent retrieving the larger XML data.) select document_id from orbeon_form_data where (document_id, last_modified) in ( select document_id, max(last_modified) from orbeon_form_data where app = 'orbeon' and form = 'builder' and deleted = 'N' group by document_id ); Creating an index on (document_id, last_modified) as follows doesn't seem to improve this. create unique index orbeon_form_data_id_last on orbeon_form_data (document_id, last_modified); However, if you write the join more explicitly, rather than using "in", then the query runs "instantly" again. So you can use: select t.document_id from orbeon_form_data t, ( select document_id, max(last_modified) last_modified from orbeon_form_data where app = 'orbeon' and form = 'builder' and deleted = 'N' group by document_id ) m where t.document_id = m.document_id and t.last_modified = m.last_modified; You'll let us know how this works for you. Alex
--
Follow Orbeon on Twitter: @orbeon Follow me on Twitter: @avernet |
Free forum by Nabble | Edit this page |