mysql persistence seems not to scale well

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

mysql persistence seems not to scale well

koenvdk
Hello,

We're using Orbeon with the mysql persistance layer and we're experiencing some serious performance/scalability issues. I wasn't able to find this in the known issues, but I think it may be an issue for Orbeon. Does anyone else have problems using the mysql persistence service?

The problem appears when there are a few thousand records in the "orbeon_form_data" table; it probably depends on the amount of RAM in the machine. With our 2GB machine, it starts to be intolerable around 3K records.

When submitting a form (put), or getting a specific form submission (get), there is no problem, but when asking for the form summary (search), the response time becomes enormous, to the point where it simply hangs the entire application.

I tried looking at the xpl files that implement the mysql service, to reconstruct the query that's being sent to the mysql db when doing a search. Here's what I end up with:
(a query used while getting the summary for a form 'test2' in an app 'test').

select
(
   select count(*) from orbeon_form_data
   where
   (app, form, document_id, last_modified) in (
        select app, form, document_id, max(last_modified) last_modified
        from orbeon_form_data
        where
           app = 'test'
           and form = 'test2'
        group by app, form, document_id)
   and deleted = 'N'
) total,
(
   select count(*) from
   (
   select data.created, data.last_modified, data.document_id, data.app,
       data.form, extractValue(data.xml,'//control-1')
        from cbpldb.orbeon_form_data data,
        (    select max(last_modified) last_modified, app, form, document_id
                from cbpldb.orbeon_form_data
                where
                    app = 'test'
                    and form = 'test2'
                group by app, form, document_id
        ) latest
        where    
            data.last_modified = latest.last_modified
            and data.app = latest.app
            and data.form = latest.form
            and data.document_id = latest.document_id
            and data.deleted = 'N'
        order by created desc
   ) a
) search_total


I don't know if I reconstructed this from the xpl correctly, but my sql-knowledgeable colleagues tell me that it has 4 nested select statements and an exponential complexity.
Another colleague did a MySql Explain:

EXPLAIN select ( select count(*) from orbeon_form_data where (app, form, document_id, last_modified) in ( select app, form, document_id, max(last_modified) last_modified from orbeon_form_data where app = 'testing' and form = 'simpleform' group by app, form, document_id) and deleted = 'N' ) total, ( select count(*) from ( select data.created, data.last_modified, data.document_id , extractValue(data.xml, '/*/section-1/control-1') detail_1 from orbeon_form_data data, ( select max(last_modified) last_modified, app, form, document_id from orbeon_form_data where app = 'testing' and form = 'simpleform' group by app, form, document_id ) latest where data.last_modified = latest.last_modified and data.app = latest.app and data.form = latest.form and data.document_id = latest.document_id and data.deleted = 'N' order by created desc )a ) search_total:

*** row 1 ***
          table:  NULL
           type:  NULL
  possible_keys:  NULL
            key:  NULL
        key_len:  NULL
            ref:  NULL
           rows:  NULL
          Extra:  No tables used
*** row 2 ***
          table:  NULL
           type:  NULL
  possible_keys:  NULL
            key:  NULL
        key_len:  NULL
            ref:  NULL
           rows:  NULL
          Extra:  Select tables optimized away
*** row 3 ***
          table:  <derived6>
           type:  ALL
  possible_keys:  NULL
            key:  NULL
        key_len:  NULL
            ref:  NULL
           rows:  3
          Extra:  Using temporary; Using filesort
*** row 4 ***
          table:  data
           type:  ALL
  possible_keys:  NULL
            key:  NULL
        key_len:  NULL
            ref:  NULL
           rows:  3287
          Extra:  Using where; Using join buffer
*** row 5 ***
          table:  orbeon_form_data
           type:  ALL
  possible_keys:  NULL
            key:  NULL
        key_len:  NULL
            ref:  NULL
           rows:  3287
          Extra:  Using where; Using temporary; Using filesort
*** row 6 ***
          table:  orbeon_form_data
           type:  ALL
  possible_keys:  NULL
            key:  NULL
        key_len:  NULL
            ref:  NULL
           rows:  3287
          Extra:  Using where
*** row 7 ***
          table:  orbeon_form_data
           type:  ALL
  possible_keys:  NULL
            key:  NULL
        key_len:  NULL
            ref:  NULL
           rows:  3287
          Extra:  Using where; Using temporary; Using filesort



Besides using another persistence service, are there any workarounds for this? For now, we will probably use another database for each form, to keep the amount of records low, but this is not ideal.

Does anyone know if the Oracle persistence uses similar queries?



kind regards,


Koen Vanderkimpen

Connect: Twitter








Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals.
Indien dit bericht niet voor u bestemd is, verzoeken wij u dit onmiddellijk aan ons te melden en het bericht te vernietigen.

Conformément aux dispositions relatives à la représentation de l'asbl dans ses statuts, seul l'administrateur délégué, le directeur général ou son mandataire exprès est habilité à souscrire des engagements au nom de Smals.
Si ce message ne vous est pas destiné, nous vous prions de nous le signaler immédiatement et de détruire le message.

According to the provisions regarding representation of the non profit association in its bylaws, only the chief executive officer, the general manager or his explicit agent can enter into engagements on behalf of Smals.
If you are not the addressee of this message, we kindly ask you to signal this to us immediately and to delete the message.





--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: mysql persistence seems not to scale well

Alessandro  Vernet
Administrator
Hi Koen,

I suspect this is caused by extractValue(). My understanding is that MySQL treats XML as "just-a-blob", and that ou can't build indexes on it. So this function will parse the content of all your rows when the query runs, which isn't working very well as the number of rows grows.

Could you try editing the fr/persistence/exist/search.xpl to remove the extractValue() (or replace it by something that returns a constant value), to see if this is indeed the problem?

Alex
--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet
Reply | Threaded
Open this post in threaded view
|

Re: Re: mysql persistence seems not to scale well

koenvdk
Hi,

Thanks for the reply, this is definitely part of the answer. Unfortunately, we can't test this at the moment.

However: I think it's only part of the problem: during execution of the query, mysql has to do a sort on last_modified to find the latest version. Up to a certain point, this can be done in memory, and it's fast. But when mysql runs out of memory, it uses the disk, and the query becomes really slow. This is especially so because the sort, I think, has to be done many times




Koen

Connect: Twitter





From:        Alessandro  Vernet <[hidden email]>
To:        [hidden email]
Date:        04/06/2012 18:50
Subject:        [ops-users] Re: mysql persistence seems not to scale well





Hi Koen,

I suspect this is caused by extractValue(). My understanding is that MySQL
treats XML as "just-a-blob", and that ou can't build indexes on it. So this
function will parse the content of all your rows when the query runs, which
isn't working very well as the number of rows grows.

Could you try editing the fr/persistence/exist/search.xpl to remove the
extractValue() (or replace it by something that returns a constant value),
to see if this is indeed the problem?

Alex

--
View this message in context:
http://orbeon-forms-ops-users.24843.n4.nabble.com/mysql-persistence-seems-not-to-scale-well-tp4634494p4655189.html
Sent from the Orbeon Forms (ops-users) mailing list archive at Nabble.com.

--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe:
[hidden email]
For general help:
[hidden email]
OW2 mailing lists service home page:
http://www.ow2.org/wws








Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals.
Indien dit bericht niet voor u bestemd is, verzoeken wij u dit onmiddellijk aan ons te melden en het bericht te vernietigen.

Conformément aux dispositions relatives à la représentation de l'asbl dans ses statuts, seul l'administrateur délégué, le directeur général ou son mandataire exprès est habilité à souscrire des engagements au nom de Smals.
Si ce message ne vous est pas destiné, nous vous prions de nous le signaler immédiatement et de détruire le message.

According to the provisions regarding representation of the non profit association in its bylaws, only the chief executive officer, the general manager or his explicit agent can enter into engagements on behalf of Smals.
If you are not the addressee of this message, we kindly ask you to signal this to us immediately and to delete the message.





--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: mysql persistence seems not to scale well

Alessandro  Vernet
Administrator
Hi Koen,

Sounds good, you'll let us know when/if you get a chance to test that. About the sorting on last_modified, I am less worried about it, as if this is indeed a bottleneck, adding an index in MySQL will most likely take care of that.

However, for the extractValue(), we'll most likely need to add columns to store those values as-is, so we don't have to call extractValue() at all. The persistence layer can then set the values in those columns when saving new data, and update them when a form is published.

Alex
--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: mysql persistence seems not to scale well

koenvdk
Hi,

I tried adding the following indexes on orbeon_form_data:

document_id
last_modified
document_id,last_modified
app,form
app,form,document_id,last_modified

This resulted in some performance gain, but not enough (calling the form runner's persistence layer for a single document still takes at least 300ms, and sometimes over 2s, searches now range from 1s to 8s)

I haven't explored yet if this degrades for a higher number of records, as it did before (we have some 250 entries in the table right now)

Are these the indexes you would suggest?



I can also tell you that:

    select count(*) from orbeon_form_data
    where
    (app, form, document_id, last_modified) in (select app, form, document_id, max(last_modified) last_modified
        from orbeon_form_data
        where app = 'test'
        and form = 'test4'
    group by app, form, document_id) and deleted = 'N'

takes half a second when I execute it directly on mysql,
and

    select app, form, document_id, max(last_modified) last_modified
    from orbeon_form_data
    where
    app = 'test'
    and form = 'test4'  
    group by app, form, document_id;

executes immediately



Also, my main problem right now is I'm trying to call the orbeon RESTful persistence api from a Java program on the same server to get ALL the data for a certain form (I mean, the latest modified and not deleted, but with all field data). This has to work in a generic way (so I can't hardcode field names), and I don't really know how to build the query. So what I do, is a simple POST search for all the document_id's (ok, but could be faster), parse this in Java (=fast), and then iteratively GET all those documents (=very slow, when there are a lot of documents) and then parse those in Java(=fast). I know the fr summary view does this more efficiently, but I don't know how to reproduce this in my code.

What would a good search POST to the api look like to get _all_ data for a form?



Thanks,

Koen Vanderkimpen

Connect: Twitter





From:        Alessandro  Vernet <[hidden email]>
To:        [hidden email]
Date:        05/06/2012 17:44
Subject:        [ops-users] Re: Re: mysql persistence seems not to scale well





Hi Koen,

Sounds good, you'll let us know when/if you get a chance to test that. About
the sorting on last_modified, I am less worried about it, as if this is
indeed a bottleneck, adding an index in MySQL will most likely take care of
that.

However, for the extractValue(), we'll most likely need to add columns to
store those values as-is, so we don't have to call extractValue() at all.
The persistence layer can then set the values in those columns when saving
new data, and update them when a form is published.

Alex

--
View this message in context:
http://orbeon-forms-ops-users.24843.n4.nabble.com/mysql-persistence-seems-not-to-scale-well-tp4634494p4655197.html
Sent from the Orbeon Forms (ops-users) mailing list archive at Nabble.com.

--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe:
[hidden email]
For general help:
[hidden email]
OW2 mailing lists service home page:
http://www.ow2.org/wws








Overeenkomstig de bepalingen inzake de vertegenwoordiging van de vzw in haar statuten, kan enkel de gedelegeerde bestuurder, de algemeen directeur of zijn uitdrukkelijke lasthebber verbintenissen aangaan namens Smals.
Indien dit bericht niet voor u bestemd is, verzoeken wij u dit onmiddellijk aan ons te melden en het bericht te vernietigen.

Conformément aux dispositions relatives à la représentation de l'asbl dans ses statuts, seul l'administrateur délégué, le directeur général ou son mandataire exprès est habilité à souscrire des engagements au nom de Smals.
Si ce message ne vous est pas destiné, nous vous prions de nous le signaler immédiatement et de détruire le message.

According to the provisions regarding representation of the non profit association in its bylaws, only the chief executive officer, the general manager or his explicit agent can enter into engagements on behalf of Smals.
If you are not the addressee of this message, we kindly ask you to signal this to us immediately and to delete the message.





--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: mysql persistence seems not to scale well

Alessandro  Vernet
Administrator
Hi Koen,

This is an interesting question. I don't have a huge number of rows in my orbeon_form_data (around 900), and as you say, the following is "instant" even without an index:

    select document_id, max(last_modified)
    from orbeon_form_data
    where app = 'orbeon'
        and form = 'builder'
        and deleted = 'N'
    group by document_id

It returns about 150 rows for me. So why would the following take 5 seconds? (And I am here just getting the document_id, to check that the time isn't spent retrieving the larger XML data.)

    select document_id from orbeon_form_data where
        (document_id, last_modified) in (
            select document_id, max(last_modified)
            from orbeon_form_data
            where app = 'orbeon'
                and form = 'builder'
                and deleted = 'N'
            group by document_id
        );

Creating an index on (document_id, last_modified) as follows doesn't seem to improve this.

    create unique index orbeon_form_data_id_last
    on orbeon_form_data
    (document_id, last_modified);

However, if you write the join more explicitly, rather than using "in", then the query runs "instantly" again. So you can use:

    select t.document_id
    from orbeon_form_data t, (
          select document_id, max(last_modified) last_modified
          from orbeon_form_data
          where app = 'orbeon'
              and form = 'builder'
              and deleted = 'N'
          group by document_id
      ) m
    where
        t.document_id = m.document_id
        and t.last_modified = m.last_modified;

You'll let us know how this works for you.

Alex
--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet