We make extensive use of PostgreSQL and Flat Views. A while ago I noticed that the views exposing our Orbeon Forms data in a relational fashion were getting slow to the point of pain. I converted one particularly slow view to a materialized view to try make the experience more tolerable but I kept wondering if the live view of the data could be made faster. Because I had some success with the JSONB datatype in PostgreSQL I wondered if the views would perform better if the data were stored as JSONB instead of XML in the orbeon_form_data table. I posted some questions on twitter at the time. Now I am posting a more complete description of my experience here in hopes that the community can provide some advice.
Our most complex form has the following statistics:
I'm guessing these stats are relatively modest compared to what some of you are doing. For this form the orbeon_form_data table stores 3054 rows with 1248 distinct document_ids. Again, I think this is relatively modest. The view we are using to access this data is a copy of the automatically generated flat view with a few type casts and an extra where condition to avoid draft forms that may have unvalidated data that cannot be properly casted. The view has about 90 output columns and about 100 XPATH() calls. Running the view query in PGAdmin takes approximately 3 minutes to retrieve ~1000 rows. (Lest someone think that my modification are slowing things down, the modified query actually returns slightly faster than the raw automatically generated flat view.) On Twitter @Orbeon suggested that I should be able to add a JSONB column to the orbeon_form_data table and populate the data with a trigger. I found a post on StackOverflow with an example XML to JSON function for PostgreSQL. I felt that the resulting JSON was too verbose so I simplified it a little. (I removed the outer <form/> document element, ignored attributes, and placed all text values in arrays even if there is only a single item to make indexing consistent.) The conversion function is listed below. Using this function it took about 2.75 minutes to populate the JSON column for the 3054 rows in question. I think this works out to about 60 milliseconds which would probably be tolerable for a trigger executing on form submission. I then wrote a query which returns identical results to the original view (i.e. the one that takes 3 minutes to materialize) but references the JSONB column instead of using XPATH() operations on the XML to extract the data. This new query executes in under 3 seconds. After running this test, I believe the JSONB data storage alternative would provide a major performance enhancement to users of PostgreSQL and Orbeon's Flat Views. I think it would be most advantageous if Orbeon Forms provided the possibility to have a parallel JSONB column out of the box. Here are my concerns with the trigger method: 1) Because PostgreSQL's XML capabilities are limited, I'm a little bit nervous that unexpected XML data (new namespaces, etc.) may cause exceptions in the trigger function that could cause the data to fail to be stored in PostgreSQL. Orbeon Forms would know how to properly convert any XML that it is able to generate. 2) The format of the generated JSONB is still a bit ugly. I would prefer to have JSON as described in Orbeon Issue #2109 but I don't think that there is enough information contained in the individual form submission's XML model data properly identify all attachments and to consistently interpret which elements may be duplicated due to repeating grids and section. I believe these things are necessary so that deep indexing into the JSONB data is consistent from record to record. Comments welcome! Aaron Spike Orbeon XML to JSONB Function:
Example Form Model:
<section-1> <field-1>value-1</field-1> <field-2/> <field-3 xmlns:ns1="http://www.example.com/ns1" xmlns:ns2="http://www.example.com/ns2">value-3</field-3> <field-4>1 2 3</field-4> <field-5> value-1 test </field-5> <attachment-1 filename="File.pdf" mediatype="application/pdf" size="1234">/fr/service/persistence/crud/APP/FORM/data/DOC/FILE.bin</attachment-1> </section-1> <section-2> <grid-1> <grid-field-1>1</grid-field-1> <grid-field-2>2</grid-field-2> </grid-1> <grid-1> <grid-field-1>3</grid-field-1> <grid-field-2>4</grid-field-2> </grid-1> <nested-section-1> <field-4>value-4</field-4> <field-5>value-5</field-5> </nested-section-1> <nested-repeating-section-1> <nested-repeating-section-1-iteration> <field-6>value-6</field-6> <field-7>value-7</field-7> </nested-repeating-section-1-iteration> <nested-repeating-section-1-iteration> <field-6>value-8</field-6> <field-7>value-9</field-7> </nested-repeating-section-1-iteration> </nested-repeating-section-1> </section-2> </form>
This electronic communication, including any attached documents, may contain confidential and/or legally privileged information that is intended only for use by the recipient(s) named above. If you have received this communication in error, please notify the sender immediately and delete the communication and any attachments. Views expressed by the author do not necessarily represent those of Martin Luther College. -- You received this message because you are subscribed to the Google Groups "Orbeon Forms" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. To post to this group, send email to [hidden email]. |
Administrator
|
Hi Aaron,
Thank you for sharing this. It will for sure come in handy to someone else stumbling on the same situation. Going from 3 minutes to 3 seconds is quite impressive. Out of curiosity, what does the view query using the JSONB look like? -Alex ----- -- Follow Orbeon on Twitter: @orbeon Follow me on Twitter: @avernet -- Sent from: http://discuss.orbeon.com/ -- You received this message because you are subscribed to the Google Groups "Orbeon Forms" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. To post to this group, send email to [hidden email].
--
Follow Orbeon on Twitter: @orbeon Follow me on Twitter: @avernet |
I've abbreviated the example a bit. This is what a current flat view definition looks like:
With the JSONB column generated by the function in my previous post, it would look more like this:
Notice that accessing data in the JSONB is a little more verbose than one would think should be necessary. I would prefer if the bold red portions could be omitted: "jsonb_data->'section-1'->0->'field-1'->>0". The problem is that given only the XML data, the conversion function in postgresql can't tell which parts of the document might include repeated data. If the JSONB was calculated inside of Orbeon Form proper a much more pleasing JSONB structure could be generated. Would you ever consider adding such a thing? My current plan to work around my concern about calling the conversion function from a trigger is to call pg_notify() from a trigger and have a daemon process somewhere listening for the notification to generate missing JSONB data. I want to reduce the chances that the XML data fails to be saved to an absolute minimum. This electronic communication, including any attached documents, may contain confidential and/or legally privileged information that is intended only for use by the recipient(s) named above. If you have received this communication in error, please notify the sender immediately and delete the communication and any attachments. Views expressed by the author do not necessarily represent those of Martin Luther College. -- You received this message because you are subscribed to the Google Groups "Orbeon Forms" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. To post to this group, send email to [hidden email]. |
Administrator
|
Hi Aaron,
Interesting. About the JSON not being optimal, I'm wondering if, in the future once the issue linked below is implemented, instead of relying on a database trigger and your own logic to generate the JSON from the XML, you could update your process running on save or submit to also `send()` the data in JSON format to an end-point of yours, and have that code store the JSON it receives in the database. Of course, until that issue is taken care of, this is just a through, but it implementing 2109 is likely to be done faster than full built-in JSONB support ;). https://github.com/orbeon/orbeon-forms/issues/2109 -Alex ----- -- Follow Orbeon on Twitter: @orbeon Follow me on Twitter: @avernet -- Sent from: http://discuss.orbeon.com/ -- You received this message because you are subscribed to the Google Groups "Orbeon Forms" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. To post to this group, send email to [hidden email].
--
Follow Orbeon on Twitter: @orbeon Follow me on Twitter: @avernet |
Free forum by Nabble | Edit this page |