Architecture question

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Architecture question

Jim Cheesman
Hi,

All this talk of Fedora, eXist and EADitor is directly relevant to one of the more pressing issues I have with Orbeon.

The context: we're building a portal that will allow dynamic form creation which will then be linked to users, allowing the portal administrators to dynamically define what personal information is to be gathered about the users, as well as creating user surveys etc. etc. My main worry is that the number of users could be high, and the user data needs to be searchable - both individual searches and aggregate records . While both MySQL and Oracle offer XML parsing funcionality,  I'm not entirely sure what the performance is like, particuarly when simulating table joins using XPath.

So the question: What's the best way to set up large datasets with Orbeon? Is there any way to dynamically link Orbeon form definitions via Hibernate (or similar) to database tables, creating these tables on the fly? (I'll ignore the obvious security problems this could present for the moment...) If I were able to define the data to be gathered beforehand I could probably set up some kind of BI ELT process, but in this case that's not (apparently) an option.

Cheers,
Jim




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Architecture question

Erik Bruchez
Administrator
Jim,

Things like full-text search or search on specific fields can be made fast with a full-text index and proper indexes on the stored data. This applies to Oracle, and probably to MySQL as well.

For more complex stuff, with Oracle, we are going to implement soon support for materialized views, which will make the data quickly available in relational form for reporting and searches. But that will be Oracle-only. However maybe that can serve as a basis for creating tables on the fly with MySQL.

Also, Form Runner reads, writes and searches data through a simple REST API. If you provide your own persistence layer implementation behind this API, you can do anything you want, including using Hibernate. Whether Hibernate is the best (most performant) solution, I have to say that we don't know.

Also, depending on what "large" is, eXist might still be a good option. Here as well database tuning, i.e. indexes, is the key to to performance.

-Erik

On Mon, Mar 21, 2011 at 10:24 AM, Jim Cheesman <[hidden email]> wrote:
Hi,

All this talk of Fedora, eXist and EADitor is directly relevant to one of the more pressing issues I have with Orbeon.

The context: we're building a portal that will allow dynamic form creation which will then be linked to users, allowing the portal administrators to dynamically define what personal information is to be gathered about the users, as well as creating user surveys etc. etc. My main worry is that the number of users could be high, and the user data needs to be searchable - both individual searches and aggregate records . While both MySQL and Oracle offer XML parsing funcionality,  I'm not entirely sure what the performance is like, particuarly when simulating table joins using XPath.

So the question: What's the best way to set up large datasets with Orbeon? Is there any way to dynamically link Orbeon form definitions via Hibernate (or similar) to database tables, creating these tables on the fly? (I'll ignore the obvious security problems this could present for the moment...) If I were able to define the data to be gathered beforehand I could probably set up some kind of BI ELT process, but in this case that's not (apparently) an option.

Cheers,
Jim




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Architecture question

Jim Cheesman
Erik,

Thanks for the quick reply - I have to say the speed you reply to our questions on this list is most impressive!

I'll check out using REST and Hibernate, as well as the more obvious use of indexes etc.

Cheers,
Jim


On lun, 2011-03-21 at 19:25 -0700, Erik Bruchez wrote:
Jim,


Things like full-text search or search on specific fields can be made fast with a full-text index and proper indexes on the stored data. This applies to Oracle, and probably to MySQL as well.


For more complex stuff, with Oracle, we are going to implement soon support for materialized views, which will make the data quickly available in relational form for reporting and searches. But that will be Oracle-only. However maybe that can serve as a basis for creating tables on the fly with MySQL.


Also, Form Runner reads, writes and searches data through a simple REST API. If you provide your own persistence layer implementation behind this API, you can do anything you want, including using Hibernate. Whether Hibernate is the best (most performant) solution, I have to say that we don't know.


Also, depending on what "large" is, eXist might still be a good option. Here as well database tuning, i.e. indexes, is the key to to performance.


-Erik

On Mon, Mar 21, 2011 at 10:24 AM, Jim Cheesman <[hidden email]> wrote:
Hi,

All this talk of Fedora, eXist and EADitor is directly relevant to one of the more pressing issues I have with Orbeon.

The context: we're building a portal that will allow dynamic form creation which will then be linked to users, allowing the portal administrators to dynamically define what personal information is to be gathered about the users, as well as creating user surveys etc. etc. My main worry is that the number of users could be high, and the user data needs to be searchable - both individual searches and aggregate records . While both MySQL and Oracle offer XML parsing funcionality,  I'm not entirely sure what the performance is like, particuarly when simulating table joins using XPath.

So the question: What's the best way to set up large datasets with Orbeon? Is there any way to dynamically link Orbeon form definitions via Hibernate (or similar) to database tables, creating these tables on the fly? (I'll ignore the obvious security problems this could present for the moment...) If I were able to define the data to be gathered beforehand I could probably set up some kind of BI ELT process, but in this case that's not (apparently) an option.

Cheers,
Jim





--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws



plain text document attachment (message-footer.txt)
--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Architecture question

Tom Grahame
For what it's worth, you might consider an XForms (Orbeon), REST, XML (Exist, Fedora Commons) solution for creating and storing data. Then separate away the searching of that data using Solr, my project has had good success with that tool.

I'd suggest the key is in being certain in which part of your XML document life cycle the high usage/performance hits will be.

Tom
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Architecture question

Ethan Gruber
I was also going to recommend using Solr.  I prefer to separate data storage from the search index.  MySQL queries are not particularly fast or efficient.  Two of my projects that rely on Orbeon use eXist as the datastore, but simultaneously post the data to Solr.  Solr has a REST API, so it's pretty easy to POST directly to your index from an Orbeon XForm.  It's also pretty easy to get Solr's facet terms delivery API to work with Orbeon's <fr:autocomplete> if you have need for that.

Ethan

On Tue, Mar 22, 2011 at 6:43 AM, Tom Grahame <[hidden email]> wrote:
For what it's worth, you might consider an XForms (Orbeon), REST, XML (Exist,
Fedora Commons) solution for creating and storing data. Then separate away
the searching of that data using  http://lucene.apache.org/solr/ Solr , my
project has had good success with that tool.

I'd suggest the key is in being certain in which part of your XML document
life cycle the high usage/performance hits will be.

Tom

--
View this message in context: http://orbeon-forms-ops-users.24843.n4.nabble.com/Architecture-question-tp3394205p3396010.html
Sent from the Orbeon Forms (ops-users) mailing list archive at Nabble.com.


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Re: Architecture question

Jim Cheesman
Thanks Ethan, two votes for Solr: I'll certainly check it out.

Cheers,
Jim

On mar, 2011-03-22 at 08:36 -0400, Ethan Gruber wrote:
I was also going to recommend using Solr.  I prefer to separate data storage from the search index.  MySQL queries are not particularly fast or efficient.  Two of my projects that rely on Orbeon use eXist as the datastore, but simultaneously post the data to Solr.  Solr has a REST API, so it's pretty easy to POST directly to your index from an Orbeon XForm.  It's also pretty easy to get Solr's facet terms delivery API to work with Orbeon's <fr:autocomplete> if you have need for that.

Ethan

On Tue, Mar 22, 2011 at 6:43 AM, Tom Grahame <[hidden email]> wrote:
For what it's worth, you might consider an XForms (Orbeon), REST, XML (Exist,
Fedora Commons) solution for creating and storing data. Then separate away
the searching of that data using  http://lucene.apache.org/solr/ Solr , my
project has had good success with that tool.

I'd suggest the key is in being certain in which part of your XML document
life cycle the high usage/performance hits will be.

Tom

--
View this message in context: http://orbeon-forms-ops-users.24843.n4.nabble.com/Architecture-question-tp3394205p3396010.html
Sent from the Orbeon Forms (ops-users) mailing list archive at Nabble.com.


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws


plain text document attachment (message-footer.txt)
--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Architecture question

Jim Cheesman
In reply to this post by Tom Grahame
Thanks Tom, I'll check out Solr. Basically most of the data should be fairly static - it's user registration data that'll will be updated every now and then, but shouldn't be that often. Some kind of indexing / BI style tool could be ideal.

Cheers,
Jim


On mar, 2011-03-22 at 03:43 -0700, Tom Grahame wrote:
For what it's worth, you might consider an XForms (Orbeon), REST, XML (Exist,
Fedora Commons) solution for creating and storing data. Then separate away
the searching of that data using  http://lucene.apache.org/solr/ Solr , my
project has had good success with that tool.

I'd suggest the key is in being certain in which part of your XML document
life cycle the high usage/performance hits will be.

Tom

--
View this message in context: http://orbeon-forms-ops-users.24843.n4.nabble.com/Architecture-question-tp3394205p3396010.html
Sent from the Orbeon Forms (ops-users) mailing list archive at Nabble.com.
plain text document attachment (message-footer.txt)
--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Architecture question

echofloripa
Also check Compass, it is a tool that you plug into hibernate which gives you full-text search out-of-the-box. It is based on annotations and behind the scenes it uses Lucene to index and search.

http://www.compass-project.org/