Hi,
All this talk of Fedora, eXist and EADitor is directly relevant to one of the more pressing issues I have with Orbeon. The context: we're building a portal that will allow dynamic form creation which will then be linked to users, allowing the portal administrators to dynamically define what personal information is to be gathered about the users, as well as creating user surveys etc. etc. My main worry is that the number of users could be high, and the user data needs to be searchable - both individual searches and aggregate records . While both MySQL and Oracle offer XML parsing funcionality, I'm not entirely sure what the performance is like, particuarly when simulating table joins using XPath. So the question: What's the best way to set up large datasets with Orbeon? Is there any way to dynamically link Orbeon form definitions via Hibernate (or similar) to database tables, creating these tables on the fly? (I'll ignore the obvious security problems this could present for the moment...) If I were able to define the data to be gathered beforehand I could probably set up some kind of BI ELT process, but in this case that's not (apparently) an option. Cheers, Jim -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Administrator
|
Jim,
Things like full-text search or search on specific fields can be made fast with a full-text index and proper indexes on the stored data. This applies to Oracle, and probably to MySQL as well.
For more complex stuff, with Oracle, we are going to implement soon support for materialized views, which will make the data quickly available in relational form for reporting and searches. But that will be Oracle-only. However maybe that can serve as a basis for creating tables on the fly with MySQL.
Also, Form Runner reads, writes and searches data through a simple REST API. If you provide your own persistence layer implementation behind this API, you can do anything you want, including using Hibernate. Whether Hibernate is the best (most performant) solution, I have to say that we don't know.
Also, depending on what "large" is, eXist might still be a good option. Here as well database tuning, i.e. indexes, is the key to to performance. -Erik
On Mon, Mar 21, 2011 at 10:24 AM, Jim Cheesman <[hidden email]> wrote:
-- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Erik,
Thanks for the quick reply - I have to say the speed you reply to our questions on this list is most impressive! I'll check out using REST and Hibernate, as well as the more obvious use of indexes etc. Cheers, Jim On lun, 2011-03-21 at 19:25 -0700, Erik Bruchez wrote: Jim, Things like full-text search or search on specific fields can be made fast with a full-text index and proper indexes on the stored data. This applies to Oracle, and probably to MySQL as well. For more complex stuff, with Oracle, we are going to implement soon support for materialized views, which will make the data quickly available in relational form for reporting and searches. But that will be Oracle-only. However maybe that can serve as a basis for creating tables on the fly with MySQL. Also, Form Runner reads, writes and searches data through a simple REST API. If you provide your own persistence layer implementation behind this API, you can do anything you want, including using Hibernate. Whether Hibernate is the best (most performant) solution, I have to say that we don't know. Also, depending on what "large" is, eXist might still be a good option. Here as well database tuning, i.e. indexes, is the key to to performance. -Erik On Mon, Mar 21, 2011 at 10:24 AM, Jim Cheesman <[hidden email]> wrote: Hi,
-- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
For what it's worth, you might consider an XForms (Orbeon), REST, XML (Exist, Fedora Commons) solution for creating and storing data. Then separate away the searching of that data using Solr, my project has had good success with that tool.
I'd suggest the key is in being certain in which part of your XML document life cycle the high usage/performance hits will be. Tom |
I was also going to recommend using Solr. I prefer to separate data storage from the search index. MySQL queries are not particularly fast or efficient. Two of my projects that rely on Orbeon use eXist as the datastore, but simultaneously post the data to Solr. Solr has a REST API, so it's pretty easy to POST directly to your index from an Orbeon XForm. It's also pretty easy to get Solr's facet terms delivery API to work with Orbeon's <fr:autocomplete> if you have need for that.
Ethan On Tue, Mar 22, 2011 at 6:43 AM, Tom Grahame <[hidden email]> wrote: For what it's worth, you might consider an XForms (Orbeon), REST, XML (Exist, -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Thanks Ethan, two votes for Solr: I'll certainly check it out.
Cheers, Jim On mar, 2011-03-22 at 08:36 -0400, Ethan Gruber wrote: I was also going to recommend using Solr. I prefer to separate data storage from the search index. MySQL queries are not particularly fast or efficient. Two of my projects that rely on Orbeon use eXist as the datastore, but simultaneously post the data to Solr. Solr has a REST API, so it's pretty easy to POST directly to your index from an Orbeon XForm. It's also pretty easy to get Solr's facet terms delivery API to work with Orbeon's <fr:autocomplete> if you have need for that. On Tue, Mar 22, 2011 at 6:43 AM, Tom Grahame <[hidden email]> wrote:
-- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
In reply to this post by Tom Grahame
Thanks Tom, I'll check out Solr. Basically most of the data should be fairly static - it's user registration data that'll will be updated every now and then, but shouldn't be that often. Some kind of indexing / BI style tool could be ideal.
Cheers, Jim On mar, 2011-03-22 at 03:43 -0700, Tom Grahame wrote: For what it's worth, you might consider an XForms (Orbeon), REST, XML (Exist, Fedora Commons) solution for creating and storing data. Then separate away the searching of that data using http://lucene.apache.org/solr/ Solr , my project has had good success with that tool. I'd suggest the key is in being certain in which part of your XML document life cycle the high usage/performance hits will be. Tom -- View this message in context: http://orbeon-forms-ops-users.24843.n4.nabble.com/Architecture-question-tp3394205p3396010.html Sent from the Orbeon Forms (ops-users) mailing list archive at Nabble.com. -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Also check Compass, it is a tool that you plug into hibernate which gives you full-text search out-of-the-box. It is based on annotations and behind the scenes it uses Lucene to index and search.
http://www.compass-project.org/ |
Free forum by Nabble | Edit this page |