Hi,
Title says it (almost) all... I need to read external XML documents to index them. Some of these documents have external DTDs and these DTDs are often either non existant or on other sites. To improve performances and decrease the number of parsing errors, I'd like to use two different and complementary approaches. The first one has already been mentioned on this list and that would be to implement XML catalogs. This would deal with "well known DTDs" (HTML, XHTML, DocBook, OpenOffice, ...). The second one, for DTDs that would not be known of the catalog would be to use SAX features that are currently not exposed through the URLGenerator such as the following one: http://apache.org/xml/features/nonvalidating/load-external-dtd >From a quick glance in the code, it doesn't seem so easy because the couple of currently exposed features (validation and XInclude) are used in the cache keys and that would require some refactoring to avoid an exponential growth of the number of combinations (and of instructions to test these combinations)... Expect maybe if we said that using these others features would disable caching. What do you think? Are there better ways to exposing these features? Thanks, Eric -- Le premier annuaire des apiculteurs 100% XML! http://apiculteurs.info/ ------------------------------------------------------------------------ Eric van der Vlist http://xmlfr.org http://dyomedea.com (ISO) RELAX NG ISBN:0-596-00421-4 http://oreilly.com/catalog/relax (W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema ------------------------------------------------------------------------ -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
Administrator
|
It looks like this email was left unanswered.
I don't think there is really a problem adding the load-external-dtd element to the configuration from the URL generator's point of view, while still allowing for caching. In a first phase, we would not detect changes in the DTDs though: adding detection would require adding hooks to he parser or to the XML catalog to add the DTD URI to the list of URIs that impact caching. We would have to solve the question of the number of parser factories available, but that is certainly doable. The relevant code is: XMLUtils.newSAXParser(boolean validating, boolean handleXInclude) That method would require a "boolean loadExternalDTD" flag. -Erik Eric van der Vlist wrote: > Hi, > > Title says it (almost) all... > > I need to read external XML documents to index them. > > Some of these documents have external DTDs and these DTDs are often > either non existant or on other sites. > > To improve performances and decrease the number of parsing errors, I'd > like to use two different and complementary approaches. > > The first one has already been mentioned on this list and that would be > to implement XML catalogs. This would deal with "well known DTDs" (HTML, > XHTML, DocBook, OpenOffice, ...). > > The second one, for DTDs that would not be known of the catalog would be > to use SAX features that are currently not exposed through the > URLGenerator such as the following one: > > http://apache.org/xml/features/nonvalidating/load-external-dtd > >>From a quick glance in the code, it doesn't seem so easy because the > couple of currently exposed features (validation and XInclude) are used > in the cache keys and that would require some refactoring to avoid an > exponential growth of the number of combinations (and of instructions to > test these combinations)... > > Expect maybe if we said that using these others features would disable > caching. > > What do you think? > > Are there better ways to exposing these features? > > Thanks, > > Eric -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
Free forum by Nabble | Edit this page |