Hi,
I think this has already been mentioned here, but OPS 2.8 (and probably more ancient versions) users might be interested by this post: http://copia.ogbuji.net/blog/2006-02-16/Mystery_of To make it short, it seems that Google can drop sites when it receives a HTTP 500 error while retrieving robots.txt files, a behavior that I have noticed with OPS 2.8 when you don't include robots.txt files in your directories. Note that this can become tricky is you generate a directory like structure that doesn't follow the structure of your filesystem for your URLs... OPS 3.0 returns a 404 error which is the right thing to do and isn't a problem with search engines. Eric -- GPG-PGP: 2A528005 Le premier annuaire des apiculteurs 100% XML! http://apiculteurs.info/ ------------------------------------------------------------------------ Eric van der Vlist http://xmlfr.org http://dyomedea.com (ISO) RELAX NG ISBN:0-596-00421-4 http://oreilly.com/catalog/relax (W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema ------------------------------------------------------------------------ -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws signature.asc (196 bytes) Download Attachment |
Administrator
|
Eric,
Thanks for the information! I think there is still a bug in OPS though: http://forge.objectweb.org/tracker/index.php?func=detail&aid=303083&group_id=168&atid=350207 The PFC considers the "not-found" page as a regular page, which produces a 200 code, not a 404. I don't think this causes problems for Google, but clearly we should have a 404, at least optionally. Note that you can work around this by generating your not-found page entirely in a page model as opposed to going through the page view and epilogue. -Erik Eric van der Vlist wrote: > Hi, > > I think this has already been mentioned here, but OPS 2.8 (and probably > more ancient versions) users might be interested by this post: > > http://copia.ogbuji.net/blog/2006-02-16/Mystery_of > > To make it short, it seems that Google can drop sites when it receives a > HTTP 500 error while retrieving robots.txt files, a behavior that I have > noticed with OPS 2.8 when you don't include robots.txt files in your > directories. > > Note that this can become tricky is you generate a directory like > structure that doesn't follow the structure of your filesystem for your > URLs... > > OPS 3.0 returns a 404 error which is the right thing to do and isn't a > problem with search engines. > > Eric -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
Hi Erik,
Le lundi 20 février 2006 à 15:09 +0100, Erik Bruchez a écrit : > Eric, > > Thanks for the information! > > I think there is still a bug in OPS though: > > http://forge.objectweb.org/tracker/index.php?func=detail&aid=303083&group_id=168&atid=350207 > > The PFC considers the "not-found" page as a regular page, which produces > a 200 code, not a 404. I don't think this causes problems for Google, > but clearly we should have a 404, at least optionally. handled differently from the case were it's handled by a page directive... If I try on the documentation section of orbeon.com: http://www.orbeon.com/ops/doc/intro-install/robots.txt -> 404 http://www.orbeon.com/ops/doc/intro-install/foo -> 500 > Note that you can work around this by generating your not-found page > entirely in a page model as opposed to going through the page view and > epilogue. That's what I have done on my corporate site (see for instance http://dyomedea.com/english/foo) and I had forgotten that this wasn't the default behavior... Eric -- GPG-PGP: 2A528005 Le premier annuaire des apiculteurs 100% XML! http://apiculteurs.info/ ------------------------------------------------------------------------ Eric van der Vlist http://xmlfr.org http://dyomedea.com (ISO) RELAX NG ISBN:0-596-00421-4 http://oreilly.com/catalog/relax (W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema ------------------------------------------------------------------------ -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws signature.asc (196 bytes) Download Attachment |
Free forum by Nabble | Edit this page |