(Reminder) OPS 2.8, robots.txt and Google

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

(Reminder) OPS 2.8, robots.txt and Google

Eric van der Vlist
Hi,

I think this has already been mentioned here, but OPS 2.8 (and probably
more ancient versions) users might be interested by this post:

http://copia.ogbuji.net/blog/2006-02-16/Mystery_of

To make it short, it seems that Google can drop sites when it receives a
HTTP 500 error while retrieving robots.txt files, a behavior that I have
noticed with OPS 2.8 when you don't include robots.txt files in your
directories.

Note that this can become tricky is you generate a directory like
structure that doesn't follow the structure of your filesystem for your
URLs...

OPS 3.0 returns a 404 error which is the right thing to do and isn't a
problem with search engines.

Eric
--
GPG-PGP: 2A528005
Le premier annuaire des apiculteurs 100% XML!
                                                http://apiculteurs.info/
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(ISO) RELAX NG   ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: (Reminder) OPS 2.8, robots.txt and Google

Erik Bruchez
Administrator
Eric,

Thanks for the information!

I think there is still a bug in OPS though:

http://forge.objectweb.org/tracker/index.php?func=detail&aid=303083&group_id=168&atid=350207

The PFC considers the "not-found" page as a regular page, which produces
a 200 code, not a 404. I don't think this causes problems for Google,
but clearly we should have a 404, at least optionally.

Note that you can work around this by generating your not-found page
entirely in a page model as opposed to going through the page view and
epilogue.

-Erik

Eric van der Vlist wrote:

> Hi,
>
> I think this has already been mentioned here, but OPS 2.8 (and probably
> more ancient versions) users might be interested by this post:
>
> http://copia.ogbuji.net/blog/2006-02-16/Mystery_of
>
> To make it short, it seems that Google can drop sites when it receives a
> HTTP 500 error while retrieving robots.txt files, a behavior that I have
> noticed with OPS 2.8 when you don't include robots.txt files in your
> directories.
>
> Note that this can become tricky is you generate a directory like
> structure that doesn't follow the structure of your filesystem for your
> URLs...
>
> OPS 3.0 returns a 404 error which is the right thing to do and isn't a
> problem with search engines.
>
> Eric



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: (Reminder) OPS 2.8, robots.txt and Google

Eric van der Vlist
Hi Erik,

Le lundi 20 février 2006 à 15:09 +0100, Erik Bruchez a écrit :

> Eric,
>
> Thanks for the information!
>
> I think there is still a bug in OPS though:
>
> http://forge.objectweb.org/tracker/index.php?func=detail&aid=303083&group_id=168&atid=350207
>
> The PFC considers the "not-found" page as a regular page, which produces
> a 200 code, not a 404. I don't think this causes problems for Google,
> but clearly we should have a 404, at least optionally.
In fact, the cases were the page is handled by a file directive is
handled differently from the case were it's handled by a page
directive...

If I try on the documentation section of orbeon.com:

http://www.orbeon.com/ops/doc/intro-install/robots.txt -> 404
http://www.orbeon.com/ops/doc/intro-install/foo        -> 500

> Note that you can work around this by generating your not-found page
> entirely in a page model as opposed to going through the page view and
> epilogue.

That's what I have done on my corporate site (see for instance
http://dyomedea.com/english/foo) and I had forgotten that this wasn't
the default behavior...

Eric

--
GPG-PGP: 2A528005
Le premier annuaire des apiculteurs 100% XML!
                                                http://apiculteurs.info/
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(ISO) RELAX NG   ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws

signature.asc (196 bytes) Download Attachment