Geeks With Blogs
The Life and Times of a Dev Yes, we're really that weird

We found an interesting problem the other day with the combination of these items.

We launched our new site back in July, and have been fighting with google to get ourselves listed in search.  MSN and Yahoo both listed us and gave us pretty high page rankings, but google refused to see us, even with links to our page, etc.  I know that google has a “sandbox” that they put new sites into, but this didn't appear to be the problem.

I don't know if you know about it or not, but google had a program called sitemaps (www.google.com/webmasters/sitemaps).  For some reason, google couldn't see our site map or our robots.txt file.  We did discover that it could see landvoice.com, but www.landvoice.com was invisible to it.

Turns out we had a CNAME dns record for www that was pointing to landvoice.com, and that fouled google up.  As soon as we remove the cname and just put in a global record (*.landvoice.com), google started seeing the site.

Pain wasn't over, however.  Google still couldn't parse or cache the site.  We looked and looked and finally discovered that dotnetnuke expects a partial html page as the skin, not the complete page.  We were emitting two html and body tags.  The skin had one, but then dotnetnuke emitted another set automatically.  Oops.  So, we fixed that.

But we still weren't home free.  For some reason, google would get 500 errors whenever it'd try to get the robots.txt.  The other search engines apparently ignore that and treat it as though there is no robots.txt.  Google treats it as though the entire site is off limits.  Our problem had to do with url rewriting.  Don't know what that is?  Check out: http://www.codeproject.com/aspnet/urlrewriter.asp.  Basically, because our users aren't the most computer savvy in the world, and because of a bug (or rather, a “feature” according to MS) in IE which guesses the content of a document and ignores the content-type headers, we rewrite .txt files so that we can emit text dynamically, but have the browser think that it's getting a text file.

Well, our url rewriter didn't take the robots.txt file into account and assumed that ALL .txt extensions should be run through the dynamic engine.  This caused an exception, which google didn't like.  The fix?  All we had to do was change the extension association in IIS to be on a sub-directory of our main application, not on the main application itself.  We could also, and probably will with our next release, change the rewriter so that if a request for robots.txt comes in, it just ignores it and returns it as normal.

Fun, Fun!

Robert

Posted on Wednesday, December 7, 2005 7:42 AM Work | Back to top


Comments on this post: DNN skins, URL Rewriting, DNS and Google

No comments posted yet.
Your comment:
 (will show your gravatar)


Copyright © Robert May | Powered by: GeeksWithBlogs.net