chrometweaks.org

Free iPage web hosting - are any real and which?

Click Here To View All Answers...


Got a quick question: Free iPage web hosting - are any real and which? Looking forward for any answer. Second question.. Hey.

Now my store is running 'osCommerce 2.2 Milestone 2' I noticed in 'Admin -> My Store' I noticed the option 'Use Search-Engine Safe URLs (still in development)'..

Could anyone comment on this, does it work if I select 'true'?.

Also, should I let robots crawl my MS2 iPage site or should I block them all with a robots.txt file? I ask this as I've heard stories in the past about huge bandwidth guzzling robots.

Cheers in advance.

Paul Brown..

Comments (118)

Good question... I dunno what is the answer. I'll do some Googling and get back to you if I discover an good answer. You should email the people at iPage as they probably could give you an answer..

Comment #1

So by doing what you say above, will I get indexed properly and safely by the like of google, msn etc?..

Comment #2

Your iPage site will still get indexed but your server will not be brought to it's knees during the process...

Comment #3

I've been trying out the search-engine-friendly urls and iPage site loads fast..

Stevel.

I'm wondering why.

You don't recommend using SE-safe urls?.

Is there a specific reason?.

I have it turned on for a iPage site that I recently put on line and quite a large number of pages are in google's index...

Comment #4

Colin.

Is that the MS2 release you have or a more recent CVS release?.

Cheers.

Paul Brown..

Comment #5

Yes Paul, MS2.

I have opted not to keep up to date with the CVS until the next milestone is released.

Dropped you a pm..

Comment #6

That statement was referring to what happens when google spiders your iPage site and.

Is.

Able to create sessions, not the SES URLs..

SES URLs in general are safe but there are a number of contributions that will not work with them...

Comment #7

I consider the SES URLs a needless complication that, as Jim says, interfere with some contributions and modules..

The key point of Prevent Spider Sessions is that it prevents the search engines from starting sessions and including session IDs in the indexed URLs. It also prevents the spiders from merrily adding items to it's "cart" while trying every link...

Comment #8

Robots.txt has to be in your root directory. It can, but may *not* be read elsewhere. So, if your catalog is in mydomain.com/catatlog, your robots.txt would be.

Disallow: catalog/shopping_cart.php.

Disallow: catalog/advanced_search.php.

Etc..

Hth..

Comment #9

There should be a slash before catalog if you are doing it that way. Yes, I should have mentioned that the page name is site-relative...

Comment #10

Your right on this. Thanks for the clarification!.

George..

Comment #11

Hey.

Now my store is running 'osCommerce 2.2 Milestone 2' I noticed in 'Admin -> My Store' I noticed the option 'Use Search-Engine Safe URLs (still in development)'..

Could anyone comment on this, does it work if I select 'true'?.

Also, should I let robots crawl my MS2 iPage site or should I block them all with a robots.txt file? I ask this as I've heard stories in the past about huge bandwidth guzzling robots.

Cheers in advance.

Paul Brown..

Comment #12

I do not recommend using the "Search engine safe URLs'. All the major search engines can handle the default method, and that feature is known to cause problems..

You should set Prevent Spider Sessions to true and I suggest replacing includes/spiders.txt with this:.

Almaden.ibm.com.

Appie 1.1.

Architext.

Asterias.

Atomz.

Augurfind.

Bannana_bot.

Bdcindexer.

Crawl.

Docomo.

Frooglebot.

Gaisbot.

Geobot.

Gigabot.

Googlebot.

Grub.

Gulliver.

Henrythemiragorobot.

Ia_archiver.

Iconsurf.

Infoseek.

Kit_fireball.

Lachesis.

Linkwalker.

Lycos_spider.

Mantraagent.

Mercator.

Msnbot.

Moget/.

Muscatferret.

Naverbot.

Naverrobot.

Ncsa beta.

Netresearchserver.

Ng/.

Npbot.

Nutch.

Obot.

Osis-project.

Polybot.

Pompos.

Psbot.

Scooter.

Seeker.

Seventwentyfour.

Sidewinder.

Slurp.

Spider.

Steeler/.

Szukacz.

T-h-u-n-d-e-r-s-t-o-n-e.

Teoma.

Turnitinbot.

Tutorgig.

Ultraseek.

Vagabondo.

Voilabot.

W3c_validator.

Websitepulse.

Zao/.

Zyborg.

You should also have a robots.txt that keeps spiders out of pages you won't want indexed. Here is what I use:.

User-agent: *.

Disallow: /shopping_cart.php.

Disallow: /advanced_search.php.

Disallow: /login.php.

Disallow: /checkout_shipping.php.

Disallow: /account.php.

Disallow: /logoff.php.

Disallow: /create_account.php.

Disallow: /password_forgotten.php..

Comment #13


This question was taken from a support group/message board and re-posted here so others can learn from it.