chrometweaks.org

I bought a domain name through iPage but bought hosting through xFluro?

Click Here To View All Answers...


My question is I bought a domain name through iPage but bought hosting through xFluro? Hoping for any answer or 2. My 2nd question... Hello guys,.

I have noticed that google.com.au has updated my listing with them however it only seemed to update the index.php (.html with SEF addon).

I looked in my logs and found googlebot v2 had visited my site, however he ended up at spiders.txt according to my log and then left..

Can anyone help me find out why the googlebot did not index my whole or even partial of my site..

I have the setting on in ADMIN for "Prevent Spider Sessions".

Here is a copy of that file spiders.txt exactly as it is.

Any help would be much appreciated..

$Id: spiders.txt,v 1.2 2003/05/05 17:58:17 dgw_ Exp $.

Almaden.ibm.com.

Appie 1.1.

Architext.

Ask jeeves.

Asterias2.0.

Augurfind.

Baiduspider.

Bannana_bot.

Bdcindexer.

Crawler.



Steeler/1.3.

Szukacz.

T-h-u-n-d-e-r-s-t-o-n-e.

Teoma.

Turnitinbot.

Ultraseek.

Vagabondo.

Voilabot.

W3c_validator.

Zao/0.

Zyborg/1.0..

Comments (48)

Hmm... I need to find out myself. I don't know what is the right answer. I'll do some research in Google and get back to you if I bump into an anything. You should email the people at iPage as they probably could give you an answer..

Comment #1

How long has it been..

It can easily take about 3 months to get all of your pages indexed...

Comment #2

WOW! That was really deep, and very honest. But your on the right track and want weight loss for all the right reasons. We are here if you need help, advice, or just some one to talk to. I wish you the very best, and I am waiting to see the results...

Comment #3

Ok so this visit does not mean googlebot will do the whole site..

I was just concerned that maybe the spiders.txt or something was telling google not to index my entire site...

Comment #4

I think you said things that many of us have thought in our lives. I could have written that myself... I know exactly what you're feeling! Thank You for your straight talk about what many of us face (or don't face)..

This site is great. You'll find people who support you, no matter what. They'll cheer for you with each pound you lose and be there for you when you have a hard day. You also have much to contribute to help support us, too. Stay in touch, respond, ask questions, we're here for you!..

Comment #5

Hi,.

You should create a 'robots.txt' file that excludes spiders from your includes directory, there is no usefull information in there and you wasted googlebots time looking at some buttons with different languages on it..

Just search for 'robots.txt' in google..

Plus there is many more things you need to do...

Comment #6

Today is the day. I will pray for your strength. I'm wishing you a good day. Remember, if you MESS up, don't GIVE up. I'll look forward to your posts to see how you're doing...

Comment #7

Can you please help me with the robots.txt.

Where do I source the content, where do I put it (/includes/).

Any support would be much appreciated.

Is this why it did not index my iPage site because I have no robots.txt..

Comment #8

..."The real shame and sadness and anger comes from what that weight has done to me. The opportunities I have denied myself. The way I have let people treat me because I didn't think I was worth more. The situations I've settled for. The fun I've stayed away from. The chances.

All of it." ... I can only say...EXACTLY!! You have hit the nail squarely on the head. These are the things that have haunted me for years and years...the things I beat myself up about all the time, because it could have been so different. I was the one in control, and yet I allowed this to happen to me. No one to blame but little old me.

Thanks for your insight and your inspiration!! ~Lori..

Comment #9

I think it goes in the root directory of your store..

Http://www.google.co.uk/search?q=robots.tx...F-8&hl;=en&meta;=.

This post has been edited by.

Yesudo.com.

: 18 February 2004, 03:31..

Comment #10

Oh how I agree with you. Isnt is amazing how we put ourselves in a corner because of our weight? We pass by so many opportunites in life, we punish ourselves, we miss out on so much. Thank you for sharing your thoughts on this; I know it's how we all feel...

Comment #11

Hello..

I now have a robots.txt file..

And once again the googlebot came, see below..

Visits from Googlebot    Googlebot.

18/Feb/2004:18:27:48 64.68.82.169 /robots.txt "Googlebot/2.1.

18/Feb/2004:18:27:49 64.68.82.169 / "Googlebot/2.1.

You see how it stopped at robots.txt.

What have I done wrong, I really need google to index me and he seems to keep stopping..

This is my robots.txt.

User-agent: *.

Disallow: /chat/.

Disallow: /live_support/.

Disallow: /admin/.

Please any help from you SEO's would be much appreciated..

Comment #12

I cried when I read this. I have felt the same way for so long. This is an uphill and downhill battle. Up and down up and down. One day I'm feeling good about myself and in control, and the next day I'm frustrated because of something so trivial like I don't like the way my face looks because it's too fat. I have moments of weakness.

I just wish those moments didn't have to do with my weight anymore. There are people in this world who don't think twice about their weight - it's not an issue and never was an issue. Me? My weight has been THE issue my entire life. I'm so sick of it!..

Comment #13

First of all, you don't have to have a robots.txt file to be indexed by Google, or any other search engine..

Second, while there are 1000's of thing you can do to increase your rankings once you are fully indexed, you han've done anything wrong as far as begin fully parsed yet..

No one ever said that google was supposed to come to your iPage site and parse through all of your pages at once. Typically, this is not the way the googlebots behave..

For more information about this subhject, you should research search engine spiders...

Comment #14

Hi,.

Can someone please explain why the file spiders.txt is in /catalog/includes/ path, whilst the file robots.txt is in the /catalog/ path ??.

Wouldn't spiders look in the /catalog/ path for spiders.txt ??.

Peter..

Comment #15

Because they are two different files that server two different funcitons..

Robots.txt will keep spiders from indexing directories or files that you specify..

Spiders.txt is a part of the spider session killer that will keep defined spider user agents from having a session id in the url so that the session ids don't end up in the search engine index...

Comment #16

Hi Chris,.

Thanks for explaining that. Just so that I have digested it properly..

1. robots.txt is for spiders/bots and is not used by osC, as such (i.e. a browser user who is visiting your osC store/site will not use robots.txt ).

2. spiders.txt is only for osC internal controls, and.

Not.

Used by spiders/bots,etc directly, but is used by osC, (if you turn spider session id's of in the admin section), to indicate which spider user agents, to turn off the session id..

Hope I have it right ??.

Peter..

Comment #17

Hi again..

Since I first posted googlebot has visited me every day..

However it NEVER passes robots.txt.

Is this normal?.

Surely it is supposed to start looking in my iPage site by now..

I have some good titles, for example, if someone does a google for "koala computers" I come up number one..

But if they search for "Cheap computers" or "Unlimited ADSL" or $49 adsl etc nothing comes up for me..

I have that in title, meta tags, content & many pages..

Does any SEO know if I have missed something, will google fully index me?.

If you are a pro SEO then I am willing to pay you to get me ranked, right now I am using pay per click on google, very well many sales from it, however I dont want to pay for this..

PLEASE help me get indexed..

Email me on .au with SEO quote if you think you can get me ranked high...

Comment #18

Plus you would want to keep them out of your includes dir..

Disallow: /catalog/includes/..

Comment #19

If your iPage site comes up with "koala computers", then it already *has* been past your robots.txt..

The 'cheap computers' keyphrase you mention below is *extremly* compeditive, at over 5 million sites returned.. You can optimize until you are blue in the face, and you'll never even sniff the top 50 pages..

The Unlimited ADSL should be doable with the proper SEO. You'll notice that you also come up first with the search "Unlimited ADSL koala".

So, the problem isn't necessarily that the spiders havn't indexed you, it's that you havent optimized the stie for the key phrases you want, or the key phrases you want are too compeditive to have a realistic shot at getting anywhere near the top..

Needless to say, in either event, you're in the wrong forum...

Comment #20

Hi,.

I'm somewhat mystified by the web server logs, as the spider/bot called 'msnbot' is showing up all the session ID's ???.

Here is the setup:.

1. Admin | Sessions | Prevent Spider Sessions | True.

2. Contents of.

/catalog/robots.txt.

User-agent: *.

Disallow: /images/.

Disallow: /includes/.

3. Contents of.

/catalog/includes/spiders.txt.

$Id: spiders.txt,v 1.2 2003/05/05 17:58:17 dgw_ Exp $.

Almaden.ibm.com.

Appie 1.1.

Architext.

Ask jeeves.

Asterias2.0.

Augurfind.

Baiduspider.

Bannana_bot.

Bdcindexer.

Crawler.

/msnbot.htm)".

I noticed the file format for robots.txt and spiders.txt is Unix (LF only), is that correct ??.

Surely I wouldn't have to add:.

Msnbot/0.11.

To the spiders.txt file, that is, have to continually monitor the web logs and see if there are new version numers out ??.

Would the exlusion of the '/includes/' path in robots.txt have anything to do with it ? Hmmm, I don't see how, it is only for osC useage, the file spiders.txt isn't used by bots, in fact a traverse of /icludes/ is not possible..

Other things that may help solve the mystery ............

/catalog/.htccess.

# $Id: .htaccess,v 1.3 2003/06/12 10:53:20 hpdl Exp $.

#.

# This is used with Apache WebServers.

#.

# For this to work, you must include the parameter 'Options' to.

# the AllowOverride configuration.

#.

# Example:.

#.

# <Directory "/usr/local/apache/htdocs">.

#   AllowOverride Options.

# </Directory>.

#.

# 'All' with also work. (This configuration is in the.

# apache/conf/httpd.conf file).

# The following makes adjustments to the SSL protocol for Internet.

# Explorer browsers.

<IfModule mod_setenvif.c>.

 <IfDefine SSL>.

   SetEnvIf User-Agent ".*MSIE.*" \.

            nokeepalive ssl-unclean-shutdown \.

            downgrade-1.0 force-response-1.0.

 </IfDefine>.

</IfModule>.

# Fix certain PHP values.

# (commented out by default to prevent errors occuring on certain.

# servers).

#<IfModule mod_php4.c>.

#  php_value session.use_trans_sid 0.

#  php_value register_globals 1.

#</IfModule>.

/catalog/includes/.htccess.

# $Id: .htaccess,v 1.4 2001/04/22 20:30:03 dwatkins Exp $.

#.

# This is used with Apache WebServers.

# The following blocks direct HTTP requests in this directory recursively.

#.

# For this to work, you must include the parameter 'Limit' to the AllowOverride configuration.

#.

# Example:.

#.

#<Directory "/usr/local/apache/htdocs">.

#  AllowOverride Limit.

#.

# 'All' with also work. (This configuration is in your apache/conf/httpd.conf file).

#.

# This does not affect PHP include/require functions.

#.

# Example: http://server/catalog/includes/application_top.php will not work.

<Files *.php>.

Order Deny,Allow.

Deny from all.

</Files>.

Did a phpinfo(), and ...............

Session.use.trans.sid is.

On.

Register_globals is.

On.

Thanks,.

Peter..

Comment #21

Hi,.

Maybe I have to add any spiders/bots with the version numbers ? I have noticed for nearly a month now, all 'Googlebot' does is.

64.68.82.172 - - [01/Mar/2004:03:59:37 -0500] "GET /robots.txt HTTP/1.0" 200 54 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)".

64.68.82.172 - - [01/Mar/2004:03:59:39 -0500] "GET / HTTP/1.0" 200 24469 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)".

64.68.82.58 - - [01/Mar/2004:04:10:23 -0500] "GET /robots.txt HTTP/1.0" 200 54 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)".

64.68.82.58 - - [01/Mar/2004:04:10:24 -0500] "GET / HTTP/1.0" 200 24487 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)".

... and the entry for 'Googlebot' in.

/includes/spiders.txt.

Is ..........

Googlebot.

.. so wouldn't this seem to suggest that the entry for 'Googlebot" should be:.

Googlebot/2.1.

And then it would spider the whole site, not just a page or two ??.

Thanks,.

Peter..

Comment #22

Hi,.

Should the PHP.INI setting "session.use.trans.sid" be.

On.

Or.

Off.

??.

From a phpinfo(), my setting is:.

Yet from the osC MS-2, the .htaccess in.

/catalog/.

Path indicates that it should be.

Off.

, as follows:.

#<IfModule mod_php4.c>.

#  php_value session.use_trans_sid 0.

#  php_value register_globals 1.

#</IfModule>.

From.

Http://www.php.net/manual/en/ref.session.p...n.use-trans-sid.

Pulling my hair out on this one !!.

Peter..

Comment #23

Hi Dave,.

When I saw what you said, I thought it was a good idea, so I went and added the above 'disallow' to 'robots.txt'..

However, now having had sometime to research more on spiders/bots,etc, I don't think it is such a good idea..

1. If you try.

Http://yourdomainname/includes.

... you should get a "403" message, .. forbidden, anyway..

2. By adding the "/includes/" path in "robots.txt", you are supplying any spider/bot with path/folder information, that they would not have otherwise been able to know about..

I think I'll take it out..

Peter..

Comment #24

Usually I put (one) robots.txt in the root dir. So if your store is in catalog then this 'Disallow: /catalog/includes/' would be right..

This to me says your store is in the root directory like so.

Http://mydomain.com/shopping_cart.php.

But all the files you mentioned said your store was in the catalog dir..

So the log should look like this.

GET /catalog/shopping_cart.php?osCsid....

And you would get your store like this.

Http://mydomain.com/...opping_cart.php.

Do you have two installed?..

Comment #25

They know about it because your language images are in there..

Http://www.mydomain.com/catalog/includes/l...images/icon.gif..

Comment #26

Hi,.

The point I'm _trying_ to make is the following in robots.txt.

Disallow: /catalog/includes/.

OR ............

Disallow: /includes/.

.. are.

NOT.

Needed, and is in fact, redundant, because .........

1. Any spider/bot or web user should not be able to get to the "/includes/" path anyway. (i.e. permissions).

2. Supplying the "/includes/" path will compromise your security, because normally (due to path permissions), no-one can 'see' that path. There is a significant amount of information in /includes/ that you wouldn't want anyone to see/view..

3.......so, why tell them it is there, it only makes a hackers job easier..

Peter..

Comment #27

That is all great but the fact is no hacker will 'never' bother to look at your robots.txt file..

This file is simply a no trespassing sign for spiders/bots..

Ill quote myself.

And restate it.....

Comment #28

And just to stay on topic here....

I think catalog/includes/spiders.txt should be renamed to spiders.php so that way it will follow the .htaccess rules...

Comment #29

Is it possible to prevent browser viewing of your robots.txt file, however still alllow bots to use the file?.

I moved /admin/ to /supersecretadmin/, then included.

Disallow: /supersecretadmin/.

In robots.txt ....

... because anyone can view robots.txt, this defeats my original purpose...

Comment #30

Hi,.

Yes, it is, you can do it with mod_rewite in the .htaces file, or do it in PHP code, but the problem is, some spiders are so well "disguised", it is often difficult to tell them apart from a browser user. So, you may also block a spider/bot, which you don't want. Webmaster world forums have some good info on user agents..

That is.

Exactly.

The piont I made I'm my previous reply, but "Dave" couldn't understand..

Yes, I agree, making a 'secret' path and then adding that path in robots.txt is, well, like telling the world where your admin folder is..

There is actually no need to make it hard for yourself for any 'secret' folders. Simply set the path/folder permissions, so that no-one can access them, and take indexing off, both in _that_ path (the secret one), and the path above it. That way, your iPage website path/file permissions, plus the 'noindex' setting in the .htaccess files, will stop anyone from every knowing what the name of your secret (admin) folder is..

Peter..

Comment #31


This question was taken from a support group/message board and re-posted here so others can learn from it.