The IWDN

Advanced Search





You are not logged in.

IWDN Forum Index > Extras > Articles > Search Engines >> Do Search Engines Index That?

Reply
 
Thread Tools
Old February 27th, 2007, 14:26   #1
the_pm's Avatar
the_pm
Vision - Action = Bovine Excrement
 
Join Date: Oct 2004
Posts: 10,252
Do Search Engines Index That?

Approximately four months ago, a small group of Web developers representing the International Web Developers Network (IWDN) devised an interesting search engine experiment. The question had been raised whether the text within a link's title attribute gets picked up by search engines or not. None of us knew the answer, and before long, we discovered we didn't know the answer to a lot of questions regarding how search engines index text in HTML.

The purpose of this experiment was to see which methods of presenting information on a Web site would get picked up by search engines and which methods would be ignored. We focused initially on three major search engines, Google, MSN and Yahoo!, in an effort to keep the experiment as tidy as possible. As we were conducting our experiment, a new IWDN member introduced us to Agent 55, a handy tool that, amongst other things, shows search results for up to 10 predetermined search engines at a time. This allowed us to expand our test group with ease, and we took full advantage of this. We would have been satisfied staying with the "Big Three," but having more search engines available to test at the same time made expanding our results convenient.

Designing the experiment

We created an HTML document with a list of test terms served up in a variety of ways. Some were displayed using different JavaScript techniques. Some were incorporated into title, name and rel attributes. Some used character encoding techniques. Identical test pages were deployed on a handful of sites, all linking to each other, and all with inbound links external to the testing network. Measuring the results of this experiment was simple. Whichever terms showed up in search engines indicated the method used to deliver that term was one search engines understood.

View a copy of the actual search engine test page here.

The following chart shows the 14 testing terms we tried and results returned to us from 10 different search engines:


View HTML version of this chart

1. Saw URLs and assumed they were real. URLs are listed as pages in MSN as of publication of this document.
2. Found the term on another search engine as well as on a test page.


Each of the terms within that chart were presented to search engines using a different method or a different context. Here is the key we followed for the presentation of each term:
  1. Text inserted as innerHTML.
  2. Text inserted using document.write
  3. Title text for a generic element.
  4. URL text for a link with no linking text.
  5. Title text for a link with no linking text.
  6. Rel text for a link with no linking text.
  7. URL text for a link with linking text present ("and").
  8. Title text for a link with linking text present ("and").
  9. Rel text for a link with linking text present ("and").
  10. A div hidden using visibility:hidden.
  11. A div hidden using display:none.
  12. A word written using HTML decimal entities ("&123;").
  13. A word written using HTML hexidecimal entities ("a").
  14. The text within an anchor tag's name attribute.
  15. Control word. No special formatting or delivery.

What we learned

The first thing we learned is ExaLead wasn't interested in indexing any of our test pages. We may explore why this is the case another time, but for the purposes of this experiment, it is not important to do so. It is duly noted, and it may be assumed future references to the testing group in this article do not include ExaLead.

The second thing we learned is most major search engines seem to follow a similar indexing scheme. There are only two anomalies in our results. MSN indexed made-up HTML document names, almost as if it expected those documents to exist somewhere. Search results actually pointed to the URLs where our fictitious pages would have existed if there were not fictitious. This is an interesting indexing technique, one we initially suspected was a bug, but may be on-purpose. It seems developers can create links to future pages and get them indexed in MSN before those pages are actually created! Perhaps this one way MSN attempts to index pages faster than its rivals, at the expense of accuracy in portraying the Internet landscape to a small degree.

The other anomaly involved serving up hexidecimal entities. Decimal entities were accepted by all search engines, but Yahoo! and WiseNut did not index hexidecimal entities. This leads one to wonder whether there are any exploitable scenarios here. It's possible a developer could "show" Yahoo! and WiseNut less content than seen by other search engines, increasing perceived keyword density without affecting how visitors digest a site's content, an interesting experiment for another day.

Otherwise, considering the number of testing scenarios we ran on the number of search engines listed, we've concluded indexing is a pretty standardized practice.

Third, this experiment appears to support the current understanding that search engines read a Web page at the code level, not the screen level. We used very direct JavaScript methods for feeding content onto the screen, which were ignored (we predicted this). Likewise, elements permanently obscured using two different CSS methods did not escape search engine indexing for any of the search engines involved, meaning the page style was ignored in favor of the page structure something search engines could only understand if they looked at the page at the code level.

What our results mean to developers

First, developers who stuff keywords into title attributes (i.e <div title="blah blah blah">) solely for the purpose of gaining a search engine boost may stop now. Search engines ignore this, and the title attribute is meant for human visitor interaction. It should be used for this purpose only. The same goes for the rel attribute. It should be used to denote a relationship between objects (its original purpose), not as a vehicle for keyword dissemination.

Second, if a developer has the need to obscure characters at the code level, it is safer to use decimal rather than hexidecimal entities.

Third, there is no need to use keyword-rich names when filling in the name attribute (<a name="blah">) for the sake of search engines. It is best to use names that makes sense from a developer's perspective, since there's no need to seek an indexing/ranking boost here.

Knowing how markup is indexed by search engines gives developers even more incentive than they already have to keep their markup clean and free of keyword spam. Please note: this experiment does not measure the effectiveness of words indexed - how much they affect a site's rank when presented in certain ways. It only demonstrates whether they are recognized or not. Also, we do not condone pursuing the exploitable scenarios presented above. There may be unknown repercussions within search engines for doing so, and human visitor may be aversely affected as well.

(Additional scenarios are being developed for future testing. If you would like to see a particular test take place within a structured environment, please make your request within the IWDN online community.)

Paul Hirsch

**********

Paul Hirsch is one of the site administrators for IWDN and a partner/owner of Equentity, LLC.
the_pm is offline   Reply With Quote
Old February 27th, 2007, 17:46   #2
Mike Dammann's Avatar
Mike Dammann
IWDN Core Team
 
Join Date: Feb 2005
Location: Miami and Costa Rica
Posts: 223
Send a message via AIM to Mike Dammann Send a message via MSN to Mike Dammann Send a message via Yahoo to Mike Dammann
This is so true

Quote:
Third, there is no need to use keyword-rich names when filling in the name attribute (<a name="blah">) for the sake of search engines. It is best to use names that makes sense from a developer's perspective, since there's no need to seek an indexing/ranking boost here.
Canīt be said often enough. This is one of the best search engine experiments ever. The more you check out the results, the more you will realize some of the time people are wasting when trying too hard to get their websites indexed and ranked as fast as possible.
__________________
Real Estate - Costa Rica - Marketing
Mike Dammann is offline   Reply With Quote
Old February 27th, 2007, 21:52   #3
the_pm's Avatar
the_pm
Vision - Action = Bovine Excrement
 
Join Date: Oct 2004
Posts: 10,252
Thanks to everyone who Dugg this article
__________________
"It's not that a plateau is a bad place to be. It's just that sometimes, it gets a little too comfortable."

- Paul Hirsch, IWDN Administrator


THE RESULTS ARE IN! Read about the Great Search Engine Experiment.
the_pm is offline   Reply With Quote
Old February 27th, 2007, 22:58   #4
chaos's Avatar
chaos
Web Guru
 
Join Date: Jan 2006
Location: Arizona
Posts: 1,230
Yay! I have been waiting for so long to see the final outcome of this experiment. The information was very well presented and documented. Excellent Job Paul, I'll keep this in mind for all my future work.
__________________
W3Wizardry.com
Be there, or be square!
chaos is offline   Reply With Quote
Old February 28th, 2007, 13:17   #5
crazyfish's Avatar
crazyfish
fish frozen in water
 
Join Date: Jan 2005
Location: Ottawa, Canada
Posts: 1,691
Send a message via MSN to crazyfish
Very nice, I too have been waiting for these results.

Maybe I missed it but what does the Y1 and Y2 mean?
__________________
WittyFish - Witty Jokes
crazyfish is offline   Reply With Quote
Old February 28th, 2007, 13:52   #6
Jamie's Avatar
Jamie
IWDN Admin
 
Join Date: Oct 2004
Location: West Yorkshire, UK
Posts: 6,505
Quote:
Quoting crazyfish:
Maybe I missed it but what does the Y1 and Y2 mean?
Quote:
Quoting the_pm:
1. Saw URLs and assumed they were real. URLs are listed as pages in MSN as of publication of this document.
2. Found the term on another search engine as well as on a test page.
__________________
JamieHarrop.com - Young Entrepreneur... the life as

Subsribe to my RSS feed
Jamie is offline   Reply With Quote
Old February 28th, 2007, 14:44   #7
crazyfish's Avatar
crazyfish
fish frozen in water
 
Join Date: Jan 2005
Location: Ottawa, Canada
Posts: 1,691
Send a message via MSN to crazyfish
Ok so I am blind as a bat.
__________________
WittyFish - Witty Jokes
crazyfish is offline   Reply With Quote
Old July 3rd, 2007, 22:18   #8
absolethe's Avatar
absolethe
Aspiring Professional
 
Join Date: Jan 2005
Location: Tallahassee, FL
Posts: 168
Send a message via AIM to absolethe
I keep wondering... what about the alt attribute?
absolethe is offline   Reply With Quote
Old July 4th, 2007, 15:24   #9
the_pm's Avatar
the_pm
Vision - Action = Bovine Excrement
 
Join Date: Oct 2004
Posts: 10,252
Quote:
Quoting absolethe: View Post
I keep wondering... what about the alt attribute?
I think it's been documented in the past that the alt attribute is indexed and does impact rankings, at least within image searches, and I believe in regular Web searches as well
__________________
"It's not that a plateau is a bad place to be. It's just that sometimes, it gets a little too comfortable."

- Paul Hirsch, IWDN Administrator


THE RESULTS ARE IN! Read about the Great Search Engine Experiment.
the_pm is offline   Reply With Quote
Old July 20th, 2007, 17:11   #10
absolethe's Avatar
absolethe
Aspiring Professional
 
Join Date: Jan 2005
Location: Tallahassee, FL
Posts: 168
Send a message via AIM to absolethe
That's what I thought, but I've had this habit (if using <a href=""><img src="" alt="" /></a> as a button) of thinking of the alt text as being roughly equivalent to a "plain text link".

Yet some things I've been seeing lately in my own site's indexing leads me to feel that alt text wasn't as reliable as I'd previously thought. Not just in the above example, either. Maybe Google just doesn't like me. (I say Google because Yahoo seems to be treating my site a little differently.)
absolethe is offline   Reply With Quote
Old July 20th, 2007, 19:44   #11
Mike Dammann's Avatar
Mike Dammann
IWDN Core Team
 
Join Date: Feb 2005
Location: Miami and Costa Rica
Posts: 223
Send a message via AIM to Mike Dammann Send a message via MSN to Mike Dammann Send a message via Yahoo to Mike Dammann
test above the images and below the image is very important as well. Google applies more common sense now. The alt tag is really what a webmaster wants Google to believe is on the image and the link connected. In reality the best implicator that an image and its link contains something relevant to a certain keyterm is what is around it.
When you are searching for a picture of someone, where is the name normally?
Above the pic

"Picture of Mike Dammann:"

or below

"Mike Dammann, John Smith, Chris Hogan at the SES Convention".

I have tested that numerous times and the alt tag is no longer as effective as what is about and below the image or button.
__________________
Real Estate - Costa Rica - Marketing
Mike Dammann is offline   Reply With Quote
Old July 20th, 2007, 23:00   #12
inimino
IWDN Core Team
 
Join Date: Oct 2004
Location: Colorado, US
Posts: 1,174
Send a message via ICQ to inimino Send a message via AIM to inimino
Of course, the alt attribute should be authored correctly for accessibility, not SEO.
inimino is offline   Reply With Quote
Old October 16th, 2007, 20:23   #13
visio
Aspiring Enthusiast
 
Join Date: Oct 2007
Posts: 10
nice experiment. for how long did it run? how was the site submitted?

Thanks,
Paul
__________________
Cheap Web Hosting
FFmpeg Hosting - Youtube clones, PHPmotion, Video Sharing sites

Last edited by visio : October 16th, 2007 at 20:34.
visio is offline   Reply With Quote
Old October 17th, 2007, 14:52   #14
the_pm's Avatar
the_pm
Vision - Action = Bovine Excrement
 
Join Date: Oct 2004
Posts: 10,252
The experiment ran for roughly four months, and the pages were not submitted to search engines, rather they were allowed to be indexed organically through inbound links from a controlled group of participants.
__________________
"It's not that a plateau is a bad place to be. It's just that sometimes, it gets a little too comfortable."

- Paul Hirsch, IWDN Administrator


THE RESULTS ARE IN! Read about the Great Search Engine Experiment.
the_pm is offline   Reply With Quote
Old May 14th, 2009, 06:14   #15
BobAdlam
Newcomer
 
Join Date: May 2009
Posts: 0
Awesome post Lee! Thanx for the list.

Awesome post Lee!

Thanx for the list.

jobmaster

seomaster

BobAdlam is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump