TCF389: Searchin'
Two ways to search the Web/Internet--
though they blur together
-
Indexes
-
To the entire Web or one single site.
-
Subject Guides
-
Indexes
-
Individual sites may catalog the pages on them
-
Or indexes may seek information throughout the Web and suck it into a huge
database.
-
E.g., Alta Vista,
Infoseek,
Lycos,
Webcrawler,
OpenText
-
How do they work?
-
A spider or Web crawler robot (a piece of software)
connects to a site, follows links through it and collects text from the pages
on that site.
-
Before crawling through a site, it checks for a text file called
robots.txt--which can specify parts of the site that should not be
indexed.
-
The
Robots Exclusion Page explains how to control Web crawling robots.
-
Then that information is compiled into a large index which the user
may search
-
Though the amount of information retrieved can be overwhelming
-
Meta search engines
-
E.g., Dogpile,
Metacrawler
-
New way of searching indexes
-
You enter a word to search for in one place and it searches several indexes
simultaneously
-
And then collates the results
-
How can you, as Webmaster, can control the summary of your page that is
listed in an index?
-
Be sure to provide a <TITLE> tag
-
Use <META> tags to control the site's description and
keywords
-
HTML tags "hidden" in the <HEAD> tags that can contain additional
information about your page.
-
Infoseek and others default to the first 200 characters from your Web page
for the description, but you can change that with <META> tags.
-
<meta name="description" content="Write your description here">
-
Substitute your description for the italicized phrase.
-
Similarly, there's a <META> tag for keywords.
-
<meta name="keywords" content="Write your keywords here">
-
Substitute keywords (words associated with your site that users might search
for) for the italicized portion.
-
Spam Penalty
-
All major search engines penalize sites that attempt to "spam" the engines
in order to improve their position. One common technique is "stacking" or
"stuffing" words on a page.
-
E.g., "sex" repeated 100 times.
-
If the search engines spot a spamming technique, they may downgrade a page's
ranking or exclude it from listings altogether.
-
Example from the TCF389 homepage:
-
<HTML>
-
<HEAD>
-
<META name="keywords" content="new media, Internet, Telecommunication
& Film Department, University of Alabama">
-
<META name="description" content="Syllabus and related materials for TCF
389 New Media: Theory and Practice.">
-
<TITLE>TCF 389 New Media: Theory and Practice</TITLE>
-
</HEAD>
-
Meta Medic, an
online service, will check your <META> tags for you.
-
Try checking your personal page with this and then the TCF389 homepage.
-
The same site also provides a Search
Engine Tutorial for Web developers.
-
There's also a <META> tag for excluding robots--though not many obey
this:
-
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
-
Subject Catalogs or Guides
-
E.g., Yahoo!,
Excite,
Looksmart,
Infoseek (some do both)
-
Information on sites is submitted or solicited and then someone categorizes
it
-
E.g., on Yahoo! you'll find the following category:
-
Regional: U.S. States: Alabama: Cities: Tuscaloosa: Education: Colleges and
Universities: University of Alabama
-
Each word after a colon is a category--becoming more and more specific
-
Thus you may either browse through Yahoo!'s categories, or
you may search it like you would search an index.
-
If Yahoo! does not find a category to match what you're looking for, it'll
automatically send your search over to Alta Vista
-
And it'll give you the option of using other indexes.
-
Which is why I usually start a search there.
-
How can you, as Webmaster, get your sites listed on subject guides?
-
Submit them one-by-one to Yahoo! and the rest
-
Use a service that does several at once
-
Broadcaster
-
Free; does over 200 guides and other services; but slow.
-
Register-It
-
Does 15 sites for free, then begins charging you.
-
Use one of these to register your site.
All of these search services are available through Netscape Navigator's
Directory Menu and
Internet Search selection.
Select Internet Search and Netscape's home site will provide an interface
through which you can reach Excite, Infoseek, Lycos, Yahoo and a few others.
-
But you do not have to rely on Netscape's home site (which can be
down or busy).
-
You may go directly to these services.
-
Netscape rotates through these services--which pay Netscape to list them
on its Internet Search page.
Search Logic
-
Regardless of whether you're looking at a Index or a Guide, the way your
compose your search or query will greatly affect how successful you
are.
-
The challenge is to find the specific information you want and not be overloaded
with useless data.
-
Standard sentences do not work on most search engines
-
A search engine is a piece of software that prompts you for words to look
for (a query) and then goes hunting for you.
-
Nope: "I am looking for information about great danes."
-
Instead, you enter words (aka, keywords) related to your search.
-
Yep: "dogs, great danes"
-
-
Most search engines allow you to combine those words by using certain
Boolean operators
-
E.g., AND, OR, NOT, NEAR
-
Enabling you to narrow your search
Boolean logic is the term used to describe certain logical operations
that are used to combine search terms in many databases. The basic Boolean
operators are represented by the words AND, OR and NOT.
Variations on these operators, sometimes called proximity
operators, that are supported by some search engines include
ADJACENT, NEAR and FOLLOWED BY. Whether or not a search
engine supports Boolean logic, and the way in which it implements it, is
another important consideration when selecting a search tool. The following
diagrams illustrate the basic Boolean operations.
-
Search Engine Watch, a site devoted
to all manner of cool search advice, explains the specifics of how particular
search engines implement Boolean logic:
-
Sample Searches
Search for TCF389's home page, which has been up since early July 1997--using
the search term "TCF389". No hits on Excite, Webcrawler.
Infoseek
TCF389: Concepts in New Media
Syllabus and related materials for TCF 389 Concepts in New Media,Media Arts
Dept., University of Arizona.
76% http://www.arts.arizona.edu/mar389/ (Size 4.5K)
Alta Vista
TCF389: Concepts in New Media - Syllabus and related materials for TCF 389
Concepts in New Media,Media Arts Dept., University of Arizona.
--http://www.arts.arizona.edu/mar389/
HotBot
1. TCF389: Concepts in New Media
99% Syllabus and related materials for TCF 389 Concepts in New Media,Media
Arts Dept., University of Arizona.
http://www.arts.arizona.edu/mar389/, 4602 bytes, 21Jul97
MetaCrawler
Search Exercise
-
Ego Surf!
-
Look for your own name on one of the major indexes. Often entertaining!
-
Find your favorite band's sites.
-
See if you hometown has a city Web site.
-
Find the current enrollment of your old high school. Does it have a Web site?
-
Try out the MCI Great American Net Test
and test your searching skills. Beat the clock!
-
Want to look over a stranger's shoulder while he/she searches?
Using the Web to search for non-Web info
-
Looking for people's phone numbers, addresses, or e-mail accounts
-
Try ego-surfing on:
-
Four11
-
WhoWhere?
-
BigFoot
Last revised: June 3, 1998