Advanced Search Feature Essay Research Paper Advanced
Advanced Search Feature Essay, Research Paper
Advanced Search FeatureIntroduction to Search Engine Design Search engines are one of the primary ways that Internet users find web sites. That’s why a web site with a good search engine listing may see a dramatic increase in traffic. Everyone wants that good listing. Unfortunately, many web sites appear poorly in search engine rankings or may not be listed at all because they fail to consider how search engines work. Knowledge of “search engine design” can help many of these sites. Search engine design means ensuring that your web pages are accessible to search engines and focused in ways that help improve the chances they will be found. A Webmaster’s Guide to Search Engines provides tips, techniques and a good grounding in the basics of search engine design. By using this information where appropriate, you may tap into visitors who previously missed your site. The guide is not a primer on ways to spam or trick the search engines. In fact, there aren’t “search engine secrets” that will guarantee a top listing. But there are a number of small changes you can make that can sometimes produce big results.The Major Search Engines Which of the many search engines really matter? Usually, it’s the search engines that are well-known and well-used. This is true whether you are a webmaster or a searcher. For webmasters, a good listing in a search engine that promotes itself well, or has strong strategic alliances, is more likely to bring traffic than a lesser-known search engine. For example, a search engine listed on the Netscape Net Search page is guaranteed to receive much use. That translates into more traffic for sites that are ranked well by these search engines. For searchers, well-known, commercially-backed search engines generally mean more dependable results. These search engines are more likely to be well-maintained and upgraded when necessary, to keep pace with the growing web. Search Engines, Directories and Hybrids Before naming names, it’s important to explain the difference between search engines and directories. They are often confused. A more detailed explanation can also be found on the How Search Engines Work page, and site subscribers have access to even more detailed information about how some of the search engines listed here operate. Search Engines: Also called “spiders” or “crawlers,” search engines constantly visit web sites on the Internet in order to create catalogs of web pages. Because they run automatically and index so many web pages, search engines may often find information not listed in directories. Directories: Unlike search engines, directories are created by humans. Sites must be submitted, then they are assigned to an appropriate category or categories. Because of the human role, directories can often provide better results than search engines. Yahoo is an example of a directory. Hybrid Search Engines: To further confuse matters, some search engines also have an associated directory. These are sites that have been reviewed or rated. For the most part, these reviewed sites do not appear as the “default” when a query is made to a hybrid search engine. Instead, a user must consciously choose to see the reviews. The Major Players Below are the current major players in the search engine game. The Strategic Alliances page explains in detail the methods in determining which search engines are considered major players. Yahoo and some others, though not true search engines, are listed below because so many people use them. Some additional search services are also listed on the Other Search Services page. Search Engine Watch also maintains a list of metacrawlers, which let you search more than one search engine at a time, plus lists of specialty search engines, regional guides and other search services such as those designed for children. See the Guide To Search Engines section on the Search Engine Facts page for a complete listing. Does the world come to an end if your site can’t be found easily in any of these “major” search engines? Not necessarily. If you want to reach apple farmers, then getting a link to your site from an obscure apple-farming web site may bring in much more meaningful traffic than by being indexed by all the general search engines in the world. AltaVista http://www.altavista.com/ AltaVista is consistently one of the largest search engines on the web, in terms of pages indexed. Its comprehensive coverage and wide range of power searching commands makes it a particular favorite among researchers. It also offers a number of features designed to appeal to basic users, such as “Ask AltaVista” results, which come from Ask Jeeves (see below), and directory listings from LookSmart. AltaVista opened in December 1995. It was owned by Digital, and now is run by Compaq, which purchased Digital in 1998. See also: AltaVista Debuts Search Features The Search Engine Report, November 4, 1998 Ask Jeeves http://www.askjeeves.com/ Ask Jeeves is a human-powered search service that aims to direct you to the exact page that answers your question. If it fails to find a match within its own database, then it will provide matching web pages from various search engines. The service went into beta in mid-April 1997 and opened fully on June 1, 1997. Results from Ask Jeeves also appear within AltaVista. See also: Ask Jeeves: Asking Questions To Give You Answers The Search Engine Report, November 4, 1998 AOL NetFind http://www.aol.com/netfind/ AOL NetFind is a branded-version of the Excite search engine in the US and Canada. It has a different name and a different look, but it is basically Excite underneath. In Europe, Lycos provides the results in the same manner as Excite. AOL NetFind launched in March 1997. Direct Hit http://www.directhit.com/ Direct Hit is a company that works with other search engines to refine their results. It does this by monitoring what users click on from the results they see. Sites that get clicked on more than others rise higher in Direct Hit’s rankings. Thus, the service dubs itself a “popularity engine.” Direct Hit’s technology is currently best seen at HotBot. See also: Counting Clicks and Looking at Links The Search Engine Report, August 4, 1998 Excite http://www.excite.com/ Excite is one of the most popular search services on the web. It offers a medium-sized index and integrates non-web material such as company information and sports scores into its results, when appropriate. It also offers one of the best news search services available: Excite NewsTracker. Excite was launched in late 1995. It grew quickly in prominence and consumed two of its competitors, Magellan in July 1996, and WebCrawler in November 1996. These continue to run as separate services. Excite also “powers” the results that appear in AOL NetFind and Netscape Search. See also: Excite Enhances Search Results The Search Engine Report, June 3, 1998 Go http://beta.go.com/ Go is a portal site produced by Infoseek and Disney. It offers portal features such as personalization and free e-mail, plus the search capabilities of Infoseek. It launched in beta form in December 1998. It is not related to GoTo, below. See also: Go Arrives from Disney and Infoseek The Search Engine Report, Jan. 5, 1999 Google http://www.google.com/ Google is a search engine that makes heavy use of link popularity as a primary way to rank web sites. This can be especially helpful in finding good sites in response to general searches such as “cars” and “travel,” because users across the web have in essence voted for good sites by linking to them. See also: Counting Clicks and Looking at Links The Search Engine Report, August 4, 1998 GoTo http://www.goto.com/ GoTo is the only major search engine with sells listings. Companies can pay money to be placed higher in the search results, which GoTo feels improves relevancy. Learn more about this model via the article below. Non-paid results come from Inktomi. GoTo launched in launched in 1997 and incorporated the former University of Colorado-based World Wide Web Worm. In February 1998, it shifted to its current pay-for-placement model and soon after replaced the WWW Worm with Inktomi for its non-paid listings. GoTo is not related to Go, above. See also: GoTo Going Strong The Search Engine Report, July 1, 1998 GoTo Sells Positions The Search Engine Report, March 3, 1998 HotBot http://www.hotbot.com/ Like AltaVista, HotBot is another favorite among researchers due to its large index of the web and many power searching features. HotBot is powered by the Inktomi search engine, which is also used by other services. It has a partnership with LookSmart for its directory listings. HotBot launched in May 1996 as Wired Digital’s entry into the search engine market. Lycos purchased Wired Digital in October 1998 and continues to run HotBot as a separate search service. See also: HotBot Emphasizes The Human The Search Engine Report, October 5, 1998 Inktomi http://www.inktomi.com/ Originally, there was an Inktomi search engine at UC Berkeley. The creators then formed their own company with the same name and created a new Inktomi index, which was first used to power HotBot. Now the Inktomi index also powers several other services. All of them tap into the same index, though results may be slightly different. This is because Inktomi provides ways for its partners to use a common index yet distinguish themselves. There is no way to query the Inktomi index directly, as it is only made available through Inktomi’s partners with whatever filters and ranking tweaks they may apply. See also: Microsoft Unveils MSN Search The Search Engine Report, October 5, 1998 Infoseek http://www.infoseek.com/ Infoseek is one of the more popular search services on the web. It has a small-to-medium sized index, so it may not be the best place for those doing a comprehensive search of the web. However, it consistently provides quality results in response to many general and broad searches, thanks to its ESP search algorithm. It also has an impressive human-compiled directory of web sites. Infoseek is the main power behind the new Go portal site, which it produces in partnership with Disney. Infoseek launched in early 1995. See also: Infoseek Reorganizes Listings The Search Engine Report, October 5, 1998 LookSmart http://www.looksmart.com/ LookSmart is the closest rival Yahoo has, in terms of being a human-compiled directory of the web. In addition to being a stand-alone service, LookSmart provides directory results to both AltaVista and HotBot. AltaVista provides LookSmart with search results when a search fails to find a match from among LookSmart’s reviews. LookSmart launched independently in October 1996, was backed by Reader’s Digest for about a year, and then company executives bought back control in the service. See also: LookSmart Launches Local Search, Plans Directory Expansion The Search Engine Report, Jan. 5, 1999 LookSmart and Snap Challenge Yahoo The Search Engine Report, June 6, 1998 Lycos http://www.lycos.com/ Lycos is one of the more popular search services, despite having a small index that is more out-of-date than its competitors. While its search engine listings are weak, Lycos does feature an impressive directory of web sites called Lycos Community Guides. Sites are automatically listed in these guides using technology from WiseWire, a company Lycos acquired in early 1998. Lycos is one of the oldest of the major search engines, around since May 1994. It began as a project at Carnegie Mellon University. The name Lycos comes from the Latin for “wolf spider.” In Octobe
r 1998, Lycos acquired the competing HotBot search service, which continues to be run separately. See also: Lycos Buys Wired, Gets Facelift The Search Engine Report, November 4, 1998 MSN (Microsoft) http://www.msn.com/ Microsoft’s MSN service features both directory listings and search engine results, powered by Inktomi. Other search engines are also featured at the service. The service went live in October 1998 with its Inktomi results, although it had existed in various formats and under different names previously. See also: Microsoft Unveils MSN Search The Search Engine Report, October 5, 1998 Netscape (including Netscape Open Directory / NewHoo) http://www.netscape.com/ Like AOL NetFind, Netscape Search is a branded version of the Excite search engine. Netscape also runs the Netscape Open Directory, formerly known as NewHoo. This directory depends on volunteer editors to categorize web sites. Netscape also features an option to search with other search engines from its site. Netscape relaunced itself as a portal site with search offerings in Spring 1998. Now owned by AOL, the Netscape site is expected to continue operating as a separate service. See also: NewHoo Becomes Netscape Open Directory The Search Engine Report, December 3, 1998 Northern Light http://www.northernlight.com/ or http://www.nlsearch.com/ Northern Light is another favorite search engine among researchers. It features a large index, along with the ability to cluster documents by topic. Northern Light also has a set of “special collection” documents that are not readily accessible to search engine spiders. There are documents from thousands of sources, including newswires, magazines and databases. Searching these documents is free, but there is a charge of up to $4 to view them. There is no charge to view documents on the public web — only for those within the special collection. Northern Light opened to general use in August 1997. See also: Northern Light Adds Search Functions, Freshens Index The Search Engine Report, August 4, 1998 Northern Light’s Custom Search Folders The Search Engine Report, September 3, 1997 Search.com http://www.search.com/ Search.com is a branded-version of the Infoseek search engine, operated by Cnet. It also offers specialty searches, where Infoseek technology is used to spider selected sites within particular categories. Search.com also provides links to a variety of specialty search services. Search.com was launched in March 1996 as an single interface to several search engines. The partnership with Infoseek began in May 1997. Since late 1997, Search.com has been eclipsed by Cnet’s new search offering, Snap (see below). Snap http://www.snap.com/ Snap is a human-compiled directory of web sites, supplemented by search results from Inktomi. Like LookSmart, it aims to challenge Yahoo as the champion of categorizing the web. Snap launched in late 1997 and is backed by Cnet and NBC. See also: LookSmart and Snap Challenge Yahoo The Search Engine Report, June 6, 1998 WebCrawler http://www.webcrawler.com/ WebCrawler has the smallest index of any major search engine on the web — think of it as Excite Lite. The small index means WebCrawler is not the place to go when seeking obscure or unusual material. However, some people may feel that by having indexed fewer pages, WebCrawler provides less overwhelming results in response to general searches. WebCrawler opened to the public on April 20, 1994. It was started as a research project at the University of Washington. America Online purchased it in March 1995 and was the online service’s preferred search engine until Nov. 1996. That was when Excite, a WebCrawler competitor, acquired the service. Excite continues to run WebCrawler as an independent search engine. Yahoo http://www.yahoo.com/ Yahoo is the web’s most popular search service and has a well-deserved reputation for helping people find information easily. The secret to Yahoo’s success is human beings. It is the largest human-compiled guide to the web, employing 80 or more editors in an effort to categorize the web. Yahoo has at least 1 million sites listed. Yahoo also supplements its results with those from Inktomi. If a search fails to find a match within Yahoo’s own listings, then matches from Inktomi are displayed. Inktomi matches also appear after all Yahoo matches have first been shown. Yahoo is the oldest major web site directory, having launched in late 1994.How Search Engines Work The term “search engine” is often used generically to describe both true search engines and directories. They are not the same. The difference is how listings are compiled. Search Engines Vs. Directories Search Engines: Search engines, such as HotBot, create their listings automatically. Search engines crawl the web, then people search through what they have found. If you change your web pages, search engines eventually find these changes, and that can affect how you are listed. Page titles, body copy and other elements all play a role. Directories: A directory such as Yahoo depends on humans for its listings. You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted. Changing your web pages has no effect on your listing. Things that are useful for improving a listing with a search engine have nothing to do with improving a listing in a directory. The only exception is that a good site, with good content, might be more likely to get reviewed than a poor site. Hybrid Search Engines: Some search engines maintain an associated directory. Being included in a search engine’s directory is usually a combination of luck and quality. Sometimes you can “submit” your site for review, but there is no guarantee that it will be included. Reviewers often keep an eye on sites submitted to announcement places, then choose to add those that look appealing. The Parts Of A Search Engine Search engines have three major elements. First is the spider, also called the crawler. The spider visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being “spidered” or “crawled.” The spider returns to the site on a regular basis, such as every month or two, to look for changes. Everything the spider finds goes into the second part of a search engine, the index. The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated new information. Sometimes it can take a while for new pages or changes that the spider finds to be added to the index. Thus, a web page may have been “spidered” but not yet “indexed.” Until it is indexed — added to the index — it is not available to those searching with the search engine. Search engine software is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant. You can learn more about how search engine software ranks web pages on the aptly-named How Search Engines Rank Web Pages page. Major Search Engines: The Same, But Different All search engines have the basic parts described above, but there are differences in how these parts are tuned. That is why the same search on different search engines often produces different results. Some of the significant differences between the major search engines are described in two main areas of this guide: Search Engine Features Page: Information on this page has been drawn from the help pages of each search engine, along with knowledge gained from articles, reviews, books, independent research, tips from others and additional information received from directly from the various search engines. Subscribers-Only Area: More in-depth information is available to site subscribers, including: How Alta Vista Works How AOL NetFind Works How Excite Works How HotBot Works How Infoseek Works How Lycos Works How Search.com Works How WebCrawler Works How Yahoo Works How Other Search Engines WorkHow Search Engines Rank Web Pages Search for anything using your favorite search engine. Nearly instantly, the search engine will sort through the millions of pages it knows about and present you with ones that match your topic. The matches will even be ranked, so that the most relevant ones come first. Of course, the search engines don’t always get it right. Non-relevant pages make it through, and sometimes it may take a little more digging to find what you are looking for. But by and large, search engines do an amazing job. As WebCrawler founder Brian Pinkerton puts it, “Imagine walking up to a librarian and saying, ‘travel.’ They’re going to look at you with a blank face.” Unlike a librarian, search engines don’t have the ability to ask a few questions to focus the search. They also can’t rely on judgment and past experience to rank web pages, in the way humans can. Intelligent agents are moving in this direction, but there’s a long way to go. So how do search engines go about determining relevancy? They follow a set of rules, with the main rules involving the location and frequency of keywords on a web page. Call it the location/frequency method, for short. Location, Location, Location…and Frequency Remember the librarian mentioned above? They need to find books to match your request of “travel,” so it makes sense that they first look at books with travel in the title. Search engines operate the same way. Pages with keywords appearing in the title are assumed to be more relevant than others to the topic. Search engines will also check to see if the keywords appear near the top of a web page, such as in the headline or in the first few paragraphs of text. They assume that any page relevant to the topic will mention those words right from the beginning. Frequency is the other major factor in how search engines determine relevancy. A search engine will analyze how often keywords appear in relation to other words in a web page. Those with a higher frequency are often deemed more relevant than other web pages. Spice In The Recipe Now its time to qualify the location/frequency method described above. All the major search engines follow it to some degree, in the same way cooks may follow a standard chili recipe. But cooks like to add their own secret ingredients. In the same way, search engines add spice to the location/frequency method. Nobody does it exactly the same, which is one reason why the same search on different search engines produces different results. To begin with, some search engines index more web pages than others. Some search engines also index web pages more often than others. The result is that no search engine has the exact same collection of web pages to search through. Search engines may also give web pages a “boost” for certain reasons. For example, Excite uses link popularity as part of its ranking method. It can tell which of the pages in its index have a lot of links pointing at them. These pages are given a slight boost during ranking, since a page with many links to it is probably well-regarded on the Internet. Some hybrid search engines, those with associated directories, may give a relevancy boost to sites they’ve reviewed. The logic is that if the site was good enough to earn a review, chances are it’s more relevant than an unreviewed site. Meta tags are what many web designers mistakenly assume are the “secret” to propelling their web pages to the top of the