A Wired.com user account lets you create, edit and comment on Webmonkey articles. You will also be able to contribute to the Wired How-To Wiki and comment on news stories at Wired.com.
It's fast and free.
processing...Retrieve Sign In
Please enter your e-mail address or username below. Your username and password will be sent to the e-mail address you provided us.
processing...Welcome to Webmonkey
- edit articles
- add to the code library
- design and write a tutorial
- comment on any Webmonkey article
Sign In Information Sent
Make a Sitemap
/skill level/
/viewed/
(�Creating a Sitemap) |
(�Creating a Sitemap) |
||
| (3 intermediate revisions not shown.) | |||
| Line 1: | Line 1: | ||
| - | Ever wonder why Google and all of the other search engines are ignoring portions of your website? It could be that the big search engines just don't like you. But simpler (and more likely) answer is that they don't know where all your pages are. | + | Ever wonder why Google and all of the other search engines are ignoring portions of your website? It could be that the big search engines just don't like you. But the simpler (and more likely) answer is that they don't know where all your pages are. |
If search engines can't find your site's pages, then there's no way for those pages to be indexed, which means you miss out on all that awesome, money-earning traffic. That's no good. | If search engines can't find your site's pages, then there's no way for those pages to be indexed, which means you miss out on all that awesome, money-earning traffic. That's no good. | ||
| Line 97: | Line 97: | ||
* '''Drupal''': Like Django, Drupal ships with a sitemap tool. Head over to the [http://drupal.org/project/xmlsitemap official documentation] for more details. | * '''Drupal''': Like Django, Drupal ships with a sitemap tool. Head over to the [http://drupal.org/project/xmlsitemap official documentation] for more details. | ||
| + | |||
| + | * '''Expression Web Extras''': This macro integrates with Microsoft Expression Web to create local or server-based sitemaps. For details visit [http://www.expressionextras.com/toolbar/sitemap_builder.htm Expression Extras Toolbar]. | ||
| + | |||
| + | <br /> | ||
| + | ''Know any other sitemap creation tools we should be pointing to? Log in and add them to the list!'' | ||
==Conclusion== | ==Conclusion== | ||
Sitemaps aren't particularly difficult to use, and they can work wonders for search engine visibility and ranking. They're no substitute for quality content and inbound links, but if Google and rest of the search players currently see your site as a black hole on the web, offering up a sitemap is the best way to make friends with search engine spiders. | Sitemaps aren't particularly difficult to use, and they can work wonders for search engine visibility and ranking. They're no substitute for quality content and inbound links, but if Google and rest of the search players currently see your site as a black hole on the web, offering up a sitemap is the best way to make friends with search engine spiders. | ||
Current revision
Ever wonder why Google and all of the other search engines are ignoring portions of your website? It could be that the big search engines just don't like you. But the simpler (and more likely) answer is that they don't know where all your pages are.
If search engines can't find your site's pages, then there's no way for those pages to be indexed, which means you miss out on all that awesome, money-earning traffic. That's no good.
So how can you explicitly tell a search engine where you pages are? The answer is to use a sitemap.
This article is part of a wiki. Got extra tips, advice or links to pass on about sitemaps? Log in and contribute to share your knowledge.
Contents |
What is a Sitemap?
A sitemap is essentially a table of contents for your website. But unlike the list of pages you might offer visitors looking for a quick way to navigate your site, the sitemaps we're talking about here are not designed for human viewing. Instead, a sitemap file serves the same information in a format that search engine spiders -- the automated machines that "crawl" the web and catalog its contents -- can easily understand. Sitemaps will go a long way to improving the searchability of any website. No self-respecting site owner should be without one.
A sitemap is a simple XML file named, fittingly, sitemap.xml. It gives the location, last-modified date and some other metadata for every page in your site.
When a search engine bot comes to your site and finds a sitemap, it will follow all the specified URLs, indexing the content and including whatever metadata and other goodies your sitemap instructs it to pay attention to.
The Sitemap Protocol
The sitemap protocol is pretty simple. The basic format looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://webmonkey.com/</loc>
<lastmod>2008-10-13T04:20:36Z</lastmod>
<changefreq>always</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>http://webmonkey.com/new-post/</loc>
<lastmod>2008-10-13T20:20:36Z</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
</urlset>
As you can see, we start with a basic XML declaration. Make sure you specify the UTF-8 encoding, as Google requires that a sitemaps be UTF-8 encoded or it will be ignored. The next line opens our urlset tag which is the container tag that will hold all our URLs.
Note that we're pointing to the schema defined on sitemaps.org. As of this writing, version 0.9 is latest official schema.
The next tag is the url tag which is just a container for all the bits of information we can tell the search engines about for each page on our site. Those options are:
- loc (required) -- the URL of the page. This URL must begin with the protocol (generally http) and end with a trailing slash, if your web server requires it.
- lastmod (optional) -- the date of last time you modified the page. Should be in W3C Datetime format, but you can omit the time portion.
- changefreq (optional) -- how often the page is likely to change. Ostensibly this helps search engines figure out how often to crawl the page. But just because you put "hourly," don't expect the Google bot to stop by that often. The possible values are: always, hourly, daily, weekly, monthly, yearly and never. Note that you'll probably only want to use "never" for permalink archive pages.
- priority (optional) -- the priority of this URL relative to other URLs on your site. In other words, how important is this particular URL in the grand scheme of your site? Possible values range from 0.0 - 1.0. If you don't specify a priority, the URL will receive a default value of 0.5.
Only the loc node is actually required. As we'll see below, most out-of-the-box sitemap creators (true, only the hardcore build these by hand) make it easy to give out more info than just the URL.
Pointing to Multiple Sitemaps
By default, search engine bots will expect your sitemap to live at http://mysite.com/sitemap.xml -- the root level of the site.
Of course that doesn't mean you can't have a simple pointer file at the root level and then the actual sitemaps file somewhere else. In fact, your sitemap.xml file cannot exceed 10 megabytes in size and should have no more than 50,000 URLs per file. If you've got a very large site, you'll need to use a pointer and several separate sitemap.xml files.
To do that create a root sitemap.xml file with content like this:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://mysite/sitemap1.xml</loc>
<lastmod>2008-10-13T18:23:17+00:00</lastmod>
</sitemap>
<sitemap>
<loc>http://mysite.com/sitemap2.xml</loc>
<lastmod>2005-01-01</lastmod>
</sitemap>
</sitemapindex>
Then at the URLs sitemap1.xml and sitemap2.xml you'd define the different parts of your sitemap using the same scheme we saw above.
Creating a Sitemap
OK, now that you know what a sitemap file is, how do you go about creating one?
The thing about sitemaps is that they need to be dynamic, that is, whenever you add a new post or URL to your site, you need to update the sitemap. For small sites, hand coding might be an option, but even the simplest of sites gets pretty complex pretty quickly.
Fortunately, there are some tools that can make the task easier. For instance, you can use the Google Sitemap Generator, which is a Python script that creates a sitemap for you. As a bonus, compatibility with Google's requirements is pretty guaranteed. The sitemap generator even comes with instructions on how to set up a cron job so that your sitemap stays up to date.
But even using cron isn't ideal in most cases -- especially if you have a site that adds dozens of new pages everyday. Luckily, most of the major publishing systems and web frameworks offer ways to create dynamically updated sitemaps. Here are a few links to get your started:
- Movable Type: Movable Type allows you create as many templates as you'd like, so just create a new sitemaps template and make sure it gets served at the URL: http://mysite.com. To help you get started, check out Niall Kennedy's somewhat dated, but still helpful, tutorial on Sitemaps in Movable Type. Also check out the Movable Type wiki which has some more sitemap examples.
- WordPress: To generate sitemaps in WordPress, just install the Google XML Sitemaps plugin. It will handle all the dirty work, automatically updating your sitemap every time you edit or create a post.
- Django: The Django web development framework ships with a built-in sitemap generator. For more details read through the official documentation.
- Drupal: Like Django, Drupal ships with a sitemap tool. Head over to the official documentation for more details.
- Expression Web Extras: This macro integrates with Microsoft Expression Web to create local or server-based sitemaps. For details visit Expression Extras Toolbar.
Know any other sitemap creation tools we should be pointing to? Log in and add them to the list!
Conclusion
Sitemaps aren't particularly difficult to use, and they can work wonders for search engine visibility and ranking. They're no substitute for quality content and inbound links, but if Google and rest of the search players currently see your site as a black hole on the web, offering up a sitemap is the best way to make friends with search engine spiders.
- This page was last modified 15:48, 16 October 2008.
/related_articles/
Special Offer For Webmonkey Users
WIRED magazine:
The first word on how technology is changing our world.
