5 Web Files That Will Improve Your Website

5 Web Files That Will Improve Your Website

The amount of code that developers encounter regularly is staggering. At any one time, a single site can make use of over five different web languages (i.e. MySQL, PHP, JavaScript, CSS, HTML).

There are a number of lesser-known and underused ways to enhance your site with a few simple but powerful files. This article aims to highlight five of these unsung heroes that can assist your site. They’re pretty easy to use and understand, and thus, can be great additions to the websites you deploy or currently run.

An Overview

Which files are we going to be examining (and producing)? Deciding which files to cover was certainly not an easy task for me, and there are many other files (such as .htaccess which we won’t cover) you can implement that can provide your website a boost.

The files I’ll talk about here were chosen for their usefulness as well as their ease of implementation. Maximum bang for our buck.

We’re going to cover robots.txt, favicon.ico, sitemap.xml, dublin.rdf and opensearch.xml. Their purposes range from helping search engines index your site accurately, to acting as usability and interoperability aids.

Let’s start with the most familiar one: robots.txt.


The primary function of a robots.txt file is to declare which parts of your site should be off-limits for crawling.

By definition, the use of this file acts as an opt-out process. If there are no robots.txt for a directory on your website, by default, it’s fair game for web robots such as search engine crawlers to access and index.

While you can state exclusion commands within an HTML document through the use of a meta tag (<meta name="robots" content="noindex" />), the benefits of controlling omitted pages through a single text file is the added ease of maintenance.

Note: It’s worth mentioning that obeying the robots.txt file isn’t mandatory, so it’s not a good privacy mechanism.

Robots.txtThis is how the robots.txt file interacts between a search engine and your website.

Creating a Robots.txt File

To create a robots.txt file the first and most obvious thing you will need is a text editor. It’s also worth pointing out that the file should be called robots.txt (or it won’t work) and it needs to exist within the root directory of your website because by default, that’s where web robots look for the file.

The next thing we need to do is figure out a list of instructions for the search engine spiders to follow. In many ways, the robot.txt’s structure is similar to CSS in that it is comprised of attribute and value pairs that dictate rules.

Another thing to note is that you can include comments inside your robots.txt file using the # (hash) character before it. This is handy for documenting your work.

Here’s a basic example telling web robots not to crawl the /members/ and /private/ directory:

 User-agent: *
 Disallow: /members/
 Disallow: /private/

The robots.txt exclusion standard only has two directives (there are also a few non-standard directives like Crawl-delay, which we’ll cover shortly).

The first standard directive is User-agent. Each robots.txt file should begin by declaring a User-agent value that explains which web robots (i.e. search crawlers) the file applies to.

Using * for the value of User-agent indicates that all web robots should follow the directives within the file;  * represents a wildcard match.

The Disallow directive points to the folders on your server that shouldn’t be accessed. The directive can point to a directory (i.e. /myprivatefolder/) or a particular file (i.e. /myfolder/folder1/myprivatefile.html).

There is a specification for robots.txt, but the rules and syntax are exceptionally simple.There is a specification for robots.txt, but the rules and syntax are exceptionally simple.

Robots.txt Non-Standard Directives

Of course, whilst having a list of search engines and files you want hidden is useful, there are a few non-standard extensions to the robots.txt specification that will further boost its value to you and your website. Although these are non-standard directives, all major search crawlers acknowledge and support them.

Some of these more popular non-standard directives are:

There are other less supported directives such as Visit-time, which restricts web robots to indexing your site only between certain hours of the day.

Here’s an example of a more complex robots.txt file using non-standard directives:

 Allow: /private/public.html
 Comment: I love you Google, come on in!
 Crawl-delay: 10
 Request-rate: 1/10m # one page every 10 minutes
 Robot-version: 2.0
 Sitemap: /sitemap.xml
 Visit-time: 0500-1300 # military time format

Whilst not a standard, there is an extension for robots.txt which has mainstream support.Whilst not a standard, there is an extension for robots.txt which has mainstream support.


A favicon (short for "favourites icon") is a small image (like a desktop application’s shortcut) that represents a site.

Shown in the browser’s address bar, the favicon gives you a unique opportunity to stylise your site in a way that will add identity to browser favourites/bookmarks (both locally and through social networks).

The great thing about this file is that every major browser has built-in support for it, so it’s a solid extra file to provide.

This is how the favicon.ico file affects your site visually through the browser.This is how the favicon.ico file affects your site visually through the browser such as IE.

Creating a Favicon.ico file

To create a favicon, you’ll need an image or icon editor. I am a fan of Axialis IconWorkshop, but there are free editors like IcoFX that do the job well.

You can also find quite a few free online favicon tools by viewing this list of web-based favicon generators.

You need to have a 16x16px icon (or 32x32px, scaled down) that matches what you want to see in the browser.

Once you are done creating your icon’s design, save the file as "favicon.ico" in the root directory of your web server (that’s where browsers look for it by default).

Note: It’s a good idea to use the .ico file type, as some browsers don’t support PNG, GIF or JPEG file types.

To make this file work properly, refer to the location of your favicon in the <head> tags of all your HTML documents, as such:

 <link rel="shortcut icon" type="image/" href="favicon.ico" /> 

The rel attribute values of "shortcut icon" or "icon" are considered acceptable and the MIME type of "" (as of 2003) replaced the older type ("image/x-icon") as the official standardized favicon MIME type for .ico files on the web.

Note: While Internet Explorer (and some other browsers) will actively seek out your favicon in the root directory of your site by default (which is why you should have it there), it’s worth adding the above code into the <head> of your HTML just to make it explicitly known by other types of browser agents.

There are multiple online tools which can create a favicon from existing images.There are multiple online tools which can create a favicon from existing images.

Favicons in Apple Devices

Another standard (of sorts) has appeared in light of Apple’s iPod, iPad, and iPhone. In this situation, you can offer a 57×57 PNG, ICO or GIF file (alpha transparency supported) that can be displayed on the devices’ Home screen using the web clip feature.

Apple also recommends that you use 90-degree corners (not rounded corners, which the devices will do for you automatically) to maintain the "feel" of such icons.

To make this file work properly, place the following code into every page within your <head> tags:

 <link rel="apple-touch-icon" href="images/icon.png" />

For users of Apple devices, a specially produced "favicon" can be produced.For users of Apple devices, a specially produced "favicon" can be produced.


One thing website owners worry about is getting their website indexed correctly by the major search engines like Google.

While the robots.txt file explains what files you want excluded from results, the Sitemap.xml file lists the structure of your site and its pages. It gives search engine crawlers an idea of where things are on your site.

This is how the Sitemap.xml file interacts between a search engine and your website.This is how the Sitemap.xml file interacts between a search engine and your website.

As always, the first recommended course of action to produce a Sitemap is to create the XML file that will contain its code. It’s recommended that you name the file as "sitemap.xml" and provide it within the root directory of your website (as some search engines automatically seek it there).

It’s also worth noting that while you can submit your Sitemap file location directly to search engines, adding the non-standard Sitemap directive to your robots.txt file can be useful as it’s widely supported and gives spiders a push in the right direction.

Below is a basic example of how a Sitemap looks like.

 <?xml version="1.0" encoding="UTF-8"?>
 <urlset xmlns="">

Each Sitemap file begins with a Document Type Definition (DTD) that states that the file is UTF-8 encoded, written in XML, and uses the official Sitemap schema.

Following those formalities, you simply produce a list of your URLs that exist within your website’s structure.

Each URL must be contained within two elements: <url> and <loc>. This is a very simple specification to follow, so even less experienced developers should be able to replicate this basic mechanism with little effort.

To reference your Sitemap inside your HTML documents, place this code between the <head> tags:

 <link rel="sitemap" type="application/xml" title="Sitemap" href="sitemap.xml" />

Just like most XML-based schemas, there is a protocol and specification to follow.Just like most XML-based schemas, there is a protocol and specification to follow.

Other Sitemap Tags

While you could limit yourself to simply listing every file, there are a number of other meta-information that can be included within the <url> tag to help further define how spiders deal with or treat each page in the site — and this is where the Sitemap’s true power lies.

You can use <lastmod>, for example, to state when the resource was last modified (formatted using YYYY-MM-DD). You can add the <changefreq> element, which uses values of always, hourly, daily, weekly, monthly, yearly, and never to suggest how often a web page changes (for example, the front page of Six Revisions has a value of daily).

There is also the <priority> tag, which uses a scale of 0.0 to 1.0 that you can utilize to indicate how important a web page is to a website.

Here’s an example of using the above tags:


Google allows you to submit your Sitemap to initiate its analysis of your site structure.Google Webmaster Tools allows you to submit your Sitemap to initiate its analysis of your site structure.


Ensuring you provide metadata has become big business among SEO professionals and semantics advocates. The appropriate use of HTML, metadata, microformats and well-written content improves the chances of appearing in the right search results. They also allow an increasing number of browsers and social networks to aggregate and filter the data so that they can accurately understand what your content represents.

The Dublin.rdf file acts as a container for officially recognised meta elements (provided by the DCMI specification) which can augment the semantic value of the media you provide.

If you’ve ever visited a library and tried to locate a book, you know that you will often have to flick through the library catalogs to find books based on their subject, their author, or perhaps even their title. The aim of the DCMI is to produce such a reference card for your website that will help search engines, social networks, web browsers, and other web technologies understand what your site is.

This is how the Dublin.rdf file interacts with supporting social networking mediums.This is how the Dublin.rdf file interacts with supporting social networking mediums.

Creating a Dublin.rdf File

To begin, you need to produce the file itself (which we shall name "Dublin.rdf"). In order to maintain consistent meta details about the site (as opposed to individual DCMI meta tags for specific pages and resources), we shall create an RDF file (formatted as XML) with a reference within the HTML document to indicate that the information is available. While you can embed DCMI meta tags within HTML, RDF allows you to cache the data.

This is how the OpenSearch file interacts with your site through the browser.

When a supporting spider or other resource that acknowledges the DCMI core sees the file, they can cache and directly relate to the information.

This doesn’t mean you shouldn’t use traditional meta tags, but the file can serve as a useful supplement.

 <?xml version="1.0"?>
 <rdf:RDF xmlns:rdf="" xmlns:dc= "">
 <rdf:Description rdf:about="">
 <dc:contributor>Your Name</dc:contributor>
 <dc:description>This is my website.</dc:description>

Like most XML files, this RDF document has a DTD — and within that, you have the description element (which links to the resource being referenced).

Within the description, as you can see from the above, there are several elements (beginning with the prefix of dc:) — these hold the metadata of the page.

There’s a whole range of terms you can add (see this list of DCMI metadata terms), it’s simply a case of adding the term’s name, then giving a value as denoted by the DCMI specification. You’ll end up with a library of useful data that can improve your site’s semantics and interoperability with other sites and applications!

To make this file work properly, place the following code into every HTML document within the <head> tags:

 <link rel="meta" type="application/rdf+xml" title="Dublin" href="dublin.rdf" />

This is how the OpenSearch file interacts with your site through the browser.The Dublin.rdf file makes use of the DCMI specification to provide meta information.


The ability to search a website is one of the most important ways people locate content.

The OpenSearch file allows you to add a custom search engine listing (on your own site) through the search feature that appears in all modern browsers. All of the major browsers can take advantage of OpenSearch; it’s pretty durable.

While you will still want to provide a search mechanism on your website, this core enhancement complements the user’s in-browser search capabilities.

This is how the OpenSearch file interacts with your site through the browser.This is how the OpenSearch file interacts with your site through the browser.

Like with all things we’ve discussed thus far, we need to produce the file for the code to be placed in.

As this particular type of file doesn’t have assumed name reservations like robots.txt or sitemap.xml, we could call the file whatever we like. However, the convention for OpenSearch files is to name the file, "opensearch.xml".

You’ll want to include the code below as your starting template, then proceed to customising the required tags such as <ShortName>, <Url> and <Description> (they are case-sensitive) to describe your site.

The example used below is for Six Revisions using Google Search.

 <?xml version="1.0" encoding="UTF-8" ?> 
 <OpenSearchDescription xmlns="">
 <ShortName>Six Revisions</ShortName>
 <Description>Search this website.</Description>
 <Url type="text/html" template=";as_q={searchTerms}"/>

The tags included above are:

To make this file work properly, place the following code into every page within the <head> tag:

 <link rel="search" type="application/opensearchdescription+xml" title="Website" href="opensearch.xml" />

This is how the OpenSearch file interacts with your site through the browser.

Other OpenSearch Tags

There’s a range of additional tags we can provide. Among these are:

Example usage of these other tags:

 <Attribution>Copyright, Your Name 2010, Some Rights Reserved.</Attribution>
 <Developer>Your Name</Developer>
 <Query role="example" searchTerms="terms" />
 <Tags>Example Tags Element Website</Tags>

This is how the OpenSearch file interacts with your site through the browser.

Simple, Small and Effective

While this guide represents a crash course in producing these useful files, it’s worth pointing out that taking the time to understand the syntax of any language is important in order to determine what the impact of these files on your website.

These files represent a truth that there’s more to a website than HTML, CSS, and JavaScript, and while producing these files will certainly not act as a replacement for your existing code workflow, their inherent benefits make them worthy of consideration to supplement your projects. Give them a try for yourself!

Related Content

About the Author

Alexander Dawson is a freelance web designer, author and recreational software developer specializing in web standards, accessibility and UX design. As well as running a business called HiTechy and writing, he spends time on Twitter, SitePoint’s forums and other places, helping those in need.

This was published on Jul 15, 2010


Pixelbox Design Jul 15 2010

Excellent Article, Added to favs, keep up the good work.

Richard Käll Jul 15 2010

Are there any advantages to using XML instead of a text-file (txt) as a sitemap?

abhishek Jul 15 2010

Is there any direct way to create rdf file like sitemap.xml file.

Baloot Jul 15 2010

Great info. Will apply this web file into my WordPress powered blog soon. Must bookmark. :)

Sun Pietro Jul 15 2010

Interesting article, but I have one question. How to implement RDF into HTML directly and dynamically using PHP?

Eduardo Storini Jul 15 2010

I really liked the article, mainly opensearch I didnt know

Ismail Sobah Jul 15 2010

thank you for this very informative article. keep up the good work :)

CyberFox Jul 15 2010

Great info! Thanks!

Ariel Mariani Jul 15 2010

Excelent article! this is very useful, I did’t know the Dublin.rdf , thank you very much!

Alex Crooks Jul 15 2010

I always avoid listing any private directories in robots.txt as it’s just helping casual “hackers” find private directories that they may not stumble on otherwise. It kind of reminds of this:

However a great article, I certainly wasn’t aware of OpenSearch and Dublin so thank you.

Jennifer R Jul 15 2010

wow, this is the first time I heared about opensearch and Dublin.rdf file, thanks for sharing that

Nicolas C. Jul 15 2010

Great article, thank you !

Rednights Jul 15 2010

I got the first 3 down, the last 2 quite escapes my comfort level. Interesting stuff though.

Alexander Dawson Jul 15 2010

@Richard Käll: I think the main advantage would be the greater number of elements and detail you can provide, though they’ll both do the job!

@abhishek: There may be (though I have not come across one for the Dublin spec), however it’s fairly simple and unlike a sitemap requires less active maintenance so manually building would be my recommendation.

@Alex Crooks: Agreed! As I said, it’s not a good privacy mechanism.

pouya saadeghi Jul 15 2010

Useful article :D

Greg S. Jul 15 2010

Great tips, thanks. Some of the info were things I already use, but like another commenter said, the dublin.rdf tip was new to me.

Clervius Jul 15 2010

I’ve used the robots.txt file since I started messing with google’s webmaster tools. Besides that, reading the rest of the article was a wonderful learning experience.
Thank you so much for this article..
Also, I think it’s worth noting out that if you are using certain CMS’ you have to play around with the .htaccess file.

Good article, thanks.

I’ve picked up a few tips!

TimHolmes Jul 15 2010

Very interesting, in particular open search and dublin.rdf…

Will look into this.

Jordan Walker Jul 15 2010

Great article, most informative one I have read today!

Wow, what a great article. I actually need to get a favicon on my website asap! Thanks for the reminder.

Jakes Jul 15 2010

Hi Alex,
Great post. “Dublin.rdf” was a new information. My blog have the first three files. Will add the other two files soon.

Gerardo S Jul 15 2010

Dublin.rdf… that’s new for me, seems useful and I didn’t even know that it exists… excelent article !

Nagarjun Palavalli Jul 15 2010

Excellent article. You wrote that icons for Apple devices need to be 90-degree corners. Does this mean they just have to be squares and Apple will automatically round the corners? This is a good tip.

shaymein Jul 15 2010

don’t forget the geo tags.

ICBM, geo.position, geo.region

MBDealer Jul 15 2010

Very informative article, thank you!

Tobbi Jul 15 2010

Great article, thanks for information.

Jacob Gube Jul 15 2010

@Nagarjun Palavalli: Yes. And you can see our tutorial on creating an HTML5 iPhone app which notes that, indeed, your icons will automatically rounded.

“Note: While Internet Explorer (and some other browsers) will actively seek out your favicon in the root directory of your site by default (which is why you should have it there), it’s worth adding the above code into the of your HTML just to make it explicitly known by other types of browser agents.”

There is absolutely no reason to do that. If no favicon link is detected upon HTML page load completion and no previous site visits are recorded in the browser’s history, a favicon.ico is requested automatically.

Jason Shultz Jul 15 2010

Really good article. I had never heard of dublin.rdf or opensearch. that is awesome. Thank you for the great information!

Richard Käll Jul 15 2010

@Alexander Dawson: Thanks for the article!

Editor, QOT Jul 15 2010

First time I heard about the Dublin.rdf file. Very insightful article. Thanks for that.

liputra Jul 15 2010

If your site have dinamic page, such as music lyrics, how do you create the sitemap?

Coolzrock Jul 15 2010

Thanks for the great article. I was looking for some of these things for a long time, and now they are in front of me!

Great jobs guys.

Sandy Jul 15 2010

Hey..You must be Damn Genius…So organised website….

Great Work…

Excellent article. I’d never heard of the dublin.rdf. Thanks!

John G Jul 15 2010

Thanks, Alexander. You really opened my eyes to some things with this article. A few of those I had no idea even existed. I can see I’ve got much work to do this weekend.

this is just super helpful and took me a long time to figure out. its nice to have the five tips in one place and documented out so well with examples and tips. Like matt above i had not heard of dublin.rdf. The others are things I have been trying to learn now for 4 years ;-)

Thomas J Bradley Jul 15 2010

I have been looking into RDFa recently, the solution you propose is really interesting. Your solution is to provide one set of RDFa meta information for the whole web site. Should we be providing different information on a per page level?

For Apple touch icons the recommended size is 129 x 129 pixels. The OS scales the icon down to the correct dimensions and 129 is accepted as the best looking. Also iPad icons are larger, somewhere in the 75 px area so providing a larger icon will render properly.

Also, as far as I know, you can only use PNG images for the icons.

ghabuntu Jul 15 2010

Thanks a lot man, really helpful bits of info there for up and coming web designers and average Joes like myself :-).

@Alex Crooks also makes a very important point that is worth taking into consideration IMHO.

Michael Tuck Jul 15 2010

As with many of the other commenters, I didn’t know about OpenSource. I thought DublinCore was obsolete. Shows what I know…! I have some updating of my sites to do! Thanks, Alex.

subzero525 Jul 15 2010

this ma first time to hear about Dublin.rdfand will try to apply it im my blog thanks mate

Sanjeev Kulkarni Jul 16 2010

Nice and useful information. Thanks for sharing this…

toufik ismail Jul 16 2010

wow, this is great. bookmarked!

Sandy Allen Jul 16 2010

Two other files you may find useful. One is a .kml file ( which can be served up for some Google Earth applications – and pointed to using a geo site map. The second – and this may be of dubious value – is a crossdomain.xml file ( for global access for flash.

jlapitan Jul 16 2010

thanks for the info, my first time to hear about opensearch..

Felipe Jul 16 2010

Great article, keep going good!
Added to my improvements bookmarks

ddeja Jul 16 2010

Dublin.rdf is a great addon i hope. But what is the difference than betwean this file and traditional meta tags? Are traditional meta tags not officialy recognized anymore? As I remember even google made a turn around in this matter…

As for the rest addons? Robots.txt as Alex Crooks said hackers are to smart to give them any more excuses.

Sitemap is of course great addon.

Favicon it’s just a bling-bling nothing more.

And open search? Never heard of it. So I’ll read it and I’ll test it.


Paul Janaway Jul 16 2010

Great tips, I love the website, I always use that one!

Satya Prakash Jul 17 2010

This article in very new in essence. Content is also very good.

Wayne Hodkinson Jul 19 2010

A well written article which on the base of it is what all websites should adhere to in the development stages. Some stuff there I didn’t know but do now. Thanks for sharing!

Alexander Dawson Jul 19 2010

@Ben: Actually, not all browsers automatically look for a favicon. In fact, doing so violates the HTTP specification in regards to filename reservations.

@Thomas J Bradley: Honestly, it depends. Like with meta tags, you have the option to repeat the information site-wide but it (of course) limits how much contextual value is assigned. My philosophy is that as RDF is still in fairly early stages with the likes of Google (in terms of recognition), having a single file which can be quickly cached and recalled will be the better option. Though that’s not to say that individually assigning DCMI terms per resource won’t have advantages (who knows, in the future each page could have it’s own RDF file!).

In regards to the Apple touch icons, I got my information directly from Apple’s developer center so it should be accurate – though admittedly iPad’s scale with larger icons so the increased size may have benefits. But you don’t need to simply use PNG, other formats are apparently compatible.

@Sandy Allen: There’s lots of good small files you can use to enhance your site, and KML is a perfect example of one which didn’t make the cut but does have real potential!

@ddeja: The main difference is that the tags are standardized therefore compatibility is going to be much higher than several generic meta tag containers. Though of course Meta tags are still valid (it’s just a different approach to the task).

As for calling the Favicon “bling”, I would disagree, there’s evidence to suggest it plays a part in a websites usability and aids user link recognition.

Russell Poulter Jul 20 2010

Nice article. Bookmarked!

Christopher Jul 20 2010

great article, really helped, thanks

Vlad Carp Jul 22 2010

good article…really helped

wow… lightened up my brain! thanks for sharing! :)

Nice article, great link too
Some stuff that I didn’t know
thanks for sharing

Claire Shaw Aug 11 2010 is a great find thanks!

Craig Sep 16 2010

Great list of files to help improve website friendliness, and interesting how all the files are located in the root directory :-)

Mark Petherbridge Sep 27 2010

Nice Article, Bookmarked :)


hussain Oct 04 2010

helpful .. :)

Maggew Oct 29 2010

This article is mean!

A quick-scan and I love what I see – thanks for bookmark.

Sheikh Oct 29 2010

This is amazing help for webmasters. Please keep it up.

So the recent article give me way down here. i finally know the essence of Robots.txt. Very nice article indeed.

Sivilce Tedavisi Nov 08 2010

“Dublin.rdf” I heard it for the first time in my life.
That is really interesting that i didnt heard of it before.
I have to keep that in mind after today :)

LSaridina Nov 09 2010

Why don’t you include rss.xml? rss is needed for any website pinger, and most of search engine use that file together with sitemap.xml.

stalker Nov 30 2010

Nice article, thanks for information!

nitin Feb 16 2011

u have simply helped us..I want to consult u is there a way??

wiison Mar 26 2011

Very nice article.
Useually i had ignored it.

Quang Le May 21 2011

Great article. Thanks for sharing.

Kevin John Ventura May 30 2011

Thanks Great Article :)

Terry Jun 14 2011

Nice. Will intend to use them on my blog.

ddeja Jun 14 2011

First of all thanks for the article.

Robots.txt, Favicon.ico, Sitemap – basic stuff.

Dublin.rdf – that’s something I’ve heard a while ago and than forgot so thanks for the reminder.

OpenSearch – Most of the biggest blogs and websites have it. But I think for smaller sites it’s not a good idea.


abhishek Jun 18 2011

I am setting up a new blog. I hope this post is gonna help me a lot.

John Wayne Sep 13 2011

Nice blog but didn’t succeed with OpenSearch… Is there something missing in explanations?

Chris Sep 21 2011

Excellent article – I was using a short version of the Dublin core with geotags but this method is an improvement.
Thanks again!

Stratz Oct 20 2011

Nice list, maybe you could add / talk about rss / atom syndication.

This comment section is closed. Please contact us if you have important new information about this post.