Building Accessible Websites (Part 2a of 5)
As one of the earliest considerations of a website site design / redesign, the accessibility and organisation of content and structure of the information comes high on the lists of early priorities. These 3 factors lay down the foundations for successful search engine optimisation.
The consideration of accessible websites, content organisation and site structure are all areas that overlap and so here, I will introduce these three aspects but broken down into 3 related posts (2a, 2b and 2c as part of 5 posts). The reason these 3 factors overlap is that they all relate to what and how the search engines crawl the information that you provide on your website.
Accessible Websites
When referring to accessible websites in search engine optimisation, this tends to mean what access the robot has to the site, and the ease at which it can crawl and index the sites information.
Why is this important?
There are certain aspects of a website's build that can unnecessarily restrict the flow of bots through a site. This is important because if a robot cannot crawl and index your website's information then they will not be able to rank that content in their search engine results.
On the flip side there's also content that you might not want to be crawled. So importantly here, I will be briefly introducing the importance of the robots.txt and sitemap.xml file.
Controlling the Access of Robots to your Content
It might sound silly but there are also occasions when you might not want information on your website to be part of the search engines index. These can be for both business and SEO reasons. Examples include: subscription-only content, private data, landing page content, and printer-friendly pages to reduce any duplicate content issues that the search engines discourage.
Robots.txt File
One way of managing search engine robot's is through a file called: robots.txt. All major search engines visit this file before crawling the rest of your site to get their instructions on what they can and cannot crawled. This file stores basic information to tell search engines which pages or folders you would not like to appear in the index, amongst other things.
Keep the robots.txt file in the 'root' of your website (I.e. not inside any folders but at the lowest level in your site structure).
If you are looking to stop a robot from crawling a certain folder called 'subscribers-only' then you would write the following into the file:
User-agent: *
Disallow: /subscribers-only
...that's it! The * describes that this rule is applicable to all user-agents, i.e. all search engine bots should follow this instruction.
#Please note, this is a very powerful file so please be careful with what content you tell search engines to not crawl, otherwise you could remove all your content from their index in one fell swoop. For instance, the following would remove your entire site from the search engine index:
User-agent: *
Disallow: /
...so please be careful with the robots.txt file!
There are several other ways of you controlling access robots access to your website but as this is an beginners guide I leave you with a couple of links for further reading for the time being. These are the robots.txt org and a broader discussion can be found on the Jane and Robot website. Feel free to get in touch if you have any doubts with regards to managing search engines access to your website.
Sitemap.xml promote robot access too
Once you have told search engines what content you would not like to have indexed, you can follow this up with a list of website page addresses (URLs) to ensure that they are aware of everything to be indexed. This needs to be a file that is preferably placed in the root of the website too, and verified with the search engines here at Google's, Yahoo's and MSN Live's respective webmaster tools. Google, the first search engine to recognise their value, has compiled further advice on sitemaps here for further reading.
Tip: You can create your own sitemap.xml file with this free sitemap tool.
Other coding factors Affecting Robot Access
The main other reasons for search engine access is their ability to crawl and index the content in addition to optimising and ranking it. Search engines can crawl and index just about any sort of code, but it does not mean that they can understand it in order to rank it in relevant search engine results pages (SERPs). It's for this reason why we must ensure that we can do things to help search engines crawl content in a way that means something to their search engine algorithms (calculations that decide upon the rankings of the websites).
There is some code that is used to build some websites that search engines cannot read so well. In search engine marketing we recommend that sites use html to organise their on-page content – that way search engines can read the code and the content within it.
Website code that cannot be read by search engines might include Flash websites, iframes used to embed video and other content and JavaScript. Search engines are making progress with their understanding of this information but it's good practice to use html code if at all possible.
Building Accessible Websites
Bourn Design takes great care in building websites that promote accessibility in their design and development.
Posted By: Ben
22 January 2009