How many URLs does a sitemap contain

Sitemap guide - the most important questions about the sitemap explained

Do you have a blog and would like to create a sitemap for it? Here you can find out everything about the sitemap - what it is, what it is used for and how you can create your own sitemap.

What is a sitemap?

A sitemap is a file in which you can list the individual web pages on your website. This is how you tell Google and other search engines how the content of your website is structured. Search engine web crawlers like Googlebot read this file to help crawl your website more intelligently. (Definition: Google)

In other words: the sitemap tells the search engine which pages on the website are important and should be indexed.

Why do you need a sitemap?

The sitemap enables search engines to crawl the website better. Since all the important pages of the website are listed in one file with the sitemap, this basically makes it easier for the crawler to find and search through the documents. This does not mean that the crawler could not find it without the sitemap or, conversely, that using the sitemap guarantees indexing of the pages. As a rule, however, it is an advantage to have a sitemap - in any case, it can't hurt.

Especially if one of the following factors applies to the website, a sitemap definitely makes sense.

The website is very big: In other words, the website consists of a lot of pages. As a rule, new content is constantly being added, adapted or omitted. Of course, "big" is to be understood relatively here, but to make it clear: You can well imagine that if a website consists of thousands of pages, for example, it can happen that some of the most recently made page changes or new ones were created Pages are "overlooked". The sitemap can be used to ensure that this does not happen.

The pages are not linked to each other: The website does not have to consist of thousands of pages for them to be overlooked when crawling. It is enough if pages are not linked or linked poorly. In principle, this is due to the way crawling works. The main task of the crawler is to browse the web for content and evaluate it. He follows the links from page to page. The street to the next address, so to speak. If there are no or only poorly accessible ways to the side, it will be reached later or not at all.

The website is new: If you have set up a website from scratch, it is similar to the case before, where the pages are poorly linked. Just that this is more about the external references. Logically, a new website may not have as many inbound links from other websites as those that have long produced content that is worth referring to. Since the crawler follows the links from page to page, as I said, it can happen that the website and its sub-pages cannot be found and crawled so easily. This process can be accelerated with a sitemap.

Incidentally, this case also applies if you move your website. If, for example, you are planning a protocol change from http to https or a complete domain move, it is essential that the sitemap is updated and re-submitted to Google.

In addition to the regular sitemap for crawling the website, additional sitemaps for reference to content such as videos, images or news can be created with sitemap extensions for additional media. In the case of a video entry in the sitemap, for example, additional meta data can be used to refer to the video title, description or video duration, etc., and the indexing can be improved.

What sitemap formats are there?

The most common sitemap format is XML. In addition to this, RSS feeds and text files are also supported in which information can be provided to a more limited extent than in the XML protocol.

Sitemap XML

The XML sitemap is preferred by the major search engines. It serves as a structured table of contents for the website and consists of various XML tags, although not all of them are absolutely necessary. It is aimed at the crawler, who can find and process the individual pages better when crawling the website. This applies in particular to very large, poorly linked websites or websites with a high directory depth (sub-categories and pages).

Sitemap.xml tags

  • : This tag summarizes the file and refers to the current log status.
  • : As a superordinate tag for the entry of the various pages to be added to the sitemap. The remaining tags are subordinate to this.
  • : The URL of the page is entered in this tag. The URL must be absolute, i.e. the full path including the protocol (e.g. "http" or "https").

The following tags are optional and not mandatory:

  • : This tag indicates when the URL was last revised. The time can be omitted.
  • : Change frequency, how often the page is likely to be edited or changed.
  • : Priority of the respective URL compared to other URLs on the website, whereby the standard priority is 0.5 and the maximum value is 1.0.

comment: The tag informs search engines what priority the website URLs have among each other and thus provides information on the structure of the website. However, it makes absolutely no sense to give all URLs in the sitemap a high value, as this is relative and only serves to select between the URLs on the website.

Here is an example of how the sitemap can look like in XML format:

<?xml version=“1.0″ encoding=“UTF-8″?>
<urlset xmlns=“http://www.sitemaps.org/schemas/sitemap/0.9″>
<url>
<loc>http://www.example.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>http://www.example.com/weitere -seite</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>

HTML sitemap

The HTML sitemap is primarily intended for users of the website, taking usability aspects into account. Just as it is easier for the crawler to crawl the website via the sitemap.xml, a clearly structured sitemap HMTL can also give the user a good overview of the content on the website. The HMTL sitemap itself is made available as a normal subpage of the website and is usually linked in the footer area of ​​the website, which the crawler can also access when viewing the website and thus find all content more easily.

If there are more than 200 pages on the website, it can make sense to consider dividing the HMTL sitemap into several so as not to lose clarity. In principle, only the most important documents should be listed on the sitemap HMTL page.
A good example of what an HTML sitemap can look like can be found on Ebay, for example, here is an abstract of it.

 

For the sake of completeness, the other sitemap formats should also be mentioned here.

RSS - Syndication Feed

If you run a blog that has an RSS feed, the feed URLs can be submitted as a sitemap. Google accepts RSS (Real Simple Syndication) 2.0 and Atom 1.0 feeds. Most CMS have blog software that can be used to create a feed, but it only contains information about current URLs. It is possible that not all of the website's URLs are transmitted to the search engines. The information to the search engines is transmitted via the entered tag, which specifies the URL, and the tag, which shows when the individual URLs were last changed.

Sitemap as a text file

If the sitemap only contains URLs from the pages, it can also be created as a simple text file consisting of one URL per line, e.g.
http://www.example.com/
http://www.example.com/other-page

It is important to note that the file is encoded in UTF-8 format and only contains the list of URLs. The file name must have the extension TXT, e.g. sitemap.txt.

You can find more details and specifications on the various sitemap formats here.

How can I create a sitemap.xml?

In the following you will find short instructions on how to create the Sitemap.xml in just a few steps:

  1. Creation of the sitemap using a sitemap generator
  2. Validation of the sitemap file
  3. Upload the sitemap file to the main directory (root) of the website
  4. Entry of the sitemap URL in the robots.txt
  5. Submit sitemap in google

Creation of the sitemap.xml using a sitemap generator

There are various tools online that you can use to create your Sitemap.xml online. Here is an example of how this works with the XML sitemaps generator:

  1. With "More options" you can specify which tags should be included in the sitemap.xml.
  2. Enter the URL of your main directory, e.g. http://www.example.com/.
  3. Click on "Start"

The website is scanned and the sitemap is created.

Once the process has ended, you can use the "View Sitemap Details" button to:

  1. Look at the finished sitemap.xml file (preview) and download it.
  2. You can also have the sitemap sent to you in HTML format by email.

You can then look at the Sitemap.xml file with an editor such as Notepad ++ and, if necessary, expand pages that you do not want in the Sitemap.xml and save the file.

IMPORTANT: The sitemap should only contain the canonical URLs with a status code of 200. URLs should not contain server errors such as 404 errors, redirects or a canonical tag to another page, as well as those with a noindex robots tag. That would make no sense, since these pages either cannot be reached under the specified URL, refer to another page or should not be indexed (noindex). To avoid this type of error, you can simply perform a crawl with Screamingfrog by crawling the URLs contained in the sitemap with the tool.

Validation of the sitemap file

An XML checker can be used to validate the sitemap.xml. The XML validator can be used to check the content of the sitemap for any formatting errors.

To do this, the content of the empty field is completely inserted and "Check XML" is clicked. If the test is successful, a message with "No errors found" should pop up:

If there are errors in the file, this is also visible in the result and looks like this, for example:

As you can see, you will see the exact line where the error appears in the sitemap. If you see this, open the Sitemap.xml file again with the notepad and check the corresponding line. There may be an error here with the tags (not properly opened or closed). If you think you have solved the problem, take the test again.

Upload the sitemap file to the main directory of the website

Once the Sitemap.xml has been created, it must be added to the main directory (root) of the website. You can use an FTP program such as FileZilla to upload the file to the appropriate directory. When this is done, the sitemap should be accessible on your own domain, e.g. under http://www.example.com/sitemap.xml.

Entry of the sitemap URL in the robots.txt

Like the sitemap file, the robots.txt is always stored in the root directory of the website and must be written exactly like this: robots.txt (pay attention to lowercase letters). The file can be created on the web server with the FTP program and the URL of the sitemap.xml can be added. The robots.txt is simply supplemented by the sitemap URL:
Sitemap: http://www.example.com/sitemap.xml

If you call up the robots.txt via the browser (http://www.example.com/robots.txt) the entry should be there. Adding the sitemap URL to the robots.txt is not absolutely necessary if you submit it to Google as follows.

Submit sitemap in Google

To add your sitemap to Google, you first have to log into the Google Search Console. Here you can submit the sitemap as follows:

  1. Call up the sub-item “Sitemaps” under “Crawling”.
  2. Click on "Add / test sitemap" at the top right
    Complete the empty field with sitemap.xml so that the path to the file (URL) is fully entered.
  3. Here you can first check whether the sitemap is processed correctly by Google by clicking on "Test".
  4. If no errors were found, repeat points 2 and 3 and click on "Send".

Before the sitemap can be submitted to Google, you must first have added your website to Google. Here is a checklist for it.

Further information on Google's guidelines for sitemaps can be found here under “General guidelines for sitemaps”.