It looks like google has a little known tool that creates sitemaps with a simple python script, its in the repositories and this could surely be handy for allot of webmasters out there.
The Google Sitemap Generator is a Python script that creates a Sitemap for your site using the Sitemap Protocol. This script can create Sitemaps from URL lists, web server directories, or from access logs. In order to use this script:
- You must be able to connect to and run scripts on your web server.
- Your web server must have Python 2.2 or later installed.
- You must know the command that launches Python. (Generally, this is python, but may vary by installation. For instance, if the web server has two versions of Python installed, the earlier version may be invoked by the command python and the later version may be invoked by the command python2.)
- You must know the directory path to your site. If your web server hosts one site, this may be a path such as var/www/html. If you have a virtual server that hosts multiple sites, this may be a path such as home/virtual/site1/fst/var/www/html.
- You must be able to upload files to your web server (for instance, using FTP).
- If you will be generating a list of URLs based on access logs, you must know the encoding used for those logs and the complete path to them.
If you aren't sure about any of this, you can check with your web hosting company.
Now you’re ready to get started. Here’s an overview of what you’ll need to do.
- Download the Sitemap Generator program files.
- Create a configuration file for your site using the provided example_config.xml file as a template. Modify this file as needed for your site and save it.
- Upload the necessary files to your web server.
- Run google-sitemapgen
- Add the generated Sitemap to your Google webmaster tools account.
- Set up a recurring script. (optional)
sudo apt-get install google-sitemapgen
If you are unable to use the Sitemap Generator, you can add a Sitemap to your Google webmaster tools account in another format, such as a simple text file. Third-party programs supporting the Sitemap Protocol.