Sitemap repair script for MediaWiki 1.11.0 with submission

Having used the MediaWiki sitemap repair script by Sy[1] for a while, I wanted to post my version of it which adds a submission feature for the generated sitemap. I think I saw some page mention a submission concept while I was researching the matter, so this is not exactly my idea.

To use the script, make the following adjustments:

  1. Correct "/home/httpd/vhosts/example.com/httpdocs/maintenance/" to point to the maintenance folder of your MediaWiki installation.
  2. Globally search and replace all "example.com" with your domain name.
  3. Globally search and replace all "sitemap.xml" with the name of your sitemap index file.
  4. Search and replace "dralspire.com" with the domain name listed in one of the sitemap files (note: you cannot use the sitemap index file for this step).
  5. Bonus: Add the absolute link of your sitemap index file to your robots.txt file.[2] Click here for an example.

That is it. Save the script on your server, and do not forget to "chmod 755 filename" before you run it.

#!/bin/sh
echo generating the sitemap...
cd /home/httpd/vhosts/example.com/httpdocs/maintenance/
php generateSitemap.php

echo prepping the files...
mv -f *.gz ../
mv -f *.xml ../
cd ..
gzip -d *.xml.gz

echo repairing the index file...
sed 's/<loc>/<loc>http:\/\/example.com\//g' sitemap.xml > sitemap.xml.sed
mv sitemap.xml.sed sitemap.xml

echo repairing the files...
for i in $( ls *.xml ); do
sed 's/dralspire.com/example.com/g' $i > $i.sed
done
ls -d *.sed | sed 's/\(.*\).sed$/mv "&" "\1"/' | sh

echo gzip it all back up...
gzip *.xml
gzip -d sitemap.xml.gz

echo submission time...
wget -q -O /dev/null http://www.google.com/webmasters/tools/ping?sitemap=http://example.com/s...
wget -q -O /dev/null http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=http://e...
wget -q -O /dev/null http://submissions.ask.com/ping?sitemap=http://example.com/sitemap.xml
wget -q -O /dev/null http://api.moreover.com/ping?u=http://example.com/sitemap.xml
wget -q -O /dev/null http://webmaster.live.com/ping.aspx?siteMap=http://example.com/sitemap.x...
echo done.



References

Trackback URL for this post:

http://dralspire.com/trackback/219

Comments

It's nice to see that

It's nice to see that someone got some good use out of my script and has gone a step further. =)

Thank you for your help on

Thank you for your help on this. I made some adaptions/simplifications that worked for my specific installation of mediawiki. You can see http://forums.appropedia.org/blog/making-sitemap-opensource-documentatio... for more about these changes.

#!/bin/sh
echo generating the sitemap...
cd maintenance/
/usr/local/php5/bin/php generateSitemap.php

echo prepping the files...
mv -f sitemap* ../
cd ..

echo repairing the index file...
sed 's//http:\/\/www.appropedia.org\//g' sitemap-index-appropedia-w1.xml > sitemap.xml

echo Yipee! Done.

excuse me good sirs but what

excuse me good sirs but what file extension am i supposed to save this with?

I used the extension .sh to

I used the extension .sh to save the script. Also remember that you have to give the file proper rights to run it, chmod 755 scriptname.sh does the trick.

It looks like things broke

It looks like things broke somewhere along the line. Have you had a chance to re-test things since these various MediaWiki and Google upgrades?

Something somewhere seems to be broken for me with my system, are things working for you?

I just updated the first

I just updated the first wiki to Mediawiki 1.13.0 yesterday. I will check whether the new generatesitemap.php causes any problems once the cron hits it today, but prior to the update I had no major problems (when running Mediawiki 1.12.0). Google had some warning in the webmaster tools, stating "All the URLs in your Sitemap are set to the same priority (not the default priority). Priority indicates the importance of a particular URL relative to other URLs on your site, and doesn't impact your site's performance in search results. If all URLs have the same priority, Google can't tell which are more important." However, that is not only the case for generatesitemap.php, but other sitemap generators as well, plus it actually has no bearing on the function of your script.
Update: Everything is working fine for me with Mediawiki 1.13.0.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Use <bib>...</bib> to insert automatically numbered references.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • You may link to webpages through the weblinks registry

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.