{"id":2227,"date":"2008-08-17T22:08:15","date_gmt":"2008-08-18T06:08:15","guid":{"rendered":"http:\/\/lee.org\/blog\/?p=2227"},"modified":"2008-08-18T21:46:21","modified_gmt":"2008-08-19T05:46:21","slug":"how-to-create-an-html-dump-of-mediawiki","status":"publish","type":"post","link":"https:\/\/www.lee.org\/blog\/2008\/08\/17\/how-to-create-an-html-dump-of-mediawiki\/","title":{"rendered":"How to Create an HTML dump of Mediawiki"},"content":{"rendered":"<p>If you will be traveling and need offline access to your Mediawiki wiki, what should you do?<\/p>\n<p>If you need to grab pages from a wiki that you aren&#8217;t the administrator of, you can try running a web crawler on it or try this <a href=\"http:\/\/andreas.schmidt.name\/blog\/2007\/10\/google-gears-hack-mediawiki-offline-functionality-in-less-than-one-hour.html\">Google Gears hack<\/a>.<\/p>\n<p>But if you are the administrator of the wiki (or you know the admin) you can make a Mediawiki2HTML dump. There is a Mediawiki extension that does it for you. Here&#8217;s how to run it:<\/p>\n<p>fetch the <a href=\"http:\/\/www.mediawiki.org\/wiki\/Extension_talk:DumpHTML\">DumpHTML extension<\/a> with shell commands like so:<\/p>\n<blockquote><p>cd \/whatever\/mediawiki\/extensions<br \/>\nsvn checkout http:\/\/svn.wikimedia.org\/svnroot\/mediawiki\/trunk\/extensions\/DumpHTML<\/p><\/blockquote>\n<p>run a shell command something like this as a cron job (create the appropriate folders first)<\/p>\n<blockquote><p>#!\/bin\/sh<br \/>\n# Generate a new html dump of wiki.orbswarm.com LCS 8-17-08<\/p>\n<p>echo &#8220;deleting contents of \/home\/swarm\/wiki.orbswarm.com-html&#8221;<br \/>\nrm -rf \/home\/swarm\/wiki.orbswarm.com-html<\/p>\n<p># DumpHTML.php expects to be run from the maintenance directory. The skin won&#8217;t get HTMLified if you run it from another directory<br \/>\ncd \/home\/swarm\/wiki.orbswarm.com\/extensions\/DumpHTML<br \/>\n\/home\/swarm\/php5\/bin\/php dumpHTML.php -d \/home\/swarm\/wiki.orbswarm.com-html -k monobook &#8211;image-snapshot &#8211;force-copy<\/p>\n<p>echo &#8220;deleting \/home\/swarm\/wiki.orbswarm.com\/offline\/*&#8221;<br \/>\nrm -rf \/home\/swarm\/wiki.orbswarm.com\/offline\/*<\/p>\n<p>\/bin\/tar -czf \/home\/swarm\/wiki.orbswarm.com\/offline\/swarm-wiki-html.tar.gz \/home\/swarm\/wiki.orbswarm.com-html\/\n<\/p><\/blockquote>\n<p>The way the above script is set up, every day, the .gz file is placed in a web accessible folder. I can then download it before I go on my trip.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you will be traveling and need offline access to your Mediawiki wiki, what should you do? If you need to grab pages from a wiki that you aren&#8217;t the administrator of, you can try running a web crawler on it or try this Google Gears hack. But if you are the administrator of the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-2227","post","type-post","status-publish","format-standard","hentry","category-geekery"],"_links":{"self":[{"href":"https:\/\/www.lee.org\/blog\/wp-json\/wp\/v2\/posts\/2227","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.lee.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.lee.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.lee.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.lee.org\/blog\/wp-json\/wp\/v2\/comments?post=2227"}],"version-history":[{"count":1,"href":"https:\/\/www.lee.org\/blog\/wp-json\/wp\/v2\/posts\/2227\/revisions"}],"predecessor-version":[{"id":2228,"href":"https:\/\/www.lee.org\/blog\/wp-json\/wp\/v2\/posts\/2227\/revisions\/2228"}],"wp:attachment":[{"href":"https:\/\/www.lee.org\/blog\/wp-json\/wp\/v2\/media?parent=2227"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.lee.org\/blog\/wp-json\/wp\/v2\/categories?post=2227"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.lee.org\/blog\/wp-json\/wp\/v2\/tags?post=2227"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}