Occasionally you need to mirror a website (or a directory inside one). If you've only got HTTP access, there are tools like httrack which are pretty good (albeit pretty ugly) at doing this. However, as far as I can tell, you can't use httrack on a password-protected website.
curl can probably do this too, and supports authentication, but it wasn't obvious.
So I ended up using wget, as it supports mirroring and credentials. But the issue here is that wget plays nice and respects robots.txt; which can actually prevent you mirroring a site you own. And nothing in the man page explains how to ignore robots.txt.
Eventually, I came up with this incantation, which works for me (access to password-protected site, full mirror, ignoring robots.txt):
wget -e robots=off --wait 1 -x --user=xxx --password=xxx -m -k http://domain.to.mirror/
where:
Don't use it carelessly on someone else's website, as they might get angry...
Comments
Nifty use of wget
Nifty use of wget, Seems so simple and useful I, not only bookmarked it but also cut & pasted your article in my personal linux help document.. Dont want to risk a page not found some time off in the distant future. ;)
Consequentially I stumbled onto your blog searching for an easy way to implement spam filtering. The how to was very helpful, but left me with one question when training my inbox (see my comment in that blog entry, if you have any insight)
Thanks, John
I put a short (and probably
I put a short (and probably not very helpful) suggestion in reply to your other comment. Always happy to get a genuine comment from someone who found my blog useful!
reply
This unique is simply one method to handle it. Despite the fact that personaly I exploit keepvid to be able to upload via myspace, therefore has always a method merely get the show in the mp4 file. That is a lot less timeconsuming this way austin
Its True!
I do agree with all the ideas you have presented for your post. They're really convincing and can definitely work. Still, the posts are too short for beginners. May just you please prolong them a little from subsequent time? Thanks for the post. obchod sport