Introduction
The wget
is a command-line utility in Linux used for downloading files from the web. It supports multiple protocols, such as HTTP, HTTPS, and FTP, and is designed to work non-interactively, meaning it can run in the background without requiring user input. This makes it an excellent tool for retrieving large files, downloading entire websites, and handling interrupted downloads. In this guide, we’ll focus on the most common options and arguments to help you get the most out of wget
.
TL;DR
You can find a shorter cheat sheet version of this article here.
Table of contents
Open Table of contents
- Basic Syntax of
wget
- Downloading Files to a Specific Directory
- Resuming Interrupted Downloads
- Downloading Multiple Files
- Downloading in the Background
- Limiting Download Speed
- Recursive Downloading (Downloading Entire Websites)
- Downloading for Offline Viewing
- Using Custom User Agents
- Handling Authentication (Username and Password)
- Checking Links Without Downloading
- Mirroring Websites
- Conclusion
Basic Syntax of wget
At its simplest, the wget
command is used like this:
wget [URL]
This command downloads the file located at the specified URL and saves it in the current working directory.
For example:
wget https://learntheshell.com/sample.zip
This will download sample.zip
to your current folder.
Downloading Files to a Specific Directory
By default, wget
saves files in the directory from which you run the command. If you want to specify a different location, use the -P
option followed by the directory path:
wget -P /path/to/directory [URL]
For example, to download a file and save it to the /home/user/Downloads
directory:
wget -P /home/user/Downloads https://learntheshell.com/sample.zip
Resuming Interrupted Downloads
If your download is interrupted, you don’t have to start over. Using the -c
option (short for “continue”), wget
will resume the download from where it left off:
wget -c https://learntheshell.com/sample.zip
This feature is particularly useful for downloading large files.
Downloading Multiple Files
To download several files at once, you can create a text file with each URL on a separate line, then pass the file to wget
with the -i
option:
-
Create a text file (
urls.txt
) with the URLs you want to download:https://learntheshell.com/file1.txt https://learntheshell.com/file2.txt
-
Use
wget
to download all the files listed in that file:wget -i urls.txt
wget
will download each file in the list one after the other.
Downloading in the Background
To download files in the background, freeing up your terminal for other tasks, use the -b
option:
wget -b https://learntheshell.com/sample.zip
When using this option, wget
runs in the background and logs output to a file named wget-log
. To check the progress of the download, use:
tail -f wget-log
Limiting Download Speed
If you don’t want wget
to use all available bandwidth, you can limit the download speed using the --limit-rate
option. This can be helpful when you need to conserve bandwidth or run other network-intensive tasks:
wget --limit-rate=200k https://learntheshell.com/sample.zip
In this example, the download speed is limited to 200 KB/s. You can specify the rate in bytes (B
), kilobytes (k
), or megabytes (m
).
Recursive Downloading (Downloading Entire Websites)
To download a website or directory recursively (i.e., download all linked files within the target page), use the -r
option:
wget -r https://learntheshell.com/
This will download the website, including all linked pages. You can limit the depth of recursion by adding the -l
(lowercase L) option:
wget -r -l 2 https://learntheshell.com/
This restricts wget
to downloading two levels deep.
Downloading for Offline Viewing
To download an entire website for offline viewing, including all assets like images and CSS files, use the -p
option along with recursive downloading:
wget -r -p https://learntheshell.com/
Additionally, to make the links in the downloaded HTML files suitable for local browsing, use the --convert-links
option:
wget -r -p --convert-links https://learntheshell.com/
This ensures that all links are converted to point to your local copies of the files.
Using Custom User Agents
Sometimes, websites block downloads from non-browser clients like wget
. You can bypass this by specifying a user agent, making wget
appear like a typical web browser:
wget --user-agent="Mozilla/5.0" https://learntheshell.com/sample.zip
This command makes wget
pretend to be Mozilla Firefox, which may help avoid blocking by some websites.
Handling Authentication (Username and Password)
For files behind HTTP authentication (such as password-protected areas), wget
supports basic authentication using the --user
and --password
options:
wget --user=username --password=password https://learntheshell.com/protected-file.zip
This command lets you download files that require a login.
If you don’t want to specify password in the command, use --ask-password
option. wget
will prompt for password:
wget --user=username --ask-password https://learntheshell.com/protected-file.zip
Note that --password
and --ask-password
are mutually exclusive.
Checking Links Without Downloading
To check if a URL or multiple URLs are valid without downloading the content, you can use the --spider
option:
wget --spider https://learntheshell.com/
This is useful for verifying links in scripts or ensuring web pages are accessible without actually downloading them.
Mirroring Websites
To mirror a website, including all its pages and directory structures, use the --mirror
option. This option is equivalent to -r -N -l inf --no-remove-listing
, which ensures a complete mirror of the website:
wget --mirror https://learntheshell.com/
The --mirror
command preserves timestamps and directory structure, creating an exact copy of the website for offline use.
Conclusion
The wget
command is a powerful and flexible tool for downloading files from the web, supporting many protocols and options. Whether downloading individual files, mirroring entire websites, or managing bandwidth usage, wget
provides a wealth of functionality to suit almost any need. By mastering the most commonly used options and arguments, you can efficiently handle everything from simple downloads to complex web scraping tasks.