Skip to content

How to Use wget Command in Linux: Advanced Options

Published: at 06:44 AM

Introduction

While wget is often used for basic file downloads, it also provides several advanced features and lesser-known options that can help you handle more complex tasks. In this second part, we’ll cover some of the more advanced functionality that can further enhance your usage of wget in Linux.

TL;DR

You can find a shorter cheat sheet version of this article here.

Table of contents

Open Table of contents

Downloading Files via FTP

In addition to HTTP and HTTPS, wget supports FTP (File Transfer Protocol), which is useful for accessing files stored on FTP servers. If the server requires authentication, you can specify a username and password directly in the URL:

wget ftp://username:password@ftp.learntheshell.com/path/to/file.zip

For anonymous FTP access, simply omit the username and password:

wget ftp://ftp.learntheshell.com/path/to/file.zip

Downloading Only Specific File Types

When mirroring websites or directories, you might only want to download specific types of files (e.g., PDFs, images, or documents). You can filter the download using the --accept or --reject option to specify file extensions:

wget -r --accept jpg,png https://learntheshell.com/

This example downloads only .jpg and .png files from the website. Conversely, you can reject specific file types:

wget -r --reject mp4,avi https://learntheshell.com/

This command will exclude any .mp4 and .avi files from the download.

Adjusting Download Timings and Retries

For more control over how wget handles failed downloads or unstable connections, you can adjust the number of retries and the time between them. The --tries option sets how many times wget will attempt to download a file before giving up:

wget --tries=10 https://learntheshell.com/sample.zip

If the connection is unreliable, you can increase the time between retries using --wait to introduce a delay between attempts:

wget --tries=10 --wait=5 https://learntheshell.com/sample.zip

Here, wget will retry up to 10 times with a 5-second delay between each attempt. This can help prevent server overload or avoid issues with rate-limited servers.

Logging and Debugging Downloads

If you’re automating downloads or troubleshooting issues, wget provides logging options to help keep track of what’s happening. Use the -o option to log output to a file:

wget -o download.log https://learntheshell.com/sample.zip

If you want more detailed information, including headers and debugging output, use the -d option to enable debug mode:

wget -d https://learntheshell.com/sample.zip

Debugging mode provides insight into the HTTP requests and responses, which can be useful when diagnosing connectivity issues or troubleshooting failures.

Mirroring a Website with Timestamping

For long-term projects that require you to repeatedly download the same site, you can use timestamping with --timestamping (-N). This ensures that only newer files or files that have changed since the last download are fetched:

wget -r --timestamping https://learntheshell.com/

This is useful when maintaining a local mirror of a website, as it prevents unnecessary downloads of unchanged files.

Limiting the Number of Connections per Server

To prevent overwhelming a server, wget allows you to limit the number of simultaneous connections. This is especially useful when mirroring websites or when the server has limitations on how many connections it accepts. You can use the --wait and --random-wait options to introduce a delay between requests:

wget -r --wait=1 --random-wait https://learntheshell.com/

The --random-wait option introduces a random delay between 0.5 to 1.5 times the specified wait time (in seconds), reducing the likelihood of being blocked by the server for excessive requests.

Using a Proxy with wget

If your network setup requires a proxy to access the internet, you can use wget with HTTP or SOCKS proxies. Set the proxy using environment variables:

export http_proxy=http://proxyserver:port/
export https_proxy=https://proxyserver:port/

To bypass the proxy for specific URLs, use the --no-proxy option:

wget --no-proxy https://localhost/sample.zip

This is useful if certain URLs (like local network addresses) should not go through the proxy.

Post Data to a Web Form

wget can also be used to interact with web forms by sending POST requests. This is useful for automating data submissions. The --post-data option allows you to pass form data to a URL:

wget --post-data="username=user&password=pass" https://learntheshell.com/login

This will simulate a form submission to a login page. The data is sent in the form of key-value pairs (key=value), and multiple fields are separated by an ampersand (&).

Setting Custom HTTP Headers

Sometimes you may need to send custom headers along with your request. You can do this using the --header option. This is useful when downloading content from APIs or websites that require specific headers (e.g., authentication tokens, content types):

wget --header="Authorization: Bearer your_token_here" https://api.learntheshell.com/data

You can also send multiple headers by repeating the --header option:

wget --header="Authorization: Bearer your_token_here" --header="Content-Type: application/json" https://api.learntheshell.com/data

Recursive Retrieval with Quotas

When recursively downloading a website, you might want to limit the total download size to avoid consuming too much bandwidth or disk space. You can set a download quota using the --quota option:

wget -r --quota=100m https://learntheshell.com/

This will stop the download once the total size exceeds 100 MB. You can specify the quota in bytes (B), kilobytes (k), megabytes (m), or gigabytes (g).

Download Over SSL/TLS

wget supports downloading over SSL/TLS (HTTPS), and it can be configured to handle various SSL settings. If you need to download a file from an HTTPS site with a self-signed certificate, use the --no-check-certificate option to bypass certificate validation:

wget --no-check-certificate https://learntheshell.com/sample.zip

This is useful when working with development servers that use self-signed certificates but should be used cautiously for security reasons.

Automatic Retries with Exponential Backoff

When encountering transient network issues, it’s often beneficial to retry with an increasing delay between attempts. wget has a built-in option for exponential backoff, which gradually increases the wait time between retries:

wget --tries=5 --waitretry=2 https://learntheshell.com/sample.zip

In this example, wget will retry up to five times, with an exponentially increasing delay starting at two seconds.


Conclusion

The wget command offers a wide range of advanced options that make it suitable for everything from simple file downloads to complex tasks like mirroring websites, interacting with APIs, and handling authentication. By mastering these lesser-known features, you can optimize your use of wget for a variety of situations, making it a highly versatile tool for any Linux user.