Saturday, November 7, 2015

Know Your Limits: Check File Size Before You Download!

At times it may be necessary for you to check the size of a downloadable file before actually downloading it. Although many well-designed websites display file sizes alongside their URLs, there are ample cases where there is no such indication. An ignorant mouse click may cost you a significant quantity of data before you realize that the download simply cannot proceed under the current data package quota, or that you have simply picked a wrong download link.

While browsers like Firefox can be configured to display confirmation dialogs acknowledging file size before the download starts progressing, this is generally an illusion because modern browsers generally start downloading the file in advance while the confirmation dialog is still in the foreground.

Fortunately, in most cases you may be able to use a HTTP tool (like wget or curl) to check the size of the associated file in advance before actually initiating the content download. This works not only for files, but for most other kinds of resources as well.

Here's how you can use wget to check the size of a download without initiating it. It uses the HTTP HEAD method to restrict the server response to headers, avoiding the actual payload (download content).

wget -S --method=HEAD -O - your_url_goes_here

-S is for asking wget to output the server response, while -O writes the output to a file, with - indicating that the output file is standard output (in our case, the terminal).

On a Linux machine with wget version 1.15, this would provide an output similar to what follows:

$ wget -S --method=HEAD -O - http://jflex.de/release/jflex-1.6.1.tar.gz
Resolving jflex.de (jflex.de)... 65.19.178.144
Connecting to jflex.de (jflex.de)|65.19.178.144|:80... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Date: Sat, 07 Nov 2015 16:05:43 GMT
  Server: Apache/2.4.7 (Ubuntu)
  Last-Modified: Sat, 11 Apr 2015 02:44:19 GMT
  ETag: "2e334f-51369db8566c0"
  Accept-Ranges: bytes
  Content-Length: 3027791
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: application/x-gzip
Length: 3027791 (2.9M) [application/x-gzip]
Remote file exists.

As seen above the download size is indicated by the Content-Length header. Unless packet transmission errors happen during the download process, this would be the amount of data to be consumed in the download process (plus a small margin for headers and other lower-level synchronization and acknowledgement signals).

Unfortunately some servers may not provide the Length header, in which case the value would either appear as unspecified or not appear at all. In such cases an attempt via a browser would produce the same result. As of now I haven't been able to find a workaround for this issue.

No comments: