Ubuntu: I used wget to download html files, where are the images in the file stored?



Question:

Firefox was loading very slow, so I decided to use wget to save HTML files.I used the following command,

wget http://textbook.s-anand.net/ncert/class-xii/chemistry/hello-this-first-chapter  

The files have been saved in my home folder.But I don't know where the images are stored.I need them to use in Anki.

So where are the images stored?


Solution:1

I prefer to use --page-requisites (-p for short) instead of -r here as it downloads everything the page needs to display but no other pages, and I don't have to think about what kind of files I want.

Actually I'm usually using something like

wget -E -H -k -p http://textbook.s-anand.net/ncert/class-xii/chemistry/hello-this-first-chapter  

This means:

  • -E: Append .html to the file name if it is an HTML file but doesn't end in .html or similar
  • -H: Download files from other hosts, too
  • -k: After downloading convert any link in it so they point to the downloaded files
  • -p: Download anything the page needs for proper offline viewing


Solution:2

using the -r parameter should enable wget to download the whole folder, including your images.

wget -r http://textbook.s-anand.net/ncert/class-xii/chemistry/hello-this-first-chapter  


Solution:3

Wget simply downloads the HTML file of the page, not the images in the page, as the images in the HTML file of the page are written as URLs. To do what you want, use the -R (recursive), the -A option with the image file suffixes, the --no-parent option, to make it not ascend, and the --level option with 1.

Specifically wget -R -A .jpg,.png,.gif --no-parent --level <url>

Even better, most browsers have methods for saving pages for offline viewing.


Solution:4

Downloading the image files separately as well

I think this command could get you started.

 wget -r -P /save/location -A jpeg,jpg,bmp,gif,png http://textbook.s-anand.net/ncert/class-xii/chemistry/hello-this-first-chapter  

It allows you to specify the location to save the images and which types of files you wants. Maybe downloading the images as such is easier.

Source:

-r enables recursive retrieval. See Recursive Download for more information.

-P sets the directory prefix where all files and directories are saved to.

-A sets a whitelist for retrieving only certain file types. Strings and patterns are accepted, and both can be used in a comma separated list (as seen above). See Types of Files for more information.

Copying the image files from your folder

I have noticed that the website uses PNG image files. You can just copy those from your folder. This should be run in the folder where you stored the webpage.

find . -name "*.png" -exec cp '{}' ./some_dir/somewhere/ \;  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »