Tutorial :why can't curl download the same URL in different format?



Question:

curl downloads http://mysite.com/Lunacy%20Disc%202%20of%202%20(U)(Saturn).zip

but not

http://mysite.com/Lunacy Disc 2 of 2 (U)(Saturn).zip  

Why is this the case?

Do I need to convert it to the first format ?

using the URL generated via urlencode($url) fails.


Solution:1

Two problems:

  1. urlencode will also encode the slashes on you. It's meant to encode query strings for use in urls, not full urls.
  2. urlencode encodes spaces as +. You need rawurlencode if you want spaces as %20.


Solution:2

To convert an URL to the "first format", you can use the PHP function urlencode.


Now, for the "why", the answer can probably be found in the RFC 1738 - Uniform Resource Locators (URL).

Quoting some paragraphs :

Octets must be encoded if they have no corresponding graphic  character within the US-ASCII coded character set, if the use of the  corresponding character is unsafe, or if the corresponding character  is reserved for some other interpretation within the particular URL  scheme.    No corresponding graphic US-ASCII:    URLs are written only with the graphic printable characters of the  US-ASCII coded character set. The octets 80-FF hexadecimal are not  used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent  control characters; these must be encoded.  

A space has the code %20 -- it's not in the range 00-1F, so it should be encoded for that reason... But, a bit later :

Unsafe:       Characters can be unsafe for a number of reasons.  The space     character is unsafe because significant spaces may disappear and     insignificant spaces may be introduced when URLs are transcribed or     typeset or subjected to the treatment of word-processing programs.  

And here, you know why the space character has to be escaped/encoded too ;-)


Solution:3

urlencode() does indeed fail with curl, if your problem is just with spaces, you can manually substitute them

$url = str_replace(' ', '%20', $url);  


Solution:4

You need to urlencode to translate the spaces (in your example; there are other characters that require it) for transmission across the internet. The encoding ensures that the various communications protocols don't terminate or otherwise mangle the string while they're handling it.


Solution:5

http://mysite.com/Lunacy Disc 2 of 2 (U)(Saturn).zip

That is not a valid url. Accessing urls like this may work in your browser because most modern browsers will automatically encode the url for you if required. The curl library must not do this automatically.


Solution:6

Why? Because some characters has special meanings such as # (html anchor).

So all characters except alfanumeric ones are encoded regardless need to be encoded or not.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »