Tutorial :Trouble Percent-Encoding Spaces in Java



Question:

I am using the URLUTF8Encoder.java class from W3C (www.w3.org/International/URLUTF8Encoder.java).

Currently, it will encode any blank spaces ' ' into plus signs '+'.

I am having difficulty modifying the code to percent-encode the blank space into '%20'. Unfortunately, I am not too familiar with hex. Can anyone help me out? I need to modify this snippet...

else if (ch == ' ') { // space                  sbuf.append('+');  

in the following code:

final static String[] hex = { "%00", "%01", "%02", "%03", "%04", "%05",              "%06", "%07", "%08", "%09", "%0A", "%0B", "%0C", "%0D", "%0E",              "%0F", "%10", "%11", "%12", "%13", "%14", "%15", "%16", "%17",              "%18", "%19", "%1A", "%1B", "%1C", "%1D", "%1E", "%1F", "%20",              "%21", "%22", "%23", "%24", "%25", "%26", "%27", "%28", "%29",              "%2A", "%2B", "%2C", "%2D", "%2E", "%2F", "%30", "%31", "%32",              "%33", "%34", "%35", "%36", "%37", "%38", "%39", "%3A", "%3B",              "%3C", "%3D", "%3E", "%3F", "%40", "%41", "%42", "%43", "%44",              "%45", "%46", "%47", "%48", "%49", "%4A", "%4B", "%4C", "%4D",              "%4E", "%4F", "%50", "%51", "%52", "%53", "%54", "%55", "%56",              "%57", "%58", "%59", "%5A", "%5B", "%5C", "%5D", "%5E", "%5F",              "%60", "%61", "%62", "%63", "%64", "%65", "%66", "%67", "%68",              "%69", "%6A", "%6B", "%6C", "%6D", "%6E", "%6F", "%70", "%71",              "%72", "%73", "%74", "%75", "%76", "%77", "%78", "%79", "%7A",              "%7B", "%7C", "%7D", "%7E", "%7F", "%80", "%81", "%82", "%83",              "%84", "%85", "%86", "%87", "%88", "%89", "%8A", "%8B", "%8C",              "%8D", "%8E", "%8F", "%90", "%91", "%92", "%93", "%94", "%95",              "%96", "%97", "%98", "%99", "%9A", "%9B", "%9C", "%9D", "%9E",              "%9F", "%A0", "%A1", "%A2", "%A3", "%A4", "%A5", "%A6", "%A7",              "%A8", "%A9", "%AA", "%AB", "%AC", "%AD", "%AE", "%AF", "%B0",              "%B1", "%B2", "%B3", "%B4", "%B5", "%B6", "%B7", "%B8", "%B9",              "%BA", "%BB", "%BC", "%BD", "%BE", "%BF", "%C0", "%C1", "%C2",              "%C3", "%C4", "%C5", "%C6", "%C7", "%C8", "%C9", "%CA", "%CB",              "%CC", "%CD", "%CE", "%CF", "%D0", "%D1", "%D2", "%D3", "%D4",              "%D5", "%D6", "%D7", "%D8", "%D9", "%DA", "%DB", "%DC", "%DD",              "%DE", "%DF", "%E0", "%E1", "%E2", "%E3", "%E4", "%E5", "%E6",              "%E7", "%E8", "%E9", "%EA", "%EB", "%EC", "%ED", "%EE", "%EF",              "%F0", "%F1", "%F2", "%F3", "%F4", "%F5", "%F6", "%F7", "%F8",              "%F9", "%FA", "%FB", "%FC", "%FD", "%FE", "%FF" };    public static String encode(String s) {          StringBuffer sbuf = new StringBuffer();          int len = s.length();          for (int i = 0; i < len; i++) {              int ch = s.charAt(i);              if ('A' <= ch && ch <= 'Z') { // 'A'..'Z'                  sbuf.append((char) ch);              } else if ('a' <= ch && ch <= 'z') { // 'a'..'z'                  sbuf.append((char) ch);              } else if ('0' <= ch && ch <= '9') { // '0'..'9'                  sbuf.append((char) ch);              } else if (ch == ' ') { // space                  sbuf.append('+');              } else if (ch == '-'                      || ch == '_' // unreserved                      || ch == '.' || ch == '!' || ch == '~' || ch == '*'                      || ch == '\'' || ch == '(' || ch == ')') {                  sbuf.append((char) ch);              } else if (ch <= 0x007f) { // other ASCII                  sbuf.append(hex[ch]);              } else if (ch <= 0x07FF) { // non-ASCII <= 0x7FF                  sbuf.append(hex[0xc0 | (ch >> 6)]);                  sbuf.append(hex[0x80 | (ch & 0x3F)]);              } else { // 0x7FF < ch <= 0xFFFF                  sbuf.append(hex[0xe0 | (ch >> 12)]);                  sbuf.append(hex[0x80 | ((ch >> 6) & 0x3F)]);                  sbuf.append(hex[0x80 | (ch & 0x3F)]);              }          }          return sbuf.toString();      }  

Thanks!


Solution:1

I won't ask why you're doing this, and just answer your question directly. Please read other answers to determine if you really want to be modifying this code. If you just remove the code:

else if (ch == ' ') { // space     sbuf.append('+');  }   

It will do what you want, because the space character will be taken care of by the code:

else if (ch <= 0x007f) { // other ASCII     sbuf.append(hex[ch]);  }   


Solution:2

You might want to check out Apache Common's codec package, it's probably a lot more robust : http://commons.apache.org/codec/ - The package you're using is about 14 years old and only encodes into one type of encoding (www-url-form-encoded) - which REQUIRES spaces to be encoded as '+'. If you're trying do do standard URL encoding (which wants spaces as %20), you'll need to use a different package entirely.


Solution:3

Why are you using this class instead of the API method?

java.net.URLEncoder.encode("your string", "utf-8");

And why is it a problem that spaces are encoded as + characters? That is exactly how URL safe character encoding is supposed to work.


Solution:4

Just do this:

String str = "Hello World+You";  String encodedStr = URLEncoder.encode(str, "UTF-8");  encodedStr = encodedStr.replace("+", "%20");  System.out.println("Encoded String: " + encodedStr);  


Solution:5

It's working correctly; it should work with + as well as it would with %20.

Maybe try java.net.URLEncoder("url", "UTF-8")?


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »