Tutorial :Where can I find a list of escape characters required for my JSON ajax return type?



Question:

I have an ASP.NET MVC action that is returning a JSON object.

The JSON:

{status: "1", message:"", output:"<div class="c1"><div class="c2">User generated text, so can be anything</div></div>"}  

Currently my HTML is breaking it. There will be user generated text in the output field, so I have to make sure I escape ALL things that need to be escaped.

Does someone have a list of all things I need to escape for?

I'm not using any JSON libraries, just building the string myself.


Solution:1

Take a look at http://json.org/. It claims a bit different list of escaped characters than Chris proposed.

\"  \\  \/  \b  \f  \n  \r  \t  \u four-hex-digits  


Solution:2

Here is a list of special characters that you can escape when creating a string literal for JSON:

  \b  Backspace (ASCII code 08)  \f  Form feed (ASCII code 0C)  \n  New line  \r  Carriage return  \t  Tab  \v  Vertical tab  \'  Apostrophe or single quote  \"  Double quote  \\  Backslash character  

Reference: String literals

Some of these are more optional than others. For instance, your string should be perfectly valid whether you escape the tab character or leave in a tab literal. You should certainly be handling the backslash and quote characters, though.


Solution:3

As explained in the section 9 of the official ECMA specification (http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf) in JSON, the following chars have to be escaped:

  • U+0022 (", the quotation mark)
  • U+005C (\, the backslash or reverse solidus)
  • U+0000 to U+001F (the ASCII control characters)

In addition, in order to safely embed JSON in HTML, the following chars have to be also escaped:

  • U+002F (/)
  • U+0027 (')
  • U+003C (<)
  • U+003E (>)
  • U+0026 (&)
  • U+0085 (Next Line)
  • U+2028 (Line Separator)
  • U+2029 (Paragraph Separator)

Some of the above characters can be escaped with the following short escape sequences defined in the standard:

  • \" represents the quotation mark character (U+0022).
  • \\ represents the reverse solidus character (U+005C).
  • \/ represents the solidus character (U+002F).
  • \b represents the backspace character (U+0008).
  • \f represents the form feed character (U+000C).
  • \n represents the line feed character (U+000A).
  • \r represents the carriage return character (U+000D).
  • \t represents the character tabulation character (U+0009).

The other characters which need to be escaped will use the \uXXXX notation, that is \u followed by the four hexadecimal digits that encode the code point.

The \uXXXX can be also used instead of the short escape sequence, or to optionally escape any other character from the Basic Multilingual Plane (BMP).


Solution:4

Right away, I can tell that at least the double quotes in the HTML tags are gonna be a problem. Those are probably all you'll need to escape for it to be valid JSON; just replace

"  

with

\"  

As for outputting user-input text, you do need to make sure you run it through HttpUtility.HtmlEncode() to avoid XSS attacks and to make sure that it doesn't screw up the formatting of your page.


Solution:5

From the spec:

All characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark (U+0022), reverse solidus [backslash] (U+005C), and the control characters U+0000 to U+001F

Just because e.g. Bell (U+0007) doesn't have a single-character escape code does not mean that you don't need to escape it. Use the Unicode escape sequence \u0007.


Solution:6

The JSON reference states:

   any-Unicode-character-       except-"-or-\\-or-       control-character  

Then lists the standard escape codes:

    \" Standard JSON quote    \\ Backslash (Escape char)    \/ Forward slash    \b Backspace (ascii code 08)    \f Form feed (ascii code 0C)    \n Newline    \r Carriage return    \t Horizontal Tab    \u four-hex-digits  

From this I assumed that I needed to escape all the listed ones and all the other ones are optional. You can choose to encode all characters into \uXXXX if you so wished, or you could only do any non-printable 7-bit ASCII characters or characters with Unicode value not in \u0020 <= x <= \u007E range (32 - 126). Preferably do the standard characters first for shorter escape codes and thus better readability and performance.

Additionally you can read point 2.5 (Strings) from RFC 4627.

You may (or may not) want to (further) escape other characters depending on where you embed that JSON string, but that is outside the scope of this question.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »