Tutorial :Coldfusion XMLFormat() not converting all characters



Question:

I am using XMLFormat() to encode some text for an XML document. However, when I go to read the XML file I created I get an invalid character error. Why does XMLFormat() not properly encode all characters?

I'm running CF8.


Solution:1

Are you sure to output the file in the right encoding? You can't just do

<cffile action="write" file="foo.xml" output="#xml#" />  

as the result very likely diverges from the character set your XML is in. Unless otherwise noted (by an encoding declaration), XML files are treated as UTF-8, and you should do:

<cffile action="write" file="foo.xml" output="#xml#" charset="utf-8" />  <!--- and --->  <cffile action="read" file="foo.xml" variable="xml" charset="utf-8" />  


Solution:2

I feel that this is a bug in XMLFormat. I am not sure who the original author of the snippet below is but here is an approach to catch the extra characters via regex...

  <cfset myText = xmlFormat(myText)>      <cfscript>        i = 0;        tmp = '';        while(ReFind('[^\x00-\x7F]',myText,i,false))        {          i = ReFind('[^\x00-\x7F]',myText,i,false); // discover high chr and save it's numeric string position.          tmp = '&##x#FormatBaseN(Asc(Mid(myText,i,1)),16)#;'; // obtain the high chr and convert it to a hex numeric chr.          myText = Insert(tmp,myText,i); // insert the new hex numeric chr into the string.          myText = RemoveChars(myText,i,1); // delete the redundant high chr from string.          i = i+Len(tmp); // adjust the loop scan for the new chr placement, then continue the loop.        }        return myText;    </cfscript>  


Solution:3

Do not forget also to put <cfprocessingdirective pageencoding="utf-8"> on top of your template.


Solution:4

if your trying to return your XML directly to the browser, you might want to try something like for the user to download it

<cfheader name="Content-Disposition" charset="utf-8" value="attachment; filename=export.xml">  <cfcontent variable="#someXMLPacket#" type="text/xml"  reset="true">  

or, if you want it returned as a webpage (ala REST) then this should do the trick

<cfheader charset="utf-8">  <cfcontent variable="#someXMLPacket#" type="text/xml"  reset="true">  

hope that helps


Solution:5

Unfortunately, XMLFormat is just not an all-inclusive solution. It has a very limited list of characters that it will replace [documentation].

You'll need to do custom encoding of characters that are invalid for XML but not covered by XMLFormat.

It's definitely not very efficient, but a potential solution would be to loop over the content of typically-suspect fields (anything user-generated, for starters) character-by-character, checking the ascii code, and if it's above 255, either omit the character or properly encode it.


Solution:6

This was a huge issue for me as well, and it turns out charset is the main factor, you need to clearly specify the correct charset.

For me I was having foreign languages inside xml, and wouldn't be parsed correctly until i put in the correct charset...


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »