Tutorial :How to handle xml that contains nested xml using c# xmlreader?



Question:

I'm using c# to interact with a database that has an exposed REST API. The table that I'm interested in contains forum posts, some of which themselves contain xml.

Whenever my result set contains a post that has xml, my application throws an error as follows:

Exception Details: System.Xml.XmlException: '>' is an unexpected token. The expected token is '"' or '''. Line 1, position 62.

And this is the line that fails:

Line 44: ds.ReadXml(xmlData);

And this is the code I'm using:

        var webClient = new WebClient();            string searchString = searchValue.Text;            string requestUrl = "http://myserver/restapi.ashx/search.xml?pagesize=4&pageindex=0&query=";          requestUrl += searchString;            XmlReaderSettings settings = new XmlReaderSettings();          settings.ProhibitDtd = false;              XmlReader xmlData = XmlReader.Create(webClient.OpenRead(requestUrl),settings);            DataSet ds = new DataSet();          ds.ReadXml(xmlData);          Repeater1.DataSource = ds.Tables[1];          Repeater1.DataBind();  

And this is the type of XML record that it's choking on (the stuff in the node is causing the problem):

  <SearchResults PageSize="1" PageIndex="0" TotalCount="342">    <SearchResult>      <ContentId>994</ContentId>      <Title>Help Files: What are they written in?</Title>      <Url>http://myserver/linktest.aspx</Url>      <Date>2008-10-16T16:18:00+01:00</Date><ContentType>post</ContentType>      <Body><div class="ForumPostBodyArea">  <div class="ForumPostContentText">  <p>Can anyone see anything obviously wrong with this xml, when its fired to CRM Its creating 13 null records.</p>  <p>&lt;?xml version="1.0" encoding="UTF-8"?&gt;&lt;soap:Envelope xmlns:typens="<a href="http://tempuri.org/type">http://tempuri.org/type</a>" soap:encodingStyle="<a href="http://schemas.xmlsoap.org/soap/encoding/">http://schemas.xmlsoap.org/soap/encoding/</a>" xmlns:soap="<a href="http://schemas.xmlsoap.org/soap/envelope/">http://schemas.xmlsoap.org/soap/envelope/</a>" xmlns:xsi="<a href="http://www.w3.org/2001/XMLSchema-instance">http://www.w3.org/2001/XMLSchema-instance</a>" xmlns:soapenc="<a href="http://schemas.xmlsoap.org/soap/encoding/">http://schemas.xmlsoap.org/soap/encoding/</a>" xmlns:wsdlns="<a href="http://tempuri.org/wsdl/">http://tempuri.org/wsdl/</a>" xmlns:xsd="<a href="http://www.w3.org/2001/XMLSchema%22%3E%3Csoap:Header%3E%3CSessionHeader%3E%3CsessionId">http://www.w3.org/2001/XMLSchema"&gt;&lt;soap:Header&gt;&lt;SessionHeader&gt;&lt;sessionId</a>  xsi:type="xsd:long"&gt;18208442035524&lt;/sessionId&gt;&lt;/SessionHeader&gt;&lt;/soap:Header&gt;&lt;soap:Body&gt;&lt;typens:add&gt;&lt;entityname  xsi:type="xsd:string"&gt;lead&lt;/entityname&gt;&lt;records  xsi:nil="true" xsi:type="typens:ewarebase" /&gt;&lt;status  xsi:type="xsd:string"&gt;PreRegistration&lt;/status&gt;&lt;requester  xsi:type="xsd:string"&gt;Mimnagh&lt;/requester&gt;&lt;personfirstname  xsi:type="xsd:string"&gt;Sean&lt;/personfirstname&gt;&lt;personlastname  xsi:type="xsd:string"&gt;Test2&lt;/personlastname&gt;&lt;personsalutation  xsi:type="xsd:string"&gt;Mr&lt;/personsalutation&gt;&lt;details  xsi:type="xsd:string"&gt;test project  details&lt;/details&gt;&lt;description xsi:type="xsd:string"&gt;test  description details&lt;/description&gt;&lt;comments  xsi:type="xsd:string"&gt;test project  comments&lt;/comments&gt;&lt;personemail  xsi:type="xsd:string"&gt;smimnagh@mac.com&lt;/personemail&gt;&lt;personphonenumber  xsi:type="xsd:string"&gt;12334566777&lt;/personphonenumber&gt;&lt;type  xsi:type="xsd:string"&gt;PreReg&lt;/type&gt;&lt;companyname  xsi:type="xsd:string"&gt;Site  Client&lt;/companyname&gt;&lt;/typens:add&gt;&lt;/soap:Body&gt;&lt;/soap:Envelope&gt;</p>  <p>Many thanks</p>  </div>  </div>  </Body>      <Tags>      <Tag>xml</Tag>    </Tags>      <IndexedAt>2010-07-08T11:53:46.848+01:00</IndexedAt>    </SearchResult>    </SearchResults>  

Is there something that I can do with the xmlreader to make it ignore whatever's causing the problem?

Please note that I can't change the XML prior to consuming it - so if it's malformed then I wonder if there's a way to ignore or modify that particular record without generating an error?

Thanks!


Solution:1

It looks like some of your quotes need escaping in the contents of some of your elements. Try using

&quot;  

for quote marks that aren't wrapping attribute values.

UPDATE:

Because the data you want to read isn't strictly XML (it's nearly XML) you're best bet is to

  1. Either you or your boss, if you have one, screams at the third party because they're not sending you well formed XML.
  2. Perform some horrible hack to try and convert whatever you might get to XML.

If you have to go with point 2, the simplest thing that pops into my head is to read the characters of the 'XML' counting in and out of angle brackets. If you find any " characters and you're not within any angle brackets, replace the " with

&quot;  

But note that doing that is a complete last resort.


Solution:2

The Content of your <Body> tag is not well formed. XML is very strict with the syntax of data. Either embed a CDATA section into your XML or escape the string properly.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »