Tutorial :Parsing Media RSS using XMLReader



Question:

<rss version="2.0"      xmlns:media="http://search.yahoo.com/mrss/">      <channel>           <title>Title of RSS feed</title>           <link>http://www.google.com</link>           <description>Details about the feed</description>           <pubDate>Mon, 24 Nov 08 21:44:21 -0500</pubDate>           <language>en</language>           <item>               <title>Article 1</title>               <description><![CDATA[How to use StackOverflow.com]]></description>              <link>http://youtube.com/?v=y6_-cLWwEU0</link>              <media:player url="http://youtube.com/?v=y6_-cLWwEU0"    />               <media:thumbnail url="http://img.youtube.com/vi/y6_-cLWwEU0/default.jpg"                  width="120" height="90" />               <media:title>Jared on StackOverflow</media:title>               <media:category label="Tags">tag1,tag2</media:category>               <media:credit>Jared</media:credit>               <enclosure url="http://youtube.com/v/y6_-cLWwEU0.swf"                  length="233"                  type="application/x-shockwave-flash"/>          </item>      </channel>  </rss>  

I decided to use XMLReader parsing my large xml files. I am having trouble getting the data inside each item especially the thumbnail

Here's my code

//////////////////////////////    $itemList = array();  $i=0;  $xmlReader = new XMLReader();  $xmlReader->open('XMLFILE');  while($xmlReader->read()) {      if($xmlReader->nodeType == XMLReader::ELEMENT) {              if($xmlReader->localName == 'title') {                      $xmlReader->read();               $itemList[$i]['title'] = $xmlReader->value;          }          if($xmlReader->localName == 'description') {              // move to its textnode / child              $xmlReader->read();               $itemList[$i]['description'] = $xmlReader->value;             }               if($xmlReader->localName == 'media:thumbnail') {              // move to its textnode / child              $xmlReader->read();               $itemList[$i]['media:thumbnail'] = $xmlReader->value;                       $i++;          }             }  }  ////////////////  

Is it advisable to use DOMXpath since I was parsing huge XML file? I really appreciate your advice.


Solution:1

xtian,

If memory usage is a concern of yours, I would recommend staying away from DOM/XPath as it requires that the whole file be read into memory first. XMLReader only reads in a chunk at a time (probably 8K as that seems to be the standard PHP Chunk Size).

I have re-written what you originally posted and it captures the following elements contained within an <item> Element:

  1. title
  2. description
  3. media:thumbnail
  4. media:title

The thing you have to remember is that XMLReader::localName will return the Element name minus any XMLNS declaration (e.g. media:thumbnail's localName is thumbnail). You will want to be careful of this as the media:title value could overwrite the title value.

Here is what I re-wrote:

<?php  define ('XMLFILE', dirname(__FILE__) . '/Rss.xml');  echo "<pre>";    $items = array ();  $i = 0;    $xmlReader = new XMLReader();  $xmlReader->open (XMLFILE, null, LIBXML_NOBLANKS);    $isParserActive = false;  $simpleNodeTypes = array ("title", "description", "media:title");    while ($xmlReader->read ())  {      $nodeType = $xmlReader->nodeType;        // Only deal with Beginning/Ending Tags      if ($nodeType != XMLReader::ELEMENT && $nodeType != XMLReader::END_ELEMENT)      {          continue;      }      else if ($xmlReader->name == "item")      {          if (($nodeType == XMLReader::END_ELEMENT) && $isParserActive)          {              $i++;          }          $isParserActive = ($nodeType != XMLReader::END_ELEMENT);      }        if (!$isParserActive || $nodeType == XMLReader::END_ELEMENT)      {          continue;      }        $name = $xmlReader->name;        if (in_array ($name, $simpleNodeTypes))      {          // Skip to the text node          $xmlReader->read ();          $items[$i][$name] = $xmlReader->value;      }      else if ($name == "media:thumbnail")      {          $items[$i]['media:thumbnail'] = array (              "url" => $xmlReader->getAttribute("url"),              "width" => $xmlReader->getAttribute("width"),              "height" => $xmlReader->getAttribute("height")          );      }  }    var_dump ($items);    echo "</pre>";    ?>  

If you have any questions on how this works, I would be more than happy to answer them for you.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »