Tutorial :Regex for Parsing Simple Text-Based Datafile


Can anyone give me a hand with a touch of regex?

I'm reading in a list of "locations" for a simple text adventure (those so popular back in the day). However, I'm unsure as to how to obtain the input.

The locations all follow the format:

<location_name>, [<item>]      [direction, location_name]  

Such as:

Albus Square, Flowers, Traffic Cone      NORTH, Franklandclaw Lecture Theatre      WEST, Library of Enchanted Books      SOUTH, Furnesspuff College    Library of Enchanted Books      EAST, Albus Square      UP, Reading Room  

(Subsequent locations are separated by a blank line.)

I'm storing these as Location objects with the structure:

public class Location {        private String name;        private Map<Direction, Location> links;        private List<Item> items;    }  

I use a method to retrieve the data from a URL and create the Location objects from the read text, but I'm at a complete block as to do this. I think regex would be of help. Can anyone lend me a well-needed hand?


Agree w/ willcodejavaforfood, regex could be used but isn't a big boost here.

Sounds like you just need a little algorithm help (sloppy p-code follows)...

currloc = null  while( line from file )      if line begins w/ whitespace          (dir, loc) = split( line, ", " )          add dir, loc to currloc      else          newlocdata = split( line, ", " )          currloc = newlocdata[0]          for i = 1 to size( newlocdata ) - 1              item = newlocdata[i]              add item to currloc  


You don't want to use a text-only format for this:

  • What happens when you have more than a single flower item? Are they all the same? Can't an adventurer collect a bouqet at by picking single flowers at several locations?

  • There will probably be several rooms with the same name ("cellar", "street corner"), i.e. filler rooms which add to the atmosphere but nothing to the game. They don't get a description of their own, though. How to keep them apart?

  • What if a name contains a comma?

  • Eventually, you'll want to use Unicode for foreign names or formatting instructions.

Since this is structured data which can contain lots of odd cases, I suggest to use XML for this:

<locations>      <location>          <name>Albus Square</name>          <summary>Short description for returning adventurer</summary>          <description>Long text here ... with formatting, etc.</description>          <items>              <item>Flowers</item>              <item>Traffic Cone</item>          <items>          <directions>              <north>Franklandclaw Lecture Theatre</north>              <west>Library of Enchanted Books</west>              <south>Furnesspuff College</south>          </directions>      </location>      <location>          <name>Library of Enchanted Books</name>          <directions>              <east>Albus Square</east>              <up>Reading Room</up>          </directions>      </location>  </locations>  

This allows for much greater flexibility, solves a lot of issues like formatting description text, Unicode characters, etc. plus you can use more than a single item/location with the same name by using IDs (numbers) instead of text.

Use JDom or DecentXML to parse the game config.


Can't get my head into Java-mode right now, so here's some pseudo-code that should do it:

Data = MyString.split('\n\n++\s*+');    for ( i=0 ; i<Data.length ; i++ )  {      CurLocation = Data[i].split('\n\s*+');        LocationInfo = CurLocation[0].split(',\s*+');        LocationName = LocationInfo[0];        for ( n=1 ; n<LocationInfo.length ; n++ )      {          Items[n-1] = LocationInfo[n];      }          for ( n=1 ; n<CurLocation.length ; n++ )      {          DirectionInfo = LocationInfo[n].split(',\s*+');            DirectionName = DirectionInfo[0];            for ( x=1 ; x<DirectionInfo.length ; x++ )          {              DirectionLocation[x-1] = DirectionInfo[x];          }        }      }  


Can you change the format of the data. That format is klunky. I suspect that you're busy reinventing the square wheel... This screems "Just use XML" to me.


I think using XML is overkill (shooting sparrows with cannons) while regexps are "underkill" (using a too weak tool, scrubbing floors with a toothbrush).

The right balance sounds like it's "the .ini format" or "mail headers with sections". For python there are library docs at http://docs.python.org/library/configparser.html.

A brief example:

[albus_square]  name: Albus Square  items: Flowers, Traffic Cone  north: lecture_theatre  west: library_enchanted_books  south: furnesspuff_college  

I'd assume there's a Java library for this format. As another poster has pointed out, you might have name collision so I took the liberty of adding a "name:" field. The name in the square brackets would be the unique identifier.

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »