Tutorial :How to parse a text file with C#



Question:

By text formatting I meant something more complicated.

At first I began manually adding the 5000 lines from the text file I'm asking this question for,into my project.

The text file has 5000 lines with different length.For example:

1   1   ITEM_ETC_GOLD_01    골ë"œ(소)   xxx xxx xxx_TT_DESC 0   0   3   3   5   0   180000  3   0   1   0   0   255 1   1   0   0   0   0   0   0   0   0   0   0   -1  0   -1  0   -1  0   -1  0   -1  0   0   0   0   0   0   0   100 0   0   0   xxx item\etc\drop_ch_money_small.bsr    xxx xxx xxx 0   2   0   0   1   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0   0   0   0   0   0   0   0   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1   í'œí˜„í•  골ë"œì˜ ì–'(param1이상) -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx 0   0    1   4   ITEM_ETC_HP_POTION_01   HP 회복 약초    xxx SN_ITEM_ETC_HP_POTION_01    SN_ITEM_ETC_HP_POTION_01_TT_DESC    0   0   3   3   1   1   180000  3   0   1   1   1   255 3   1   0   0   1   0   60  0   0   0   1   21  -1  0   -1  0   -1  0   -1  0   -1  0   0   0   0   0   0   0   100 0   0   0   xxx item\etc\drop_ch_bag.bsr    item\etc\hp_potion_01.ddj   xxx xxx 50  2   0   0   1   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0   0   0   0   0   0   0   0   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 120 HP회복ì–'   0   HP회복ì–'(%)    0   MP회복ì–'   0   MP회복ì–'(%)    -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx 0   0    1   5   ITEM_ETC_HP_POTION_02   HP 회복약 (소)  xxx SN_ITEM_ETC_HP_POTION_02    SN_ITEM_ETC_HP_POTION_02_TT_DESC    0   0   3   3   1   1   180000  3   0   1   1   1   255 3   1   0   0   1   0   110 0   0   0   2   39  -1  0   -1  0   -1  0   -1  0   -1  0   0   0   0   0   0   0   100 0   0   0   xxx item\etc\drop_ch_bag.bsr    item\etc\hp_potion_02.ddj   xxx xxx 50  2   0   0   2   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0   0   0   0   0   0   0   0   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 220 HP회복ì–'   0   HP회복ì–'(%)    0   MP회복ì–'   0   MP회복ì–'(%)    -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx 0   0  

The text between the first character(1) and the second character(1/4/5) is not a whitespace,it's a tab.There's no whitespaces in that text file.

What I want:

I want to get the second integer(In the three lines I posted above,the second integers are 1,4 and 5) and the string in the middle of each line indicating the path(It starts with "item\" and ends with the file extension ".ddj").

My problem:

When I google "Text formatting C#" - all I get is how to open a text file and how to write a text file in C#.I don't know how to search for text inside a text file.Also I can't search for the first integer,because in case its a small integer like in the three lines I posted above,I wont be able to find the corrent location,because for example "1" might exist in a different location.

My question:

It would be the best If I write a program that would delete anything,but what I need.

The other way in my mind is to directly search inside that file,but as I mentioned above - I might get the wrong location of the second integer if its too low.

Please suggest something,I can't format all this by hand.


Solution:1

OK, here's what we do: open the file, read it line by line, and split it by tabs. Then we grab the second integer and loop through the rest to find the path.

StreamReader reader = File.OpenText("filename.txt");  string line;  while ((line = reader.ReadLine()) != null) {      string[] items = line.Split('\t');      int myInteger = int.Parse(items[1]); // Here's your integer.      // Now let's find the path.      string path = null;      foreach (string item in items) {          if (item.StartsWith("item\\") && item.EndsWith(".ddj")) {              path = item;          }      }        // At this point, `myInteger` and `path` contain the values we want      // for the current line. We can then store those values or print them,      // or anything else we like.  }  


Solution:2

Another solution, this time making use of regular expressions:

using System.Text.RegularExpressions;    ...    Regex parts = new Regex(@"^\d+\t(\d+)\t.+?\t(item\\[^\t]+\.ddj)");    StreamReader reader = FileInfo.OpenText("filename.txt");  string line;  while ((line = reader.ReadLine()) != null) {      Match match = parts.Match(line);      if (match.Success) {          int number = int.Parse(match.Group(1).Value);          string path = match.Group(2).Value;            // At this point, `number` and `path` contain the values we want          // for the current line. We can then store those values or print them,          // or anything else we like.      }  }  

That expression's a little complex, so here it is broken down:

^        Start of string  \d+      "\d" means "digit" - 0-9. The "+" means "one or more."           So this means "one or more digits."  \t       This matches a tab.  (\d+)    This also matches one or more digits. This time, though, we capture it           using brackets. This means we can access it using the Group method.  \t       Another tab.  .+?      "." means "anything." So "one or more of anything". In addition, it's lazy.           This is to stop it grabbing everything in sight - it'll only grab as much           as it needs to for the regex to work.  \t       Another tab.    (item\\[^\t]+\.ddj)      Here's the meat. This matches: "item\<one or more of anything but a tab>.ddj"  


Solution:3

You could do something like:

using (TextReader rdr = OpenYourFile()) {      string line;      while ((line = rdr.ReadLine()) != null) {          string[] fields = line.Split('\t'); // THIS LINE DOES THE MAGIC          int theInt = Convert.ToInt32(fields[1]);      }  }  

The reason you didn't find relevant result when searching for 'formatting' is that the operation you are performing is called 'parsing'.


Solution:4

Like it's already mentioned, I would highly recommend using regular expression (in System.Text) to get this kind of job done.

In combo with a solid tool like RegexBuddy, you are looking at handling any complex text record parsing situations, as well as getting results quickly. The tool makes it real easy.

Hope that helps.


Solution:5

Try regular expressions. You can find a certain pattern in your text and replace it with something that you want. I can't give you the exact code right now but you can test out your expressions using this.

http://www.radsoftware.com.au/regexdesigner/


Solution:6

You could open the file up and use StreamReader.ReadLine to read the file in line-by-line. Then you can use String.Split to break each line into pieces (use a \t delimiter) to extract the second number.

As the number of items is different you would need to search the string for the pattern 'item\*.ddj'.

To delete an item you could (for example) keep all of the file's contents in memory and write out a new file when the user clicks 'Save'.


Solution:7

One way that I've found really useful in situations like this is to go old-school and use the Jet OLEDB provider, together with a schema.ini file to read large tab-delimited files in using ADO.Net. Obviously, this method is really only useful if you know the format of the file to be imported.

public void ImportCsvFile(string filename)  {      FileInfo file = new FileInfo(filename);        using (OleDbConnection con =               new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" +              file.DirectoryName + "\";              Extended Properties='text;HDR=Yes;FMT=TabDelimited';"))      {          using (OleDbCommand cmd = new OleDbCommand(string.Format                                    ("SELECT * FROM [{0}]", file.Name), con))          {              con.Open();                // Using a DataReader to process the data              using (OleDbDataReader reader = cmd.ExecuteReader())              {                  while (reader.Read())                  {                      // Process the current reader entry...                  }              }                // Using a DataTable to process the data              using (OleDbDataAdapter adp = new OleDbDataAdapter(cmd))              {                  DataTable tbl = new DataTable("MyTable");                  adp.Fill(tbl);                    foreach (DataRow row in tbl.Rows)                  {                      // Process the current row...                  }              }          }      }  }   

Once you have the data in a nice format like a datatable, filtering out the data you need becomes pretty trivial.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »