Tutorial :Parsing file in C++


I have some string data in the following format: "Ronit","abc""defgh","abcdef,"fdfd",

Can somebody suggest some good code in C++ to return the comma-separated tokens, when the commas are not inside a string?

I.e. it should return

  1. "Ronit"
  2. "abc""defgh"
  3. "abcdef,"fdfd"

to be more clear

Thanks all of you for kind help.

Below is my sample file which is given as input,

First line will tell me how many columns i have #


"user1","user,user2","user3", "userrr





"","user2,"", "", #

Below is an output of csv file, please give me compile code, so that i can test, thanks again for your kind help.

1st Row, 1)user1, 2)user,user2 3)user3 4)userrrr4

Note rr4 is in next line.

2nd Row, 1)user1 2)user2 3)user3 4)us er4

note er4 is in next line.

3rd row, 1)user1 2)user,2 3)user3 4)user4

4thr row 1) 2) user2 3) 4)


The C++ String Toolkit Library (StrTk) has the following solution to your problem:

#include <iostream>  #include <string>  #include <deque>  #include "strtk.hpp"    int main()  {     std::deque<std::string> word_list;     strtk::for_each_line("data.txt",                          [&word_list](const std::string& line)                          {                             const std::string delimiters = "\t\r\n ,,.;:'\""                                                            "!@#$%^&*_-=+`~/\\"                                                            "()[]{}<>";                             strtk::parse(line,delimiters,word_list);                          });       std::cout << strtk::join(" ",word_list) << std::endl;       return 0;  }  

More examples can be found Here


This looks like parsing a CSV file to me (even if it's not technically a file) - you could take a look at this question and answer.


Just download boost and use boost.tokenizer.
It's the best solution there is.


The following will assume that the input comes from some stream (you had a C++ token, after all). If that's not the case, look into string streams.

std::string read_quoted_string(std::istream& is)  {    is >> std::ws;    std::string garbage;    std::getline(is,garbage,'"'); // everything up to opening quote    if(!garbage.empty()) throw format_error("garbage outside of quotes", garbage);    if(!is.good()) return std::string();      std::string a_string;    std::getline(is,a_string,'"'); // the string up to closing quote    if(!is) return std::string();    return a_string;  }    std::vector<std::string> split_input(std::istream& is)  {    std::vector<std::string> result;    while(is) {      const std::string& a_string = read_quoted_string(is);      if(is) {        result.push_back(a_string);        is >> std::ws;        std::string garbage;        std::getline(is,garbage,','); // next delimiter        if(!garbage.empty()) throw format_error("garbage outside of quotes", garbage);      }    }    if(!is.eof()) throw format_error("error reading token", a_string);    return result;  }  

This isn't the fastest you can have, but it's a simple and very likely a fast enough solution.


This returns the split tokens exactly as you asked:

using namespace std;  vector<string> splitqc(std::string const& s) {   vector<string> tokens;   char last=0;   unsigned start=0;       for (unsigned i=0,n=s.length;i!=n;++i) {    char c=s[i];    if (c==',' && last='"') {      tokens.push_back(s.substr(start,(i-1)-start));      start=i+1;    }    last=s[i];     }   return tokens;  }  

Here's a more general facility (the functor f gets called with each token; note that it won't have the close quote that's part of your delimiter; you'd have to add that yourself):

template <class Func>  inline void split_noquote(      const std::string &csv,      Func f,      const std::string &delim=","      )  {      using namespace std;      string::size_type pos=0,nextpos;      string::size_type delim_len=delim.length();      if (delim_len==0) delim_len=1;      while((nextpos=csv.find(delim,pos)) != string::npos) {          if (! f(string(csv,pos,nextpos-pos)) )              return;          pos=nextpos+delim_len;      }      if (csv.length()!=0)          f(string(csv,pos,csv.length()-pos));  }  

Usage: split_noquote(s,func,"\",")


It's not the best way but you can use the strtok function.


I don't think something like "abcdef,"fdfd" could be parsed. That is illegal, for any language and for any data format, because one of the quotes are not terminated. It should be "abcdef,fdfd". Given that all strings are properly terminated, the following function will give the output you want.

std::istream& tokenize_quoted_strings(std::istream& in,                                 std::string& dest,                                char delim)  {    dest.erase();    char ch = 0;    bool in_quotes = false;    while (in)      {        if (!in.get(ch)) break;              if (!in_quotes && ch == delim) break;        dest.push_back(ch);        if (ch == '"') in_quotes = !in_quotes;      }    return in;  }  

The following function uses tokenize_quoted_strings to split a string into a vector of tokens:

typedef std::vector<std::string> StringList;    void tokenize_line(const std::string& line,             StringList& tokens)  {    std::istringstream iss(line);    std::string token;    tokens.clear();    while (tokenize_quoted_strings(iss, token, ','))      tokens.push_back(token);  }  


#include <iostream>  #include <string>  #include <fstream>  #include <sstream>  #include <vector>    int main()  {      std::fstream in("test.txt", std::ios_base::in);    std::string line;    StringList tokens;    while (getline(in, line))      {        tokenize_line(line, tokens);        size_t sz = tokens.size();        for (size_t i=0; i<sz; ++i)      std::cout << (i+1) << ") " << tokens[i] << ' ';        std::cout << '\n';      }    return 0;  }  

Note that it does not care about C style escaped quotes.

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »