Tutorial :C++ tokenize a string using a regular expression


I'm trying to learn myself some C++ from scratch at the moment.
I'm well-versed in python, perl, javascript but have only encountered C++ briefly, in a classroom setting in the past. Please excuse the naivete of my question.

I would like to split a string using a regular expression but have not had much luck finding a clear, definitive, efficient and complete example of how to do this in C++.

In perl this is action is common, and thus can be accomplished in a trivial manner,

/home/me$ cat test.txt  this is  aXstringYwith, some problems  and anotherXY line with   similar issues    /home/me$ cat test.txt | perl -e'  > while(<>){  >   my @toks = split(/[\sXY,]+/);  >   print join(" ",@toks)."\n";  > }'  this is a string with some problems  and another line with similar issues  

I'd like to know how best to accomplish the equivalent in C++.

I think I found what I was looking for in the boost library, as mentioned below.

boost regex-token-iterator (why don't underscores work?)

I guess I didn't know what to search for.

  #include <iostream>  #include <boost/regex.hpp>    using namespace std;    int main(int argc)  {    string s;    do{      if(argc == 1)        {          cout << "Enter text to split (or \"quit\" to exit): ";          getline(cin, s);          if(s == "quit") break;        }      else        s = "This is a string of tokens";        boost::regex re("\\s+");      boost::sregex_token_iterator i(s.begin(), s.end(), re, -1);      boost::sregex_token_iterator j;        unsigned count = 0;      while(i != j)        {          cout << *i++ << endl;          count++;        }      cout << "There were " << count << " tokens found." << endl;      }while(argc == 1);    return 0;  }    


The boost libraries are usually a good choice, in this case Boost.Regex. There even is an example for splitting a string into tokens that already does what you want. Basically it comes down to something like this:

boost::regex re("[\\sXY]+");  std::string s;    while (std::getline(std::cin, s)) {    boost::sregex_token_iterator i(s.begin(), s.end(), re, -1);    boost::sregex_token_iterator j;    while (i != j) {       std::cout << *i++ << " ";    }    std::cout << std::endl;  }  


Check out Boost.Regex. I think you can find your answer here:

C++: what regex library should I use?


If you want to minimize use of iterators, and pithify your code, the following should work:

#include <string>  #include <iostream>  #include <boost/regex.hpp>    int main()  {    const boost::regex re("[\\sXY,]+");      for (std::string s; std::getline(std::cin, s); )     {      std::cout << regex_replace(s, re, " ") << std::endl;       }    }  


Unlike in Perl, regular expressions are not "built in" into C++.

You need to use an external library, such as PCRE.


Regex are part of TR1 included in Visual C++ 2008 SP1 (including express edition) and G++ 4.3.

Header is <regex> and namespace std::tr1. Works great with STL.

Getting started with C++ TR1 regular expressions

Visual C++ Standard Library : TR1 Regular Expressions

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »