Tutorial :Matching arbitrary number of words in PCRE regex into strings



Question:

I am using PCRE for some regex parsing and I need to search a string for words in a specific pattern (let's say all words in a string of words separated by commas) and put them into a string vector.

How would I go about doing that?


Solution:1

Sorry for the rough code, but I am in a hurry...

  pcre* re;    const char *error;    int   erroffset;    char* subject = txt;    int   ovector[3];    int   subject_length = strlen(subject);    int rc = 0;        re = pcre_compile(    "\\w+",              /* the pattern */    PCRE_CASELESS|PCRE_MULTILINE,                    /* default options */    &error,               /* for error message */    &erroffset,           /* for error offset */    NULL);                /* use default character tables */      char* pofs = subject;    while (  rc >= 0  ) {      rc = pcre_exec(        re,                   /* the compiled pattern */        NULL,                 /* no extra data - we didn't study the pattern */        subject,              /* the subject string */        subject_length,       /* the length of the subject */        0,                    /* start at offset 0 in the subject */        0,                    /* default options */        ovector,              /* output vector for substring information */        3);           /* number of elements in the output vector */        /*      if (rc < 0) {        switch(rc) {          case PCRE_ERROR_NOMATCH: printf("No match\n"); break;            // Handle other special cases if you like            default: printf("Matching error %d\n", rc); break;        }        pcre_free(re);     // Release memory used for the compiled pattern        return;      }      */        /* Match succeded */        if (  rc >= 0  ) {        pofs += ovector[1];          char *substring_start = subject + ovector[0];          // do something with the substring          int substring_length = ovector[1] - ovector[0];          subject = pofs;        subject_length -= ovector[1];      }    }  


Solution:2

  std::string wordstring = "w1, w2, w3";  std::string word;  pcrecpp::StringPiece inp_w(wordstring);  pcrecpp::RE w_re("(\\S+),?\\s*");  std::vector outwords;    while (w_re.FindAndConsume(&inp_w, &word)) {      outwords.push_back(word);  }  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »