Tutorial :Need help understanding a C function


Here's the deal: I'm trying, as a learning experience, to convert a C program to C++. This program takes a text file and applies modifications to it according to user-inputted rules. Specifically, it applies sounds changes to a set of words, using rules formatted like "s1/s2/env". s1 represents the characters to be changed, s2 represents what to change it into, and env is the context in which the change should be applied.

I'm sorry that I don't describe this in more depth, but the question would be too long, and the author's site already explains it.

The function I'm having trouble is TryRule. I understand that it's supposed to see if a given rule applies to a given string, but I'm having trouble understanding exactly how it does it. The poor explanation of the parameters confuses me: for example, I don't understand why the strings "s1" and "s2" have to be passed back, or what does "i" represent.

This is the code:

/*  **  TryRule  **  **  See if a rule s1->s2/env applies at position i in the given word.  **  **  If it does, we pass back the index where s1 was found in the  **  word, as well as s1 and s2, and return TRUE.  **  **  Otherwise, we return FALSE, and pass garbage in the output variables.  */  int TryRule( char *word, int i, char *Rule, int *n, char **s1, char **s2, char *varRep )      {          int j, m, cont = 0;          int catLoc;          char *env;          int  optional = FALSE;          *varRep = '\0';            if (!Divide( Rule, s1, s2, &env ) || !strchr( env, '_' ))              return(FALSE);            for (j = 0, cont = TRUE; cont && j < strlen(env); j++)          {              switch( env[j] )              {                  case '(':                      optional = TRUE;                      break;                    case ')':                      optional = FALSE;                      break;                    case '#':                      cont = j ? (i == strlen(word)) : (i == 0);                       break;                    case '_':                      cont = !strncmp( &word[i], *s1, strlen(*s1) );                      if (cont)                      {                          *n = i;                          i += strlen(*s1);                      }                      else                      {                          cont = TryCat( *s1, &word[i], &m, &catLoc );                          if (cont && m)                          {                              int c;                              *n = i;                              i += m;                                for (c = 0; c < nCat; c++)                                  if ((*s2)[0] == Cat[c][0] && catLoc < strlen(Cat[c]))                                      *varRep = Cat[c][catLoc];                          }                          else if (cont)                              cont = FALSE;                      }                      break;                    default:                      cont = TryCat( &env[j], &word[i], &m, &catLoc );                      if (cont && !m)                      {                          /* no category applied */                          cont = i < strlen(word) && word[i] == env[j];                          m = 1;                      }                      if (cont)                          i += m;                      if (!cont && optional)                          cont = TRUE;              }          }          if (cont && printRules)              printf( "   %s->%s /%s applies to %s at %i\n",               *s1, *s2, env, word, *n );        return(cont);  }  


This code is... tough to read. I looked at the original file, and it could really use some better variable names. I especially love this part from one of the function comments:

/*  ** (Stuff I removed)  **  ** Warning: For now, we don't have a way to handle digraphs.   **  ** We also return TRUE if (<- It really just stops here!)  */  

I can see the challenge. I agree with wmeyer about the variables. I think I understand things, so I'm going to attempt to translate the function into pseudo code.

Word: The string we are looking at
i: The index in the string we're looking at
Rule: The text of the rule (i.e. "v/b/_")
n: A variable to return the index into the string we found the match for the _, I think
s1: Returns the first part of the rule, decoded out of Rule
s2: Returns the second part of the rule, decoded out of Rule
varRep: Returns the character matched in the category, if a category matched, I think

int TryRule( char *word, int i, char *Rule,                  int *n, char **s1, char **s2, char *varRep ) {          Prepare a bunch of variables we''ll use later          Mark that we''re not working on an optional term          Set varRep''s first char to null, so it''s an empty string            if (We can parse the rule into it''s parts                OR there is no _ in the environment (which is required))              return FALSE // Error, we can't run, the rule is screwy            for (each character, j, in env (the third part of the rule)) {              if (cont is TRUE) {                  switch (the character we''re looking at, j) {                      if the character is opening paren:                          set optional to TRUE, marking it''s an optional character                      if the character is closing paren:                          set optional to FALSE, since we''re done with optional stuff                      if the character is a hash mark (#):                          // This is rather complicated looking, but it's not bad                          // This uses a ? b : c, which means IF a THEN b ELSE c                          // Remember i is the position in the word we are looking at                          // Hash marks match the start or end of a word                          // J is the character in the word                            if (j >= 0) {                              // We're not working on the first character in the rule                              // so the # mark we found is to find the end of a word                                if (i == the length of the word we''re looking at) {                                  // We've found the end of the word, so the rule matches                                    continue = true;   // Keep going                              } else {                                  // We're not at the end of a word, but we found a hash                                  // Rule doesn't match, so break out of the main loop by setting                                  //     continue to false                                    continue = false;                              }                          } else {                              // OK, the hash mark is the first part of env,                              // so it signifies the start of a word                                continue = (i == 0);   // Continue holds if we                                                     // are matching the first                                                     // character in *word or not                          }                      if the character is an _ (the match character):                          // This gets complicated                            continue = if word starting at character i ISN''T s1, the search string;                            if (continue == TRUE) {                              // There was no match, so we'll go look at the next word                              n = the index of the word start that didn''t match   // Not sure why                              i = i (start index to look) + length of s1 (word we just matched)                              // This means i now holds the index of the start of the next word                          } else {                              // TryCat sees if the character we're trying to match is a category                                continue = s1 is a category in the program                                            && the category contains the character at word[i]                                // If continue holds false, s1 was a category and we found no match                              // If continue holds true, s1 either wasn't a category (so m = 0)                              //     or s1 WAS a category, m contains 1, and catLoc holds which                              //     character in the category definition was matched                                if (we found a match of some sort                                     && s1 was a category (indicated by m == 1)) {                                  n = index of the character in the word we found a match                                  i = the index of the next character (m is always 1, so this is ugly)                                    for (each category defined) {                                      if (first character of s2                                             == the category''s name                                          && where in the category definition we matched                                                is less than the length of the category we''re on) {                                             varRep = the character matched in the category                                          }                                  }                                    // Now the above seems EXACTLY like the TryCat function. You'd                                  // think varRep would always hold the same value as catLoc. I                                  // believe this loop is so that later rules also get applied?                              } else {                                  continue = FALSE; // Because we didn't match a letter or category                              }                          }                      Any other character:                          continue = the character we''re looking at is a category in the program                                        && the category contains the character at word[i]                            if (there was a match AND it wasn''t a category (m == 0, just a letter)) {                              m = 1;                              continue if and only if there are characters left in the word                                   (i < strlen()) && the current character is at word[i]                                   (we matched a literal character, instead of a category)                          }                            if (continue)                              i = i + m // Remember, M is always 1 or 0                                        // So this is basically IF continue THEN i++ END IF                          if ((continue == FALSE) && (optional == TRUE))                              // We didn't find a match, but we're working on an optional part                              // So continue anyway                              continue = TRUE;                  end switch               end if continue == true          }      }        if (continue && printRules)          print out a little debug statement showing what we matched        return continue;   // At this point, if continue is false we can't keep matching  }  

I hope this helps. You may need to read it a few times. It took me over 45 minutes to write this, almost entirely because of trying to decipher exactly what's going on in some of the cases around TryCat. Add in about 5 minutes for constantly trying to hit the Tab key and getting my cursor send to the next field (stupid HTML text box).

Sorry this is so big, you'll probably have to do a bunch of horizontal scrolling.


Given that you are converting from C to C++ you should be refactoring the code to become more readable as well.

One major problem with this code is that the variables have terrible names and I'd wager even the original writer of the routine would need to spend some time analysing it.

Just renaming the variables to be more precise would give you a greater hand in understanding what the code does.

Take a look at some questions tagged under refactoring for some help. There is also Refactoring by Martin Fowler


I think you need the whole code to understand this fragment.

It looks like "word", "i" and "Rule" are input variables, the rest are pure output variables.

"i" is the current index within "word", i.e. TryRule only looks at "word" starting at "word[i]".

In "s1" the functions returns the left side of the rule that was applied. In "s2" the right side of that rule.

In "n" the function returns the position within "word" where the rule applies.

No idea what "varRep" is.

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »