Why is my implementation of wc off by one word?


[Solved] Writing parsing code is a trap. A line with 15 spaces will have 15 words. Blank lines will also count as a word. Back to flex and bison for me.

#include        <stdio.h>                           #include        <stdlib.h>                            int     main(int argc, char     *argv[]) {            FILE    *fp = NULL;          int             iChars =0, iWords =0, iLines =0;          int             ch;                                         /* if there is a command line arg, then try to open it as the file                  otherwise, use stdin */                                               fp = stdin;          if (argc == 2) {                  fp = fopen(argv[1],"r");                  if (fp == NULL) {                                 fprintf(stderr,"Unable to open file %s. Exiting.\n",argv[1]);                          exit(1);                  }          }            /* read until the end of file, counting chars, words, lines */          while ((ch = fgetc(fp)) != EOF) {                  if (ch == '\n') {                          iWords++;                          iLines++;                  }                    if (ch == '\t' || ch == ' ') {                          iWords++;                  }                    iChars++;          }            /* all done. If the input file was not stdin, close it*/          if (fp != stdin) {                  fclose(fp);          }            printf("chars: %d,\twords: %d,\tlines: %d.\n",iChars,iWords,iLines);  }  

TEST DATA foo.sh

#!/home/ojblass/source/bashcrypt/a.out  This is line 1  This is line 2  This is line 3  

ojblass@linux-rjxl:~/source/bashcrypt> wc foo.sh

5 13 85 foo.sh

ojblass@linux-rjxl:~/source/bashcrypt> a.out foo.sh

chars: 85, words: 14, lines: 5.


You are counting \n as a word even for a blank line.


Your algorithm is wrong. If you have in the test file 2 blank characters in succession the counter for words will be incremented twice, but it should be incremented only once.

A solution will be to remember last character read. If the character read is a special character (blank, new line, ...) and the previous character is an alphanumeric then you increment the counter for words.

