Tutorial :finding a function name and counting its LOC



Question:

So you know off the bat, this is a project I've been assigned. I'm not looking for an answer in code, but more a direction.

What I've been told to do is go through a file and count the actual lines of code while at the same time recording the function names and individual lines of code for the functions. The problem I am having is determining a way when reading from the file to determine if the line is the start of a function.

So far, I can only think of maybe having a string array of data types (int, double, char, etc), search for that in the line and then search for the parenthesis, and then search for the absence of the semicolon (so i know it isn't just the declaration of the function).

So my question is, is this how I should go about this, or are there other methods in which you would recommend?

The code in which I will be counting will be in C++.


Solution:1

Three approaches come to mind.

  1. Use regular expressions. This is fairly similar to what you're thinking of. Look for lines that look like function definitions. This is fairly quick to do, but can go wrong in many ways.

    char *s = "int main() {"  

    is not a function definition, but sure looks like one.

    char  * /* eh? */  s  (  int /* comment? // */ a  )  // hello, world /* of confusion  {  

    is a function definition, but doesn't look like one.

    Good: quick to write, can work even in the face of syntax errors; bad: can easily misfire on things that look like (or fail to look like) the "normal" case.

    Variant: First run the code through, e.g., GNU indent. This will take care of some (but not all) of the misfires.

  2. Use a proper lexer and parser. This is a much more thorough approach, but you may be able to re-use an open source lexer/parsed (e.g., from gcc).

    Good: Will be 100% accurate (will never misfire). Bad: One missing semicolon and it spews errors.

  3. See if your compiler has some debug output that might help. This is a variant of (2), but using your compiler's lexer/parser instead of your own.


Solution:2

Your idea can work in 99% (or more) of the cases. Only a real C++ compiler can do 100%, in which case I'd compile in debug mode (g++ -S prog.cpp), and get the function names and line numbers from the debug information of the assembly output (prog.s).

My thoughts for the 99% solution:

  • Ignore comments and strings.
  • Document that you ignore preprocessor directives (#include, #define, #if).
  • Anything between a toplevel { and } is a function body, except after typedef, class, struct, union, namespace and enum.
  • If you have a class, struct or union, you should be looking for method bodies inside it.
  • The function name is sometimes tricky to find, e.g. in long(*)(char) f(int); .
  • Make sure your parser works with template functions and template classes.


Solution:3

For recording function names I use PCRE and the regex

"(?<=[\\s:~])(\\w+)\\s*\\([\\w\\s,<>\\[\\].=&':/*]*?\\)\\s*(const)?\\s*{"  

and then filter out names like "if", "while", "do", "for", "switch". Note that the function name is (\w+), group 1.
Of course it's not a perfect solution but a good one.


Solution:4

I feel manually doing the parsing is going to be a quite a difficult task. I would probably use a existing tool such as RSM redirect the output to a csv file (assuming you are on windows) and then parse the csv file to gather the required information.


Solution:5

Find a decent SLOC count program, eg, SLOCCounter. Not only can you count SLOC, but you have something against which to compare your results. (Update: here's a long list of them.)

Interestingly, the number of non-comment semicolons in a C/C++ program is a decent SLOC count.


Solution:6

How about writing a shell script to do this? An AWK program perhaps.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »