Tutorial :What is the fastest way to find out how many non-empty lines are in a file, using Java?



Question:

What is the fastest way to find out how many non-empty lines are in a file, using Java?


Solution:1

I am with Limbic System on the NIO recommendation. I've added a NIO method to Daphna's test code and bench marked it against his two methods:

public static void timeNioReader () throws IOException {      long bef = System.currentTimeMillis();        File file = new File("/Users/stu/test.txt");      FileChannel fc = (new FileInputStream(file)).getChannel();       MappedByteBuffer buf = fc.map(MapMode.READ_ONLY, 0, file.length());      boolean emptyLine = true;      int     counter = 0;        while (buf.hasRemaining())      {          byte element = buf.get();            if (element == '\r' || element == '\n') {              if (!emptyLine) {                  counter += 1;                  emptyLine = true;              }          } else               emptyLine = false;        }        long after = System.currentTimeMillis() - bef;        System.out.println("timeNioReader      Time: " + after + " Result: " + counter);    }  

Here are the warmed up results for a 89MB file:

timeBufferedReader Time: 947 Result: 747656  timeFileReader     Time: 670 Result: 747656  timeNioReader      Time: 251 Result: 747656  

NIO is 2.5x faster than FileReader and 4x fastser than the BufferedReader!

With a 6.4MB file the results are even better, although the warm up time is much longer.

//jvm start, warming up  timeBufferedReader Time: 121 Result: 53404  timeFileReader     Time: 65 Result: 53404  timeNioReader      Time: 40 Result: 53404    //still warming up  timeBufferedReader Time: 107 Result: 53404  timeFileReader     Time: 60 Result: 53404  timeNioReader      Time: 20 Result: 53404    //ripping along  timeBufferedReader Time: 79 Result: 53404  timeFileReader     Time: 56 Result: 53404  timeNioReader      Time: 16 Result: 53404  

Make of it what you will.


Solution:2

The easiest way would be to use BufferedReader, and check which lines are empty. However, this is a relatively slow way, because it needs to create a String object for every line in the file. A faster way would be to read the file into arrays using read(), and then iterate through the arrays to count for line breaks.

Here's the code for the two options; the second one took about 50% of the time on my machine.

public static void timeBufferedReader () throws IOException  {      long bef = System.currentTimeMillis ();        // The reader buffer size is the same as the array size I use in the other function      BufferedReader reader = new BufferedReader(new FileReader("test.txt"), 1024 * 10);      int counter = 0;      while (reader.ready())      {          if (reader.readLine().length() > 0)              counter++;      }        long after = System.currentTimeMillis() - bef;        System.out.println("Time: " + after + " Result: " + counter);    }    public static void timeFileReader () throws IOException  {      long bef = System.currentTimeMillis();        FileReader reader = new FileReader("test.txt");      char[] buf = new char[1024 * 10];      boolean emptyLine = true;      int     counter = 0;      while (reader.ready())      {          int len = reader.read(buf,0,buf.length);          for (int i = 0; i < len; i++)          {              if (buf[i] == '\r' || buf[i] == '\n')              {                  if (!emptyLine)                  {                      counter += 1;                      emptyLine = true;                  }              }              else emptyLine = false;          }      }        long after = System.currentTimeMillis() - bef;        System.out.println("Time: " + after + " Result: " + counter);    }  


Solution:3

The easiest would be with a scanner (yes I like verbose code... you can make it physically shorter). Scanner() also takes File, Reader, etc... so you can pass it whatever you have.

import java.util.Scanner;      public class Main  {      public static void main(final String[] argv)      {          final Scanner scanner;          final int     lines;            scanner = new Scanner("Hello\n\n\nEvil\n\nWorld");          lines   = countLines(scanner);          System.out.println("lines = "  + lines);      }        private static int countLines(final Scanner scanner)      {          int lines;            lines = 0;            while(scanner.hasNextLine())          {              final String line;                line = scanner.nextLine();                if(line.length() > 0)              {                  lines++;              }          }            return lines;      }  }  


Solution:4

If it really must be the fastest possible, you should look into NIO. And then, test your code on your target platform to see if it's really and truly better using NIO. I was able to get an order of magnitude improvement in some code I was playing with for the Netflix Prize. It involved parsing thousands of files into a more compact, quick-loading binary format. NIO was a big help on my (slow) development laptop.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »