Tutorial :Read a file in chunks in Ruby



Question:

I need to read a file in MB chunks, is there a cleaner way to do this in Ruby:

FILENAME="d:\\tmp\\file.bin"  MEGABYTE = 1024*1024  size = File.size(FILENAME)  open(FILENAME, "rb") do |io|     read = 0    while read < size      left = (size - read)      cur = left < MEGABYTE ? left : MEGABYTE      data = io.read(cur)      read += data.size      puts "READ #{cur} bytes" #yield data    end  end  


Solution:1

Adapted from the Ruby Cookbook page 204:

FILENAME = "d:\\tmp\\file.bin"  MEGABYTE = 1024 * 1024    class File    def each_chunk(chunk_size = MEGABYTE)      yield read(chunk_size) until eof?    end  end    open(FILENAME, "rb") do |f|    f.each_chunk { |chunk| puts chunk }  end  

Disclaimer: I'm a ruby newbie and haven't tested this.


Solution:2

Alternatively, if you don't want to monkeypatch File:

until my_file.eof?    do_something_with( my_file.read( bytes ) )  end  

For example, streaming an uploaded tempfile into a new file:

# tempfile is a File instance  File.open( new_file, 'wb' ) do |f|    # Read in small 65k chunks to limit memory usage    f.write(tempfile.read(2**16)) until tempfile.eof?  end  


Solution:3

You can use IO#each(sep, limit), and set sep to nil or empty string, for example:

chunk_size = 1024  File.open('/path/to/file.txt').each(nil, chunk_size) do |chunk|    puts chunk  end  


Solution:4

If you check out the ruby docs: http://ruby-doc.org/core-2.2.2/IO.html there's a line that goes like this:

IO.foreach("testfile") {|x| print "GOT ", x }  

The only caveat is. Since, this process can read the temp file faster than the generated stream, IMO, a latency should be thrown in.

IO.foreach("/tmp/streamfile") {|line|    ParseLine.parse(line)    sleep 0.3 #pause as this process will discontine if it doesn't allow some buffering   }  


Solution:5

FILENAME="d:/tmp/file.bin"    class File    MEGABYTE = 1024*1024      def each_chunk(chunk_size=MEGABYTE)      yield self.read(chunk_size) until self.eof?    end  end    open(FILENAME, "rb") do |f|    f.each_chunk {|chunk| puts chunk }  end  

It works, mbarkhau. I just moved the constant definition to the File class and added a couple of "self"s for clarity's sake.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »