Tutorial :Which is faster in memory, ints or chars? And file-mapping or chunk reading?


Okay, so I've written a (rather unoptimized) program before to encode images to JPEGs, however, now I am working with MPEG-2 transport streams and the H.264 encoded video within them. Before I dive into programming all of this, I am curious what the fastest way to deal with the actual file is.

Currently I am file-mapping the .mts file into memory to work on it, although I am not sure if it would be faster to (for example) read 100 MB of the file into memory in chunks and deal with it that way.

These files require a lot of bit-shifting and such to read flags, so I am wondering that when I reference some of the memory if it is faster to read 4 bytes at once as an integer or 1 byte as a character. I thought I read somewhere that x86 processors are optimized to a 4-byte granularity, but I'm not sure if this is true...



Memory mapped files are usually the fastest operations available if you require your file to be available synchronously. (There are some asynchronous APIs that allow the O/S to reorder things for a slight speed increase sometimes, but that sounds like it's not helpful in your application)

The main advantage you're getting with the mapped files is that you can work in memory on the file while it is still being read from disk by the O/S, and you don't have to manage your own locking/threaded file reading code.

Memory reference wise, on the x86 memory is going to be read an entire line at a time no matter what you're actually working with. The extra time associated with non byte granular operations refers to the fact that integers need not be byte aligned. For example, performing an ADD will take more time if things aren't aligned on a 4 byte boundary, but for something like a memory copy there will be little difference. If you are working with inherently character data then it's going to be faster to keep it that way than to read everything as integers and bit shift things around.

If you're doing h.264 or MPEG2 encoding the bottleneck is probably going to be CPU time rather than disk i/o in any case.


If you have to access the whole file, it is always faster to read it to memory and do the processing there. Of course, it's also wasting memory, and you have to lock the file somehow so you won't get concurrent access by some other application, but optimization is about compromises anyway. Memory mapping is faster if you're skipping (large) parts of the file, because you don't have to read them at all then.

Yes, accessing memory at 4-byte (or even 8-byte) granularity is faster than accessing it byte-wise. Again it's a compromise - depending on what you have to do with the data afterwards, and how skilled you are at fiddling with the bits in an int, it might not be faster overall.

As for everything regarding optimization:

  1. measure
  2. optimize
  3. measure


These are sequential bit-streams - you basically consume them one bit at a time without random-access.

You don't need to put a lot of effort into explicitly buffering reads and such in this scenario: the operating system will be buffering them for you anyway. I've written H.264 parsers before, and the time is completely dominated by the decoding and manipulation, not the IO.

My recommendation is to use a standard library and for parsing these bit-streams.

Flavor is such a parser, and the website even includes examples of MPEG-2 (PS) and various H.264 parts like M-Coder. Flavor builds native parsing code from a c++-like language; here's an quote from the MPEG-2 PS spec:

class TargetBackgroundGridDescriptor extends BaseProgramDescriptor : unsigned int(8) tag = 7   {      unsigned int(14) horizontal_size;      unsigned int(14) vertical_size;      unsigned int(4) aspect_ratio_information;  }    class VideoWindowDescriptor extends BaseProgramDescriptor : unsigned int(8) tag = 8   {      unsigned int(14) horizontal_offset;      unsigned int(14) vertical_offset;      unsigned int(4) window_priority;  }  


Regarding to the best size to read from memory, I'm sure you will enjoy reading this post about memory access performance and cache effects.


One thing to consider about memory-mapping files is that a file with a size greater than the available address range will only be able to be map a portion of the file. To access the remainder of the file requires the first part to be unmapped and the next part to mapped in its place.

Since you're decoding mpeg streams you may want to use a double buffered approach with asynchronous file reading. It works like this:

blocksize = 65536 bytes (or whatever)  currentblock = new byte [blocksize]  nextblock = new byte [blocksize]  read currentblock  while processing     asynchronously read nextblock     parse currentblock     wait for asynchronous read to complete     swap nextblock and currentblock  endwhile  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »