Tutorial :Multiple threads reading from the same file



Question:

My platform is windows vista 32, with visual c++ express 2008 .

for example:

if i have a file contains 4000 bytes, can i have 4 threads read from the file at same time? and each thread access a different section of the file.

thread 1 read 0-999, thread 2 read 1000 - 2999, etc.

please give a example in C language.


Solution:1

If you don't write to them, no need to take care of sync / race condition.

Just open the file with shared reading as different handles and everything would work. (i.e., you must open the file in the thread's context instead of sharing same file handle).

#include <stdio.h>  #include <windows.h>    DWORD WINAPI mythread(LPVOID param)  {      int i = (int) param;      BYTE buf[1000];      DWORD numread;        HANDLE h = CreateFile("c:\\test.txt", GENERIC_READ, FILE_SHARE_READ,          NULL, OPEN_EXISTING, 0, NULL);        SetFilePointer(h, i * 1000, NULL, FILE_BEGIN);      ReadFile(h, buf, sizeof(buf), &numread, NULL);       printf("buf[%d]: %02X %02X %02X\n", i+1, buf[0], buf[1], buf[2]);        return 0;  }    int main()  {      int i;      HANDLE h[4];        for (i = 0; i < 4; i++)          h[i] = CreateThread(NULL, 0, mythread, (LPVOID)i, 0, NULL);        // for (i = 0; i < 4; i++) WaitForSingleObject(h[i], INFINITE);      WaitForMultipleObjects(4, h, TRUE, INFINITE);        return 0;  }  


Solution:2

There's not even a big problem writing to the same file, in all honesty.

By far the easiest way is to just memory-map the file. The OS will then give you a void* where the file is mapped into memory. Cast that to a char[], and make sure that each thread uses non-overlapping subarrays.

void foo(char* begin, char*end) { /* .... */ }  void* base_address = myOS_memory_map("example.binary");  myOS_start_thread(&foo, (char*)base_address, (char*)base_address + 1000);  myOS_start_thread(&foo, (char*)base_address+1000, (char*)base_address + 2000);  myOS_start_thread(&foo, (char*)base_address+2000, (char*)base_address + 3000);  


Solution:3

You can certainly have multiple threads reading from a data structure, race conditions can potentially occur if any writing is taking place.

To avoid such race conditions you need to define the boundaries that threads can read, if you have an explicit number of data segments and an explicit number of threads to match these then that is easy.

As for an example in C you would need to provide some more information, like the threading library you are using. Attempt it first, then we can help you fix any issues.


Solution:4

I don't see any real advantage to doing this.
You may have multiple threads reading from the device but your bottleneck will not be CPU but rather disk IO speed.

If you are not careful you may even slow the processes down (but you will need to measure it to know for certain).


Solution:5

Windows supports overlapped I/O, which allows a single thread to asynchronously queue multiple I/O requests for better performance. This could conceivably be used by multiple threads simultaneously as long as the file you are accessing supports seeking (i.e. this is not a pipe).

Passing FILE_FLAG_OVERLAPPED to CreateFile() allows simultaneous reads and writes on the same file handle; otherwise, Windows serializes them. Specify the file offset using the Offset and OffsetHigh members of the OVERLAPPED structure.

For more information see Synchronization and Overlapped Input and Output.


Solution:6

The easiest way is to open the file within each parallel instance, but just open it as readonly.

The people who say there may be an IO bottleneck are probably wrong. Any modern operating system caches file reads. Which means the first time you read a file will be the slowest, and any subsequent reads will be lightning fast. A 4000 byte file can even rest inside the processor's cache.


Solution:7

You shouldn't need to do anything particularly clever if all they're doing is reading. Obviously you can read it as many times in parallel as you like, as long as you don't exclusively lock it. Writing is clearly another matter of course...

I do have to wonder why you'd want to though - it will likely perform badly since your HDD will waste a lot of time seeking back and forth rather than reading it all in one (relatively) uninterrupted sweep. For small files (like your 4000 line example) where that might not be such a problem, it doesn't seem worth the trouble.


Solution:8

It is possible though i'm not sure it will be worth the effort. Have you considered reading the entire file into memory within a single thread and then allow multiple threads to access that data?


Solution:9

Reading: No need to lock the file. Just open the file as read only or shared read

Writing: Use a mutex to ensure the file is only written to by one person.


Solution:10

As others have noted already, there is no inherent problem in having multiple threads read from the same file, as long as they have their own file descriptor/handles. However, I'm a little curious about your motives. Why do you want to read a file in parallell? If you're only reading a file into memory, your bottleneck is likely the disk itself, in which case multiple thread won't help you at all (it'll just clutter your code).

And as always when optimizing, you should not attempt it until you (1) have a easy to understand, working, solution, and (2) you've measured your code to know where you should optimize.


Solution:11

std::mutex mtx;    void worker(int n)  {      mtx.lock();        char * memblock;        ifstream file ("D:\\test.txt", ios::in);        if (file.is_open())      {          memblock = new char [1000];          file.seekg (n * 999, ios::beg);          file.read (memblock, 999);          memblock[999] = '\0';            cout << memblock << endl;            file.close();          delete[] memblock;      }      else           cout << "Unable to open file";      mtx.unlock();  }      int main()  {      vector<std::thread> vec;      for(int i=0; i < 3; i++)      {          vec.push_back(std::thread(&worker,i));      }        std::for_each(vec.begin(), vec.end(), [](std::thread& th)      {          th.join();      });      return 0;  }  


Solution:12

You need a way to sync those threads. There're different solutions to mutex http://en.wikipedia.org/wiki/Mutual_exclusion


Solution:13

He wants to read from a file in different threads. I guess that should be ok if the file is opened as read-only by each thread.

I hope you don't want to do this for performance though, since you will have to scan large parts of the file for newline characters in each thread.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »