Tutorial :How can I implement an algorithm for multicore in Java?



Question:

Modern computers have more and more cores. We want to change our current linear algorithm to use these cores.

A splitting of any algorithm to use different threads only makes sense if there is a free processor.

Are there any good libraries that can help to parallelize some steps if there are free processors?

I will give some examples.

  • If there is only one processor it makes no sense to create multiple threads. It will reduce the speed.
  • If there run 2 processes (requests on a server) on a core duo it make also no sense to start threads.
  • If there only one process on a core duo it make sense.

The abstract algorithm has 4 steps A, B, C and D. Steps A, B and C can execute parallel. Step D needs the results from A, B and C.

Edit: I means an mathematic algorithm. No IO, No events, etc


Solution:1

I think you need a ConcurrentContext from Javolution. See at http://javolution.org/target/site/apidocs/javolution/context/ConcurrentContext.html


Solution:2

This isn't necessarily true.

Depending on the algorithm, it often makes sense to split it into multiple threads even if there is only a single core available. If there is any waiting on sockets, IO, etc, you can get benefits from this. If there are 2 processes, the "other" process may not be using 100% of the other core, so threading can help here. Trust your OS in this case to handle it correctly.

You can always check the processor count with Runtime.availableProcessors() to determine how to split it into separate threads. Alternatively, you can use a threadpool, which should scale correctly with more processors.

In general, though, I would design your algorithm to use more than one processor if the algorithm makes sense to parallelize. Most systems will have more cores/processors available, and you can always tweak your implementation later if you find it needs it. If the process is long running, the overhead of generating the thread will be worth it - if it's already fast, it may be more worthwhile looking at other places to optimize.


Solution:3

Look at the various concurrent classes in Java 5 and onwards. You most likely want a ThreadPoolExecutor - http://java.sun.com/javase/6/docs/api/java/util/concurrent/ThreadPoolExecutor.html.

The appropriate value of the ThreadPool will most likely vary from system to system depending on work load and hardware architechture. Make it user adjustable.


Solution:4

For some ideas have a look at JSR166 and JSR166y (something like fork-join system with work stealing (166) and parallel array (166y) ).

Some nice reading and overview of the future directions for Java. Looks not that bad (strong support for high level concurrent and parallel programming).


Solution:5

I often have a fixed thread pool which is dynamically the same number of thread as the number of processors (see Runtime) I add tasks to this thread pool so it uses all the processors available.

I don't believe you should try re-inventing the process scheduler in the operating system. It does a good job so let it do what it does well.


Solution:6

Having more threads/processes than cores is not necessarily a bad thing. If your code is strictly mathematical with little I/O and no side effects, then yes, having a 1:1 correspondence between cores and threads is optimal. But this is not usually the case. I/O takes eons compared to clock cycles. Why stall the core completely while waiting on I/O when the OS can swap in another thread to keep chugging along?

The problem is there aren't a lot of languages/compilers that will make the concurrency decision for you. You need to design your program to take advantage of concurrency. And you probably need to design your program for multiple target environment, usually not under your control. So usually, best practice is to make threads for things whatever it makes sense to parallelize and let the thread scheduler handle it. The thread scheduler should be tuned for use on the specific hardware in question far better than you can tune your program for "whatever hardware comes along".


Solution:7

The abstract algorithm has 4 steps A, B, C and D. Steps A, B and C can execute parallel. Step D needs the results from A, B and C.

This is a one-liner using the Ateji PX notation, an extension of the Java language:

[ A(); || B(); || C(); ]; D();  

Your responsability as a programmer is to express where there is a potential for parallel execution, that's the role of the parallel bars "||" in the code. The scheduler now is able to make the best use of available hardware, namely run A, B and C on three different cores whenever available.

This is a very high-level view, more parallelism may possibly be exhibited inside A, B or C.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »