parallel programming

This post will shed some light on parallel programming in .Net. The reasons why one wants to use it, what it does and how it is accomplished with .Net techniques from TPL.

What this post will cover

  • Why you need parallel programming, what is it, and how it is done
  • Parallel programming in .Net with Pfx
  • PLINQ

what this post will not cover

  • fine grained analysis of parallel internals like cache invalidation etc.

We will start off with why you need parallel programming.

Why parallel programming

The so called “free lunch” is over. Free lunch in the sense of performance. Until chip manufacturers hit an upper limit, application programmers could simply wait for a newer and faster chip to enter the market instead of optimizing their code for performance. With the development of multi core processors, application programmers needed to change their code to make use of these cores. That is they need to employ parallelization techniques.  This is something Pfx (Parallel Framework Extensions) tries to accomplish. The API must be as easy as writing synchronous code to get widely adopted.

So with the reasons that drive parallelism in place we look at what is parallelism exactly.

what is Parallel Programming

In general the need for parallelism stems from the need for real time answers to real problems in a finite amount of time. But with ever increasing complexity of problems to solve and with ever increasing data sets you need ever increasing processing power, in the form of more cores/threads in parallel.

Is it worth to parallelize everything?

The question is rather can everything be parallelized? It is essential to understand how much of an algorithm can be parallelized, because that amounts to the potential speedup.
The reference for this is Amdahls law: Which states that the speedup by parallelizing is limited by the sequential fraction of the program. No matter how much processes/cores are used you cannot be faster than the sequential part.

Running things in parallel is no silver bullet, you need to consider the given overhead. It boils down to using x cores for a computation will not make the computation x times faster.

Before parallelization you always should firstly create the best implementation of the algorithm in a single threaded environment.
Besides not being much faster when you instantly parallelize, you also get the problem of mutable shared state like race conditions and such.

The best single threaded version you can come up with, can be seen as benchmark, both performance as well as consistency wise.

So now to some categories of parallelization:

Coarse and Fine grained

You can seperate parallelism in two categories. Coarse grained parallelism and fine grained parallelism.

Coarse grained parallelism means that you start a new thread for a computation and run it with this thread to the end. This is mostly what servers do to handle incoming requests.

Fine grained parallelism on the other hand, is used mostly on the client side and with the ever increasing amount of cores that you want to utilize you could use one thread/core for one specific task, but eventually you would run out of tasks and waste resources. So for client side it is best to separate the computation/algorithm in parallelizeable steps.

Task/Data parallelism

Another notion for parallelism is the difference between Task (not the task class but an abstract idea of a computation) and Data focused parallelism.

  • Task based parallelism: a computation is separated into several tasks that all are part of the algorithm and execute in parallel.
  • Data based parallelism: Same logic but with a different part of the data set.

Now that we have a finer perspective on what and why, we will now look at the how.

how does it work in .net

The main goal for parallel API in .Net and TPL was an easy to use API that resembles the synchronous programming style. The main entry for this in TPL is the Parallel Class and its static methods:

  • Parallel.Invoke
  • Parallel.For
  • Parallel.ForEach

These are high level abstractions of the world of parallel programming, that make your code look quite similar to synchronous code.
All of them apply the Fork/Join pattern. Which means that on invocation it creates TPL tasks and runs them until all are complete and then returns.
You can control the use of Parallel execution with the following ParallelOptions:

  • Maximum number of tasks to be used
  • Cancellation Token
  • Which Task Scheduler to use

Maxnumber of tasks can make sense when you know the exact hardware specification to optimize for all cores/tasks

Cancellation, when signaled, stops creation of new Tasks, the started ones will finish though.

A Task scheduler can be supplied with your own implementation (fine grained control over when which task will be run with which priority and such).

Parallel.Invoke

The first method we will look at helps to run multiple blocks of code in parallel by the use of Parllel.Invoke.
Consider the calculation of a sum with three independent computations, that can therefore be parallelized in this way:

From my point of view that seems pretty close to synchronous code.

The issue with Parallel.Invoke is that you cannot supply a timeout for it. Which, as you might remember is something you should always do when you run something asynchron (which is what happens with parallel programming). Also it cannot abort tasks that run, only stops spawning new ones. So to avoid that your code might wait indefinitely you can use the CancellationToken instead:

Since Parallel.Invoke by its nature will still wait when Cancellation is requested, and because Cancellation uses cooperation, you need to let your actions handle Cancellation by themselves.

Error handling

To handle errors you simply wrap your call to Parallel.Invoke in a try/catch block:

Summary

Parallel.Invoke is easy to use, but abstracts the underlying Task nature and may obstruct identifying a potential problem.

Next we look at a way how you can parallelize loops with Pfx.

Leave a Reply