Parallel Loops

In general Loops typically are best to parallelize, because they consist of lots of small computations, which use lots of CPU time and therefore are worth speeding up. But be aware that a parallelized loop computes the code block out of order, because of the way the Parallel methods fork and join the tasks. The question is, does order matter with your loop/algorithm. If so, parallelization is off the table.

Parallel.For

The first of the parrallel loops from TPL/Pfx is Parallel.For, which is simply a parallel version of the standard .Net for loop. So instead of

for (i = 0; i < length; i++)
{
     // something where order does not matter
}

you will write

Parallel.For(0, length, i =>
{
    // same computation as above, yet in parallel
}

Parallel.For does not know the perfect number of Tasks to create and the algorithm internally sort of guesses. If simple tasks take longer to compute, there will be more and more Tasks added to the execution of the Parallel.For. This might not always be optimal, but provides a respectable speedup.

Parallel.foreach

Parallel.ForEach is the parallel equivalent of synchronous foreach. Works by consuming an IEnumerable<T> and farming out elements consumed through the enumerator to multiple tasks, thus processing the elements in parallel.

IEnumerable<int> items = Enumerable.Range(0, 20);

Parallel.ForEach(items, item =>
{
    Console.WriteLine(item);
};

Parallel.ForEach can be applied to anything that implements IEnumerable. The Drawback of Parallel.For is that you only can increment by one and use int or long. This can obviously be worked around but with a custom Iterator and Parallel.Foreach it is much more approachable.

Parallel.ForEach(MyIterator(1, 2, 0.1), Console.WriteLine);

private static IEnumerable<double> MyIterator(double start, double end, double step)
{
    for (double current = start; current < end; current += step)
    {
         yield return current;
    }
}

Because Parallel.ForEach uses an IEnumerable and does not know initially how much items will be yielded, it needs to apply some sort of partitioning strategy to get chunks of data sets to farm out for parallel execution.

To aid this process, the best way is to provide an IList or an array to the Parallel.ForEach, because internally it tries to use this types to determine the appropriate chunk size. On the other hand you could implement your own partitioner and utilize it during Parallel.ForEach.

ParallelLoopState

Because the body of Parallel loops is represented by a delegate, ther is no break statement to stop the loop from executing. This can be overcome with an overload that allows for the utilization of ParallelLoopState object.

This works for both the Parallel.For and Parallel.ForEach. This LoopState object has a Break() method, that emulates the break statement. Tries to create the semantics of a regular loop. Therefore all tasks started before the call to the Break method are completed. It is obviously not a total match, but comes as close as it gets.

If you want to stop immediately use the Stop() method.

Parallel.For(0, 20, (i, loopState) =>
{
     if (i > 12)
     {
         loopState.Break(); // or use Stop()
         return;
     }
};

ParallelLoopState also allows for the observation of the loopState and to determine if the loop has to terminate early.

  • IsExceptional –> indicates that another Task threw an unhandled exception
  • IsStopped –> If a Task called Stop()
  • ShouldExitCurrentIteration –> Break() or Stop() was called
  • LowestBreakIteration –> long? that indicates the iteration of the call to Break() or null when no Break was called.

Nested Loops

Nested loops are faster in parallelization, because parallel.Loop does not need to invoke a delegate for each iteration. Note that pretty much every loop can be changed into two loops.

Parallel.For(0, 10, i =>
{
    for (i = 0; i < 100; i++)
    {
        Console.WriteLine(i);
    }
};
instead of
Parallel.For(0, 1000, i =>
{
     Console.WriteLine(i);
};

PLINQ

Simply the parallel form of LINQ over IEnumerable<T>. Add an AsParallel() to a simple synchronous LINQ statement. One benefit you get is readable, scalable code. Because the nature of LINQ is operation on immutable collections. So the Conversion to PLINQ looks easy, but obviously as always has some caveats.

For one there is no problem with delegate invocation as in Parallel.For/Parallel.ForEach, because LINQ is build around this. Obviously LINQ is not the most performant (was never designed for this) so PLINQ might give you better performance, yet for optimum performance you’ll need to implement the same loop by hand with for/foreach/while.

Another benefit is also, that with functional style of LINQ you will very likely never encounter any race conditions, so it is much easier for programmers with less asynchronous experience.

From a partioning standpoint PLINQ also works best with Arrays and Lists. As stated before, do not simply parallelize anything, do benchmarking and implement the best possible single threaded algorithm first.

Does ordering matter for PLINQ?

As with parallel loops, the order that is supplied by the IEnumerable will not be sustained if you run a LINQ statement AsParallel. If the order is important to you, you’ll need the AsOrdered() method at the end of your LINQ statement.

var orderedResultFromParallel = myEnumerable
    .Where(x => x.HasSomeDesiredState())
    .AsParallel()
    .AsOrdered();

Obviously this comes with a performance cost, because the partitioner is more constrained in its work.
Another way would be to use a anonymous type in your LINQ statement, that has an index property:

var input = new [] { "first", "second", "third", "fourth", "fifth"};

var upperCaseResult = input
    .Select((value, index) => new { Index = index, Value = value}
    .AsParallel() 
    .Select(anonymousType => new { Index = anonynomousType.Index, Value = anonymousType.Value.ToUpperCase()});

foreach (var item in upperCaseResult)
{
     Console.WriteLine(item.Index, item.Value);
}

You can fine grain how the query is executed with the following additional methods.

  • WithDegreeOfParallelism() –> determines the amount of used Tasks
  • WithExecutionMode() –> e.g. used to force the parallel execution, else the runtime might decide it is of better performance to run synchronously.
  • WithCancellation –> for early cancellation with cancellatkiontoken.

If you return a result from a PLINQ query, try to use ForAll, because it allows for the further use of parallelization, if you would use foreach all items need to be synchronized on one thread, which will impede performance.

Classic Map Reduce is well suited for PLINQ.

// Map Reduce under construction

Summary

Parallel programming is essential to keep up with the ever increasing complexity and huge amounts of data that need to be processed in a performant way.

.Net offers with TPL and Pfx some ways to do this in a way, that is quite approachable from the standpoint of the API.

There is the Parallel static class, that offers Parallel.Invoke(…) which can be supplied with a arbitrary number of computations that you want to run in parallel. As for loops you can select from Parallel.For and Parallel.ForEach which both are the parallel counterpart to their synchronous name siblings. They both use delegate invocation and process in a random order.

The parallel loops can be controlled with the ParallelLoopState object.

Another way that is even more readable is the use of PLINQ, which is an extension of simple LINQ with a performance boost.

Parallel programming might be the way to go, but before you paralllelize something you should always build the best possible single threaded implementation first, even if it simply is used as a benchmark for your performance.

Categories: TPL

0 Comments

Leave a Reply