Problem solutions and solution descriptions

Multithreading • Block matrices • Comparing algorithms

Consider a modification of Strassen’s algorithm for square matrix multiplication with *fewer* number
of summations between blocks than in the ordinary algorithm — 15 instead of 18 and the same number
of multiplications as in the ordinary algorithm — 7. We will use Java Streams.

Recursive partitioning of matrices into blocks during multiplication makes sense up to a certain limit, and then it loses its sense, since the Strassen’s algorithm does not use cache of the execution environment. Therefore, for small blocks we will use a parallel version of nested loops, and for large blocks we will perform recursive partitioning in parallel.

We determine the boundary between the two algorithms experimentally — we adjust it to the cache of the execution environment. The benefit of Strassen’s algorithm becomes more evident on sizable matrices — the difference with the algorithm using nested loops becomes larger and depends on the execution environment. Let’s compare the operating time of two algorithms.

Matrix multiplication in parallel streams

Multithreading • Nested loops • Comparing algorithmsConsider an algorithm for multiplying rectangular matrices using Java Streams. Let’s take the
*optimized version* of the algorithm using three nested loops and replace the outer loop with
a stream. Let’s compare the operating time of two algorithms.

We process the rows of the resulting matrix in parallel mode, and populate each row layerwise. Due to
the parallel use of cache of the execution environment on multi-core machines, the computation time can
be reduced by more than two times. To check, let’s take two rectangular matrices {`L×M`

} and {`M×N`

}.

Transpose • Comparing algorithms

Consider the algorithm for rotating a matrix 180 degrees. Unlike the *transpose* algorithm,
here in the resulting matrix the rows and columns are not swapped, but are mirrored.

Let’s write a method in Java to rotate a matrix {`m×n`

} 180 degrees. As an example,
let’s take a rectangular matrix {`4×3`

}.

Transpose • Comparing algorithms

Consider the algorithm for rotating a matrix 90 degrees clockwise and anticlockwise. This
algorithm is similar to *matrix transpose*, with the difference that here, for each point,
one of the coordinates is mirrored.

Let’s write a method in Java to rotate a matrix {`m×n`

} 90 degrees. The additional parameter
sets the direction of rotation: clockwise or anticlockwise. As an example, let’s take a
rectangular matrix {`4×3`

}.

Optimizing matrix multiplication

Permutations • Nested loops • Comparing algorithmsConsider an algorithm for multiplying matrices using three nested loops. The complexity of such
an algorithm by definition should be `O(n³)`

, but there are particularities related to the execution
environment — the speed of the algorithm depends on the sequence in which the loops are executed.

Let’s compare different permutations of nested loops and the execution time of the algorithms.
Let’s take two matrices: {`L×M`

} and {`M×N`

} → three loops → six permutations:
`LMN`

, `LNM`

, `MLN`

, `MNL`

, `NLM`

, `NML`

.

The algorithms that work faster than others are those that write data to the resulting matrix
*row-wise in layers*: `LMN`

and `MLN`

, — the percentage difference to other algorithms is substantial
and depends on the execution environment.

© Golovin G.G., Code with comments, translation from Russian, 2024