Multithreading • Block matrices • Comparing algorithms
Consider a modification of Strassen’s algorithm for square matrix multiplication with fewer number of summations between blocks than in the ordinary algorithm — 15 instead of 18 and the same number of multiplications as in the ordinary algorithm — 7. We will use Java Streams.
Recursive partitioning of matrices into blocks during multiplication makes sense up to a certain limit, and then it loses its sense, since the Strassen’s algorithm does not use cache of the execution environment. Therefore, for small blocks we will use a parallel version of nested loops, and for large blocks we will perform recursive partitioning in parallel.
We determine the boundary between the two algorithms experimentally — we adjust it to the cache of the execution environment. The benefit of Strassen’s algorithm becomes more evident on sizable matrices — the difference with the algorithm using nested loops becomes larger and depends on the execution environment. Let’s compare the operating time of two algorithms.
Matrix multiplication in parallel streams
Multithreading • Nested loops • Comparing algorithms
Consider an algorithm for multiplying rectangular matrices using Java Streams. Let’s take the optimized version of the algorithm using three nested loops and replace the outer loop with a stream. Let’s compare the operating time of two algorithms.
We process the rows of the resulting matrix in parallel mode, and populate each row layerwise. Due to
the parallel use of cache of the execution environment on multi-core machines, the computation time can
be reduced by more than two times. To check, let’s take two rectangular matrices {L×M
} and {M×N
}.
Transpose • Comparing algorithms
Consider the algorithm for rotating a matrix 180 degrees. Unlike the transpose algorithm, here in the resulting matrix the rows and columns are not swapped, but are mirrored.
Let’s write a method in Java to rotate a matrix {m×n
} 180 degrees. As an example,
let’s take a rectangular matrix {4×3
}.
Transpose • Comparing algorithms
Consider the algorithm for rotating a matrix 90 degrees clockwise and anticlockwise. This algorithm is similar to matrix transpose, with the difference that here, for each point, one of the coordinates is mirrored.
Let’s write a method in Java to rotate a matrix {m×n
} 90 degrees. The additional parameter
sets the direction of rotation: clockwise or anticlockwise. As an example, let’s take a
rectangular matrix {4×3
}.
Optimizing matrix multiplication
Permutations • Nested loops • Comparing algorithms
Consider an algorithm for multiplying matrices using three nested loops. The complexity of such
an algorithm by definition should be O(n³)
, but there are particularities related to the execution
environment — the speed of the algorithm depends on the sequence in which the loops are executed.
Let’s compare different permutations of nested loops and the execution time of the algorithms.
Let’s take two matrices: {L×M
} and {M×N
} → three loops → six permutations:
LMN
, LNM
, MLN
, MNL
, NLM
, NML
.
The algorithms that work faster than others are those that write data to the resulting matrix
row-wise in layers: LMN
and MLN
, — the percentage difference to other algorithms is substantial
and depends on the execution environment.
© Golovin G.G., Code with comments, translation from Russian, 2025