High Performance Linear Algebra

This course will cover topics in Linear Algebra, specifically concerning the way to implement them on modern computer architectures and clusters. The course will cover both the theoretical aspects underlying the routine we want to implement, both the the implementation aspects that are needed to achieve high performance; while specifying what do we actually mean by that.

Main list of the topics

Vector and Matrix Products
- Inner product
- Outer product
- Matrix-vector product
- Matrix-matrix product
- Row, column, and submatrix partitioning
LU Factorization
- ijk forms of Gaussian elimination
- Row, column, and submatrix partitioning
- Partial pivoting and alternatives
Cholesky Factorization
- ijk forms of Cholesky factorization
- Memory access patterns
- Data dependences
Triangular, band and tridiagonal Systems
- Row vs. column partitioning
- Fan-in and fan-out algorithms
- Wavefront algorithms
- Cyclic algorithms
- Cyclic reduction
Sparse BLAS
- Matrix-vector product
- Matrix-matrix product
- Matrix storage formats
Implementation of Krylov iterative solvers
- CG and GMRES
Distributed Sparse and Dense BLAS
- Data distribution
- Handling of the parallel environment
- Programming models
How to write efficient threaded and GPU accelerated implementation of the different LA routines.

The final exam consists of the implementation and testing of a scalable algorithm for linear algebra, or the discussion and use within an application.

Fabio Durastante Associate Professor

High Performance Linear Algebra

Main list of the topics