#### Intel<sup>®</sup> oneAPI Math Kernel Library (oneMKL)



October 2022

Gennady Fedorov, Technical Consulting Engineer, Intel Architecture, Graphics & Software (IAGS)

## Intel® oneAPI Base Toolkit

Core set of frequently used tools and libraries for developing high-performance applications across diverse architectures—CPU, GPU, FPGA.

#### Who Uses It?

- A broad range of developers across industries
- Add-on toolkit users because this is the base for all toolkits

#### **Top Features/Benefits**

- Data Parallel C++ (DPC++) compiler, library, and analysis tools
- DPC++ Compatibility tool helps migrate existing CUDA code
- Python distribution includes accelerated scikit-learn, NumPy, SciPy libraries
- Optimized performance libraries for threading, math, data analytics, deep learning, and video/image/signal processing



2

## Intel® oneAPI Base Toolkit

Core set of frequently used tools and libraries for developing high-performance applications across diverse architectures—CPU, GPU, FPGA.

#### Who Uses It?

- A broad range of developers across industries
- Add-on toolkit users because this is the base for all toolkits

#### **Top Features/Benefits**

- Data Parallel C++ (DPC++) compiler, library, and analysis tools
- DPC++ Compatibility tool helps migrate existing CUDA code
- Python distribution includes accelerated scikit-learn, NumPy, SciPy libraries
- Optimized performance libraries for threading, math, data analytics, deep learning, and video/image/signal processing





#### What's New, Domain Area Updates

- BLAS\_64/Lapack\_64 API Extensions
- MKL GPU Verbose mode
- Demo
- References

# What's New for Intel® oneAPI MKL

- Data Parallel C++ (DPC++) APIs maximize performance and cross-architecture portability
- Introduces C and Fortran OpenMP offload for Intel® GPU acceleration
  - Support for Intel® Processor Graphics (GPU) <u>ttps://software.intel.com/content/www/us/en/develop/articles/oneapi-math-kernel-library-system-</u> <u>requirements.html</u>
- oneAPI MKL Specification: <u>https://spec.oneapi.com/versions/latest/elements/oneMKL/source/domains/domains.h</u> <u>tml</u>
- One MKL Open-Source interface: <a href="https://github.com/oneapi-src/oneMKL">https://github.com/oneapi-src/oneMKL</a>
- Intel MKL continues to provide support for the same C and Fortran APIs for CPUs

5

#### What's Inside Intel® MKL



# Intel® oneAPI Math Kernel Library (oneMKL), cont



7

#### Intel<sup>®</sup> oneAPI MKL, BLAS, update

- BLAS, Netlib interfaces
  - USM and Buffer API ALL
  - C/Fortran Offloading ALL

OpenMP offload to support the OpenMP\* 5.1 specification

BLAS Extensions

BLAS Level 1 Routines and Functions

- cblas ?asum
- cblas\_?axpy
- cblas ?copy
- cblas\_?dot
- cblas\_?sdot
- cblas\_?dotc
- cblas\_?dotu
- cblas\_?nrm2
- cblas\_?rot
- cblas ?rotg
- cblas\_?rotm
- cblas\_?rotmg
- cblas\_?scal
- cblas\_?swap
- i?amax
- i?amin
- cblas ?cabs1

BLAS Level 2 Routines

- cblas\_?gbmv cblas\_?gemv
- cblas ?ger
- cblas\_?gerc
- cblas\_?geru
- cblas ?hbmv
- cblas ?hemv
- cblas\_?her
- cblas\_?her2
- cblas\_?hpmv
- cblas\_?hpr
- cblas\_?hpr2
- cblas\_?sbmv
- cblas\_?spmv
- cblas\_?spr
- cblas\_?spr2
- cblas ?symv
- cblas\_?syr
- cblas\_?syr2
- cblas\_?tbsv
- cblas\_?tpmv

- cblas ?trsv

#### intel

8

- cblas\_?tbmv

- cblas\_?tpsv
- cblas\_?trmv



- cblas\_?gemm
- cblas\_?hemm cblas\_?herk

cblas ?her2k

cblas ?symm

cblas\_?syrk

cblas\_?syr2k

cblas\_?trmm

cblas\_?trsm

#### Intel® oneAPI MKL, BLAS, update,

#### cont.

#### BLAS, Netlib interfaces

- USM and Buffer API ALL
- C/Fortran Offloading ALL
- BLAS Extensions

| CPU                                        | OpenMP Offload Intel GPU                   |
|--------------------------------------------|--------------------------------------------|
| {AXPY,GEMM,TRSM}_BATCH (group and strided) | {AXPY,GEMM,TRSM}_BATCH (group and strided) |
| GEMMT, АХРВҮ, GEMM3M                       | GEMMT                                      |
| Integer GEMM (s8u8)                        | N/A                                        |
| Bfloat16 GEMM                              | N/A                                        |
| JIT GEMM API                               | N/A                                        |
| PACK GEMM API                              | N/A                                        |
| COMPACT GEMM API                           | N/A                                        |

#### Intel<sup>®</sup> oneAPI MKL, Sparse BLAS update

- Supported API:
  - gemm, gemv, trmv, trsv, symv
  - Buffer and USM API
  - C/C++ OpenMP Offloading (+ sp2m, mkl\_sparse\_?\_mm)
- Improved performance of DPC++ oneapi::mkl::sparse::matmat for small to medium sizes
- Limitations:
  - CSR format only (CSC, COO, BSR, DIA and SKY) no plans

#### Intel® oneAPI MKL, LAPACK update

DPC++ interfaces for selected routines from the Linear Algebra PACKage:

- Linear Equation Routines for solving, factoring, inverting tasks: QR, LU, Bunch-Kaufman, Cholesky, GETRI, GETRS, POTRS, TRTRS...
- Singular Value and Eigenvalue Problem Routines: GESVD, HEEVD, SYEVD, SYTRD, HEGVD, HETRD, ORGBR .....
- USM and Buffer API
- C/Fortran Offloading
- OpenMP offload to support the OpenMP\* 5.1 specification

#### Intel<sup>®</sup> oneAPI MKL, LAPACK update, cont.

#### LAPACK Like Extensions:

- An additional routines to extend the functionality of the LAPACK routines. These include routines to compute many independent factorizations, linear equation solutions, and similar
- GEQRF\_BATCH, GETRF\_BATCH, GETRI\_BATCH, GETRS\_BATCH, ORGQR\_BATCH, POTRF\_BATCH,

#### POTRS\_BATCH, UNGQR\_BATCH

- <a href="https://spec.oneapi.io/versions/latest/elements/oneMKL/source/domains/lapack/lapack.html">https://spec.oneapi.io/versions/latest/elements/oneMKL/source/domains/lapack/lapack.html</a>
- Buffer and USM API.
- C/Fortran Offloading (OpenMP offload to support the OpenMP\* 5.1 specification )

#### Intel® oneAPI MKL, FFT update

- Supported modes:
  - Buffer and USM API
  - The same precision, domains, dimensions\*, placement, batch, layout modes
  - C/Fortran Offloading all FFT API (Level0 or OpenCL backends)

descriptor\_t desc({N2, N1});

desc.set\_value(oneapi::mkl::dft::config\_param::NUMBER\_OF\_TRANSFORMS, BATCH); desc.set\_value(oneapi::mkl::dft::config\_param::FWD\_DISTANCE, N1\*N2); desc.set\_value(oneapi::mkl::dft::config\_param::BWD\_DISTANCE, N1\*N2); desc.set\_value(oneapi::mkl::dft::config\_param::BACKWARD\_SCALE, (1.0/(N1\*N2))); desc.commit(queue);

\* 1,2 and 3 dimensions

#### config\_param

enum class config\_param {

FORWARD\_DOMAIN, DIMENSION, LENGTHS, PRECISION,

FORWARD\_SCALE, BACKWARD\_SCALE,

NUMBER\_OF\_TRANSFORMS,

COMPLEX\_STORAGE, REAL\_STORAGE, CONJUGATE\_EVEN\_STORAGE,

PLACEMENT,

INPUT\_STRIDES, OUTPUT\_STRIDES,

FWD\_DISTANCE, BWD\_DISTANCE,

WORKSPACE, ORDERING, TRANSPOSE, PACKED\_FORMAT, COMMIT\_STATUS };

# Agenda

What's New, Domain Area Updates

#### BLAS\_64/Lapack\_64 API Extensions

- MKL GPU Verbose mode
- Demo
- References

## BLAS\_64/Lapack\_64 API Extensions

- Using BLAS and LAPACK with the 32-bit and 64-bit interface (lp64 / ilp64) at the same time
- BLAS\_64 and LAPACK\_64 NetLib interfaces
- Declaration: mkl\_blas\_64.h, mkl\_lapack.h
- Limitations :
  - Intel64 only.
  - no Fortran API at this moment
  - no mkl\_lapacke.h (LAPACKE\_cgetrf(\*....)) available since v.2022 update 2
  - CPU only

1/4

# Agenda

- What's New, Domain Area Updates
- BLAS\_64/Lapack\_64 API Extensions
- MKL GPU Verbose mode
- Demo
- References

## MKL Verbose mode

- CPU support
  - set/export MKL\_VERBOSE=0\*|1
  - mkl\_verbose(int), mkl\_verbose\_output\_file(char\*)
- BLAS, FFT, LAPACK, ScaLAPACK
  - BLAS no JIT, imatcopy, omatcopy...
  - ScaLAPACK:
    - P?POTRF, P?TRTRI, PDSYEV{D, R, X} and PZHEEV{D, R, X}.
    - All MPI ranks will print MKL\_VERBOSE output.
  - RNG TBD, VML no plans
  - SpBLAS, Solvers

# MKL Verbose mode, cont.

#### Examples:

MKL\_VERBOSE oneMKL 2021.0 Update 4 Product build 20210904 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.20GHz lp64 intel\_thread

MKL\_VERBOSE **DGEMM**(N,N,1280,1280,0x7ffe1f04eb78,0x2b3062d68080,1280,0x2b306 39e9080,1280,0x7ffe1f04eb80,0x2b306466a080,1280) 7.09ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:28

MKL\_VERBOSE

FFT(dcbi<mark>5x13x7</mark>,tLim:22,desc:0xeb6d80) 93.11us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:44

## MKL GPU Verbose mode, cont.

GPU Verbosing – extension of the existing env variable and run time functions

To change the verbose mode, do one of the following:

• set the environment variable *MKL\_VERBOSE* 

|                                | CPU Targets        | GPU Targets                              |
|--------------------------------|--------------------|------------------------------------------|
| (default) Set MKL_VERBOSE to 0 | to disable verbose | to disable verbose                       |
| Set MKL_VERBOSE to 1           | to enable verbose  | to enable verbose without timing         |
| Set MKL_VERBOSE to 2           | to enable verbose  | to enable verbose with synchronous timir |

• Or call the support function mkl\_verbose(int mode)

|                               | CPU Targets        | GPU Targets                              |
|-------------------------------|--------------------|------------------------------------------|
| (default) Call mkl_verbose(0) | to disable verbose | to disable verbose                       |
| <i>Call</i> mkl_verbose(1)    | to enable verbose  | to enable verbose without timing         |
| <i>Call</i> mkl_verbose(2)    | to enable verbose  | to enable verbose with synchronous timir |

## Intel<sup>®</sup> oneMKL Resources

| Intel® oneMKL Product Page         | https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html                               |
|------------------------------------|----------------------------------------------------------------------------------------------------------|
| Get Started with Intel® oneMKL     | https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-mkl-for-dpcpp/top.html    |
| Intel® oneMKL Developer Reference  | https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top.html      |
| Intel® oneMKL Developer Guide      | https://www.intel.com/content/www/us/en/develop/documentation/onemkl-windows-developer-guide/top.html    |
| Intel® oneMKL Specification        | https://spec.oneapi.io/versions/latest/elements/oneMKL/source/index.html                                 |
| Intel® oneMKL Open-Source Interfac | https://github.com/oneapi-src/oneMKL                                                                     |
| Intel® oneMKL Release Notes        | https://cqpreview.intel.com/content/www/us/en/developer/articles/release-notes/onemkl-release-notes.html |
| Intel® oneMKL Forum                | https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/bd-p/oneapi-math-kernel-library          |

## Notices & Disclaimers

Intel technologies may require enabled hardware, software or service activation. Learn more at intel.com or from the OEM or retailer.

Your costs and results may vary.

Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.

**Optimization Notice:** Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804. <a href="https://software.intel.com/en-us/articles/optimization-notice">https://software.intel.com/en-us/articles/optimization-notice</a>

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.

Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. See backup for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See configuration disclosure for details. No product or component can be absolutely secure.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

#