Blog

NCCL Basics

Published:

An introduction to NVIDIA Collective Communications Library (NCCL) for efficient multi-GPU communication.

Cpp Tricks

Published:

A collection of useful C++ tricks and tips for developers.

ScaLAPACK

Published:

How to use ScaLAPACK for parallel linear algebra computations on distributed-memory systems.

MPI

Published:

Distributed memory coding.

Why Qiskit Estimator is Fast?

Published:

Discover the speed advantages of Qiskit Estimator for quantum computing tasks. Learn how it optimizes performance and efficiency in quantum simulations.

CPU Arch Optimizations

Published:

An in-depth look at CPU architecture optimizations and their impact on performance.

HIF

Published:

层次插值分解

LLM

Published:

学点大模型

GPU human docs

Published:

A collection of GPU-related human documentation.