Matrix Multiplication Using Nested Loops

LAS: Locality-Aware Scheduling for GEMM-Accelerated Convolutions in GPUs

Abstract: This article presents a graphics processing unit (GPU) scheduling scheme that maximizes the exploitation of data locality in deep neural networks (DNNs). Convolution is one of the ...

IEEE

A Novel Hilbert Curve for Cache-Locality Preserving Loops

Abstract: Modern microprocessors offer a rich memory hierarchy including various levels of cache and registers. Some of these memories (like main memory, L3 cache) are big but slow and shared among ...

GitHub

FLUX: A Deep Learning Framework in C++ Built from First Principles

FLUX is an educational deep learning framework that reimplements the core functionality of PyTorch and TensorFlow from scratch, using only C++ and the Standard Template Library. No external ...

unite

Flash Attention: Revolutionizing Transformer Efficiency

As transformer models grow in size and complexity, they face significant challenges in terms of computational efficiency and memory usage, particularly when dealing with long sequences. Flash ...

C&EN

Efficient and Parallel Implementation of Real and Complex Response Functions Employing the Second-Order Algebraic-Diagrammatic Construction Scheme for the Polarization Propagator

Division of Theoretical Chemistry and Biology, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, Stockholm SE-100 44, Sweden ...

GitHub

Show inaccessible results

LAS: Locality-Aware Scheduling for GEMM-Accelerated Convolutions in GPUs

A Novel Hilbert Curve for Cache-Locality Preserving Loops

FLUX: A Deep Learning Framework in C++ Built from First Principles

Flash Attention: Revolutionizing Transformer Efficiency

Efficient and Parallel Implementation of Real and Complex Response Functions Employing the Second-Order Algebraic-Diagrammatic Construction Scheme for the Polarization Propagator

Counting and printing prime numbers of an array.c

Triaxial closed-loop measurement based on a single-beam zero-field optically pumped magnetometer

ANNarchy: a code generation approach to neural simulations on parallel hardware