Recent
Going Big and Small for 2025
2024 marked a significant evolution in Burn's architecture. Traditional deep learning frameworks often require developers to compromise between performance, portability, and flexibility; we aimed to transcend these trade-offs. Looking ahead to 2025, we are committed to applying this philosophy across the entire computing stack, encompassing everything from embedded devices to data centers.
Burn 0.15.0 Release Notes
This release brings major performance improvements to tensor operations, particularly in matrix multiplication and convolution, along with experimental ROCm/HIP and SPIR-V support enabled by CubeCL runtimes. It also introduces foundational features for multi-backend compatibility and adds new quantization operations.
Becoming the Fastest: Introduction
In the rapidly evolving landscape of artificial intelligence, one truth stands paramount: size matters. However, the future of AI shouldn't be constrained by hardware monopolies or software limitations, and this is where Burn and CubeCL come in.
Becoming the Fastest
Going Big and Small for 2025
2024 marked a significant evolution in Burn's architecture. Traditional deep learning frameworks often require developers to compromise between performance, portability, and flexibility; we aimed to transcend these trade-offs. Looking ahead to 2025, we are committed to applying this philosophy across the entire computing stack, encompassing everything from embedded devices to data centers.
Becoming the Fastest: Introduction
In the rapidly evolving landscape of artificial intelligence, one truth stands paramount: size matters. However, the future of AI shouldn't be constrained by hardware monopolies or software limitations, and this is where Burn and CubeCL come in.
Technical Posts
Optimal Performance without Static Graphs by Fusing Tensor Operation Streams
This post explores Burn's tensor operation stream strategy, optimizing models through an eager API by creating custom kernels with fused operations. Our cusotm GELU experiment reveals a remarkable improvement of up to 78 times on our WGPU backend.
Autotune for GPU Kernels: Ensuring Consistent Peak Performance
Crafting high-performance GPU kernels for common deep learning operations, such as matrix multiplication (matmul) and reduction, requires finesse. The speed of these kernels varies depending on input shapes and the GPU device in use, meaning the fastest one may change based on the context. In Burn, Autotune automates the task of dynamically performing kernel selection, allowing one to create a plethora of kernel variations with confidence that the best-performing one will be executed in every situation.
Creating High Performance Asynchronous Backends With Burn-Compute
Developing new high-performance deep learning backends in Burn has become remarkably easy, as it can be readily enhanced with advanced capabilities such as asynchronous computations, intelligent memory management, and autotuning mechanisms. The innovative Burn-Compute crate lays the architectural foundation for in-house backends, effortlessly equipping them with advanced features to maximize efficiency.
Burn's New Cross-Platform GPU Backend
Introducing Burn's new Cross-Platform GPU Backend built using WGPU. Burn now supports running deep learning models on a variety of hardware configurations, leveraging graphics APIs such as Vulkan, DirectX 11/12, Metal, OpenGL, and WebGPU. We discuss the possible applications in various domains and glimpse into the promising future of the framework.
Reduced Memory Usage: Burn's Rusty Approach to Tensor Handling
The latest release of Burn includes significant changes to its memory management strategy, and tensor-allocated memory can now be reused way more often. Overall, these changes significantly reduce memory usage, especially on the CPU compared to PyTorch.
A Case for Rust in Deep Learning
In this blog post, we'll explore the case for Rust in deep learning and why it may be a better option than Python. With its ability to handle complexity through safe and concurrent abstractions, Rust has the potential to tackle this field's biggest challenges in a way that Python cannot.
Tutorials
Building Blocks #1: Dataset & Data Loading
Burn provides key components that serve as the building blocks of the framework and your deep learning projects. The first entry in the Building Blocks series explores the dataset and batcher traits, and how they fit into Burn's data loading process.
Transitioning From PyTorch to Burn
In this updated tutorial, we'll implement the popular ResNet family of models and import ImageNet pre-trained weights available online.
Release Notes
Burn 0.15.0 Release Notes
This release brings major performance improvements to tensor operations, particularly in matrix multiplication and convolution, along with experimental ROCm/HIP and SPIR-V support enabled by CubeCL runtimes. It also introduces foundational features for multi-backend compatibility and adds new quantization operations.
Burn 0.14.0 Release Notes
This release marks the debut of our CubeCL integration, which brings cross-platform GPU programming capabilities directly to Rust. As always, it also includes numerous bug fixes, performance enhancements, new tensor operations, and improved documentation.
Burn 0.13.0 Release Notes
Burn 0.13 introduces major performance enhancements, new tensor operations, improved autodiff, Just-in-Time backend refactoring, and numerous feature additions across modules, optimizers, and backends.