← Previous · All Episodes · Next →
Navigating the C++ Jungle: A Complete Guide to Contributing to PyTorch Internals Episode

Navigating the C++ Jungle: A Complete Guide to Contributing to PyTorch Internals

· 03:13

|

If you’ve ever used PyTorch and whispered to yourself, “I’d love to contribute, but… that C++ wilderness? No thanks,” then this blog post by Edward Z. Yang is your golden map through the jungle. In his detailed and imminently useful guide, “PyTorch Internals,” Edward demystifies the architecture behind one of the most important deep learning libraries of our time. From unpacking the logical and physical layer of Tensors—yes, those mysterious strides and views!—to understanding the dispatch system that routes your code through the right kernels, this comprehensive talk lays out everything you need to know if you're serious about hacking on PyTorch. Whether you want to write your own kernel, fix some legacy code from the TH "badlands," or just grok how autograd magic happens under the hood, this post is basically the bootcamp you didn't know you needed. Edward says it best: “If the largeness of PyTorch’s C++ codebase is the first gatekeeper... the efficiency of your workflow is the second.” So grab your machete—we're hacking through PyTorch internals!

Key Points:

  • Tensors are PyTorch’s central data structure, consisting of underlying storage and metadata like size, dtype (data type), device (e.g., CPU or CUDA), and stride—the last of which enables memory-efficient views on data.

  • Strides allow tensors to share data and form views without copying, a concept key to PyTorch's performance and memory efficiency. For example, tensor[1, :] doesn't copy data—it's just a different view of the same memory.

  • In PyTorch, execution of operations involves a two-level dispatch:

    • First dispatch: Based on device and data layout (e.g., CPU vs. CUDA).
    • Second dispatch: Based on dtype (e.g., int vs. float) using macros like AT_DISPATCH_ALL_TYPES.
      This structure allows hooking into the right implementation in complex environments.
  • Tensor extensions are built on the “trinity” of device, layout, and dtype. All valid tensor types exist at the Cartesian product of these parameters. If you need to go beyond this, a wrapper class is preferred—unless backward autograd support is needed.

  • PyTorch supports automatic differentiation by wrapping tensors with metadata (AutogradMeta) and recording operations for backward execution. This is core to the loss.backward() call that trains your model.

  • Main directories in PyTorch’s massive C++ codebase:

    • torch/: Python interface.
    • torch/csrc/: Python-C++ bindings and autograd engine.
    • aten/: "A Tensor Library"—where most operator kernels live.
    • c10/: Core abstractions like Tensor and Storage.
  • Writing new PyTorch ops involves:

    • Declaring operator schemas.
    • Implementing kernel logic with error checking and type dispatching.
    • Making versions like abs, abs_out, and abs_.
    • Tools include TensorIterator, TensorAccessor, and Vec256 for SIMD performance.
  • The infamous legacy TH (Torch) code is written in a macro-heavy C style, manually refcounted, and should generally be avoided or modernized into ATen.

  • Pro tips for contributors:

    • Avoid editing headers to minimize rebuild times.
    • Use local environments with ccache and avoid CUDA builds on your laptop.
    • Start with triaged issues and documentation PRs to get familiar without deep dives.
  • Excellent starting resources include PyTorch’s CONTRIBUTING.md, native_functions.yaml for operator schemas, and issues with “triaged” or “small” labels.

Notable quote: “If you have written Java in the old days… substrings of strings had a similar problem [as tensor views], because by default no copy is made.” —Edward Z. Yang, on why understanding memory layout is so important.

This guide is a must-read for anyone curious about unlocking contribution superpowers in PyTorch. Recommended not just for devs, but also for power users who want to understand what's happening under the hood when they go tensor[1,:] on an 8GB model.
Link to Article


Subscribe

Listen to jawbreaker.io using one of many popular podcasting apps or directories.

Apple Podcasts Spotify Overcast Pocket Casts Amazon Music
← Previous · All Episodes · Next →