-
A Deterministic View of Diffusion Models
In this post, we present a deterministic perspective on diffusion models. In this approach, neural networks are trained as an inverse function of the deterministic diffusion mapping that progressively corrupts images at each time step. This method simplifies the derivation of diffusion models, enabling us to fully explain and derive them using only a few straightforward mathematical equations.
-
Understanding Transformers and GPT: An In-depth Overview
In this post, we delve into the technical details of the widely used transformer architecture by deriving all formulas involved in its forward and backward passes step by step. By doing so, we can implement these passes ourselves and often achieve more efficient performance than using autograd methods. Additionally, we introduce the technical details on the construction of the popular GPT-3 model using the transformer architecture.
-
Machine Learning Fundamentals
Supplementary materials for the book "Machine Learning Fundamentals", published by Cambridge University Press.