Fangfang Lee on Linear Algebra for ML

IBM Developer Advocate Fangfang Lee explains how linear algebra, including concepts like vectors, matrices, and SVD, is fundamental to machine learning.

6 min read
Fangfang Lee, Developer Advocate at IBM, stands against a black background, gesturing towards mathematical concepts written on a board.
How Linear Algebra Powers Machine Learning (ML) — IBM on YouTube

In a recent IBM Think Series video, Fangfang Lee, a Developer Advocate at IBM, breaks down the fundamental role of Linear Algebra in Machine Learning. Lee, a prominent figure in the developer advocacy space, explains how the seemingly abstract mathematical concepts are directly applicable to how computers process and learn from data, particularly in the context of machine learning models.

Who is Fangfang Lee?

Fangfang Lee serves as a Developer Advocate at IBM, a position that places her at the intersection of technology and the developer community. In this capacity, she is instrumental in translating complex technical concepts into accessible information for developers, fostering understanding and adoption of IBM's technologies, particularly in the realm of artificial intelligence and data science. Her role involves not just explaining how technologies work, but also illustrating their practical applications and the underlying principles that make them powerful.

The Language of Machines: Data Representation

Lee begins by addressing a core challenge in machine learning: how computers, which fundamentally operate on numbers, can interpret and process diverse forms of data like images, audio, and text. She explains that these raw data inputs must be translated into a language that computers can understand and manipulate. This translation process is where linear algebra becomes indispensable. Lee outlines the four fundamental data types used in this translation:

The full discussion can be found on IBM's YouTube channel.

How Linear Algebra Powers Machine Learning (ML) - IBM
How Linear Algebra Powers Machine Learning (ML) — from IBM
  • Scalars: A single numerical value, representing a magnitude or quantity.
  • Vectors: A one-dimensional array of numbers, representing a sequence or list of magnitudes.
  • Matrices: A two-dimensional array of numbers, organized into rows and columns, capable of representing relationships between data points.
  • Tensors: Multi-dimensional arrays of numbers, capable of representing more complex data structures with multiple features or dimensions.

Lee emphasizes that linear algebra provides the mathematical framework to represent, manipulate, and understand the relationships within these data structures, which is critical for training and operating machine learning models.

Linear Algebra as the Foundation for Machine Learning

The core thesis of Lee's explanation is that linear algebra is not just an academic subject but a practical tool that underpins machine learning. She illustrates how raw data, whether it's an image, a piece of text, or an audio clip, is converted into these mathematical objects. For instance, an image can be represented as a matrix where each entry corresponds to a pixel's intensity or color value. Text can be tokenized and converted into vectors, with each dimension representing a word or concept.

Lee states, "Computers cannot process images, text, audios or videos directly like humans. Instead, we need to translate these inputs into a language they can understand." This translation is achieved by leveraging linear algebra, allowing data to be represented as scalars, vectors, matrices, or tensors. She further elaborates that within this high-dimensional space, linear algebra enables the measurement, comparison, and learning of patterns and relationships within the data. This capability is crucial for tasks such as image recognition, natural language processing, and recommendation systems.

Key Mathematical Concepts in ML

Lee highlights specific linear algebra concepts and their relevance:

  • Scalars: Represent single data points or parameters.
  • Vectors: Represent features of data, such as the pixel values of an image or the word embeddings of text.
  • Matrices: Represent datasets or transformations, where rows might correspond to data samples and columns to features, or where matrices represent linear transformations applied to data.
  • Tensors: Generalize matrices to higher dimensions, used for more complex data like video (which has time, height, width, and color channels) or for representing parameters in deep neural networks.

She explains that the operations performed on these structures—such as dot products, matrix multiplication, and decomposition—are the building blocks of machine learning algorithms. For example, the calculation of similarity between two data points, represented as vectors, can be done using cosine similarity, which relies on the dot product and vector magnitudes.

Singular Value Decomposition (SVD) for Data Understanding

A significant portion of Lee's explanation focuses on Singular Value Decomposition (SVD), a powerful linear algebra technique widely used in machine learning. SVD decomposes a matrix into three other matrices: U, Sigma (Σ), and V transpose (V^T). Lee explains that this decomposition is instrumental in extracting meaningful information and reducing the complexity of data.

"SVD is one of the most important operations in machine learning," Lee states, highlighting its utility. She illustrates how SVD can break down a large matrix (representing a dataset) into smaller, more manageable matrices that capture the most significant features or patterns within the data. This process is fundamental for:

  • Dimensionality Reduction: By keeping only the most significant singular values and their corresponding vectors, SVD can reduce the number of features while retaining most of the important information, making models more efficient and less prone to overfitting.
  • Feature Extraction: The resulting matrices from SVD can be interpreted as latent features that represent underlying patterns in the data, which can be used for tasks like recommendation systems or topic modeling.
  • Noise Reduction: By discarding less significant singular values, SVD can effectively filter out noise from the data.

Lee uses the analogy of a movie dataset, where users (rows) rate movies (columns). SVD can decompose this large, sparse matrix into smaller matrices representing user latent features and movie latent features, enabling a recommendation engine to suggest movies based on user preferences.

Practical Applications in ML Frameworks

Lee connects these linear algebra concepts to practical machine learning frameworks. She notes that libraries like PyTorch, Keras, and TensorFlow are built upon these mathematical operations. "When you use these libraries, you are indirectly using linear algebra," she explains. For instance, when a text is converted into numerical representations (embeddings) and then fed into a neural network, operations like matrix multiplication and dot products are performed on these vectors and matrices.

The ability to represent data as matrices and tensors, and to perform operations such as SVD, allows machine learning models to learn complex relationships and make predictions. This is exemplified by the process of calculating similarity between text documents or images, which often involves converting them into vectors and then computing their cosine similarity, a direct application of dot product and vector norms.

Conclusion: The Indispensable Role of Linear Algebra

Fangfang Lee's explanation underscores that linear algebra is not merely a theoretical subject but a foundational pillar of modern machine learning. From the basic representation of data as scalars, vectors, matrices, and tensors, to advanced decomposition techniques like SVD for dimensionality reduction and feature engineering, these mathematical tools enable computers to process, analyze, and learn from vast amounts of data. Understanding these concepts is crucial for anyone seeking to delve deeper into the mechanics of artificial intelligence and build effective machine learning models.