PEP 465 – A dedicated infix operator for matrix multiplication | peps.python.org (2024)

Author:
Nathaniel J. Smith <njs at pobox.com>
Status:
Final
Type:
Standards Track
Created:
20-Feb-2014
Python-Version:
3.5
Post-History:
13-Mar-2014
Resolution:
Python-Dev message
Table of Contents
  • Abstract
  • Specification
  • Motivation
    • Executive summary
    • Background: What’s wrong with the status quo?
    • Why should matrix multiplication be infix?
    • Transparent syntax is especially crucial for non-expert programmers
    • But isn’t matrix multiplication a pretty niche requirement?
    • So @ is good for matrix formulas, but how common are those really?
    • But isn’t it weird to add an operator with no stdlib uses?
  • Compatibility considerations
  • Intended usage details
    • Semantics
    • Adoption
  • Implementation details
  • Rationale for specification details
    • Choice of operator
    • Precedence and associativity
    • (Non)-Definitions for built-in types
    • Non-definition of matrix power
  • Rejected alternatives to adding a new operator
  • Discussions of this PEP
  • References
  • Copyright

Abstract

This PEP proposes a new binary operator to be used for matrixmultiplication, called @. (Mnemonic: @ is * formATrices.)

Specification

A new binary operator is added to the Python language, togetherwith the corresponding in-place version:

OpPrecedence/associativityMethods
@Same as *__matmul__, __rmatmul__
@=n/a__imatmul__

No implementations of these methods are added to the builtin orstandard library types. However, a number of projects have reachedconsensus on the recommended semantics for these operations; seeIntended usage details below for details.

For details on how this operator will be implemented in CPython, seeImplementation details.

Motivation

Executive summary

In numerical code, there are two important operations which competefor use of Python’s * operator: elementwise multiplication, andmatrix multiplication. In the nearly twenty years since the Numericlibrary was first proposed, there have been many attempts to resolvethis tension [13]; none have been really satisfactory.Currently, most numerical Python code uses * for elementwisemultiplication, and function/method syntax for matrix multiplication;however, this leads to ugly and unreadable code in commoncirc*mstances. The problem is bad enough that significant amounts ofcode continue to use the opposite convention (which has the virtue ofproducing ugly and unreadable code in different circ*mstances), andthis API fragmentation across codebases then creates yet moreproblems. There does not seem to be any good solution to theproblem of designing a numerical API within current Python syntax –only a landscape of options that are bad in different ways. Theminimal change to Python syntax which is sufficient to resolve theseproblems is the addition of a single new infix operator for matrixmultiplication.

Matrix multiplication has a singular combination of features whichdistinguish it from other binary operations, which together provide auniquely compelling case for the addition of a dedicated infixoperator:

  • Just as for the existing numerical operators, there exists a vastbody of prior art supporting the use of infix notation for matrixmultiplication across all fields of mathematics, science, andengineering; @ harmoniously fills a hole in Python’s existingoperator system.
  • @ greatly clarifies real-world code.
  • @ provides a smoother onramp for less experienced users, who areparticularly harmed by hard-to-read code and API fragmentation.
  • @ benefits a substantial and growing portion of the Python usercommunity.
  • @ will be used frequently – in fact, evidence suggests it maybe used more frequently than // or the bitwise operators.
  • @ allows the Python numerical community to reduce fragmentation,and finally standardize on a single consensus duck type for allnumerical array objects.

Background: What’s wrong with the status quo?

When we crunch numbers on a computer, we usually have lots and lots ofnumbers to deal with. Trying to deal with them one at a time iscumbersome and slow – especially when using an interpreted language.Instead, we want the ability to write down simple operations thatapply to large collections of numbers all at once. The n-dimensionalarray is the basic object that all popular numeric computingenvironments use to make this possible. Python has several librariesthat provide such arrays, with numpy being at present the mostprominent.

When working with n-dimensional arrays, there are two different wayswe might want to define multiplication. One is elementwisemultiplication:

[[1, 2], [[11, 12], [[1 * 11, 2 * 12], [3, 4]] x [13, 14]] = [3 * 13, 4 * 14]]

and the other is matrix multiplication:

[[1, 2], [[11, 12], [[1 * 11 + 2 * 13, 1 * 12 + 2 * 14], [3, 4]] x [13, 14]] = [3 * 11 + 4 * 13, 3 * 12 + 4 * 14]]

Elementwise multiplication is useful because it lets us easily andquickly perform many multiplications on a large collection of values,without writing a slow and cumbersome for loop. And this works aspart of a very general schema: when using the array objects providedby numpy or other numerical libraries, all Python operators workelementwise on arrays of all dimensionalities. The result is that onecan write functions using straightforward code like a * b + c / d,treating the variables as if they were simple values, but thenimmediately use this function to efficiently perform this calculationon large collections of values, while keeping them organized usingwhatever arbitrarily complex array layout works best for the problemat hand.

Matrix multiplication is more of a special case. It’s only defined on2d arrays (also known as “matrices”), and multiplication is the onlyoperation that has an important “matrix” version – “matrix addition”is the same as elementwise addition; there is no such thing as “matrixbitwise-or” or “matrix floordiv”; “matrix division” and “matrixto-the-power-of” can be defined but are not very useful, etc.However, matrix multiplication is still used very heavily across allnumerical application areas; mathematically, it’s one of the mostfundamental operations there is.

Because Python syntax currently allows for only a singlemultiplication operator *, libraries providing array-like objectsmust decide: either use * for elementwise multiplication, or use* for matrix multiplication. And, unfortunately, it turns outthat when doing general-purpose number crunching, both operations areused frequently, and there are major advantages to using infix ratherthan function call syntax in both cases. Thus it is not at all clearwhich convention is optimal, or even acceptable; often it varies on acase-by-case basis.

Nonetheless, network effects mean that it is very important that wepick just one convention. In numpy, for example, it is technicallypossible to switch between the conventions, because numpy provides twodifferent types with different __mul__ methods. Fornumpy.ndarray objects, * performs elementwise multiplication,and matrix multiplication must use a function call (numpy.dot).For numpy.matrix objects, * performs matrix multiplication,and elementwise multiplication requires function syntax. Writing codeusing numpy.ndarray works fine. Writing code usingnumpy.matrix also works fine. But trouble begins as soon as wetry to integrate these two pieces of code together. Code that expectsan ndarray and gets a matrix, or vice-versa, may crash orreturn incorrect results. Keeping track of which functions expectwhich types as inputs, and return which types as outputs, and thenconverting back and forth all the time, is incredibly cumbersome andimpossible to get right at any scale. Functions that defensively tryto handle both types as input and DTRT, find themselves flounderinginto a swamp of isinstance and if statements.

PEP 238 split / into two operators: / and //. Imagine thechaos that would have resulted if it had instead split int intotwo types: classic_int, whose __div__ implemented floordivision, and new_int, whose __div__ implemented truedivision. This, in a more limited way, is the situation that Pythonnumber-crunchers currently find themselves in.

In practice, the vast majority of projects have settled on theconvention of using * for elementwise multiplication, and functioncall syntax for matrix multiplication (e.g., using numpy.ndarrayinstead of numpy.matrix). This reduces the problems caused by APIfragmentation, but it doesn’t eliminate them. The strong desire touse infix notation for matrix multiplication has caused a number ofspecialized array libraries to continue to use the opposing convention(e.g., scipy.sparse, pyoperators, pyviennacl) despite the problemsthis causes, and numpy.matrix itself still gets used inintroductory programming courses, often appears in StackOverflowanswers, and so forth. Well-written libraries thus must continue tobe prepared to deal with both types of objects, and, of course, arealso stuck using unpleasant funcall syntax for matrix multiplication.After nearly two decades of trying, the numerical community has stillnot found any way to resolve these problems within the constraints ofcurrent Python syntax (see Rejected alternatives to adding a newoperator below).

This PEP proposes the minimum effective change to Python syntax thatwill allow us to drain this swamp. It splits * into twooperators, just as was done for /: * for elementwisemultiplication, and @ for matrix multiplication. (Why not thereverse? Because this way is compatible with the existing consensus,and because it gives us a consistent rule that all the built-innumeric operators also apply in an elementwise manner to arrays; thereverse convention would lead to more special cases.)

So that’s why matrix multiplication doesn’t and can’t just use *.Now, in the rest of this section, we’ll explain why it nonethelessmeets the high bar for adding a new operator.

Why should matrix multiplication be infix?

Right now, most numerical code in Python uses syntax likenumpy.dot(a, b) or a.dot(b) to perform matrix multiplication.This obviously works, so why do people make such a fuss about it, evento the point of creating API fragmentation and compatibility swamps?

Matrix multiplication shares two features with ordinary arithmeticoperations like addition and multiplication on numbers: (a) it is usedvery heavily in numerical programs – often multiple times per line ofcode – and (b) it has an ancient and universally adopted tradition ofbeing written using infix syntax. This is because, for typicalformulas, this notation is dramatically more readable than anyfunction call syntax. Here’s an example to demonstrate:

One of the most useful tools for testing a statistical hypothesis isthe linear hypothesis test for OLS regression models. It doesn’treally matter what all those words I just said mean; if we findourselves having to implement this thing, what we’ll do is look upsome textbook or paper on it, and encounter many mathematical formulasthat look like:

S = (Hβ − r)T(HVHT) − 1(Hβ − r)

Here the various variables are all vectors or matrices (details forthe curious: [5]).

Now we need to write code to perform this calculation. In currentnumpy, matrix multiplication can be performed using either thefunction or method call syntax. Neither provides a particularlyreadable translation of the formula:

import numpy as npfrom numpy.linalg import inv, solve# Using dot function:S = np.dot((np.dot(H, beta) - r).T, np.dot(inv(np.dot(np.dot(H, V), H.T)), np.dot(H, beta) - r))# Using dot method:S = (H.dot(beta) - r).T.dot(inv(H.dot(V).dot(H.T))).dot(H.dot(beta) - r)

With the @ operator, the direct translation of the above formulabecomes:

S = (H @ beta - r).T @ inv(H @ V @ H.T) @ (H @ beta - r)

Notice that there is now a transparent, 1-to-1 mapping between thesymbols in the original formula and the code that implements it.

Of course, an experienced programmer will probably notice that this isnot the best way to compute this expression. The repeated computationof Hβ − r should perhaps be factored out; and,expressions of the form dot(inv(A), B) should almost always bereplaced by the more numerically stable solve(A, B). When using@, performing these two refactorings gives us:

# Version 1 (as above)S = (H @ beta - r).T @ inv(H @ V @ H.T) @ (H @ beta - r)# Version 2trans_coef = H @ beta - rS = trans_coef.T @ inv(H @ V @ H.T) @ trans_coef# Version 3S = trans_coef.T @ solve(H @ V @ H.T, trans_coef)

Notice that when comparing between each pair of steps, it’s very easyto see exactly what was changed. If we apply the equivalenttransformations to the code using the .dot method, then the changesare much harder to read out or verify for correctness:

# Version 1 (as above)S = (H.dot(beta) - r).T.dot(inv(H.dot(V).dot(H.T))).dot(H.dot(beta) - r)# Version 2trans_coef = H.dot(beta) - rS = trans_coef.T.dot(inv(H.dot(V).dot(H.T))).dot(trans_coef)# Version 3S = trans_coef.T.dot(solve(H.dot(V).dot(H.T)), trans_coef)

Readability counts! The statements using @ are shorter, containmore whitespace, can be directly and easily compared both to eachother and to the textbook formula, and contain only meaningfulparentheses. This last point is particularly important forreadability: when using function-call syntax, the required parentheseson every operation create visual clutter that makes it very difficultto parse out the overall structure of the formula by eye, even for arelatively simple formula like this one. Eyes are terrible at parsingnon-regular languages. I made and caught many errors while trying towrite out the ‘dot’ formulas above. I know they still contain atleast one error, maybe more. (Exercise: find it. Or them.) The@ examples, by contrast, are not only correct, they’re obviouslycorrect at a glance.

If we are even more sophisticated programmers, and writing code thatwe expect to be reused, then considerations of speed or numericalaccuracy might lead us to prefer some particular order of evaluation.Because @ makes it possible to omit irrelevant parentheses, we canbe certain that if we do write something like (H @ V) @ H.T,then our readers will know that the parentheses must have been addedintentionally to accomplish some meaningful purpose. In the dotexamples, it’s impossible to know which nesting decisions areimportant, and which are arbitrary.

Infix @ dramatically improves matrix code usability at all stagesof programmer interaction.

Transparent syntax is especially crucial for non-expert programmers

A large proportion of scientific code is written by people who areexperts in their domain, but are not experts in programming. Andthere are many university courses run each year with titles like “Dataanalysis for social scientists” which assume no programmingbackground, and teach some combination of mathematical techniques,introduction to programming, and the use of programming to implementthese mathematical techniques, all within a 10-15 week period. Thesecourses are more and more often being taught in Python rather thanspecial-purpose languages like R or Matlab.

For these kinds of users, whose programming knowledge is fragile, theexistence of a transparent mapping between formulas and code oftenmeans the difference between succeeding and failing to write that codeat all. This is so important that such classes often use thenumpy.matrix type which defines * to mean matrixmultiplication, even though this type is buggy and heavilydisrecommended by the rest of the numpy community for thefragmentation that it causes. This pedagogical use case is, in fact,the only reason numpy.matrix remains a supported part of numpy.Adding @ will benefit both beginning and advanced users withbetter syntax; and furthermore, it will allow both groups tostandardize on the same notation from the start, providing a smootheron-ramp to expertise.

But isn’t matrix multiplication a pretty niche requirement?

The world is full of continuous data, and computers are increasinglycalled upon to work with it in sophisticated ways. Arrays are thelingua franca of finance, machine learning, 3d graphics, computervision, robotics, operations research, econometrics, meteorology,computational linguistics, recommendation systems, neuroscience,astronomy, bioinformatics (including genetics, cancer research, drugdiscovery, etc.), physics engines, quantum mechanics, geophysics,network analysis, and many other application areas. In most or all ofthese areas, Python is rapidly becoming a dominant player, in largepart because of its ability to elegantly mix traditional discrete datastructures (hash tables, strings, etc.) on an equal footing withmodern numerical data types and algorithms.

We all live in our own little sub-communities, so some Python usersmay be surprised to realize the sheer extent to which Python is usedfor number crunching – especially since much of this particularsub-community’s activity occurs outside of traditional Python/FOSSchannels. So, to give some rough idea of just how many numericalPython programmers are actually out there, here are two numbers: In2013, there were 7 international conferences organized specifically onnumerical Python [3] [4]. At PyCon 2014, ~20%of the tutorials appear to involve the use of matrices[6].

To quantify this further, we used Github’s “search” function to lookat what modules are actually imported across a wide range ofreal-world code (i.e., all the code on Github). We checked forimports of several popular stdlib modules, a variety of numericallyoriented modules, and various other extremely high-profile moduleslike django and lxml (the latter of which is the #1 most downloadedpackage on PyPI). Starred lines indicate packages which exportarray- or matrix-like objects which will adopt @ if this PEP isapproved:

Count of Python source files on Github matching given search terms (as of 2014-04-10, ~21:00 UTC)================ ========== =============== ======= ===========module "import X" "from X import" total total/numpy================ ========== =============== ======= ===========sys 2374638 63301 2437939 5.85os 1971515 37571 2009086 4.82re 1294651 8358 1303009 3.12numpy ************** 337916 ********** 79065 * 416981 ******* 1.00warnings 298195 73150 371345 0.89subprocess 281290 63644 344934 0.83django 62795 219302 282097 0.68math 200084 81903 281987 0.68threading 212302 45423 257725 0.62pickle+cPickle 215349 22672 238021 0.57matplotlib 119054 27859 146913 0.35sqlalchemy 29842 82850 112692 0.27pylab *************** 36754 ********** 41063 ** 77817 ******* 0.19scipy *************** 40829 ********** 28263 ** 69092 ******* 0.17lxml 19026 38061 57087 0.14zlib 40486 6623 47109 0.11multiprocessing 25247 19850 45097 0.11requests 30896 560 31456 0.08jinja2 8057 24047 32104 0.08twisted 13858 6404 20262 0.05gevent 11309 8529 19838 0.05pandas ************** 14923 *********** 4005 ** 18928 ******* 0.05sympy 2779 9537 12316 0.03theano *************** 3654 *********** 1828 *** 5482 ******* 0.01================ ========== =============== ======= ===========

These numbers should be taken with several grains of salt (seefootnote for discussion: [12]), but, to the extent theycan be trusted, they suggest that numpy might be the singlemost-imported non-stdlib module in the entire Pythonverse; it’s evenmore-imported than such stdlib stalwarts as subprocess, math,pickle, and threading. And numpy users represent only asubset of the broader numerical community that will benefit from the@ operator. Matrices may once have been a niche data typerestricted to Fortran programs running in university labs and militaryclusters, but those days are long gone. Number crunching is amainstream part of modern Python usage.

In addition, there is some precedence for adding an infix operator tohandle a more-specialized arithmetic operation: the floor divisionoperator //, like the bitwise operators, is very useful undercertain circ*mstances when performing exact calculations on discretevalues. But it seems likely that there are many Python programmerswho have never had reason to use // (or, for that matter, thebitwise operators). @ is no more niche than //.

So @ is good for matrix formulas, but how common are those really?

We’ve seen that @ makes matrix formulas dramatically easier towork with for both experts and non-experts, that matrix formulasappear in many important applications, and that numerical librarieslike numpy are used by a substantial proportion of Python’s user base.But numerical libraries aren’t just about matrix formulas, and beingimportant doesn’t necessarily mean taking up a lot of code: if matrixformulas only occurred in one or two places in the averagenumerically-oriented project, then it still wouldn’t be worth adding anew operator. So how common is matrix multiplication, really?

When the going gets tough, the tough get empirical. To get a roughestimate of how useful the @ operator will be, the table belowshows the rate at which different Python operators are actually usedin the stdlib, and also in two high-profile numerical packages – thescikit-learn machine learning library, and the nipy neuroimaginglibrary – normalized by source lines of code (SLOC). Rows are sortedby the ‘combined’ column, which pools all three code bases together.The combined column is thus strongly weighted towards the stdlib,which is much larger than both projects put together (stdlib: 411575SLOC, scikit-learn: 50924 SLOC, nipy: 37078 SLOC). [7]

The dot row (marked ******) counts how common matrix multiplyoperations are in each codebase.

==== ====== ============ ==== ======== op stdlib scikit-learn nipy combined==== ====== ============ ==== ======== = 2969 5536 4932 3376 / 10,000 SLOC - 218 444 496 261 + 224 201 348 231 == 177 248 334 196 * 156 284 465 192 % 121 114 107 119 ** 59 111 118 68 != 40 56 74 44 / 18 121 183 41 > 29 70 110 39 += 34 61 67 39 < 32 62 76 38 >= 19 17 17 18 <= 18 27 12 18 dot ***** 0 ********** 99 ** 74 ****** 16 | 18 1 2 15 & 14 0 6 12 << 10 1 1 8 // 9 9 1 8 -= 5 21 14 8 *= 2 19 22 5 /= 0 23 16 4 >> 4 0 0 3 ^ 3 0 0 3 ~ 2 4 5 2 |= 3 0 0 2 &= 1 0 0 1 //= 1 0 0 1 ^= 1 0 0 0 **= 0 2 0 0 %= 0 0 0 0 <<= 0 0 0 0 >>= 0 0 0 0==== ====== ============ ==== ========

These two numerical packages alone contain ~780 uses of matrixmultiplication. Within these packages, matrix multiplication is usedmore heavily than most comparison operators (< != <=>=). Even when we dilute these counts by including the stdlibinto our comparisons, matrix multiplication is still used more oftenin total than any of the bitwise operators, and 2x as often as //.This is true even though the stdlib, which contains a fair amount ofinteger arithmetic and no matrix operations, makes up more than 80% ofthe combined code base.

By coincidence, the numeric libraries make up approximately the sameproportion of the ‘combined’ codebase as numeric tutorials make up ofPyCon 2014’s tutorial schedule, which suggests that the ‘combined’column may not be wildly unrepresentative of new Python code ingeneral. While it’s impossible to know for certain, from this data itseems entirely possible that across all Python code currently beingwritten, matrix multiplication is already used more often than //and the bitwise operations.

But isn’t it weird to add an operator with no stdlib uses?

It’s certainly unusual (though extended slicing existed for some timebuiltin types gained support for it, Ellipsis is still unusedwithin the stdlib, etc.). But the important thing is whether a changewill benefit users, not where the software is being downloaded from.It’s clear from the above that @ will be used, and used heavily.And this PEP provides the critical piece that will allow the Pythonnumerical community to finally reach consensus on a standard duck typefor all array-like objects, which is a necessary precondition to everadding a numerical array type to the stdlib.

Compatibility considerations

Currently, the only legal use of the @ token in Python code is atstatement beginning in decorators. The new operators are both infix;the one place they can never occur is at statement beginning.Therefore, no existing code will be broken by the addition of theseoperators, and there is no possible parsing ambiguity betweendecorator-@ and the new operators.

Another important kind of compatibility is the mental cost paid byusers to update their understanding of the Python language after thischange, particularly for users who do not work with matrices and thusdo not benefit. Here again, @ has minimal impact: evencomprehensive tutorials and references will only need to add asentence or two to fully document this PEP’s changes for anon-numerical audience.

Intended usage details

This section is informative, rather than normative – it documents theconsensus of a number of libraries that provide array- or matrix-likeobjects on how @ will be implemented.

This section uses the numpy terminology for describing arbitrarymultidimensional arrays of data, because it is a superset of all othercommonly used models. In this model, the shape of any array isrepresented by a tuple of integers. Because matrices aretwo-dimensional, they have len(shape) == 2, while 1d vectors havelen(shape) == 1, and scalars have shape == (), i.e., they are “0dimensional”. Any array contains prod(shape) total entries. Noticethat prod(()) == 1 (for the same reason that sum(()) == 0); scalarsare just an ordinary kind of array, not a special case. Notice alsothat we distinguish between a single scalar value (shape == (),analogous to 1), a vector containing only a single entry (shape ==(1,), analogous to [1]), a matrix containing only a single entry(shape == (1, 1), analogous to [[1]]), etc., so the dimensionalityof any array is always well-defined. Other libraries with morerestricted representations (e.g., those that support 2d arrays only)might implement only a subset of the functionality described here.

Semantics

The recommended semantics for @ for different inputs are:

  • 2d inputs are conventional matrices, and so the semantics areobvious: we apply conventional matrix multiplication. If we writearr(2, 3) to represent an arbitrary 2x3 array, then arr(2, 3)@ arr(3, 4) returns an array with shape (2, 4).
  • 1d vector inputs are promoted to 2d by prepending or appending a ‘1’to the shape, the operation is performed, and then the addeddimension is removed from the output. The 1 is always added on the“outside” of the shape: prepended for left arguments, and appendedfor right arguments. The result is that matrix @ vector and vector@ matrix are both legal (assuming compatible shapes), and bothreturn 1d vectors; vector @ vector returns a scalar. This isclearer with examples.
    • arr(2, 3) @ arr(3, 1) is a regular matrix product, and returnsan array with shape (2, 1), i.e., a column vector.
    • arr(2, 3) @ arr(3) performs the same computation as theprevious (i.e., treats the 1d vector as a matrix containing asingle column, shape = (3, 1)), but returns the result withshape (2,), i.e., a 1d vector.
    • arr(1, 3) @ arr(3, 2) is a regular matrix product, and returnsan array with shape (1, 2), i.e., a row vector.
    • arr(3) @ arr(3, 2) performs the same computation as theprevious (i.e., treats the 1d vector as a matrix containing asingle row, shape = (1, 3)), but returns the result with shape(2,), i.e., a 1d vector.
    • arr(1, 3) @ arr(3, 1) is a regular matrix product, and returnsan array with shape (1, 1), i.e., a single value in matrix form.
    • arr(3) @ arr(3) performs the same computation as theprevious, but returns the result with shape (), i.e., a singlescalar value, not in matrix form. So this is the standard innerproduct on vectors.

    An infelicity of this definition for 1d vectors is that it makes@ non-associative in some cases ((Mat1 @ vec) @ Mat2 !=Mat1 @ (vec @ Mat2)). But this seems to be a case wherepracticality beats purity: non-associativity only arises for strangeexpressions that would never be written in practice; if they arewritten anyway then there is a consistent rule for understandingwhat will happen (Mat1 @ vec @ Mat2 is parsed as (Mat1 @ vec)@ Mat2, just like a - b - c); and, not supporting 1d vectorswould rule out many important use cases that do arise very commonlyin practice. No-one wants to explain to new users why to solve thesimplest linear system in the obvious way, they have to type(inv(A) @ b[:, np.newaxis]).flatten() instead of inv(A) @ b,or perform an ordinary least-squares regression by typingsolve(X.T @ X, X @ y[:, np.newaxis]).flatten() instead ofsolve(X.T @ X, X @ y). No-one wants to type (a[np.newaxis, :]@ b[:, np.newaxis])[0, 0] instead of a @ b every time theycompute an inner product, or (a[np.newaxis, :] @ Mat @ b[:,np.newaxis])[0, 0] for general quadratic forms instead of a @Mat @ b. In addition, sage and sympy (see below) use thesenon-associative semantics with an infix matrix multiplicationoperator (they use *), and they report that they haven’texperienced any problems caused by it.

  • For inputs with more than 2 dimensions, we treat the last twodimensions as being the dimensions of the matrices to multiply, and‘broadcast’ across the other dimensions. This provides a convenientway to quickly compute many matrix products in a single operation.For example, arr(10, 2, 3) @ arr(10, 3, 4) performs 10 separatematrix multiplies, each of which multiplies a 2x3 and a 3x4 matrixto produce a 2x4 matrix, and then returns the 10 resulting matricestogether in an array with shape (10, 2, 4). The intuition here isthat we treat these 3d arrays of numbers as if they were 1d arraysof matrices, and then apply matrix multiplication in anelementwise manner, where now each ‘element’ is a whole matrix.Note that broadcasting is not limited to perfectly aligned arrays;in more complicated cases, it allows several simple but powerfultricks for controlling how arrays are aligned with each other; see[10] for details. (In particular, it turns out thatwhen broadcasting is taken into account, the standard scalar *matrix product is a special case of the elementwise multiplicationoperator *.)

    If one operand is >2d, and another operand is 1d, then the aboverules apply unchanged, with 1d->2d promotion performed beforebroadcasting. E.g., arr(10, 2, 3) @ arr(3) first promotes toarr(10, 2, 3) @ arr(3, 1), then broadcasts the right argument tocreate the aligned operation arr(10, 2, 3) @ arr(10, 3, 1),multiplies to get an array with shape (10, 2, 1), and finallyremoves the added dimension, returning an array with shape (10, 2).Similarly, arr(2) @ arr(10, 2, 3) produces an intermediate arraywith shape (10, 1, 3), and a final array with shape (10, 3).

  • 0d (scalar) inputs raise an error. Scalar * matrix multiplicationis a mathematically and algorithmically distinct operation frommatrix @ matrix multiplication, and is already covered by theelementwise * operator. Allowing scalar @ matrix would thusboth require an unnecessary special case, and violate TOOWTDI.

Adoption

We group existing Python projects which provide array- or matrix-liketypes based on what API they currently use for elementwise and matrixmultiplication.

Projects which currently use * for elementwise multiplication, andfunction/method calls for matrix multiplication:

The developers of the following projects have expressed an intentionto implement @ on their array-like types using the abovesemantics:

  • numpy
  • pandas
  • blaze
  • theano

The following projects have been alerted to the existence of the PEP,but it’s not yet known what they plan to do if it’s accepted. Wedon’t anticipate that they’ll have any objections, though, sinceeverything proposed here is consistent with how they already dothings:

  • pycuda
  • panda3d

Projects which currently use * for matrix multiplication, andfunction/method calls for elementwise multiplication:

The following projects have expressed an intention, if this PEP isaccepted, to migrate from their current API to the elementwise-*,matmul-@ convention (i.e., this is a list of projects whose APIfragmentation will probably be eliminated if this PEP is accepted):

  • numpy (numpy.matrix)
  • scipy.sparse
  • pyoperators
  • pyviennacl

The following projects have been alerted to the existence of the PEP,but it’s not known what they plan to do if it’s accepted (i.e., thisis a list of projects whose API fragmentation may or may not beeliminated if this PEP is accepted):

  • cvxopt

Projects which currently use * for matrix multiplication, and whichdon’t really care about elementwise multiplication of matrices:

There are several projects which implement matrix types, but from avery different perspective than the numerical libraries discussedabove. These projects focus on computational methods for analyzingmatrices in the sense of abstract mathematical objects (i.e., linearmaps over free modules over rings), rather than as big bags full ofnumbers that need crunching. And it turns out that from the abstractmath point of view, there isn’t much use for elementwise operations inthe first place; as discussed in the Background section above,elementwise operations are motivated by the bag-of-numbers approach.So these projects don’t encounter the basic problem that this PEPexists to address, making it mostly irrelevant to them; while theyappear superficially similar to projects like numpy, they’re actuallydoing something quite different. They use * for matrixmultiplication (and for group actions, and so forth), and if this PEPis accepted, their expressed intention is to continue doing so, whileperhaps adding @ as an alias. These projects include:

  • sympy
  • sage

Implementation details

New functions operator.matmul and operator.__matmul__ areadded to the standard library, with the usual semantics.

A corresponding function PyObject* PyObject_MatrixMultiply(PyObject*o1, PyObject *o2) is added to the C API.

A new AST node is added named MatMult, along with a new tokenATEQUAL and new bytecode opcodes BINARY_MATRIX_MULTIPLY andINPLACE_MATRIX_MULTIPLY.

Two new type slots are added; whether this is to PyNumberMethodsor a new PyMatrixMethods struct remains to be determined.

Rationale for specification details

Choice of operator

Why @ instead of some other spelling? There isn’t any consensusacross other programming languages about how this operator should benamed [11]; here we discuss the various options.

Restricting ourselves only to symbols present on US English keyboards,the punctuation characters that don’t already have a meaning in Pythonexpression context are: @, backtick, $, !, and ?. Ofthese options, @ is clearly the best; ! and ? are alreadyheavily freighted with inapplicable meanings in the programmingcontext, backtick has been banned from Python by BDFL pronouncement(see PEP 3099), and $ is uglier, even more dissimilar to * and, and has Perl/PHP baggage. $ is probably thesecond-best option of these, though.

Symbols which are not present on US English keyboards start at asignificant disadvantage (having to spend 5 minutes at the beginningof every numeric Python tutorial just going over keyboard layouts isnot a hassle anyone really wants). Plus, even if we somehow overcamethe typing problem, it’s not clear there are any that are actuallybetter than @. Some options that have been suggested include:

  • U+00D7 MULTIPLICATION SIGN: A × B
  • U+22C5 DOT OPERATOR: A B
  • U+2297 CIRCLED TIMES: A B
  • U+00B0 DEGREE: A ° B

What we need, though, is an operator that means “matrixmultiplication, as opposed to scalar/elementwise multiplication”.There is no conventional symbol with this meaning in eitherprogramming or mathematics, where these operations are usuallydistinguished by context. (And U+2297 CIRCLED TIMES is actually usedconventionally to mean exactly the wrong things: elementwisemultiplication – the “Hadamard product” – or outer product, ratherthan matrix/inner product like our operator). @ at least has thevirtue that it looks like a funny non-commutative operator; a naiveuser who knows maths but not programming couldn’t look at A * Bversus A × B, or A * B versus A B, or A * B versusA ° B and guess which one is the usual multiplication, and whichone is the special case.

Finally, there is the option of using multi-character tokens. Someoptions:

  • Matlab and Julia use a .* operator. Aside from being visuallyconfusable with *, this would be a terrible choice for usbecause in Matlab and Julia, * means matrix multiplication and.* means elementwise multiplication, so using .* for matrixmultiplication would make us exactly backwards from what Matlab andJulia users expect.
  • APL apparently used +.×, which by combining a multi-charactertoken, confusing attribute-access-like . syntax, and a unicodecharacter, ranks somewhere below U+2603 SNOWMAN on our candidatelist. If we like the idea of combining addition and multiplicationoperators as being evocative of how matrix multiplication actuallyworks, then something like +* could be used – though this maybe too easy to confuse with *+, which is just multiplicationcombined with the unary + operator.
  • PEP 211 suggested ~*. This has the downside that it sort ofsuggests that there is a unary * operator that is being combinedwith unary ~, but it could work.
  • R uses %*% for matrix multiplication. In R this forms part of ageneral extensible infix system in which all tokens of the form%foo% are user-defined binary operators. We could steal thetoken without stealing the system.
  • Some other plausible candidates that have been suggested: >< (=ascii drawing of the multiplication sign ×); the footnote operator[*] or |*| (but when used in context, the use of verticalgrouping symbols tends to recreate the nested parentheses visualclutter that was noted as one of the major downsides of the functionsyntax we’re trying to get away from); ^*.

So, it doesn’t matter much, but @ seems as good or better than anyof the alternatives:

  • It’s a friendly character that Pythoneers are already used to typingin decorators, but the decorator usage and the math expressionusage are sufficiently dissimilar that it would be hard to confusethem in practice.
  • It’s widely accessible across keyboard layouts (and thanks to itsuse in email addresses, this is true even of weird keyboards likethose in phones).
  • It’s round like * and .
  • The mATrices mnemonic is cute.
  • The swirly shape is reminiscent of the simultaneous sweeps over rowsand columns that define matrix multiplication
  • Its asymmetry is evocative of its non-commutative nature.
  • Whatever, we have to pick something.

Precedence and associativity

There was a long discussion [15] aboutwhether @ should be right- or left-associative (or even somethingmore exotic [18]). Almost all Python operators areleft-associative, so following this convention would be the simplestapproach, but there were two arguments that suggested matrixmultiplication might be worth making right-associative as a specialcase:

First, matrix multiplication has a tight conceptual association withfunction application/composition, so many mathematically sophisticatedusers have an intuition that an expression like RSx proceedsfrom right-to-left, with first S transforming the vectorx, and then R transforming the result. This isn’tuniversally agreed (and not all number-crunchers are steeped in thepure-math conceptual framework that motivates this intuition[16]), but at the least thisintuition is more common than for other operations like 2⋅3⋅4 which everyone reads as going from left-to-right.

Second, if expressions like Mat @ Mat @ vec appear often in code,then programs will run faster (and efficiency-minded programmers willbe able to use fewer parentheses) if this is evaluated as Mat @ (Mat@ vec) then if it is evaluated like (Mat @ Mat) @ vec.

However, weighing against these arguments are the following:

Regarding the efficiency argument, empirically, we were unable to findany evidence that Mat @ Mat @ vec type expressions actuallydominate in real-life code. Parsing a number of large projects thatuse numpy, we found that when forced by numpy’s current funcall syntaxto choose an order of operations for nested calls to dot, peopleactually use left-associative nesting slightly more often thanright-associative nesting [17]. And anyway,writing parentheses isn’t so bad – if an efficiency-minded programmeris going to take the trouble to think through the best way to evaluatesome expression, they probably should write down the parenthesesregardless of whether they’re needed, just to make it obvious to thenext reader that they order of operations matter.

In addition, it turns out that other languages, including those withmuch more of a focus on linear algebra, overwhelmingly make theirmatmul operators left-associative. Specifically, the @ equivalentis left-associative in R, Matlab, Julia, IDL, and Gauss. The onlyexceptions we found are Mathematica, in which a @ b @ c would beparsed non-associatively as dot(a, b, c), and APL, in which alloperators are right-associative. There do not seem to exist anylanguages that make @ right-associative and *left-associative. And these decisions don’t seem to be controversial– I’ve never seen anyone complaining about this particular aspect ofany of these other languages, and the left-associativity of *doesn’t seem to bother users of the existing Python libraries that use* for matrix multiplication. So, at the least we can conclude fromthis that making @ left-associative will certainly not cause anydisasters. Making @ right-associative, OTOH, would be exploringnew and uncertain ground.

And another advantage of left-associativity is that it is much easierto learn and remember that @ acts like *, than it is toremember first that @ is unlike other Python operators by beingright-associative, and then on top of this, also have to rememberwhether it is more tightly or more loosely binding than*. (Right-associativity forces us to choose a precedence, andintuitions were about equally split on which precedence made moresense. So this suggests that no matter which choice we made, no-onewould be able to guess or remember it.)

On net, therefore, the general consensus of the numerical community isthat while matrix multiplication is something of a special case, it’snot special enough to break the rules, and @ should parse like* does.

(Non)-Definitions for built-in types

No __matmul__ or __matpow__ are defined for builtin numerictypes (float, int, etc.) or for the numbers.Numberhierarchy, because these types represent scalars, and the consensussemantics for @ are that it should raise an error on scalars.

We do not – for now – define a __matmul__ method on the standardmemoryview or array.array objects, for several reasons. Ofcourse this could be added if someone wants it, but these types wouldrequire quite a bit of additional work beyond __matmul__ beforethey could be used for numeric work – e.g., they have no way to doaddition or scalar multiplication either! – and adding suchfunctionality is beyond the scope of this PEP. In addition, providinga quality implementation of matrix multiplication is highlynon-trivial. Naive nested loop implementations are very slow andshipping such an implementation in CPython would just create a trapfor users. But the alternative – providing a modern, competitivematrix multiply – would require that CPython link to a BLAS library,which brings a set of new complications. In particular, severalpopular BLAS libraries (including the one that ships by default onOS X) currently break the use of multiprocessing [8].Together, these considerations mean that the cost/benefit of adding__matmul__ to these types just isn’t there, so for now we’llcontinue to delegate these problems to numpy and friends, and defer amore systematic solution to a future proposal.

There are also non-numeric Python builtins which define __mul__(str, list, …). We do not define __matmul__ for thesetypes either, because why would we even do that.

Non-definition of matrix power

Earlier versions of this PEP also proposed a matrix power operator,@@, analogous to **. But on further consideration, it wasdecided that the utility of this was sufficiently unclear that itwould be better to leave it out for now, and only revisit the issue if– once we have more experience with @ – it turns out that @@is truly missed. [14]

Rejected alternatives to adding a new operator

Over the past few decades, the Python numeric community has explored avariety of ways to resolve the tension between matrix and elementwisemultiplication operations. PEP 211 and PEP 225, both proposed in 2000and last seriously discussed in 2008 [9], were earlyattempts to add new operators to solve this problem, but suffered fromserious flaws; in particular, at that time the Python numericalcommunity had not yet reached consensus on the proper API for arrayobjects, or on what operators might be needed or useful (e.g., PEP 225proposes 6 new operators with unspecified semantics). Experiencesince then has now led to consensus that the best solution, for bothnumeric Python and core Python, is to add a single infix operator formatrix multiply (together with the other new operators this implieslike @=).

We review some of the rejected alternatives here.

Use a second type that defines __mul__ as matrix multiplication:As discussed above (Background: What’s wrong with the status quo?),this has been tried this for many years via the numpy.matrix type(and its predecessors in Numeric and numarray). The result is astrong consensus among both numpy developers and developers ofdownstream packages that numpy.matrix should essentially never beused, because of the problems caused by having conflicting duck typesfor arrays. (Of course one could then argue we should only define__mul__ to be matrix multiplication, but then we’d have the sameproblem with elementwise multiplication.) There have been severalpushes to remove numpy.matrix entirely; the only counter-argumentshave come from educators who find that its problems are outweighed bythe need to provide a simple and clear mapping between mathematicalnotation and code for novices (see Transparent syntax is especiallycrucial for non-expert programmers). But, of course, starting outnewbies with a dispreferred syntax and then expecting them totransition later causes its own problems. The two-type solution isworse than the disease.

Add lots of new operators, or add a new generic syntax for defininginfix operators: In addition to being generally un-Pythonic andrepeatedly rejected by BDFL fiat, this would be using a sledgehammerto smash a fly. The scientific python community has consensus thatadding one operator for matrix multiplication is enough to fix the oneotherwise unfixable pain point. (In retrospect, we all think PEP 225was a bad idea too – or at least far more complex than it needed tobe.)

Add a new @ (or whatever) operator that has some other meaning ingeneral Python, and then overload it in numeric code: This was theapproach taken by PEP 211, which proposed defining @ to be theequivalent of itertools.product. The problem with this is thatwhen taken on its own terms, it’s pretty clear thatitertools.product doesn’t actually need a dedicated operator. Ithasn’t even been deemed worth of a builtin. (During discussions ofthis PEP, a similar suggestion was made to define @ as a generalpurpose function composition operator, and this suffers from the sameproblem; functools.compose isn’t even useful enough to exist.)Matrix multiplication has a uniquely strong rationale for inclusion asan infix operator. There almost certainly don’t exist any otherbinary operations that will ever justify adding any other infixoperators to Python.

Add a .dot method to array types so as to allow “pseudo-infix”A.dot(B) syntax: This has been in numpy for some years, and in manycases it’s better than dot(A, B). But it’s still much less readablethan real infix notation, and in particular still suffers from anextreme overabundance of parentheses. See Why should matrixmultiplication be infix? above.

Use a ‘with’ block to toggle the meaning of * within a single codeblock: E.g., numpy could define a special context object so thatwe’d have:

c = a * b # element-wise multiplicationwith numpy.mul_as_dot: c = a * b # matrix multiplication

However, this has two serious problems: first, it requires that everyarray-like type’s __mul__ method know how to check some globalstate (numpy.mul_is_currently_dot or whatever). This is fine ifa and b are numpy objects, but the world contains manynon-numpy array-like objects. So this either requires non-localcoupling – every numpy competitor library has to import numpy andthen check numpy.mul_is_currently_dot on every operation – orelse it breaks duck-typing, with the above code doing radicallydifferent things depending on whether a and b are numpyobjects or some other sort of object. Second, and worse, withblocks are dynamically scoped, not lexically scoped; i.e., anyfunction that gets called inside the with block will suddenly finditself executing inside the mul_as_dot world, and crash and burnhorribly – if you’re lucky. So this is a construct that could onlybe used safely in rather limited cases (no function calls), and whichwould make it very easy to shoot yourself in the foot without warning.

Use a language preprocessor that adds extra numerically-orientedoperators and perhaps other syntax: (As per recent BDFL suggestion:[1]) This suggestion seems based on the idea thatnumerical code needs a wide variety of syntax additions. In fact,given @, most numerical users don’t need any other operators orsyntax; it solves the one really painful problem that cannot be solvedby other means, and that causes painful reverberations through thelarger ecosystem. Defining a new language (presumably with its ownparser which would have to be kept in sync with Python’s, etc.), justto support a single binary operator, is neither practical nordesirable. In the numerical context, Python’s competition isspecial-purpose numerical languages (Matlab, R, IDL, etc.). Comparedto these, Python’s killer feature is exactly that one can mixspecialized numerical code with code for XML parsing, web pagegeneration, database access, network programming, GUI libraries, andso forth, and we also gain major benefits from the huge variety oftutorials, reference material, introductory classes, etc., which usePython. Fragmenting “numerical Python” from “real Python” would be amajor source of confusion. A major motivation for this PEP is toreduce fragmentation. Having to set up a preprocessor would be anespecially prohibitive complication for unsophisticated users. And weuse Python because we like Python! We don’t wantalmost-but-not-quite-Python.

Use overloading hacks to define a “new infix operator” like *dot*,as in a well-known Python recipe: (See: [2]) Beautiful isbetter than ugly. This is… not beautiful. And not Pythonic. Andespecially unfriendly to beginners, who are just trying to wrap theirheads around the idea that there’s a coherent underlying system behindthese magic incantations that they’re learning, when along comes anevil hack like this that violates that system, creates bizarre errormessages when accidentally misused, and whose underlying mechanismscan’t be understood without deep knowledge of how object orientedsystems work.

Use a special “facade” type to support syntax like arr.M * arr:This is very similar to the previous proposal, in that the .Mattribute would basically return the same object as arr *dot would,and thus suffers the same objections about ‘magicalness’. Thisapproach also has some non-obvious complexities: for example, whilearr.M * arr must return an array, arr.M * arr.M andarr * arr.M must return facade objects, or else arr.M * arr.M * arrand arr * arr.M * arr will not work. But this means that facadeobjects must be able to recognize both other array objects and otherfacade objects (which creates additional complexity for writinginteroperating array types from different libraries who must nowrecognize both each other’s array types and their facade types). Italso creates pitfalls for users who may easily type arr * arr.M orarr.M * arr.M and expect to get back an array object; instead,they will get a mysterious object that throws errors when they attemptto use it. Basically with this approach users must be careful tothink of .M* as an indivisible unit that acts as an infix operator– and as infix-operator-like token strings go, at least *dot*is prettier looking (look at its cute little ears!).

Discussions of this PEP

Collected here for reference:

References

Copyright

This document has been placed in the public domain.

PEP 465 – A dedicated infix operator for matrix multiplication | peps.python.org (2024)
Top Articles
Latest Posts
Article information

Author: Clemencia Bogisich Ret

Last Updated:

Views: 6211

Rating: 5 / 5 (80 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Clemencia Bogisich Ret

Birthday: 2001-07-17

Address: Suite 794 53887 Geri Spring, West Cristentown, KY 54855

Phone: +5934435460663

Job: Central Hospitality Director

Hobby: Yoga, Electronics, Rafting, Lockpicking, Inline skating, Puzzles, scrapbook

Introduction: My name is Clemencia Bogisich Ret, I am a super, outstanding, graceful, friendly, vast, comfortable, agreeable person who loves writing and wants to share my knowledge and understanding with you.