The now-famous 2017 paper “Attention Is All You Need” kicked off the Large Language Model revolution by describing a very effective algorithm for Artificial Intelligence. With it (and a lot of fast hardware), we have developed language models that can do things that, just five or ten years ago, we would never have imagined computers doing. We are now able to ask these models, in plain human language, to create programs for us — and they do, far faster than we humans could even type the code in.
A huge piece of the puzzle, of course, is the raw speed at which modern computers can do calculations. Specifically, matrix multiplications. Phenomena like reasoning and natural-language processing can require trillions upon trillions of floating-point calculations to be made — and if we want to get any results before the heat death of the Universe, the computers doing these calculations had better be fast.
But could we, with a lot of patience, have run language models decades ago, if only we knew the algorithms?
As a test, I decided to compare the speed of my very first computer — a Timex/Sinclair 1000 — against the speed of Scientia (my current Core i9 workstation) at matrix multiplications. While there are far too many variables to make this a proper apples-to-apples comparison (for one, the Sinclair is running interpreted BASIC instead of compiled C), the difference in raw speed is striking.
For reasonable accuracy, I chose as the benchmark for each machine a matrix size that would take it at least a few minutes to compute. Additionally, I tried to keep the display output to a minimum, since the result I’m interested in is calculation speed. For the Sinclair, I chose the largest matrix multiplication that I could easily fit into 16KB of memory — a size of 30×30. In order to multiply two 30×30 matrices, a total of 30x30x30 multiplications and the same number of additions must be made. This is a total of 54,000 floating-point operations (half of them multiplications and half of them additions.)
The Sinclair (or, rather, a 1x-speed emulation of one) completed this in roughly twelve and a half minutes, for a grand total of 72 FLOPS (floating operations per second). Even this is blazingly fast compared to truly old-school machines like ENIAC or the Mark I relay computer.
…But how much faster is a fairly modest, modern (2020-era) workstation?
Running the same 30×30 matrix multiplication on Scientia took about 34 milliseconds — some 22,000 times faster. But a lot of that is simply the overhead of compiling the code, displaying the window, and so forth. At 3000×3000, it took about three minutes. Since the computational complexity of naïve square matrix multiplication goes up by the third power of the size of the matrices, this task is not 100 times harder, but rather one million times harder.
Scientia (using a single CPU core, a fraction of the available memory, and not using the GPU) was able to finish in 180 seconds, for a resulting speed of about 300MFLOPS (300 million floating operations per second.) That’s 4.167 million times faster!
And that’s not even the whole picture. AI models, with their heavy reliance on matrix multiplications, naturally work best when one or more GPUs are available to handle the calculations. GPUs use thousands of (simpler) compute cores, compared to the handful of more complex cores available on a CPU. Matrix multiplication naturally lends itself to such parallelization, whether across multiple CPU cores (Scientia’s CPU, representative of a typical modern computer, has eight cores capable of sixteen threads) or across the thousands of specialized cores available in a GPU (Scientia’s RTX4070 has nearly six thousand such cores.)
To be fair, the Sinclair is handicapped (by perhaps a factor of 10x to 20x) due to running interpreted BASIC rather than compiled code. Even in “fast mode” where the display is turned off so the Z80 processor can focus on computation, the process is very inefficient (basically doing everything in translation instead of using efficiently-compiled machine code.)
But running a single-CPU-core matrix multiplication test on a modern PC is also making it work without the vast majority of its compute capability, too. The RTX4070 GPU is theoretically capable of some 29 teraflops — 29 trillion floating-point operations per second(!)
That’s about 400 billion times faster than the Sinclair. That’s how far we’ve come in 40-some years. And that (plus terabytes and terabytes of training data) is what makes the AI magic work. Even if the Sinclair could fit a trained LLM into the maximum 64KB of memory it can access (it can’t, by many orders of magnitude), a response that takes a modern workstation ten seconds to produce would take some twelve hundred years, running on Sinclair BASIC.
I love living in the future.








