There’s something deeply satisfying about a startup that looks at the entire GPU-driven AI infrastructure stack and says, “What if we just… didn’t do any of that?” That’s basically [Taalas](https://taalas.com/) in a nutshell.
This Toronto-based chip company, founded by Ljubisa Bajic (yes, the same guy who co-founded Tenstorrent), is taking what might be the most aggressive approach to AI inference I’ve seen in years. Instead of running model weights through traditional memory hierarchies, they literally etch the weights directly into the transistors of the chip itself. The model *is* the chip. Their tagline — “The model is The Computer” — isn’t marketing fluff. It’s the actual architecture.
Their first product, the HC1, is built on TSMC’s 6nm process with 53 billion transistors, and it runs the full Llama 3.1 8B model entirely on a single chip. No HBM. No advanced packaging. No liquid cooling. Just a ~200W chip spitting out over 17,000 tokens per second. To put that in context, that’s roughly 73x faster than an Nvidia H200 on the same model, while using a fraction of the power. Those numbers got my attention immediately.
The secret sauce is what Taalas calls their “mask ROM recall fabric” — a structure where a single transistor can store 4 bits of model data and perform multiplication simultaneously. They pair this with SRAM for things like KV caches and fine-tuned weights, so the chip isn’t completely static. And when they need to customize a chip for a different model, they only change two masks in the manufacturing process. That’s a wild level of hardware efficiency.
The company just [announced a $169 million raise](https://siliconangle.com/2026/02/19/taalas-raises-169m-funding-develop-model-specific-ai-chips/) on February 19th, bringing total funding to $219 million, backed by Quiet Capital, Fidelity, and legendary semiconductor investor Pierre Lamond. The announcement blew up across tech media — [EE Times](https://www.eetimes.com/taalas-specializes-to-extremes-for-extraordinary-token-speed/) did a deep technical breakdown, [The Next Platform](https://www.nextplatform.com/2026/02/19/taalas-etches-ai-models-onto-transistors-to-rocket-boost-inference/) covered the architecture in detail, and the story landed on [Techmeme’s front page](https://www.techmeme.com/260219/p36) pulling in discussion from all corners.
Now, the obvious trade-off here is flexibility — you’re baking a specific model into silicon, so you can’t just swap in a new model with a firmware update. Taalas addresses this by planning a pipeline of chips: an upcoming 20B-parameter chip is expected this summer, and the next-gen HC2 is targeting frontier-scale models. If inference demand for popular models like Llama stabilizes (which it arguably already has for many production workloads), this approach starts to make a lot of economic sense. The team has 20+ years of working together and holds 14 patents, so they’re not exactly winging it.
I’m genuinely curious where this goes. It’s the kind of from-first-principles rethinking that either flames out spectacularly or reshapes an entire market. Given the caliber of the team and the funding they’ve pulled in, I wouldn’t bet against them.

Leave a comment