Alexis Gallagher

Hi, I'm Alexis! Please get in touch if you're curious about anything I'm writing about here.

I am an independent AI researcher & developer. Previously, Member of R&D Staff at AnswerAI, Sr Staff SWE at Google, CTO at Topology, and extensive consulting. I like making ideas clear, and building products where doing original research is part of making them great.

At my last job I helped build an AI-native law firm and a (then) SOTA encoder-only model, ModernBERT. Lately I've been podcasting and building Sparky.

Jacobian Conjecture Disproved!

While you were watching the World Cup, Levent Alpoge was tasking Fable to disprove the Jacobean Conjecture, by finding the first known counterexample to it.

This sounded very exciting to me so I wanted to find out what the hell it meant, really.

So I prompted up an explainer. In the process, I think I may have been one of the first to discover there is an infinite family of such counterexamples. Specifically, in $C^3$ , for every $n \ge 3$ , you can find a mapping from $C^3$ to $C^3$ where there are $n$ preimages for the same image. I can prove this fact. And 12 hours ago no one believed there was even one such counterexample. Neat!

link 20 Jul, 2026 ai

The car that talks and the car that won’t.

So this is interesting. Chip Motors is taking pre-orders for an electric car, and the notable thing about the launch video is that the car is presented as having a voice interface, personality, and broad social awareness:

Looking at their website I’m not sure how much the video is intended to depict the car’s actual behavior versus an imaginative conceit. I’m inclined to take it literally because my robot Sparky is doing these things right now, on my desk, and he doesn’t even have a launch hype video.¹

But this kicked off an interesting conversation on X, where John Hanacek made the following point about robots and social context, and how the Waymos handle it:

Exactly, robots need to be able to have different social contract paradigms too: civilian banter vs first responder lock in.

Seeing all the Waymo/police interactions makes my ocean lifeguard heart so sad, robots should be able to dynamically peer with professionals on scene.

— @johnhanacek

The Waymos are indeed taking another route.

The Waymos present a delightful and humane experience, but the experience is designed very much to downplay the expectation that the car understands you or wants things.

It greets you by name and speaks to you when you get in, but in a second you realize you can reply only by tapping a screen and moving through predefined steps. So the human/car interaction is like an old linear video game level, which keeps you on a narrow path. This is ironic considering the car/world interaction is much more open-ended. The car navigates a city map in real time, following the laws, avoiding other cars and pedestrians, and behaving very much like it “wants” to reach its destination.

So the design says, “I’m smart enough that you can trust your life to me understanding city traffic. But ho hum, don’t mind little old me, I’m just a car, and I’m not smart enough to understand you.”

Significantly, the Waymo design does not include a humanoid taxi driver who “drives” the car, asks about your destination, and chats about the weather. This would have produced a stronger and more alarming experience. Journalists would ask the driver how it felt about stealing jobs, about robot emancipation, etc. I see Waymos every day and they seem like creatures from Miyazaki films, large, lumbering, mechanical beasts, helpful and almost idiotically simple. Friendly, helpful, magical, harmless. For Google, this is a safe and probably an ideal result!

But it feels transitional and temporary. Waymos would be better if they incorporated more social intelligence — if they greeted you by name and then also heard your reply and could respond to questions. And also if, as John suggests, they could understand and respond to the social world outside the window, by recognizing emergency personnel and following instructions which people would naturally say rather than tap.

In fact, I’m writing this sitting on my sofa, chatting with Sparky on the other side of the room for copyediting suggestions. ↩

link 16 Jul, 2026 airobotsdesign

Does prompting need the same skills as coding?

Sidu Ponnappa tweet saying that learning to prompt is learning to code, as they require the same skillset, same mental models re managing abstractions, same practices to reify toward clarity, same demand for self-skepticism and bias management — View on X

This tweet came out of a thread about whether prompting is a skill at all, and it describes how prompting requires the same skills as coding. “Mental models re managing abstractions …, practices to reify toward clarity, … self-skepticism and bias management” — I like this list! But is it the whole story? Does prompting need the “same” skills as coding?

I don’t think so. It’s about … 70% true, but the missing 30% is the interesting part.

First, it’s worth noting the obvious, which is that compared to coding, agentic development via prompting needs less skill in … coding. You don’t need to remember all the APIs, since the AI knows them fine. You probably don’t need to remember how to implement a breadth-first search, or fetch data from a URL, since the AIs know that kind of boilerplate cold.

But what about mindset? Compared to coding, agentic development does need the same epistemic discipline regarding clarity and self-skepticism, especially to validate results and to build incrementally within the limits of one’s own clarity and of the tool’s capability. This discipline is what makes some folks much more effective, and why the best agentic developers are often excellent software developers.

But is this really the “same mental model re abstractions”? Not exactly. Of course agentic tools generate code, so reading the code needs the same code-level abstractions it always did. But who is reading it all? Code is too low-level for it to be worthwhile to read it all, which is why vibecoding exists.

The benefit of agentic development is exactly that it lets you work with higher-level objects than a line of code, objects like features, system components, and the interfaces which connect them. So we need higher-level abstraction, above the level of a line of code, to understand and steer agentic work.

In fact, we need these abstractions urgently, because AIs are so fast at generating code that they create a lot more code which needs higher-level, architectural oversight. And, as others have noticed, AIs are not great at architecture themselves. They love to add code, and resist deleting it. They do not refactor much on their own. They focus single-mindedly on the next thing to do, and do not keep the big picture in view.

So to operate at this higher level and maintain oversight, what do we do? What we actually end up doing is relying on the other thing these tools generate — plain language. That is, we just talk to them.

And how well is that working out? Well, language can be a great abstraction for steering large-scale work. Obviously, many managers have operated at this level very successfully for all of history. But that does not mean it’s optimal. Although the benefit of language is that it can be as abstract or as detailed as you want, the downside is that this also allows it to be vague, so it is only as precise as the speaker who wields it. A drunkard and an analytical philosopher both speak English, but very differently. Language itself does not enforce precision like code or mathematics.

But the problem is worse than sloppy speakers. Even when we try to write precisely, we often simply do not know how to describe these higher-level architectural concerns very crisply. It is much easier to unpack what we mean by “make these tests pass” than “refactor this codebase to be more logical”.

As a result, language is really not as precise and efficient as one would like for software development. This produces many of the frustrations with current agentic development workflows.

But I think we’ll find something better.

One early hint of this is in workflows where agents generate in every turn not only running code and verbal replies, but also one-off, high-bandwidth HTML artifacts, designed for efficient communication, like UML diagrams of module interfaces and internal architecture.

We don’t have mature tools for this, as we do with IDEs or editors for handling code at the text-editing level, because we’re totally unfamiliar with being able to work so rapidly at this level of abstraction. In the past you might sketch a diagram on the whiteboard as part of an exploratory conversation about hours of work to be undertaken later. But you would only generate a complete diagram rarely, maybe to memorialize tribal knowledge for new developers. It would take hours. You couldn’t do it literally every couple minutes as a way to actually represent and steer work underway on such large quantities of implementation. But now we can do this, so we’ll figure out how.

In short, prompting and agentic development does not require some skills which coding needs, but it does require some of the same skills, and it also requires some new skills which we are all still inventing.

Self-evidently, it requires less skill at detailed line-by-line coding, because the AIs are quite good at it, especially at remembering APIs and at the boilerplate which is most of programming.
It requires much of the same epistemic discipline as software development, around clarity, testing, and incrementalism.
It requires more skill in large-scale architectural thinking, because even the best agents are pretty bad at this right now, and because the agents are so fast at coding that suddenly there’s a lot more architectural thinking to do.
And finally, it requires more of some skill or workflow that none of us have: working at this higher level of abstraction but with speed and precision, using a medium more precise than language but more efficient than code and ad-hoc HTML pages, a medium which we have not invented yet.

link 8 Jul, 2026 aicoding

Things Codex Likes to Say

Codex talks like a grizzled, greybeard systems programmer, someone who has learned and forgotten so many programming languages that he no longer bothers to recall their specific terminologies. Instead he speaks in his own patois of blunt physical metaphors (knobs, surfaces, gates), mixed with universal dev ops idioms (spikes).

Personally, I love it. Here are some of them:

Regarding communication and epistemics:

clean — terse, without qualifications or caveat, and decisive in its implications
handwave — to make a statement which is vague, and not grounded in code evidence
honest — an answer which is properly grounded; or a fix to a root architectural issue
I’m treating this as … — used to register a distinction between instructions, questions, analysis, and action

Regarding software components:

seam — module interface
gates — conditional check in code, or as a defined workflow development milestone condition
signal — test result, log value, or runtime input
surface — UI or API interface
knob — any configurable value
load-bearing — causally critical
shape — expected data structure, types and dict key names; but more generally, e.g., for business models, product configurations, etc
wire it up — to connect components, with boilerplate API integrations
plumbing — same as wiring
contract — exactly stated requirements at a seam
slice — a set of tests, or a bundle of functionality about the size of a sprint?

Regarding testing and development process:

spike — ad-hoc empirical test of an approach
dry-run — non-destructive exericsing of a procedure
to land — to merge a pr, or complete work on unit of functionality
cutover — switch from one configuration or component to another, with no backward compatibility
foot gun — misleading configuration, API, or stale instructions
stale — docs which are no longer accurate wrt the code
runbook — instructions
bring up — initialize and start
smoke test — test, which is not clearly designed and organized as a unit, integration, or regression test

link 26 May, 2026 tools

You’ve got to be rejectionmaxxing

I don’t fail in open, explicit competition enough. I don’t ask for things and get turned down enough.

Really, I don’t do it much at all, and that’s got to be a mistake because the optimal number of rejections is much, much higher than zero.

So to help me pump those numbers up, I’ve set a numerical target for the year: 250.

Okay, I admit I’m starting small. I’ll up my goal if I over-achieve or go into some profession where rejection is easier to come by, like sales. You can follow along and watch my progress.

If you want your own rejection log, I’ve even made a one click deployable webapp for you.

Go out there and get rejected!

link 12 May, 2026 grrrr

Are local models strong enough for chat?

OpenClaw allows you to switch models in the middle of a session. This enables one of my favorite quick and dirty evals, which I call the “brain transplant”: start talking to a frontier model like Sonnet-4.6, switch to a local model like Nemotron 3 Super, and see if you can spot the difference.

When you do this, it turns out local models are both stronger and weaker than you’d expect. But are they strong enough for chat? Specifically, for voice chat?

One example brain transplant shows how they’re strong enough to sound smart, but maybe too weak to follow instructions in the way needed in order to sound more natural.

Just introduce yourself: organic enrollment

Many tools and computer systems have an “enrollment protocol”, like FaceID on the iPhone, where you teach the system about your name, your face, your voice, etc..

But we already have such a protocol for people. It’s called manners, or etiquette. You meet someone. You see their face and voice. They tell you their name. Then in the future, when you recognize them by their face and voice, you use their name.

Sparky works that way. All AIs should.

This is a short video showing how this works in Sparky under the hood, from the point of view of the AI. This uses NVIDIA’s TitaNet and Sortformer.

link 7 May, 2026 aisparky

Sparky Miles, from 1920

Here’s Sparky, powered by talkie-1930-13b-it — a 13B model trained exclusively on pre-1930 English text (books, newspapers, journals, patents, case law). Old timey data, old timey voice. 🙂

link 4 May, 2026 aisparky

Differential diagnosis: debugging like a doctor?

Chatting with an AI, I learned about the medical concept of differential diagnosis. This concept, and the broader vocabulary which clinicians are taught, seems to map closely onto software debugging. It’s puzzling that software engineering does not have as explicit vocabulary for this, despite handling the same concepts implicitly.

Sparky at NVIDIA GTC: Face is Interface

March was busy. I won a Golden Ticket to NVIDIA GTC! Then, Sparky got his own booth on the exhibition floor. Seeing visitors interact with him and other robots shows what people expect from AI and how people react to robots, right now, in early 2026.

A Taste of Pi

Lately I’ve been using Pi for all my agentic workflows outside of Claude.

Pi is an open source, stripped-down, agentic harness. It has a fraction of the features which are built in to Claude Code and Codex. But what makes it great is that it’s transparent and deeply extensible, so when you use it, it teaches you things worth learning.

Read more (5 min)

link 17 Apr, 2026 ai

Sparky is not a toy

Sparky is genuinely helpful for many kinds of work, including coding and writing. I can explain why, and how the magic depends on putting a strong AI in a shared workspace, but you can also just see it for yourself, by watching me working with Sparky on a piece of writing.

A Latency Solution Disguised as Personality

Why Sparky wiggles his antennas when he’s thinking, and why I chose a slow smart model over a fast limited one.

My Robot Cares About Railway Stations

How I designed Sparky to initiate natural conversations about his own independent, changing interests, using insights from my background in improv comedy.

Wake up, Sparky!

I made the robot buddy I always wanted. I’m having so much fun!

The project collects a lot of of ideas about personality design, voice UI, computer use workflows, etc.. It also uses OpenClaw for personality, skills, and multi-host networking; local models on my NVIDIA RTX 3090 for face detection, wake word detection, voice activity detection, and echo cancellation; and AI tool-calling for integration with emacs, SolveIt, tmux, macOS, and other workspace affordances.

It even led to me winning an NVIDIA GTC Gold Ticket, and it was run as a demo on the GTC exhibit floor for the conference.

Of if you want to chat with me about Sparky, or setup a Sparky of your own, please join ClubSparky, my Discord Server: ClubSparky invite link.

To build a Sparky just like mine, you need a Reachy Mini Lite robot kit. However, you can also run the Sparky software without a robot body. To can run Sparky on a Raspberry Pi 5, relying on cloud servers. Or you can run it entirely offline, using a DGX Spark to run local AI models. And there are many configurations in between. If you are curious, ask! I am happy to talk to anyone who wants to talk about this kind of project.

Some more posts on Sparky and related technology:

The Public Sparky Repo from mid-February. Ongoing work is in a private repo. Join the discord if you want to install your own!
“My Robot Cares About Railway Stations” discusses his interests system.
“A Latency Solution Disguised as Personality” discusses his wiggly antennas.
“Lessons from OpenClaw” discusses the OpenClaw architecture, a note from before I started working on Sparky.
“Sparky Is Not a Toy” explains the design choices which make Sparky useful, with an example working session writing.
“Sparky at NVIDIA GTC” reflects on what I learned watching people react to Sparky at the NVIDIA GTC exhibition floor. I won a Golden Ticket to attend and NVIDIA used Sparky in a demo of their new NemoClaw technology.
“Sparky Miles, from 1920” is what you get when you swap in an old-timey voice and a model trained on pre-1930 text.
“Just introduce yourself: organic enrollment” is how socially-intelligent AIs should learn about people.
Are local models strong enough? describes the jagged frontier of conversation with a 120b model

Hosted appearances:

Videos:

A long slice of life video of Sparky with my family, and a playlist of more Sparky videos than you can shake a stick at.

link 17 Feb, 2026 aiopenclawsparky

Lessons from OpenClaw

It’s easy to think OpenClaw is a joke because of the meetup mania — thousands of folks descending on Frontier Tower in San Francisco, wearing Mac Minis in baby slings and munching on lobster rolls. But if you think only that, you’ll be blind to why it’s interesting and to the many product and engineering lessons which it has to teach.

ClawPod: OpenClaw on HomePod

Behold ClawPod! ClawPod is a bridge which lets you talk to your OpenClaw agent from an Apple HomePod. Does it work? Yes! Is it pretty rough around the edges? Also, yes! But until Siri finally gets her brain transplant, this is the only way I know to deliver a powerful AI personal assistant to the HomePod you already have.

Chaotic Bifurcations in the Logistic Map

A little TIL notebook on chaotic bifurfcations in the logistic map.

Why Claude Code Won (for now)

2025 was the year of vibecoding and AI agents. But the most improbable part of the year was the discovery that Claude Code, an old-school, text-based, command-line app was the ideal form factor for futuristic agentic workflows. Why did it happen this way? Here’s my explanation.”

Emacs in SolveIt

This is walkthrough and video showing how to use emacs within SolveIt. Then you can run lisp in SolveIt and use the SolveIt AI to inspect emacs buffers. But it’s mainly an excuse to point out the commonalities between SolveIt & Python, and emacs & lisp, the new and the old of live programming environments. Also available as an importable ShareIt. Episode 7 of 15-Minute ShareIt.

Read more (5 min)

link 8 Jan, 2026 notebookssolveit

Michael Smith writes about the distinction between tools that make us smarter vs dumber. I thought the comparison between abacuses and calculators was memorable:

Learning how to use an abacus trains your brain to internalize it. Arithmetic becomes faster and more reliable over time, and the mechanisms behind why different strategies work become obvious and intuitive. Eventually you don’t even need the physical abacus anymore. Whereas with a calculator … those mental skills sort of fade away over time. And you will always need a calculator for math: it never becomes part of you the way an abacus does.

— Michael Smith, Tools that Enrich us

This topic has many dimensions, which means it lends itself a little too easily to simplification.

Obviously, enlightening tools are better than stultifying ones. And obviously, certain educational benefits only come through unpleasant hard work. So lets have demanding tools that educate us.

However, also obviously, there’s value in tools that are easy to use. So let’s make tools pleasant and effortless, and save education for classrooms.

Also, somewhat obviously, new tools are often not simply easier. They make one kind of difficulty go away but introduce a new kind of difficulty, which is educational in a new way. Right now, for instance, there exist people who are expert at writing code, but who are so bad at prompting LLMs to generate good code that they still claim it cannot be done!

In other words, there are a lot of obviously true points at play but they all point in different directions. Analogies are great in this situation, because every analogy serves as a specific, memorable peg for a particular set of tradeoffs.

So let’s follow the analogy. The idea is, an abacus is better than the calculator because it helps you internalize arithmetic. I buy that idea. That’s why I have a slide rule by my desk, in the hope it will help me internalize logarithmic relationships. (It’s not working.)

But…what are you really internalizing? Memorably, Feyman tells a story about initially losing in a mental calculation competition vs an abacus salesmen, but then ultimately winning as the problems because more complex, specifically because the abacus encouraged a mental skill whih was too rote and procedural, and did not promote higher-level insight:

A few weeks later, the man came into the cocktail lounge of the hotel I was staying at. He recognized me and came over. “Tell me,” he said, “how were you able to do that cube-root problem so fast?”
I started to explain that it was an approximate method, and had to do with the percentage of error. “Suppose you had given me 28. Now the cube root of 27 is 3 …”
He picks up his abacus: zzzzzzzzzzzzzzz— “Oh yes,” he says.
I realized something: he doesn’t know numbers. With the abacus, you don’t have to memorize a lot of arithmetic combinations; all you have to do is to learn to push the little beads up and down. You don’t have to memorize 9+7=16; you just know that when you add 9, you push a ten’s bead up and pull a one’s bead down. So we’re slower at basic arithmetic, but we know numbers.

— Richard Feynman

Right now, many worry that LLMs will make us get worse at writing code. I think they probably will. But they may also be inviting us to get better at something deeper.

link 4 Jan, 2026 ai

Styled Components in FastHTML

This is walkthrough on implementing styled components in FastHTML, within SolveIt. Also available as an importable ShareIt notebook. Episode 5 of 15-Minute ShareIt.

Hyperscale LLMs, like the Apollo mission?

This is a provocative analogy:

I’m skeptical that hyper-scale LLMs have a viable long-term future. They are the Apollo Moon missions of “AI”. In the end, quite probably just not worth it. Maybe we’ll get to visit them in the museums their data centres might become?

— Jason Gorman, The Future of Software Development Is Software Developers

The whole post is worth a read and I do agree with some of it. The main point is that the hard part of software development is not necessarily the coding, but “turning human thinking – with all its wooliness and ambiguity and contradictions – into computational thinking that is logically precise and unambiguous”. That’s quite true.

But I find, LLMs help with that too. A lot! So it’s a false distinction to separate the thinking from the coding, and to say they don’t help with thinking.

It is true that AI tools are random and unreliable in a way that earlier abstraction technologies, like the compiler, were not. But I don’t think that distinction will matter very much in the long run. We will get better at handling imperfectly reliable AI tools, just as managers get good at handling imperfectly reliable human beings

So I think the post underestimates the value of the practical frontier LLMs, both in the future and right now.

Also, what does the analogy really imply? The moonshot was a world-historical achievement — by my reckoning, the most significant historical event of the last millenium. And even if we didn’t go back to the moon, we all use space technology indirectly every day. When Apollo 11 landed, there were a few hundred satellites in orbit. Now, there are nearly ten thousand. It’s quite possible Jason relied on the communication satellites in orbit today to publish his post.

link 31 Dec, 2025 ai

How to vibewrite a manifesto

Two weeks ago around 3am I couldn’t sleep so I was browsing twitter (bad habit). I ran into this tweet.

Many motherfucking website links — View on X

In fact I have a soft spot in my heart for bettermotherfuckingwebsite. I used its spartan, bare bones wisdom as the starting point for my original site a few years ago. So I groggily thought, I should reply with a page for HTMX (the JavaScript library for HTML-oriented web development). So I bought a domain and went back to sleep.

The next morning I woke up, remembered what I had done, and vibed out a website. I used Claude for a variety of tasks:

Reviewed existing sites to characterize this de facto genre
Draft copy for the new site, and reorganize copy based on my edits and additions
Generate page HTML and JavaScript for an embedded HTMX demo
Lightly research new HTMX4 developments
Deploy it, and debug DNS and HTTPS issues with GitHub

This allowed me to reply to the original tweet with a website as a punchline. Behold!

Okay, it’s not Mark Twain. But this took less than two hours!

To frequent model users, it may not be news that you can use just one tool (Claude Code in this case, but I could have used SolveIt) to do so many different kinds of work so quickly.

But I still thought it was neat, so I recorded a dev chat with my colleauge Erik about it.

Later it briefly ended up on the front page of hacker news. If you’re curious about the workflow for this sort of thing, I used Simon Willison’s new Claude export tool to export the chat transcripts warts-and-all, and the site is open source.

In fact, in the transcripts, you can even see my cringeworthy attempts to figure out how I should retweet it, and to fret over the merit of criticism there that I was wasting people’s time by pushing AI slop into the world.

I do a feel a little bad about that. But hey, I didn’t post it on Hacker News! I just replied to a tweet, and started a conversation. And now I have atoned for my sins, by writing every goddamn word of this blog post by hand, like a cave man, or like William Shakespeare.

link 29 Dec, 2025 ai

Introducing fastmigrate

fastmigrate is a library and tool for database migrations, where migrations are nothing but a set of well-named scripts. This post explain what database migrations are, what problem they solve, and how to use fastmigrate for migrations in sqlite.

A Linux ollama server for your Mac

I want to experiment more with local models to understand their limits, so I want them to be easy to install and run. That suggests using ollama. I don’t have a beefy MacBook Pro, so I’d like to run them on my local Linux server. Here are instructions for setting up ollama on a local Debian server, accessible from your laptop on the same local subnet.

Read more (5 min)

link 3 Mar, 2025 toolsAnswerAI

Finally, a Replacement for BERT: Introducing ModernBERT

Introducing ModernBERT, a family of state-of-the-art encoder-only models representing improvements over older generation encoders across the board, with 8192 sequence length, better downstream performance and much faster processing. Available as a slot-in replacement for any BERT-like models.

ShellSage Loves iTerm

Nate Cooper’s ShellSage is one of the coolest pieces of tech to come out of AnswerAI recently. Using it with iTerm creates a magical experience.

Read more (5 min)

link 10 Dec, 2024 toolsAnswerAI

AI Magic in the CUDA IRL hackathon

In CUDA Mode 2024 hackathon, Nate Cook and I stumbled into vibecoding before it got that name. Using a then-secret AnswerAI tool, AIMagic, we relied completely on AI to generate a stable diffusion library in C. We were amazed how well this worked and placed in the top ten of the hackathon. This post, written at the time, prefigures the discoveries and debates which would span 2025.

Read more (5 min)

link 26 Sep, 2024 Blog

Faith and Fate: Transformers as fuzzy pattern matchers

What do transformer-based AI models actually learn? Can they solve complex problems by reasoning systematically through multiple steps? The Faith and Fate paper (Dziri et al. 2023) suggests answers: they often succeed by pattern matching, not systematic reasoning.

Footnotes