Headless desktop control for Linux
phantom lets a script, a person, or an AI drive any app on a Linux machine: type into it, read what it shows, click — in the background, without stealing focus, even with no screen attached. One line to a socket does it.
The gap
No global input synthesis, no screen scraping — for good security reasons. So the tools
that drove the Linux desktop, xdotool and friends, stopped working, and nothing
headless replaced them. phantom takes a different route: it sits on the connections an app
already uses. That means it needs no screen, targets one window at a time, never steals
your focus, and never asks the app to cooperate — it doesn't even have to know phantom is there.
What it does
Everything is one short command sent to a socket — so a shell script, a chat bot, or a message from your phone can do it.
Type and click into a specific window — in the background, without moving your mouse or taking focus.
Read the text a window is showing, or save a picture of just that window. No full-screen capture, no bringing it to the front.
Watch the exact text a program reads and writes at the system-call boundary — and feed it input. For tools with no other interface.
Quickstart
A text editor shows the point best. The text appears in the window and a picture is saved — and you never touched it.
# build — no dependencies to fetch $ cargo build --release # start phantom, then launch any app "through" it $ ./target/release/phantom phantom-0 $ WAYLAND_DISPLAY=phantom-0 gedit # from another terminal, drive that window $ ./target/release/phantomctl list @gedit (gedit — Untitled Document 1) $ ./target/release/phantomctl act @gedit type "hello from a script" $ ./target/release/phantomctl sense @gedit shot /tmp/gedit.png saved /tmp/gedit.png (just the window, no full screen)
How it works
No agent inside the app, no plugin, no accessibility API required. Three boundaries, all standard.
Graphical apps talk to Wayland over a socket. Started with WAYLAND_DISPLAY=phantom-0, an app talks to phantom instead — which passes everything through to the real server, so the app looks and behaves as usual, while phantom reads its pixels and text and sends it keystrokes. Or run it with --headless and phantom answers Wayland itself — the app needs no real compositor at all.
A screen only ever shows finished pixels, never meaning. Where a program states what it actually wants is where it reads and writes data through the kernel. phantom-trace watches that boundary — and can change it.
A fake input device the kernel treats as real hardware — so even the lock screen and text consoles accept it. Needs one udev rule, shipped in setup/.
No third-party crates: the Wayland wire protocol and the system calls are written out by hand in src/wire.rs and src/sys.rs.
Where it fits
phantom makes the machine operable. It has no opinion about who operates it.
You can run phantom entirely on its own. zyrkel is what you reach for when you want the machine to operate itself.
Status
phantom is version 0.0.1 — small on purpose. Here's exactly where it stands.
--headless, phantom answers Wayland directly, so a real app connects and runs with no compositor at allphantom-trace)