How-To Series · Episode 22 / 59 · Module 4: Eyes, Ears, Voice

Hermes · Vision & Paste

Paste a screenshot. Ask. The agent reads it.

After this videoYou can now hand the agent any image and have it work from there.

Copy any image to your clipboard, hit /paste (or Ctrl/Cmd+V), type your question, send. The image goes to the model as a base64 vision content block. Multiple attachments work (Ctrl+C clears them). Three attach modes: /paste (most reliable), Cmd/Ctrl+V (layered), and Mac screenshot/file:// auto-attach. OS clipboard helpers: Mac needs nothing, Linux X11 needs xclip, Wayland needs wl-clipboard, WSL2 works via PowerShell, SSH does not work. /terminal-setup fixes IDE-terminal key intercepts in VS Code, Cursor, and Windsurf. Every pasted image saves to ~/.hermes/images/ for later re-attach with @.

About these resources. Every command in this video comes from the Vision & Image Paste doc.

Sources · What this video distills

1 docs page · every command below traces to one of them
Primary · /paste, Cmd/Ctrl+V, Mac auto-attach, platform setup, /terminal-setup
Vision & Image Paste
Read ↗

Commands shown · Copy and paste

each shows the source doc it came from
Explicit pastefrom source ↗
/paste
IDE-terminal fixfrom source ↗
/terminal-setup
sudo apt install xclip
Linux Waylandfrom source ↗
sudo apt install wl-clipboard
Mac speed bumpfrom source ↗
brew install pngpaste

Going deeper · Related Hermes docs

further reading · not sources of facts shown above

Next in the series · Episodes that build on this

E15
Drop Files With @
E20
Browser Automation
E23
Image Generation