How-To Series · Episode 22 / 59 · Module 4: Eyes, Ears, Voice

Hermes · Vision & Paste

Paste a screenshot. Ask. The agent reads it.

After this videoYou can now hand the agent any image and have it work from there.

Copy any image to your clipboard, hit /paste (or Ctrl/Cmd+V), type your question, send. The image goes to the model as a base64 vision content block. Multiple attachments work (Ctrl+C clears them). Three attach modes: /paste (most reliable), Cmd/Ctrl+V (layered), and Mac screenshot/file:// auto-attach. OS clipboard helpers: Mac needs nothing, Linux X11 needs xclip, Wayland needs wl-clipboard, WSL2 works via PowerShell, SSH does not work. /terminal-setup fixes IDE-terminal key intercepts in VS Code, Cursor, and Windsurf. Every pasted image saves to ~/.hermes/images/ for later re-attach with @.

About these resources. Every command in this video comes from the Vision & Image Paste doc.

New words here · Plain English

one sentence each · full glossary

VisionThe ability for the AI to see and understand images. Paste a screenshot and ask about it.

ScreenshotA captured image of what is on your screen. Hermes can read and analyze them.

Sources · What this video distills

1 docs page · every command below traces to one of them

Primary · /paste, Cmd/Ctrl+V, Mac auto-attach, platform setup, /terminal-setup

Vision & Image Paste

Commands shown · Copy and paste

each shows the source doc it came from

Explicit pastefrom source ↗

/paste

IDE-terminal fixfrom source ↗

/terminal-setup

Linux X11from source ↗

sudo apt install xclip

Linux Waylandfrom source ↗

sudo apt install wl-clipboard

Mac speed bumpfrom source ↗

brew install pngpaste

Going deeper · Related Hermes docs

further reading · not sources of facts shown above

Browser Automation

vision reads what the accessibility tree misses

Context References

@ to re-attach saved images later

Next in the series · Episodes that build on this

E15

Drop Files With @

E20

Browser Automation

E23

Image Generation