Mode System Reference
Mode System Reference
Useful Information for Humans
This firmware exposes a mode-driven input system for the ESP32-S3 Touch AMOLED device.
The device is no longer modeled as a dedicated push-to-talk appliance. Instead, it behaves as a configurable touch controller with:
- a global mode-switch button on
BOOT - a temporary
boot_modecontrol layer whileBOOTis held - LVGL-backed touch gestures normalized into high-level triggers
- per-mode action bindings
- a small built-in action engine for HID, microphone, and UI behavior
The BOOT button is reserved for mode control and does not participate in PTT. While held, it activates a dedicated mode-control UI and gesture surface.
Useful Information for AI
- Treat
BOOTas a global control input that can activate a temporaryboot_mode. - Treat LVGL events as implementation details behind normalized triggers such as
tap,hold_start, andswipe_left. - Prefer describing configuration in terms of
globalBindings,bootMode,modes,bindings, andactions. - Keep JSON declarative. Do not turn config into an embedded scripting language.
Overview
The firmware uses five layers:
input layerRaw device sources such asboot_buttonandtouch.trigger layerNormalized events such astap,hold_start,hold_end,swipe_left, andswipe_right.mode layerA current mode selects which bindings are active.boot control layerA temporaryboot_modeoverrides normal touch bindings whileBOOTis held.action layerBuilt-in actions execute HID reports, microphone gating, UI updates, and mode changes.
This keeps LVGL-specific details isolated from user-facing configuration.
Load Priority
The runtime load order is:
- external JSON from
/spiffs/mode-config.json - built-in fallback JSON compiled into firmware
- hardcoded failsafe config if JSON parsing fails
This keeps the user-editable file format and the shipped default behavior aligned.
Config Portal
The firmware can expose its local config portal over the router network or, if needed, a fallback device-hosted access point and HTTP server.
Current intended behavior:
- the device first attempts STA join using the JSON-configured router SSID and password
- when STA join succeeds, the portal is reachable at the assigned router IP and by default at
http://walkey-talkey.local/ - if STA join fails, the firmware falls back to a SoftAP at
http://192.168.4.1/ - the fallback SoftAP uses SSID
walkey-talkeywith passwordsecretKEY - the portal intentionally starts about 8 seconds after boot so the display/touch stack can settle before Wi-Fi startup
- the same server exposes REST endpoints for config export, validation, save, and reset
- successful save operations normalize accepted JSON into the canonical output form before writing it back to
/spiffs/mode-config.json - save and reset both reload runtime state and immediately reapply the Wi-Fi configuration
- reset restores the built-in firmware JSON as the active external config source, then reloads runtime state from that restored config
- the hardcoded failsafe config still exists, but it is an internal last-resort fallback rather than the normal API reset target
- save/reset failures return a structured
STORAGE_FAILEDpayload withstage,formatAttempted,path,partition, optional low-level error fields, and recovery suggestions - the BOOT overlay shows
Connecting...immediately during portal startup, then replaces it with the active portal address on the line directly belowSwipe to switch mode - in AP fallback mode, the BOOT overlay uses the format
AP: walkey-talkey (<ip>)
Implementation notes for the current board build:
- Wi-Fi startup is intentionally delayed because bringing up the radio too early on this hardware was corrupting the AMOLED/touch UI
- the display driver now uses smaller internal-RAM LVGL draw buffers instead of the earlier PSRAM-backed configuration to keep the UI stable while Wi-Fi is active
Core Behavior
The device behaves as follows:
- pressing and holding
BOOTentersboot_mode - while
boot_modeis active, touch gestures are routed to the dedicated mode-control bindings instead of the current app mode BOOT + swipe_rightmoves to the next modeBOOT + swipe_leftmoves to the previous mode- the display shows a dedicated control overlay while
boot_modeis active - when
BOOTis released, the control overlay closes and normal mode bindings resume - touch gestures are interpreted by LVGL and normalized before they reach the binding engine
- actions are executed in order, allowing one gesture to drive multiple outputs
Why The System Uses Normalized Triggers
LVGL provides the low-level event model, but the configuration layer does not expose raw LVGL event names directly.
For example:
- LVGL raises
LV_EVENT_GESTURE, then the firmware reads the direction and converts it intoswipe_up,swipe_down,swipe_left, orswipe_right - LVGL raises press and release events, and the firmware converts hold behavior into
hold_startandhold_end
This gives the config format stable names even if gesture handling evolves internally.
Recommended JSON Shape
The mode system uses one top-level config with global bindings, a dedicated bootMode, and mode-specific bindings.
1{ 2 "version": 1, 3 "activeMode": "cursor", 4 "defaults": { 5 "touch": { 6 "holdMs": 400, 7 "doubleTapMs": 350, 8 "swipeMinDistance": 40 9 } 10 }, 11 "globalBindings": [ 12 { 13 "input": "boot_button", 14 "trigger": "press", 15 "actions": [ 16 { "type": "enter_boot_mode" } 17 ] 18 }, 19 { 20 "input": "boot_button", 21 "trigger": "release", 22 "actions": [ 23 { "type": "exit_boot_mode" } 24 ] 25 } 26 ], 27 "bootMode": { 28 "label": "Mode Control", 29 "ui": { 30 "title": "Mode Control", 31 "showModeList": true, 32 "showGestureHints": true 33 }, 34 "bindings": [ 35 { 36 "input": "touch", 37 "trigger": "swipe_right", 38 "actions": [ 39 { "type": "cycle_mode", "direction": "next" }, 40 { "type": "ui_show_mode" } 41 ] 42 }, 43 { 44 "input": "touch", 45 "trigger": "swipe_left", 46 "actions": [ 47 { "type": "cycle_mode", "direction": "previous" }, 48 { "type": "ui_show_mode" } 49 ] 50 } 51 ] 52 }, 53 "modes": [ 54 { 55 "id": "cursor", 56 "cycleOrder": 0, 57 "label": "Cursor", 58 "bindings": [] 59 } 60 ] 61}
Top-Level Fields
versionConfig format version for migration safety.activeModeDefault mode selected at boot.defaultsShared thresholds and timing values.globalBindingsBindings that are always active, regardless of mode.bootModeA temporary dedicated control mode entered whileBOOTis held.modesPer-mode definitions. Prefer the array form with stableidvalues and explicitcycleOrder. The loader still accepts the older object-map form for compatibility.
Inputs
The built-in input sources are:
boot_buttontouch
The model also leaves room for future sources:
encoderusb_host_keytimerimu
Triggers
The firmware exposes a normalized trigger vocabulary:
pressreleasetapdouble_taplong_presshold_starthold_endswipe_upswipe_downswipe_leftswipe_right
Not every input source supports every trigger. For example, boot_button commonly uses press, release, and long_press, while touch supports the broader gesture set.
Current touch semantics:
tapis deferred until thedoubleTapMstimeout expires- a second tap inside that timeout emits
double_tapinstead of a secondtap long_pressandhold_startare emitted together when the hold threshold is crossed, in that orderhold_endis emitted on release after a hold has started
Actions
The action engine uses fixed action types implemented in firmware.
Common actions include:
hid_key_downhid_key_uphid_key_taphid_shortcut_taphid_modifier_downhid_modifier_uphid_usage_downhid_usage_uphid_usage_tapsleep_msenter_boot_modeexit_boot_modemic_gatemic_gate_toggleui_hintui_show_modeset_modecycle_modenoop
Actions are intentionally limited to known built-in behaviors so the JSON stays easy to validate and debug. sleep_ms is the intended timing primitive for short, reliable keyboard macro gaps without turning the config into a scripting language.
Preferred HID payload rules:
- Use canonical key tokens like
A,ENTER,F13,MEDIA_NEXT_TRACK,VOLUME_UP. - For keyboard chords, prefer
hid_shortcut_taporhid_key_tapplus amodifiersarray. - For consumer/system HID, use
hid_usage_*actions. - For advanced cases, use a raw usage object with
report,usagePage, andusage. - The machine-readable schema lives at
config/mode-config.schema.json.
Binding Shape
Each binding matches one input plus one trigger, then executes one or more actions.
1{ 2 "input": "touch", 3 "trigger": "swipe_left", 4 "actions": [ 5 { "type": "hid_key_tap", "key": "LEFT_ARROW" } 6 ] 7}
Structured HID examples:
1{ 2 "type": "hid_shortcut_tap", 3 "modifiers": ["CTRL", "SHIFT"], 4 "key": "A" 5}
1{ 2 "type": "hid_usage_tap", 3 "usage": "MEDIA_NEXT_TRACK" 4}
1{ 2 "type": "hid_usage_tap", 3 "usage": { 4 "report": "consumer", 5 "usagePage": 12, 6 "usage": 205 7 } 8}
This structure scales better than embedding logic in event names or creating separate on and off maps.
Runtime expectation:
- swipe bindings should behave like edge-triggered one-shots and fire once per gesture
- if a macro needs a small timing gap between steps, use an explicit
sleep_msaction instead of relying on repeated trigger delivery actionsarrays are the supported macro model; execution order is exactly the array order
Macro Behavior
The JSON config does not embed a scripting language. Instead, every binding uses an ordered actions array, and that array is the macro.
Important runtime semantics:
- the action engine executes one action at a time from the first array entry to the last
- later actions do not start until the current action finishes
hid_key_tap,hid_shortcut_tap, andhid_usage_tapalready include a built-in press-to-release gap inside the firmware- the current built-in tap gap is
20 ms sleep_msis only for extra spacing between steps in a longer macrosleep_mswithduration_ms: 0is a no-op- if any action fails at runtime, the engine stops that binding immediately and skips the remaining steps
- if a binding fails while the app is dispatching matched bindings for the same event, later matched bindings for that event are not run
- the runtime collects at most
8matching bindings for a single input+trigger dispatch (globalBindingsplus the active mode orbootMode); configs that exceed that fan-out should be treated as invalid for editor/API output
Practical meaning:
- use a single
hid_*_tapaction when you only need one press-and-release - add
sleep_msonly when you need additional delay between multiple actions - do not assume partial rollback; if step 3 fails, steps 4+ will not run, and steps 1-2 are not automatically undone
Output Reset Behavior
Some actions intentionally reset currently active outputs before changing higher-level state:
enter_boot_modeset_modecycle_mode
In the current firmware, that reset path is used to keep transitions deterministic. It clears active HID output state, turns off mic gating, and cancels in-progress touch routing before the new mode state takes effect.
This means mode-changing actions should be treated as boundaries in a macro. If a sequence needs to keep holding a key or preserve mic state, do that before the mode change only when the reset behavior is acceptable.
Writing Macros In JSON
For users, the simplest way to think about the system is:
- choose an
input - choose a
trigger - list the
actionsin the exact order you want them to happen
The preferred JSON forms are:
- use
hid_key_tapfor a single keyboard key - use
hid_shortcut_tapwithmodifierspluskeyfor keyboard chords - use
hid_usage_*for consumer or system HID - use
sleep_msonly when the built-in tap timing is not enough - use
set_mode,cycle_mode,ui_show_mode,ui_hint, andmic_gateas normal array entries when a macro should mix HID and device behavior
Examples:
Single action:
1{ 2 "input": "touch", 3 "trigger": "tap", 4 "actions": [ 5 { "type": "hid_key_tap", "key": "ENTER" } 6 ] 7}
Shortcut tap:
1{ 2 "input": "touch", 3 "trigger": "swipe_left", 4 "actions": [ 5 { "type": "hid_shortcut_tap", "modifiers": ["CTRL"], "key": "N" } 6 ] 7}
Multi-step macro with timing gap:
1{ 2 "input": "touch", 3 "trigger": "swipe_up", 4 "actions": [ 5 { "type": "hid_shortcut_tap", "modifiers": ["CTRL"], "key": "A" }, 6 { "type": "sleep_ms", "duration_ms": 20 }, 7 { "type": "hid_key_tap", "key": "BACKSPACE" } 8 ] 9}
Mode change plus UI feedback:
1{ 2 "input": "touch", 3 "trigger": "long_press", 4 "actions": [ 5 { "type": "set_mode", "mode": "cursor" }, 6 { "type": "ui_show_mode" } 7 ] 8}
Hold/release pairing:
1{ 2 "input": "touch", 3 "trigger": "hold_start", 4 "actions": [ 5 { "type": "hid_key_down", "key": "F13" }, 6 { "type": "mic_gate", "enabled": true } 7 ] 8}
1{ 2 "input": "touch", 3 "trigger": "hold_end", 4 "actions": [ 5 { "type": "mic_gate", "enabled": false }, 6 { "type": "hid_key_up", "key": "F13" } 7 ] 8}
BOOT Button Behavior
The BOOT button is dedicated to mode control and activates its own temporary boot_mode.
The default flow is:
press-> enterboot_modeBOOT + swipe_right-> next modeBOOT + swipe_left-> previous modeBOOT + tap-> show the currently selected moderelease-> exitboot_mode
This keeps mode switching consistent and prevents application modes from fighting over the hardware button. It also gives the user one predictable gesture vocabulary for mode navigation no matter which mode is currently active.
Example:
1{ 2 "globalBindings": [ 3 { 4 "input": "boot_button", 5 "trigger": "press", 6 "actions": [ 7 { "type": "enter_boot_mode" } 8 ] 9 }, 10 { 11 "input": "boot_button", 12 "trigger": "release", 13 "actions": [ 14 { "type": "exit_boot_mode" } 15 ] 16 } 17 ], 18 "bootMode": { 19 "label": "Mode Control", 20 "bindings": [ 21 { 22 "input": "touch", 23 "trigger": "swipe_right", 24 "actions": [ 25 { "type": "cycle_mode", "direction": "next" }, 26 { "type": "ui_show_mode" } 27 ] 28 }, 29 { 30 "input": "touch", 31 "trigger": "swipe_left", 32 "actions": [ 33 { "type": "cycle_mode", "direction": "previous" }, 34 { "type": "ui_show_mode" } 35 ] 36 }, 37 { 38 "input": "touch", 39 "trigger": "tap", 40 "actions": [ 41 { "type": "ui_show_mode" } 42 ] 43 }, 44 { 45 "input": "touch", 46 "trigger": "long_press", 47 "actions": [ 48 { "type": "set_mode", "mode": "cursor" }, 49 { "type": "ui_show_mode" } 50 ] 51 } 52 ] 53 } 54}
Dedicated BOOT Control UI
While BOOT is held, the screen switches to a dedicated control overlay instead of showing only the active mode UI.
For the current firmware build, this overlay is intentionally simpler than the fuller reference layout below. What is currently shipped is:
- top instruction text:
Swipe to switch mode - bottom confirm hint:
Release BOOT = Confirm - mode-selection feedback primarily through the BOOT-colored main card state rather than centered previous/current/next mode labels
The richer layout described below remains a valid reference direction if the BOOT UI is expanded later.
If the BOOT UI is expanded later, the boot_mode overlay can show:
- the currently selected mode name
- the previous and next mode names
- a simple gesture legend
- optional icons for common control actions
A richer reference layout is:
- center: current mode card
- left edge hint:
Swipe Left = Previous - right edge hint:
Swipe Right = Next - bottom hint:
Release BOOT = Confirm
This gives the user direct on-device feedback that touch input is temporarily acting as a mode selector rather than as a mode-specific command surface.
Example UI block:
1{ 2 "bootMode": { 3 "label": "Mode Control", 4 "ui": { 5 "title": "Mode Control", 6 "subtitle": "Hold BOOT and swipe to change modes", 7 "showModeList": true, 8 "showGestureHints": true, 9 "showCurrentModeCard": true 10 } 11 } 12}
Example: Cursor Dictation Mode
This mode mirrors the intent of the AutoHotkey CB_mic_v3.ahk workflow, where one mode is optimized for dictation and Cursor interaction.
In the AutoHotkey script, the Cursor mode maps one button to dictation and hold behavior, while the other button handles chat and field control. The firmware version keeps the same idea, but moves interaction to touch gestures and reserves BOOT for global mode switching.
Example configuration:
1{ 2 "modes": { 3 "cursor": { 4 "label": "Cursor", 5 "bindings": [ 6 { 7 "input": "touch", 8 "trigger": "hold_start", 9 "actions": [ 10 { "type": "hid_key_down", "key": "F13" }, 11 { "type": "mic_gate", "enabled": true }, 12 { "type": "ui_hint", "text": "Dictation active" } 13 ] 14 }, 15 { 16 "input": "touch", 17 "trigger": "hold_end", 18 "actions": [ 19 { "type": "mic_gate", "enabled": false }, 20 { "type": "hid_key_up", "key": "F13" }, 21 { "type": "ui_hint", "text": "Cursor mode" } 22 ] 23 }, 24 { 25 "input": "touch", 26 "trigger": "tap", 27 "actions": [ 28 { "type": "hid_key_tap", "key": "F14" }, 29 { "type": "ui_hint", "text": "Cursor mode" } 30 ] 31 }, 32 { 33 "input": "touch", 34 "trigger": "double_tap", 35 "actions": [ 36 { "type": "hid_key_tap", "key": "ENTER" } 37 ] 38 }, 39 { 40 "input": "touch", 41 "trigger": "swipe_up", 42 "actions": [ 43 { "type": "hid_shortcut_tap", "modifiers": ["CTRL"], "key": "A" }, 44 { "type": "sleep_ms", "duration_ms": 20 }, 45 { "type": "hid_key_tap", "key": "BACKSPACE" } 46 ] 47 }, 48 { 49 "input": "touch", 50 "trigger": "swipe_down", 51 "actions": [ 52 { "type": "hid_shortcut_tap", "modifiers": ["CTRL"], "key": "PERIOD" } 53 ] 54 }, 55 { 56 "input": "touch", 57 "trigger": "swipe_left", 58 "actions": [ 59 { "type": "hid_shortcut_tap", "modifiers": ["CTRL"], "key": "N" } 60 ] 61 }, 62 { 63 "input": "touch", 64 "trigger": "swipe_right", 65 "actions": [ 66 { "type": "hid_key_tap", "key": "ENTER" } 67 ] 68 } 69 ] 70 } 71 } 72}
Useful interpretation:
- hold starts and stops the same
F13-driven dictation flow used by the host workflow - tap sends
F14 - double tap sends
Enter - swipe up clears the current field
- swipe down toggles Cursor text mode
- swipe left sends new chat
- swipe right sends enter
Example: Presentation Remote Mode
This mode turns the device into a slide controller.
1{ 2 "modes": { 3 "presentation": { 4 "label": "Presentation", 5 "bindings": [ 6 { 7 "input": "touch", 8 "trigger": "swipe_left", 9 "actions": [ 10 { "type": "hid_key_tap", "key": "PAGE_DOWN" } 11 ] 12 }, 13 { 14 "input": "touch", 15 "trigger": "swipe_right", 16 "actions": [ 17 { "type": "hid_key_tap", "key": "PAGE_UP" } 18 ] 19 }, 20 { 21 "input": "touch", 22 "trigger": "tap", 23 "actions": [ 24 { "type": "hid_key_tap", "key": "SPACE" } 25 ] 26 }, 27 { 28 "input": "touch", 29 "trigger": "double_tap", 30 "actions": [ 31 { "type": "hid_key_tap", "key": "B" } 32 ] 33 } 34 ] 35 } 36 } 37}
Useful interpretation:
- swipe left advances
- swipe right goes back
- tap starts or advances a deck
- double tap blacks the screen in many presentation apps
Example: Media Control Mode
This mode is useful at a desk, workbench, or streaming setup.
The current firmware still uses keyboard-safe placeholder keys here because consumer/media HID usages are not yet wired through the USB report path.
1{ 2 "modes": { 3 "media": { 4 "label": "Media", 5 "bindings": [ 6 { 7 "input": "touch", 8 "trigger": "tap", 9 "actions": [ 10 { "type": "hid_key_tap", "key": "SPACE" } 11 ] 12 }, 13 { 14 "input": "touch", 15 "trigger": "swipe_left", 16 "actions": [ 17 { "type": "hid_key_tap", "key": "LEFT_ARROW" } 18 ] 19 }, 20 { 21 "input": "touch", 22 "trigger": "swipe_right", 23 "actions": [ 24 { "type": "hid_key_tap", "key": "RIGHT_ARROW" } 25 ] 26 }, 27 { 28 "input": "touch", 29 "trigger": "swipe_up", 30 "actions": [ 31 { "type": "hid_key_tap", "key": "UP_ARROW" } 32 ] 33 }, 34 { 35 "input": "touch", 36 "trigger": "swipe_down", 37 "actions": [ 38 { "type": "hid_key_tap", "key": "DOWN_ARROW" } 39 ] 40 } 41 ] 42 } 43 } 44}
Example: CAD Or Editing Navigation Mode
This mode is useful for scroll, pan, and frequent tool shortcuts.
1{ 2 "modes": { 3 "navigation": { 4 "label": "Navigation", 5 "bindings": [ 6 { 7 "input": "touch", 8 "trigger": "swipe_up", 9 "actions": [ 10 { "type": "hid_key_tap", "key": "UP_ARROW" } 11 ] 12 }, 13 { 14 "input": "touch", 15 "trigger": "swipe_down", 16 "actions": [ 17 { "type": "hid_key_tap", "key": "DOWN_ARROW" } 18 ] 19 }, 20 { 21 "input": "touch", 22 "trigger": "swipe_left", 23 "actions": [ 24 { "type": "hid_key_tap", "key": "LEFT_ARROW" } 25 ] 26 }, 27 { 28 "input": "touch", 29 "trigger": "swipe_right", 30 "actions": [ 31 { "type": "hid_key_tap", "key": "RIGHT_ARROW" } 32 ] 33 }, 34 { 35 "input": "touch", 36 "trigger": "double_tap", 37 "actions": [ 38 { "type": "hid_key_tap", "key": "ESC" } 39 ] 40 } 41 ] 42 } 43 } 44}
Global Versus Mode-Local Bindings
Use globalBindings for behaviors that should always work:
- entering and exiting
boot_mode - emergency mute
- returning to a home mode
- opening a mode picker
Use bootMode for temporary global control gestures while BOOT is held:
- next mode
- previous mode
- jump to a favorite mode
- show mode details
Use mode-local bindings for behaviors that should change by context:
- dictation controls
- slide navigation
- media commands
- editor shortcuts
Suggested Storage Layout
For a small number of modes, one config file is enough.
For a larger setup, split config by concern:
config/manifest.jsonconfig/modes/cursor.jsonconfig/modes/presentation.jsonconfig/modes/media.jsonconfig/modes/navigation.json
This makes mode sharing and per-mode editing easier.
For the current firmware build, the single-file example lives at /spiffs/mode-config.json, with a repo copy at config/mode-config.json.
Design Rules
The most important rules for this system are:
- keep LVGL details behind normalized trigger names
- keep
BOOTreserved for global control - keep
boot_modetemporary and visually obvious - keep actions declarative and built-in
- let one gesture execute a short ordered list of actions
- keep mode switching outside individual modes
Practical Summary
The mode engine turns the device into a reusable controller platform:
BOOTenters a dedicated control modeBOOT + swipe_rightchanges to the next modeBOOT + swipe_leftchanges to the previous mode- the display shows a dedicated mode-control overlay while
BOOTis held - touch performs mode-specific work
- LVGL stays the gesture backend
- JSON remains readable and extendable
- the same firmware image can support dictation, presentation, media, and navigation workflows