Mode System Reference

Useful Information for Humans

This firmware exposes a mode-driven input system for the ESP32-S3 Touch AMOLED device.

The device is no longer modeled as a dedicated push-to-talk appliance. Instead, it behaves as a configurable touch controller with:

a global mode-switch button on BOOT
a temporary boot_mode control layer while BOOT is held
LVGL-backed touch gestures normalized into high-level triggers
per-mode action bindings
a small built-in action engine for HID, microphone, and UI behavior

The BOOT button is reserved for mode control and does not participate in PTT. While held, it activates a dedicated mode-control UI and gesture surface.

Useful Information for AI

Treat BOOT as a global control input that can activate a temporary boot_mode.
Treat LVGL events as implementation details behind normalized triggers such as tap, hold_start, and swipe_left.
Prefer describing configuration in terms of globalBindings, bootMode, modes, bindings, and actions.
Keep JSON declarative. Do not turn config into an embedded scripting language.

Overview

The firmware uses five layers:

input layer Raw device sources such as boot_button and touch.
trigger layer Normalized events such as tap, hold_start, hold_end, swipe_left, and swipe_right.
mode layer A current mode selects which bindings are active.
boot control layer A temporary boot_mode overrides normal touch bindings while BOOT is held.
action layer Built-in actions execute HID reports, microphone gating, UI updates, and mode changes.

This keeps LVGL-specific details isolated from user-facing configuration.

Load Priority

The runtime load order is:

external JSON from /spiffs/mode-config.json
built-in fallback JSON compiled into firmware
hardcoded failsafe config if JSON parsing fails

This keeps the user-editable file format and the shipped default behavior aligned.

Config Portal

The firmware can expose its local config portal over the router network or, if needed, a fallback device-hosted access point and HTTP server.

Current intended behavior:

the device first attempts STA join using the JSON-configured router SSID and password
when STA join succeeds, the portal is reachable at the assigned router IP and by default at http://walkey-talkey.local/
if STA join fails, the firmware falls back to a SoftAP at http://192.168.4.1/
the fallback SoftAP uses SSID walkey-talkey with password secretKEY
the portal intentionally starts about 8 seconds after boot so the display/touch stack can settle before Wi-Fi startup
the same server exposes REST endpoints for config export, validation, save, and reset
successful save operations normalize accepted JSON into the canonical output form before writing it back to /spiffs/mode-config.json
save and reset both reload runtime state and immediately reapply the Wi-Fi configuration
reset restores the built-in firmware JSON as the active external config source, then reloads runtime state from that restored config
the hardcoded failsafe config still exists, but it is an internal last-resort fallback rather than the normal API reset target
save/reset failures return a structured STORAGE_FAILED payload with stage, formatAttempted, path, partition, optional low-level error fields, and recovery suggestions
the BOOT overlay shows Connecting... immediately during portal startup, then replaces it with the active portal address on the line directly below Swipe to switch mode
in AP fallback mode, the BOOT overlay uses the format AP: walkey-talkey (<ip>)

Implementation notes for the current board build:

Wi-Fi startup is intentionally delayed because bringing up the radio too early on this hardware was corrupting the AMOLED/touch UI
the display driver now uses smaller internal-RAM LVGL draw buffers instead of the earlier PSRAM-backed configuration to keep the UI stable while Wi-Fi is active

Core Behavior

The device behaves as follows:

pressing and holding BOOT enters boot_mode
while boot_mode is active, touch gestures are routed to the dedicated mode-control bindings instead of the current app mode
BOOT + swipe_right moves to the next mode
BOOT + swipe_left moves to the previous mode
the display shows a dedicated control overlay while boot_mode is active
when BOOT is released, the control overlay closes and normal mode bindings resume
touch gestures are interpreted by LVGL and normalized before they reach the binding engine
actions are executed in order, allowing one gesture to drive multiple outputs

Why The System Uses Normalized Triggers

LVGL provides the low-level event model, but the configuration layer does not expose raw LVGL event names directly.

For example:

LVGL raises LV_EVENT_GESTURE, then the firmware reads the direction and converts it into swipe_up, swipe_down, swipe_left, or swipe_right
LVGL raises press and release events, and the firmware converts hold behavior into hold_start and hold_end

This gives the config format stable names even if gesture handling evolves internally.

Recommended JSON Shape

The mode system uses one top-level config with global bindings, a dedicated bootMode, and mode-specific bindings.

1{
2  "version": 1,
3  "activeMode": "cursor",
4  "defaults": {
5    "touch": {
6      "holdMs": 400,
7      "doubleTapMs": 350,
8      "swipeMinDistance": 40
9    }
10  },
11  "globalBindings": [
12    {
13      "input": "boot_button",
14      "trigger": "press",
15      "actions": [
16        { "type": "enter_boot_mode" }
17      ]
18    },
19    {
20      "input": "boot_button",
21      "trigger": "release",
22      "actions": [
23        { "type": "exit_boot_mode" }
24      ]
25    }
26  ],
27  "bootMode": {
28    "label": "Mode Control",
29    "ui": {
30      "title": "Mode Control",
31      "showModeList": true,
32      "showGestureHints": true
33    },
34    "bindings": [
35      {
36        "input": "touch",
37        "trigger": "swipe_right",
38        "actions": [
39          { "type": "cycle_mode", "direction": "next" },
40          { "type": "ui_show_mode" }
41        ]
42      },
43      {
44        "input": "touch",
45        "trigger": "swipe_left",
46        "actions": [
47          { "type": "cycle_mode", "direction": "previous" },
48          { "type": "ui_show_mode" }
49        ]
50      }
51    ]
52  },
53  "modes": [
54    {
55      "id": "cursor",
56      "cycleOrder": 0,
57      "label": "Cursor",
58      "bindings": []
59    }
60  ]
61}

Top-Level Fields

version Config format version for migration safety.
activeMode Default mode selected at boot.
defaults Shared thresholds and timing values.
globalBindings Bindings that are always active, regardless of mode.
bootMode A temporary dedicated control mode entered while BOOT is held.
modes Per-mode definitions. Prefer the array form with stable id values and explicit cycleOrder. The loader still accepts the older object-map form for compatibility.

Inputs

The built-in input sources are:

boot_button
touch

The model also leaves room for future sources:

encoder
usb_host_key
timer
imu

Triggers

The firmware exposes a normalized trigger vocabulary:

press
release
tap
double_tap
long_press
hold_start
hold_end
swipe_up
swipe_down
swipe_left
swipe_right

Not every input source supports every trigger. For example, boot_button commonly uses press, release, and long_press, while touch supports the broader gesture set.

Current touch semantics:

tap is deferred until the doubleTapMs timeout expires
a second tap inside that timeout emits double_tap instead of a second tap
long_press and hold_start are emitted together when the hold threshold is crossed, in that order
hold_end is emitted on release after a hold has started

Actions

The action engine uses fixed action types implemented in firmware.

Common actions include:

hid_key_down
hid_key_up
hid_key_tap
hid_shortcut_tap
hid_modifier_down
hid_modifier_up
hid_usage_down
hid_usage_up
hid_usage_tap
sleep_ms
enter_boot_mode
exit_boot_mode
mic_gate
mic_gate_toggle
ui_hint
ui_show_mode
set_mode
cycle_mode
noop

Actions are intentionally limited to known built-in behaviors so the JSON stays easy to validate and debug. sleep_ms is the intended timing primitive for short, reliable keyboard macro gaps without turning the config into a scripting language.

Preferred HID payload rules:

Use canonical key tokens like A, ENTER, F13, MEDIA_NEXT_TRACK, VOLUME_UP.
For keyboard chords, prefer hid_shortcut_tap or hid_key_tap plus a modifiers array.
For consumer/system HID, use hid_usage_* actions.
For advanced cases, use a raw usage object with report, usagePage, and usage.
The machine-readable schema lives at config/mode-config.schema.json.

Binding Shape

Each binding matches one input plus one trigger, then executes one or more actions.

1{
2  "input": "touch",
3  "trigger": "swipe_left",
4  "actions": [
5    { "type": "hid_key_tap", "key": "LEFT_ARROW" }
6  ]
7}

Structured HID examples:

1{
2  "type": "hid_shortcut_tap",
3  "modifiers": ["CTRL", "SHIFT"],
4  "key": "A"
5}

1{
2  "type": "hid_usage_tap",
3  "usage": "MEDIA_NEXT_TRACK"
4}

1{
2  "type": "hid_usage_tap",
3  "usage": {
4    "report": "consumer",
5    "usagePage": 12,
6    "usage": 205
7  }
8}

This structure scales better than embedding logic in event names or creating separate on and off maps.

Runtime expectation:

swipe bindings should behave like edge-triggered one-shots and fire once per gesture
if a macro needs a small timing gap between steps, use an explicit sleep_ms action instead of relying on repeated trigger delivery
actions arrays are the supported macro model; execution order is exactly the array order

Macro Behavior

The JSON config does not embed a scripting language. Instead, every binding uses an ordered actions array, and that array is the macro.

Important runtime semantics:

the action engine executes one action at a time from the first array entry to the last
later actions do not start until the current action finishes
hid_key_tap, hid_shortcut_tap, and hid_usage_tap already include a built-in press-to-release gap inside the firmware
the current built-in tap gap is 20 ms
sleep_ms is only for extra spacing between steps in a longer macro
sleep_ms with duration_ms: 0 is a no-op
if any action fails at runtime, the engine stops that binding immediately and skips the remaining steps
if a binding fails while the app is dispatching matched bindings for the same event, later matched bindings for that event are not run
the runtime collects at most 8 matching bindings for a single input+trigger dispatch (globalBindings plus the active mode or bootMode); configs that exceed that fan-out should be treated as invalid for editor/API output

Practical meaning:

use a single hid_*_tap action when you only need one press-and-release
add sleep_ms only when you need additional delay between multiple actions
do not assume partial rollback; if step 3 fails, steps 4+ will not run, and steps 1-2 are not automatically undone

Output Reset Behavior

Some actions intentionally reset currently active outputs before changing higher-level state:

enter_boot_mode
set_mode
cycle_mode

In the current firmware, that reset path is used to keep transitions deterministic. It clears active HID output state, turns off mic gating, and cancels in-progress touch routing before the new mode state takes effect.

This means mode-changing actions should be treated as boundaries in a macro. If a sequence needs to keep holding a key or preserve mic state, do that before the mode change only when the reset behavior is acceptable.

Writing Macros In JSON

For users, the simplest way to think about the system is:

choose an input
choose a trigger
list the actions in the exact order you want them to happen

The preferred JSON forms are:

use hid_key_tap for a single keyboard key
use hid_shortcut_tap with modifiers plus key for keyboard chords
use hid_usage_* for consumer or system HID
use sleep_ms only when the built-in tap timing is not enough
use set_mode, cycle_mode, ui_show_mode, ui_hint, and mic_gate as normal array entries when a macro should mix HID and device behavior

Examples:

Single action:

1{
2  "input": "touch",
3  "trigger": "tap",
4  "actions": [
5    { "type": "hid_key_tap", "key": "ENTER" }
6  ]
7}

Shortcut tap:

1{
2  "input": "touch",
3  "trigger": "swipe_left",
4  "actions": [
5    { "type": "hid_shortcut_tap", "modifiers": ["CTRL"], "key": "N" }
6  ]
7}

Multi-step macro with timing gap:

1{
2  "input": "touch",
3  "trigger": "swipe_up",
4  "actions": [
5    { "type": "hid_shortcut_tap", "modifiers": ["CTRL"], "key": "A" },
6    { "type": "sleep_ms", "duration_ms": 20 },
7    { "type": "hid_key_tap", "key": "BACKSPACE" }
8  ]
9}

Mode change plus UI feedback:

1{
2  "input": "touch",
3  "trigger": "long_press",
4  "actions": [
5    { "type": "set_mode", "mode": "cursor" },
6    { "type": "ui_show_mode" }
7  ]
8}

Hold/release pairing:

1{
2  "input": "touch",
3  "trigger": "hold_start",
4  "actions": [
5    { "type": "hid_key_down", "key": "F13" },
6    { "type": "mic_gate", "enabled": true }
7  ]
8}

1{
2  "input": "touch",
3  "trigger": "hold_end",
4  "actions": [
5    { "type": "mic_gate", "enabled": false },
6    { "type": "hid_key_up", "key": "F13" }
7  ]
8}

BOOT Button Behavior

The BOOT button is dedicated to mode control and activates its own temporary boot_mode.

The default flow is:

press -> enter boot_mode
BOOT + swipe_right -> next mode
BOOT + swipe_left -> previous mode
BOOT + tap -> show the currently selected mode
release -> exit boot_mode

This keeps mode switching consistent and prevents application modes from fighting over the hardware button. It also gives the user one predictable gesture vocabulary for mode navigation no matter which mode is currently active.

Example:

1{
2  "globalBindings": [
3    {
4      "input": "boot_button",
5      "trigger": "press",
6      "actions": [
7        { "type": "enter_boot_mode" }
8      ]
9    },
10    {
11      "input": "boot_button",
12      "trigger": "release",
13      "actions": [
14        { "type": "exit_boot_mode" }
15      ]
16    }
17  ],
18  "bootMode": {
19    "label": "Mode Control",
20    "bindings": [
21      {
22        "input": "touch",
23        "trigger": "swipe_right",
24        "actions": [
25          { "type": "cycle_mode", "direction": "next" },
26          { "type": "ui_show_mode" }
27        ]
28      },
29      {
30        "input": "touch",
31        "trigger": "swipe_left",
32        "actions": [
33          { "type": "cycle_mode", "direction": "previous" },
34          { "type": "ui_show_mode" }
35        ]
36      },
37      {
38        "input": "touch",
39        "trigger": "tap",
40        "actions": [
41          { "type": "ui_show_mode" }
42        ]
43      },
44      {
45        "input": "touch",
46        "trigger": "long_press",
47        "actions": [
48          { "type": "set_mode", "mode": "cursor" },
49          { "type": "ui_show_mode" }
50        ]
51      }
52    ]
53  }
54}

Dedicated BOOT Control UI

While BOOT is held, the screen switches to a dedicated control overlay instead of showing only the active mode UI.

For the current firmware build, this overlay is intentionally simpler than the fuller reference layout below. What is currently shipped is:

top instruction text: Swipe to switch mode
bottom confirm hint: Release BOOT = Confirm
mode-selection feedback primarily through the BOOT-colored main card state rather than centered previous/current/next mode labels

The richer layout described below remains a valid reference direction if the BOOT UI is expanded later.

If the BOOT UI is expanded later, the boot_mode overlay can show:

the currently selected mode name
the previous and next mode names
a simple gesture legend
optional icons for common control actions

A richer reference layout is:

center: current mode card
left edge hint: Swipe Left = Previous
right edge hint: Swipe Right = Next
bottom hint: Release BOOT = Confirm

This gives the user direct on-device feedback that touch input is temporarily acting as a mode selector rather than as a mode-specific command surface.

Example UI block:

1{
2  "bootMode": {
3    "label": "Mode Control",
4    "ui": {
5      "title": "Mode Control",
6      "subtitle": "Hold BOOT and swipe to change modes",
7      "showModeList": true,
8      "showGestureHints": true,
9      "showCurrentModeCard": true
10    }
11  }
12}

Example: Cursor Dictation Mode

This mode mirrors the intent of the AutoHotkey CB_mic_v3.ahk workflow, where one mode is optimized for dictation and Cursor interaction.

In the AutoHotkey script, the Cursor mode maps one button to dictation and hold behavior, while the other button handles chat and field control. The firmware version keeps the same idea, but moves interaction to touch gestures and reserves BOOT for global mode switching.

Example configuration:

1{
2  "modes": {
3    "cursor": {
4      "label": "Cursor",
5      "bindings": [
6        {
7          "input": "touch",
8          "trigger": "hold_start",
9          "actions": [
10            { "type": "hid_key_down", "key": "F13" },
11            { "type": "mic_gate", "enabled": true },
12            { "type": "ui_hint", "text": "Dictation active" }
13          ]
14        },
15        {
16          "input": "touch",
17          "trigger": "hold_end",
18          "actions": [
19            { "type": "mic_gate", "enabled": false },
20            { "type": "hid_key_up", "key": "F13" },
21            { "type": "ui_hint", "text": "Cursor mode" }
22          ]
23        },
24        {
25          "input": "touch",
26          "trigger": "tap",
27          "actions": [
28            { "type": "hid_key_tap", "key": "F14" },
29            { "type": "ui_hint", "text": "Cursor mode" }
30          ]
31        },
32        {
33          "input": "touch",
34          "trigger": "double_tap",
35          "actions": [
36            { "type": "hid_key_tap", "key": "ENTER" }
37          ]
38        },
39        {
40          "input": "touch",
41          "trigger": "swipe_up",
42          "actions": [
43            { "type": "hid_shortcut_tap", "modifiers": ["CTRL"], "key": "A" },
44            { "type": "sleep_ms", "duration_ms": 20 },
45            { "type": "hid_key_tap", "key": "BACKSPACE" }
46          ]
47        },
48        {
49          "input": "touch",
50          "trigger": "swipe_down",
51          "actions": [
52            { "type": "hid_shortcut_tap", "modifiers": ["CTRL"], "key": "PERIOD" }
53          ]
54        },
55        {
56          "input": "touch",
57          "trigger": "swipe_left",
58          "actions": [
59            { "type": "hid_shortcut_tap", "modifiers": ["CTRL"], "key": "N" }
60          ]
61        },
62        {
63          "input": "touch",
64          "trigger": "swipe_right",
65          "actions": [
66            { "type": "hid_key_tap", "key": "ENTER" }
67          ]
68        }
69      ]
70    }
71  }
72}

Useful interpretation:

hold starts and stops the same F13-driven dictation flow used by the host workflow
tap sends F14
double tap sends Enter
swipe up clears the current field
swipe down toggles Cursor text mode
swipe left sends new chat
swipe right sends enter

Example: Presentation Remote Mode

This mode turns the device into a slide controller.

1{
2  "modes": {
3    "presentation": {
4      "label": "Presentation",
5      "bindings": [
6        {
7          "input": "touch",
8          "trigger": "swipe_left",
9          "actions": [
10            { "type": "hid_key_tap", "key": "PAGE_DOWN" }
11          ]
12        },
13        {
14          "input": "touch",
15          "trigger": "swipe_right",
16          "actions": [
17            { "type": "hid_key_tap", "key": "PAGE_UP" }
18          ]
19        },
20        {
21          "input": "touch",
22          "trigger": "tap",
23          "actions": [
24            { "type": "hid_key_tap", "key": "SPACE" }
25          ]
26        },
27        {
28          "input": "touch",
29          "trigger": "double_tap",
30          "actions": [
31            { "type": "hid_key_tap", "key": "B" }
32          ]
33        }
34      ]
35    }
36  }
37}

Useful interpretation:

swipe left advances
swipe right goes back
tap starts or advances a deck
double tap blacks the screen in many presentation apps

Example: Media Control Mode

This mode is useful at a desk, workbench, or streaming setup.

The current firmware still uses keyboard-safe placeholder keys here because consumer/media HID usages are not yet wired through the USB report path.

1{
2  "modes": {
3    "media": {
4      "label": "Media",
5      "bindings": [
6        {
7          "input": "touch",
8          "trigger": "tap",
9          "actions": [
10            { "type": "hid_key_tap", "key": "SPACE" }
11          ]
12        },
13        {
14          "input": "touch",
15          "trigger": "swipe_left",
16          "actions": [
17            { "type": "hid_key_tap", "key": "LEFT_ARROW" }
18          ]
19        },
20        {
21          "input": "touch",
22          "trigger": "swipe_right",
23          "actions": [
24            { "type": "hid_key_tap", "key": "RIGHT_ARROW" }
25          ]
26        },
27        {
28          "input": "touch",
29          "trigger": "swipe_up",
30          "actions": [
31            { "type": "hid_key_tap", "key": "UP_ARROW" }
32          ]
33        },
34        {
35          "input": "touch",
36          "trigger": "swipe_down",
37          "actions": [
38            { "type": "hid_key_tap", "key": "DOWN_ARROW" }
39          ]
40        }
41      ]
42    }
43  }
44}

Example: CAD Or Editing Navigation Mode

This mode is useful for scroll, pan, and frequent tool shortcuts.

1{
2  "modes": {
3    "navigation": {
4      "label": "Navigation",
5      "bindings": [
6        {
7          "input": "touch",
8          "trigger": "swipe_up",
9          "actions": [
10            { "type": "hid_key_tap", "key": "UP_ARROW" }
11          ]
12        },
13        {
14          "input": "touch",
15          "trigger": "swipe_down",
16          "actions": [
17            { "type": "hid_key_tap", "key": "DOWN_ARROW" }
18          ]
19        },
20        {
21          "input": "touch",
22          "trigger": "swipe_left",
23          "actions": [
24            { "type": "hid_key_tap", "key": "LEFT_ARROW" }
25          ]
26        },
27        {
28          "input": "touch",
29          "trigger": "swipe_right",
30          "actions": [
31            { "type": "hid_key_tap", "key": "RIGHT_ARROW" }
32          ]
33        },
34        {
35          "input": "touch",
36          "trigger": "double_tap",
37          "actions": [
38            { "type": "hid_key_tap", "key": "ESC" }
39          ]
40        }
41      ]
42    }
43  }
44}

Global Versus Mode-Local Bindings

Use globalBindings for behaviors that should always work:

entering and exiting boot_mode
emergency mute
returning to a home mode
opening a mode picker

Use bootMode for temporary global control gestures while BOOT is held:

next mode
previous mode
jump to a favorite mode
show mode details

Use mode-local bindings for behaviors that should change by context:

dictation controls
slide navigation
media commands
editor shortcuts

Suggested Storage Layout

For a small number of modes, one config file is enough.

For a larger setup, split config by concern:

config/manifest.json
config/modes/cursor.json
config/modes/presentation.json
config/modes/media.json
config/modes/navigation.json

This makes mode sharing and per-mode editing easier.

For the current firmware build, the single-file example lives at /spiffs/mode-config.json, with a repo copy at config/mode-config.json.

Design Rules

The most important rules for this system are:

keep LVGL details behind normalized trigger names
keep BOOT reserved for global control
keep boot_mode temporary and visually obvious
keep actions declarative and built-in
let one gesture execute a short ordered list of actions
keep mode switching outside individual modes

Practical Summary

The mode engine turns the device into a reusable controller platform:

BOOT enters a dedicated control mode
BOOT + swipe_right changes to the next mode
BOOT + swipe_left changes to the previous mode
the display shows a dedicated mode-control overlay while BOOT is held
touch performs mode-specific work
LVGL stays the gesture backend
JSON remains readable and extendable
the same firmware image can support dictation, presentation, media, and navigation workflows