Skip to content

Fix: ESP32 KISS modem becomes permanently unresponsive under TX backpressure#2646

Draft
agessaman wants to merge 2 commits into
meshcore-dev:devfrom
agessaman:fix/kiss-modem-esp32-usb-lockup
Draft

Fix: ESP32 KISS modem becomes permanently unresponsive under TX backpressure#2646
agessaman wants to merge 2 commits into
meshcore-dev:devfrom
agessaman:fix/kiss-modem-esp32-usb-lockup

Conversation

@agessaman
Copy link
Copy Markdown
Contributor

Problem

On ESP32-S3 boards (Heltec V4, Station G2), the KISS modem would stop responding after a while and stay dead—SetHardware PING (0x17) timed out and the condition survived a host service restart, forcing the pyMC_repeater client into no-radio mode. nRF52 (RAK4361) on identical firmware was unaffected.

Root cause

The ESP32 modem ran over Arduino TinyUSB USBCDC (ARDUINO_USB_MODE=0), which wedges under TX backpressure: USBCDC::write() busy-spins indefinitely while "connected" (its tx_timeout_ms only guards the lock acquire, not the send loop), and RX events post to a 5-deep queue with portMAX_DELAY. Since loop() is single-threaded and the ESP32-S3 has no DTR/RTS→EN reset, nothing recovers it. nRF52 is immune (hardware UART, non-blocking FIFO writes).

Confirmed by a deterministic repro (flood the modem while never reading replies → permanent wedge) and a heartbeat LED that kept blinking while the data path was frozen, isolating the hang to the TinyUSB layer, below app code.

Fix

Switch the ESP32 KISS envs to the ESP32-S3 USB-Serial-JTAG peripheral (HWCDC, ARDUINO_USB_MODE=1), whose write() is bounded (tries/timeout countdown, not an infinite spin) and which posts RX from ISR.

  • variants/{heltec_v4,station_g2}/platformio.ini: build_unflags the board default =0, set =1 (a bare -U is reordered after the -D by SCons).
  • examples/kiss_modem/: transport-neutral hardening—non-blocking writes that drop a frame instead of stalling loop() when the TX buffer is full, plus setTxTimeoutMs(). A dropped reply is harmless; the host retries.

Validation

  • Builds clean for heltec_v4_kiss_modem, Station_G2_kiss_modem, RAK_4631_kiss_modem.
  • nm: HWCDC linked, zero TinyUSB symbols.
  • On hardware (Heltec V4): the flood that permanently wedged TinyUSB now recovers in a retry or two, which the host's continuous-read + ping-retry handles.

A Wrinkle

ARDUINO_USB_MODE=1 changes the USB descriptor, so the /dev/serial/by-id/ path becomes usb-Espressif_USB_JTAG_serial_debug_unit_<MAC>-if00.

agessaman added 2 commits May 29, 2026 23:45
…fer stalls. Introduce frame write management and update related methods for improved handling of serial communication. Adjust platform configurations for USB-CDC to enhance reliability on ESP32.
…nd timeout handling for USB-CDC. Enhance documentation for better understanding of frame management and serial communication flow.
}
}

void KissModem::writeFrame(uint8_t type, const uint8_t* data, uint16_t len) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if a better approach is to pre-calc the total frame length, then introduce a new method like:
bool canWriteFrame(size_t len);

And for logic to be:
writeFrame( ... ) {
size_t total_len = ... ;
if (!canWriteFrame(total_len)) return; // bail, all or nothing
_serial.write( ... ); ...
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the same, but looks like those buffers are quite small (like 64 bytes in some cases). Need some sleep first, but its an interesting one to look into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants