No Description

Philippe G 51c178ca46 duration unit change 3 years ago
.github 955692f8ad Update esp-idf-v4.3-build.yml 3 years ago
build-scripts d0bcc72bce Update config scripts and compare tool 3 years ago
components 51c178ca46 duration unit change 3 years ago
main dc1e258d64 use network manager events for AirPlay and Spotify 3 years ago
plugin 6f4ed0679e fix 32 bits sample size L/R swap - release 4 years ago
server_certs bdec9d25c6 change file permission 4 years ago
test 81756a7649 cpp state machine for ethernet 4 years ago
.cproject b0b489704e Merge and reset component names 5 years ago
.gitattributes a5b37a13c9 :confetti_ball: Added .gitattributes & .gitignore files 6 years ago
.gitignore 5120029643 Firmware update improvements - include webpack build files 4 years ago
.gitmodules e6b7ed28e7 Included sub-module for state machine 3 years ago
.project 2ab14d62be Merged with httpd - work in progress 5 years ago
CMakeLists.txt 9dfe90c26f alignment to 4.0 + misc cspot fixes 3 years ago
Dockerfile 99722a6f94 Update Dockerfile 4 years ago
Makefile f613487c4d adjusting makefiles for http compile in linux 5 years ago
Makefile_std.mk c666ad8d63 add esp-dsp 5 years ago
README.md 4dce4b307b Update README.md 4 years ago
TODO 86b64d0415 headphone, bass/treble, battery from LMS 5 years ago
build_flash_cmd.sh 0acb0dc3e7 fix system freezing on telnet activation 5 years ago
flash_cmd.txt f998ea2a52 retrofit to gcc8/CMake 5 years ago
generate_debug_scripts.cmake 8a81fe821f fixing the binary app_name for squeezelite 5 years ago
makeBuildDocs.sh b1f234d460 Added in better build instructions. Added script for generating documentation and scripts to add to release zip. (#12) 6 years ago
partitions-debug.csv 811451f24e cmake on esp-idf V4.0 - testing version - release 5 years ago
partitions.csv 811451f24e cmake on esp-idf V4.0 - testing version - release 5 years ago
repo.xml a51ff972ea repo update 5 years ago
sdkconfig 9dfe90c26f alignment to 4.0 + misc cspot fixes 3 years ago
sdkconfig-backup 811451f24e cmake on esp-idf V4.0 - testing version - release 5 years ago
sdkconfig.defaults 898998efb0 big merge 3 years ago
sdkconfig_compare.js d0bcc72bce Update config scripts and compare tool 3 years ago
sdkconfig_minimal 8fbe1159f5 Reworking BT output 5 years ago
squeezelite.cmake 81756a7649 cpp state machine for ethernet 4 years ago
version.txt 7457632990 Auto stash before merge of "master-cmake" and "origin/master-cmake" 4 years ago

README.md

Cross-Build

Squeezelite-esp32

What is this

Squeezelite-esp32 is an audio software suite made to run on espressif's ESP32 wifi (b/g/n) and bluetooth chipset. It offers the following capabilities

  • Stream your local music and connect to all major on-line music providers (Spotify, Deezer, Tidal, Qobuz) using Logitech Media Server - a.k.a LMS and enjoy multi-room audio synchronization. LMS can be extended by numerous plugins and can be controlled using a Web browser or dedicated applications (iPhone, Android). It can also send audio to UPnP, Sonos, ChromeCast and AirPlay speakers/devices.
  • Stream from a Bluetooth device (iPhone, Android)
  • Stream from an AirPlay controller (iPhone, iTunes ...) and enjoy synchronization multiroom as well (although it's AirPlay 1 only)

Depending on the hardware connected to the ESP32, you can send audio to a local DAC, to SPDIF or to a Bluetooth speaker. The bare minimum required hardware is a WROVER module with 4MB of Flash and 4MB of PSRAM (https://www.espressif.com/en/products/modules/esp32). With that module standalone, just apply power and you can stream to a Bluetooth speaker. You can also send audio to most I2S DAC as well as to SPDIF receivers using just a cable or an optical transducer.

But squeezelite-esp32 is highly extensible and you can add

  • Buttons and Rotary Encoder and map/combine them to various functions (play, pause, volume, next ...)
  • IR receiver (no pullup resistor or capacitor needed, just the 38kHz receiver)
  • Monochrome, GrayScale or Color displays using SPI or I2C (supported drivers are SH1106, SSD1306, SSD1322, SSD1326/7, SSD1351, ST7735, ST7789 and ILI9341).
  • Ethernet using a Microchip LAN8720 with RMII interface or Davicom DM9051 over SPI.

Other features include

  • Resampling
  • 10-bands equalizer
  • Automatic initial setup using any WiFi device
  • Full web interface for further configuration/management
  • Firmware over-the-air update

To control the equalizer or use the display on LMS, a new player model is required and this is provided through a plugin that is part of LMS' 3rd party repositories

Performances

(opinions presented here so I = @philippe44) The main build of squeezelite-esp32 is a 16 bits internal core with all calculations in 32 bits or float precision. This is a design choice I've made to preserve CPU performances (it is already stretching a lot the esp32 chipset) and optimize memory usage as we only have 4MB of usable RAM. Some might correctly comment that the WROVER module have 8MB of RAM, but the processor is only able to address 4MB and the remaining 4MB must be paginated by smaller blocks and I don't have patience to that.

Now, when I did the porting of squeezelite to esp32, I've also made the core 16 or 32 bits compatible at compile-time. So far, it works in 32 bits but less tests have been done. You can chose to compile it in 32 bits mode. I'm not very interested above 16 bits samples because it does not bring anything (I have an engineering background in theory of information).

| Capability |16 bits|32 bits| comment | |----------------------------|-------|-------|--------------------------------------------------------------------| | max sampling rate | 192k | 96k | 192k is very challenging, especially when combined with display | | max bit depth | 16 | 24 | 24 bits are truncated in 16 bits mode | | spdif |16 bits|20 bits| | | mp3, aac, opus, ogg/vorbis | 48k | 48k | | | alac, flac, ogg/flac | 96k | 96k | | | pcm, wav, aif | 192k | 96k | | | equalizer | Y | N | 48kHz max (after resampling) - equalization skipped on >48k tracks | | resampling | Y | N | | | cross-fade | 10s | <5s | depends on buffer size and sampling rate |

The esp32 must run at 240 MHz, with Quad-SPI I/O at 80 MHz and a clock of 40 Mhz. Still, it's a lot to run, especially knowing that it has a serial Flash and PSRAM, so kudos to Espressif for their chipset optimization. Now, to have all the decoding, resampling, equalizing, gain, display, spectrum/vu is a very (very) delicate equilibrium between use of internal /external RAM, tasks priorities and buffer handling. It is not perfect and the more you push the system to the limit, the higher the risk that some files would not play (see below). In general, the display will always have the lowest priority and you'll notice slowdown in scrolling and VU/Spectrum refresh rates. Now, even display thread has some critical section and impacts the capabilities. For example, a 16 bits-depth color display with low SPI speed might prevent 24/96 flac to work but still work with pcm 24/96

In 16 bits mode, although 192 kHz is reported as max rate, it's highly recommended to limit reported sampling rate to 96k (-Z 96000). Note that some high-speed 24/96k on-line streams might stutter because of TCP/IP stack performances. It is usually due to the fact that the server sends small packets of data and the esp32 cannot receive encoded audio fast enough, regardless of task priority settings (I've tried to tweak that a fair bit). The best option in that case is to let LMS proxy the stream as it will provide larger chunks and a "smoother" stream that can then be handled.

Note as well that some codecs consume more CPU than others or have not been optimized as much. I've done my best to tweak these, but that level of optimization includes writing some assembly which is painful. One very demanding codec is AAC when files are encoded with SBR. It allows reconstruction of upper part of spectrum and thus higher sampling rate, but the codec spec is such that this is optional, you can decode simply lower band and accept lower sampling rate - See the AAC_DISABLE_SBR option below.

Supported Hardware

Any esp32-based hardware with at least 4MB of flash and 4MB of PSRAM will be capable of running squeezelite-esp32 and there are various boards that include such chip. A few are mentionned below, but any should work. You can find various help & instructions here

For the sake of clarity, WROOM modules DO NOT work as they don't include PSRAM. Some designs might add it externally, but it's (very) unlikely.

Raw WROVER module

Per above description, a WROVER module is enough to run Squeezelite-esp32, but that requires a bit of tinkering to extend it to have analogue audio or hardware buttons (e.g.)

Please note that when sending to a Bluetooth speaker (source), only 44.1 kHz can be used, so you either let LMS do the resampling, but you must make sure it only sends 44.1kHz tracks or enable internal resampling (using -R) option. If you connect a DAC, choice of sample rates will depends on its capabilities. See below for more details.

Most DAC will work out-of-the-box with simply an I2S connection, but some require specific commands to be sent using I2C. See DAC option below to understand how to send these dedicated commands. There is build-in support for TAS575x, TAS5780, TAS5713 and AC101 DAC.

SqueezeAMP

This is the main hardware companion of Squeezelite-esp32 and has been developped together. Details on capabilities can be found here and here.

if you want to rebuild, use the squeezelite-esp32-SqueezeAmp-sdkconfig.defaults configuration file.

NB: You can use the pre-build binaries SqueezeAMP4MBFlash which has all the hardware I/O set properly. You can also use the generic binary I2S4MBFlash in which case the NVS parameters shall be set to get the exact same behavior

  • set_GPIO: 12=green,13=red,34=jack,2=spkfault
  • batt_config: channel=7,scale=20.24
  • dac_config: model=TAS57xx,bck=33,ws=25,do=32,sda=27,scl=26,mute=14:0
  • spdif_config: bck=33,ws=25,do=15

ESP32-A1S

Works with ESP32-A1S module that includes audio codec and headset output. You still need to use a demo board like this or an external amplifier if you want direct speaker connection. Note that there is a version with AC101 codec and another one with ES8388 (see below)

The board shown above has the following IO set

  • amplifier: GPIO21
  • key2: GPIO13, key3: GPIO19, key4: GPIO23, key5: GPIO18, key6: GPIO5 (to be confirmed with dip switches)
  • key1: not sure, using GPIO36 in a matrix
  • jack insertion: GPIO39 (inserted low)
  • D4 -> GPIO22 used for green LED (active low)
  • D5 -> GPIO19 (muxed with key3)
  • The IO connector also brings GPIO5, GPIO18, GPIO19, GPIO21, GPIO22 and GPIO23 (don't forget it's muxed with keys!)
  • The JTAG connector uses GPIO 12, 13, 14 and 15 (see dip switch) but these are also used for SD-card (and GPIO13 is key2 as well)
  • It's always possible to re-use GPIOO (download at boot) and GPIO1/GPIO3 which are RX/TX of UART0 but you'll lose trace

(note that some GPIO need pullups)

So a possible config would be

  • set_GPIO: 21=amp,22=green:0,39=jack:0
  • a button mapping:

    [{"gpio":5,"normal":{"pressed":"ACTRLS_TOGGLE"}},{"gpio":18,"pull":true,"shifter_gpio":5,"normal":{"pressed":"ACTRLS_VOLUP"}, "shifted":{"pressed":"ACTRLS_NEXT"}}, {"gpio":23,"pull":true,"shifter_gpio":5,"normal":{"pressed":"ACTRLS_VOLDOWN"},"shifted":{"pressed":"ACTRLS_PREV"}}]
    

    for AC101

  • dac_config: model=AC101,bck=27,ws=26,do=25,di=35,sda=33,scl=32

for ES8388

  • dac_config: model=ES8388,bck=5,ws=25,do=26,sda=18,scl=23,i2c=16

    T-WATCH2020 by LilyGo

    This is a fun smartwatch based on ESP32. It has a 240x240 ST7789 screen and onboard audio. Not very useful to listen to anything but it works. This is an example of a device that requires an I2C set of commands for its dac (see below). There is a build-option if you decide to rebuild everything by yourself, otherwise the I2S default option works with the following parameters

  • dac_config: model=I2S,bck=26,ws=25,do=33,i2c=106,sda=21,scl=22

  • dac_controlset: { "init": [ {"reg":41, "val":128}, {"reg":18, "val":255} ], "poweron": [ {"reg":18, "val":64, "mode":"or"} ], "poweroff": [ {"reg":18, "val":191, "mode":"and"} ] }

  • spi_config: dc=27,data=19,clk=18

  • display_config: SPI,driver=ST7789,width=240,height=240,cs=5,back=12,speed=16000000,HFlip,VFlip

    ESP32-WROVER + I2S DAC

    Squeezelite-esp32 requires esp32 chipset and 4MB PSRAM. ESP32-WROVER meets these requirements. To get an audio output an I2S DAC can be used. Cheap PCM5102 I2S DACs work others may also work. PCM5012 DACs can be hooked up via:

I2S - WROVER
VCC - 3.3V
3.3V - 3.3V
GND - GND
FLT - GND
DMP - GND
SCL - GND
BCK - (BCK - see below)
DIN - (DO - see below)
LCK - (WS - see below) FMT - GND
XMT - 3.3V

Use the squeezelite-esp32-I2S-4MFlash-sdkconfig.defaults configuration file.

SqueezeAmpToo !

And the super cool project https://github.com/rochuck/squeeze-amp-too

Configuration

To access NVS, in the webUI, go to credits and select "shows nvs editor". Go into the NVS editor tab to change NFS parameters. In syntax description below <> means a value while [] describe optional parameters.

I2C

The NVS parameter "i2c_config" set the i2c's gpio used for generic purpose (e.g. display). Leave it blank to disable I2C usage. Note that on SqueezeAMP, port must be 1. Default speed is 400000 but some display can do up to 800000 or more. Syntax is

sda=<gpio>,scl=<gpio>[,port=0|1][,speed=<speed>]

Please note that you can not use the same GPIO or port as the DAC

SPI

The esp32 has 4 SPI sub-systems, one is unaccessible so numbering is 0..2 and SPI0 is reserved for Flash/PSRAM. The NVS parameter "spi_config" set the spi's gpio used for generic purpose (e.g. display). Leave it blank to disable SPI usage. The DC parameter is needed for displays. Syntax is

data|mosi=<gpio>,clk=<gpio>[,dc=<gpio>][,host=1|2][,miso=<gpio>]

Default "host" is 1. The "miso" parameter is only used when SPI bus is to be shared with other peripheral (e.g. ethernet, see below), otherwise it can be omitted. Note that "data" can also be named "mosi".

DAC/I2S

The NVS parameter "dac_config" set the gpio used for i2s communication with your DAC. You can define the defaults at compile time but nvs parameter takes precedence except for SqueezeAMP and A1S where these are forced at runtime. Syntax is

bck=<gpio>,ws=<gpio>,do=<gpio>[,mck][,mute=<gpio>[:0|1][,model=TAS57xx|TAS5713|AC101|I2S][,sda=<gpio>,scl=gpio[,i2c=<addr>]]

if "model" is not set or is not recognized, then default "I2S" is used. The option "mck" is used for some codecs that require a master clock (although they should not). Only GPIO0 can be used as MCLK and be aware that this cannot coexit with RMII Ethernet (see ethernet section below). I2C parameters are optional and only needed if your DAC requires an I2C control (See 'dac_controlset' below). Note that "i2c" parameters are decimal, hex notation is not allowed.

So far, TAS57xx, TAS5713, AC101, WM8978 and ES8388 are recognized models where the proper init sequence/volume/power controls are sent. For other codecs that might require an I2C commands, please use the parameter "dac_controlset" that allows definition of simple commands to be sent over i2c for init, power on and off using a JSON syntax:

{ init: [ {"reg":<register>,"val":<value>,"mode":<nothing>|"or"|"and"}, ... {{"reg":<register>,"val":<value>,"mode":<nothing>|"or"|"and"} ],
  poweron: [ {"reg":<register>,"val":<value>,"mode":<nothing>|"or"|"and"}, ... {{"reg":<register>,"val":<value>,"mode":<nothing>|"or"|"and"} ],
  poweroff: [ {"reg":<register>,"val":<value>,"mode":<nothing>|"or"|"and"}, ... {{"reg":<register>,"val":<value>,"mode":<nothing>|"or"|"and"} ] }

This is standard JSON notation, so if you are not familiar with it, Google is your best friend. Be aware that the '...' means you can have as many entries as you want, it's not part of the syntax. Every section is optional, but it does not make sense to set i2c in the 'dac_config' parameter and not setting anything here. The parameter 'mode' allows to or the register with the value or to and it. Don't set 'mode' if you simply want to write. Note that all values must be decimal. You can use a validator like this to verify your syntax

NB: For specific builds (all except I2S), all this is ignored. For know codecs, the built-in sequences can be overwritten using dac_controlset

Please note that you can not use the same GPIO or port as the I2C

SPDIF

The NVS parameter "spdif_config" sets the i2s's gpio needed for SPDIF.

SPDIF is made available by re-using i2s interface in a non-standard way, so although only one pin (DO) is needed, the controller must be fully initialized, so the bit clock (bck) and word clock (ws) must be set as well. As i2s and SPDIF are mutually exclusive, you can reuse the same IO if your hardware allows so.

You can define the defaults at compile time but nvs parameter takes precedence except for SqueezeAMP where these are forced at runtime.

Leave it blank to disable SPDIF usage, you can also define them at compile time using "make menuconfig". Syntax is

bck=<gpio>,ws=<gpio>,do=<gpio>

NB: For well-known configuration, this is ignored

To optimize speed, a bit-manipulation trick is used and as a result, the bit depth is limited to 20 bits, even in 32 bits mode. As said before, this is more than enough for any human ear. In theory, it could be extended up to 23 bits but I don't see the need. Now, you can also get SPDIF using a specialized chip that offers a I2S interface like a DAC but spits out SPDIF (optical and coax). Refers to DAC chapter then.

If you want coax, you can also use a poor-man's trick to generate signal from a 3.3V GPIO. All that does is dividing the 3.3V to generate a 0.6V peak-to-peak and then remove DC

                          100nF
GPIO  ----210ohm-----------||---- coax S/PDIF signal out
                    |
                  110ohm
                    |
Ground -------------------------- coax signal ground

Display

The NVS parameter "display_config" sets the parameters for an optional display. Syntax is

I2C,width=<pixels>,height=<pixels>[address=<i2c_address>][,reset=<gpio>][,HFlip][,VFlip][driver=SSD1306|SSD1326[:1|4]|SSD1327|SH1106]
SPI,width=<pixels>,height=<pixels>,cs=<gpio>[,back=<gpio>][,reset=<gpio>][,speed=<speed>][,HFlip][,VFlip][driver=SSD1306|SSD1322|SSD1326[:1|4]|SSD1327|SH1106|SSD1675|ST7735|ST7789|ILI9341[:16|18][,rotate]]
  • back: a LED backlight used by some older devices (ST7735). It is PWM controlled for brightness
  • reset: some display have a reset pin that is should normally be pulled up if unused
  • VFlip and HFlip are optional can be used to change display orientation
  • rotate: for non-square drivers, move to portrait mode. Note that width and height must be inverted then
  • Default speed is 8000000 (8MHz) but SPI can work up to 26MHz or even 40MHz
  • SH1106 is 128x64 monochrome I2C/SPI here
  • SSD1306 is 128x32 monochrome I2C/SPI here
  • SSD1322 is 256x64 grayscale 16-levels SPI in multiple sizes here - it is very nice
  • SSD1326 is 256x32 monochrome or grayscale 16-levels SPI here
  • SSD1327 is 128x128 16-level grayscale SPI here - artwork can be up to 96x96 with vertical vu-meter/spectrum
  • SSD1351 is 128x128 65k/262k color SPI here
  • SSD1675 is an e-ink paper and is experimental as e-ink is really not suitable for LMS du to its very low refresh rate
  • ST7735 is a 128x160 65k color SPI here. This needs a backlight control
  • ST7789 is a 240x320 65k (262k not enabled) color SPI here. It also exist with 240x240 displays. See rotate for use in portrait mode
  • ILI9341 is another 240x320 65k (262k capable) color SPI. I've not used it much, the driver it has been provided by one external contributor to the project

You can tweak how the vu-meter and spectrum analyzer are displayed, as well as size of artwork through a dedicated menu in player's settings (don't forget to add the plugin).

The NVS parameter "metadata_config" sets how metadata is displayed for AirPlay and Bluetooth. Syntax is

[format=<display_content>][,speed=<speed>][,pause=<pause>]
  • 'speed' is the scrolling speed in ms (default is 33ms)

  • 'pause' is the pause time between scrolls in ms (default is 3600ms)

  • 'format' can contain free text and any of the 3 keywords %artist%, %album%, %title%. Using that format string, the keywords are replaced by their value to build the string to be displayed. Note that the plain text following a keyword that happens to be empty during playback of a track will be removed. For example, if you have set format=%artist% - %title% and there is no artist in the metadata then only