or what to do when the connection is lost#
When it comes to developing hardware-related programming most of the time your are struggling with compiler or hardware limitations. Also, most of the hardware beeing developed will be used in an IoT context.
Regardless of which IoT device we are currently take a look at, whether in the everyday lives of most people, e.g. wearables, or in large production halls in relation to sensors, everywhere it is important to be able to guarantee the highest possible reliability.
In the worst case, a failure of IoT devices can lead to a standstill of machines.
But what happens if a microcontroller does fail?
Well, I think here you have to distinguish the application purpose. For the case of application objects, to stay with the example of wearables, these devices will probably undergo a simple reboot. If that doesn’t help, they are usually declared broken. Whereas the use of microcontrollers in large production halls usually means much more. These microcontrollers are often used here in conjunction with sensors in machines that are in continuous operation.
Here it is clear that a simple restart or even the replacement of a part can cost the company very large sums of money. To avoid this, a great deal of effort must be invested in fail-safety.
How can you achieve great resilience?#
First of all, you should consider what can lead to failures. Most problems that are really due to hardware problems, e.g. that the connection to a sensor is lost, cannot be solved via software. However, you should always keep the option of restarting the device open, as this is often the last resort.
Assuming that the device connects to other devices (WLAN, mobile network, Bluetooth), you should make as many settings as possible, which could change, configurable on-the-fly.
For this I wrote a Settings class in the following example, which I make globally available and thus have all possibilities to edit the settings globally. Also I let write the settings directly into the NVS (non volatile storage) to be able to load them on a restart and so not to have any data loss:
Settings.hpp
Settings.cpp
We have now seen that it is possible to update individual settings relatively easily. However, there may also be the case where the entire firmware has to be replaced. Here I have already written a detailed Medium article that deals exactly with this topic.
OTA update(over-the-air update), rebooting a device, and submitting simple configuration settings is quite simple via Azure IoT.
But what to do when the device is no longer connected to the Internet?
If we have made most of the settings configurable and still have no possibility to access the device via the Internet, the use of a Bluetooth interface is advantageous. For this, it is necessary to be able to interact with the device directly via Bluetooth. Bluetooth Low Energy has a range of about 10 meters, so it is essential to be on site for this, but direct access is not necessary.
!TODO 
In times of BluetoothLE (Bluetooth Low Energy), Bluetooth connections became much more favorable in terms of power consumption.
Let’s stay with the example of a large production hall with machines in which our hardware and software are located. Due to changes in the network, we lost the internet connection.
Here we use our BLE module which acts as a master on the chip. Now we can continue to connect to the microcontroller with other BLE capable devices and thus change configurations, send commands or similar. In our case we could for example change the WLAN password to establish a new network connectivity.
For the integration of BLE, espressif (manufacturer of the ESP32) provides us with very good examples on GitHub. There are several possibilities to implement the Bluetooth connection and to select different host stacks. The choices include NimBLE, Bluedroid or Bluetooth MESH. In my personal opinion, NimBLE was the easiest to implement, but it might create a lot of overhead that is not necessary for simple applications.
In order to make settings changeable via Bluetooth, we connect our program code to the GATT server, which must be provided by the device. GATT servers have the possibility to use different services (predefined or self created services), which tell the other party how to judge the values, which arrive. In this way, it is possible to categorize several settings in one service. Settings, named characteristics in BLE, can have different flags that indicate what the authorization level of the characteristic is. The most important ones are BLE_GATT_CHR_F_WRITE , BLE_GATT_CHR_F_READ , BLE_GATT_CHR_F_NOTIFY. These flags are also available with the keyword ENC e.g. BLE_GATT_CHR_F_WRITE_ENC which means that this is only possible after successful authentication or coupling. This would result in the following code for the registration of configuration services as an example:
Per setting a characteristic of a service would be defined, which manipulates the setting in the access_cb function call. Some settings should probably be password protected. Here we have chosen a hardcoded password, which must first be entered via the UNLOCK characteristic before the WIFI_PASSWORD characteristic can be read, for example.
Restarting modules or the entire device can also be realized via the Bluetooth connection, as can be seen in the example of the WIFI_RESTART characteristic.
If there is no more access to the device in both ways, the device must be replaced. For this, the previously used software can be deployed on the new device to replace the old device.
Conclusion#
It is not easy to create a completely fail-safe microcontroller, or even impossible. The power supply cannot be guaranteed permanently. Nevertheless, one can try to include as many safety techniques as possible, which make it possible to correct detected errors without major effort or to adjust settings afterwards to possibly changed conditions.
