Survey-grade GNSS (GPS) with Quectel LC29HDA using RTK

This post summarises the essential steps in putting together a survey-grade GNSS unit using the LC29HDA from Quectel for a very modest financial outlay; a single LC29HDA module with a good quality antenna was bought for a little over £40 (they were on offer. This was found to give repeatable positioning to only a few cm.

This is not a full tutorial but it should be fairly easy to find background information on GNSS and RTK (although I found this e-book to be a good guide to some of the theory and practice), and for Bluetooth and LoRa. I will not repeat what can be easily found on the web or in datasheets. I will, however, provide some “beginner level” information where I consider this is helpful in allowing readers to do their own background reading, as it is sometimes hard to work out what is relevant.

For some initial background reading: look up what the difference between GNSS and GPS is (even though we colloquially use GPS when we should say GNSS); find out what “rover” and “base” mean in the context of RTK.

Two Aims

There are essentially two ways of getting a survey-grade position using RTK: using a third party base station and the NTRIP protocol over the internet, or running your own base station and feeding the RTCM messages direct into the rover. The first case will be find if there is a nearby (typically <30km) accessible third party base station and a strong enough “mobile data” signal at the survey location. My setup for this is an Android phone with SWMaps and a connection to the GNSS unit using Bluetooth. The second case is for if the first does not apply. My setup for this uses the LoRa radio technology to communicate RTCM messages from the base station direct to the rover GNSS unit (still connected to the phone via Bluetooth). More on the detailed setups later!

The LC29HDA can be configured to work as a base and as a rover (in spite of what some sites say, the base station does not requie purchase of a LC29HBS).

Preparations

This is aimed at the beginners. There are three things to consider to avoid looking at your shiny GNSS module and saying “what next?”: electrical connection, communication with your PC, and powering. A bit of soldering will also be required.

Electrical Connection

You could use solderless breadboard if you have it but I often use what tend to be called “DuPont” connectors. These are 0.1″ pitch connectors which are widely available in both double-ended sockets and socket-to-pin versions.

Powering

You need to be a bit careful with voltages as the communications (see below) for the modules in question is at 3.3V but the boards used are powered by more than 3.3V and have a voltage regulator on-board (top right in the LC29HDA module pictured below) which converts the supply to what the inner electronics requires. The ability to power the modules with higher voltages is helpful in that it allows us to use something like a “power bank” (5V, typically used for mobile phone charging) or a 18650 Lithium cell (working voltage about 3.7V, max voltage about 4.2V). My preferred option is to use 18650 Lithium cells; these are rechargeable, readily available (being used in “vaping” devices), are available in sufficiently high capacities, and you can get battery holders for them. Beware, though, that there are LOTS of people selling cheap 18650 cells which do not have the advertised capacity and may not have sensible safety devices inside; I always get mine from a specialist seller or a proper electronics retailer such as CPC.

PC Communications

The modules communicate using a digital protocol known as UART, which uses a series of pulses to encode the binary format of the messages. We don’t need to worry about this level of detail, other than to be sure that the voltage of the pulses matches, when connecting modules together (which they do for the modules I mention). In order to communicate between a PC and modules, we need an adapter that can convert between the UART digital signals and appear as a USB serial port (a “COM” port on Windows). We also need some software to display what is received on the serial port, and to allow us to send messages. The software I tend to use is called Termite but there are plenty of options for Windows, MacOS, and linux, and plenty of guides on the web to choosing and using serial terminal software. Similarly, there are plenty of guides to connecting the adapters (the most important thing is to connect the TX from the module to the RX of the adapter and vice versa, noting that TX=transmit means data coming out of the connector).

There are several options for the UART to USB serial adapter, but consideration has to be given to the communication voltage (or risk burning out your module!). Some of these have a removable jumper to change the voltage from 3.3V to 5V, while others have a “solder jumper”.

One option is to power the module from a battery and to only connect the ground (0V), RX and TX of the adapter.

The other option is to use the adapter to provide power to the module. This is where care is required because if you set the adapter to use 3.3V logic then its Vcc pin will be at 3.3V and this may not be sufficient to properly power the module (recall that these have a voltage regulator on board which will “drop” some voltage, leaving the module potentially under-powered). There is also the issue that the maxiumum current which the 3.3V supply from the adapter board can support may be less than the module(s) require; the LC29HDA peaks at about 100mA.

My preferred option is to use a FTDI232 serial adapter board with a jumper set to the 3.3V position and to plug a DuPont connector onto the bare pin next to the jumper, as this should be at 5V (check your adapter!). There are currently boards on the market with solderable holes on the sides which can also be used to access the 5V supplied by the USB connector (I suspect many of these use clone FDTI232 chips, but they probably work fine). The 5V supply from the USB port should be good to 500mA, plenty for our uses.

LC29HDA GNSS Modules and Antennas

Note that there are several very different models which start “LC29H”. The DA supports RTK at an update frequency of 1Hz, which is fine for survey use, while another model gives 10Hz for use cases such as drones. Some models do not support RTK at all. These are dual frequency band (L1 + L5) devices, which is important for high accuracy. Note that this is different to being multi-constellation (which they are), which means they can use satellites from GPS, GLONASS, Galileo, BeiDou, etc systems.

The Quectel website has several brochures and data sheets to refer to. You will also need the “protocol specification”, which describes the messages which can be sent to/from the LC29H (although most will not be directly relevant), and I recommend using their QGNSS software.

I bought a module and the more expensive antenna (probably a Quectel YB0017AA, although it is not branded) from Panda Wireless on Aliexpress (note that the picture is from their listing and is for the EA variant, also that the antenna is actually ~6cm square and the LC29HDA board is 3cm long). Ensure you use a dual band L1 + L5 antenna. The YB0017AA is claimed to be IP65 rated so should be weather-proof in all reasonable conditions. Panda wireless were attentive and clearly wanted to be sure that I knew what I was doing; I suspect they have had too many returns from people who did not!

The board has a backup battery (top left in image) which is rechargeable but will only work if the module is powered for quite some time (at least 1 day?). This will allow the GNSS to more quickly get a fix if powered down for up to a few hours. A “cold start” needs about 30s to get a position fix.

At this point, connect the antenna (ideally place it outside) and the LC29HDA module to your serial adapter (TX/RX cross-over) and connect to your PC. It will take about 30s for a position fix after which the green PPS LED will flash. Set QGNSS to refer to the serial port associated with the adapter, using a 115200 baud connection speed (the default for LC29H) and explore the features of QGNSS.

Once you’ve explored the graphical presentation of position and GNSS performance, take a look at the data/messages coming out of the module. There are several viewers under the “View” menu but I found the most useful was the Console tool (“Tools” menu) because the module emits lots of messages every second. Use the “Protocol Package” tab and move the splitter (two short vertical bars) leftwards. This allows you to home in on a particular message type and see its latest contents decoded.

Now would be a good time to look up the format of NMEA messages and to consult the protocol specification document to correlate the explanations with the messages coming out of the GNSS module.

As an extra, try using Termite (or alternative) to connect and see the messages. You will need to set it to 115200 baud (and the default of 8 data bits, no parity and 1 stop bit). Be sure to disconnect QGNSS first; only one program can use a serial port at a given time.

Concerning Firmware

The firmware is the software program running inside the LC29HDA. The version is shown at the bottom of QGNSS and the modules supplied to me had version LC29HDANR11A03S_RSA. This works fine but does not have the NMEA GST message which contains information about the location accuracy. I contacted Quectel who provided me with LC29HDANR11A04S_RSA, which does. The firmware can be updated using QGNSS. During this process, QGNSS says you should reset the module by pressing a button! All you need to do is to briefly disconnect power from the LC29HDA (leaving the serial adapter connected to the PC).

Using NTRIP for RTK

Since the default mode for the LC29HDA is to function as a rover, all we need to do now is to acquire NTRIP data from a server and use it to generate RTCM3 messages to send to the GNSS module. QGNSS can do this; see under the Tools menu.

There are plenty of commercial NTRIP services but RTK2Go has a public “NTRIP Caster” and quite a few people who send their base station data to the RTK2Go server for distribution to anyone for free. A map of base stations allows you to see if there is once close enough (as a rough guide, a base station closer than ~10km should give you ~2cm level fixes and over 30km might struggle to get a fix, depending on conditions. My nearest station is called WEBBPARTNERS so in the QGNSS NTRIP Client setting I provide “rtk2go.com”, “2101” for the port, and my email address as the username (no password needed for client access). Then just “Update NTRIP Source Table” and choose the nearest station then “Connect to Host”.

Initially, the fix quality will appear as “RTK Float”, while the algorithm is working out position ambiguities, but after a while it will show a “RTK Fix”.

NB: for RTK to be able to get a “fix”, the antenna must have a good wide view of sky. A window cill is likely to be no good.

Bluetooth

Although it would be possible to make a wired connection via USB to a phone/tablet, it seems quite an ugly approach, and Bluetooth is the obvious contender. What we need here is something equivalent to the UART serial to USB adapter but with a wireless link in the middle. So the bluetooth module needs to speak UART to communicate with the GNSS module and the Bluetooth messages need to be interpreted as serial communications on the phone/tablet/PC. We need a Bluetooth-serial module.

My preferred software, SWMaps. supports both Bluetooth Classic and Bluetooth Low Energy (BLE). Initial trials with a BLE were not successful as the NMEA messages got truncated and overlapped by the time they were passed to SWMaps. I believe this is probably down to the BLE modules I tested being v4.0 (most cheap modules use this version) having too short a message size (20 bytes) to transfer the GNSS output at the programmed BLE connection rate. It was possible to reduce the message truncation by dropping the GNSS baud rate but the LC29H documentation recommends 115200 baud and is was clear from experiments that there is an unavoidable bottle-neck. Rather than investigate BLE v5.0 (or v4.2) modules I opted to go the Bluetooth Classic route. This also means that Windows 11 can connect via Bluetooth too*; once connected, another serial port will appear (actually usually two appear, only one of which works!!). If you use iOS I understand you will have to use BLE.

[* – this may be a hardware issue but I consistently failed to be able to connect a BLE serial device, although I believe BLE audio may be possible]

The HC-06 and HC-05 modules have been available for several years and should be Bluetooth Classic, but note that some suppliers are selling devices they describe as HC-06/05 which are actually BLE (only). BLE and Bluetooth Classic are not at all interoperable. Check that the description says something like “bluetooth V2.0 SPP protocol” somewhere. HC-06 will work fine, but so will HC-05 (it can also work in “master” role). Both should allow for a 3.6V-6V power supply but have a 3.3V logic level (see earlier). We do not need “EN” and “STATE”, so a 4 pin device will be fine. Bare modules that have outlines like a postage stamp won’t do.

Search the web for more information on the HC-06, especially the “AT Commands” which are used to configure the module.

Connect the HC-06 to your USB serial adapter, crossing over TX/RX. Establish a connection using Termite with a baud rate of 9600 (which should be the supplied default). NB: this is not connecting via Bluetooth, but by the USB connection.

For some basic testing, you can either make a Bluetooth connection from your PC (probably you have to buy an adapter for a destop) and then run a second instance of Termite, or install a suitable app on your phone. Suitable apps on Android include Bluetooth Serial Monitor by ArduinoGetStarted or Serial Bluetooth by Kai Morich. These both support both BLE and Bluetooth Classic and so can be used to check your HC-06 is kosher or not.

Once the HC-06 is connected to your PC/phone (LED should go permanently on, a PIN may be required depending on the as-supplied setup of the module), it should be possible to send text messages from Termite to the app (or other Termite window).

To prepare the BT module for use with the GNSS, first disconnect the BT, leaving just the USB serial adapter as the means of communication and use the AT+BAUD command to change the baud rate to 115200 (you will probably also have to issue an AT+RESET to make this take effect). If it works you’ll then have to change Termite to use 115200 too. Maybe also change the module name and decide whether you do/don’t want a PIN.

Remove the USB serial adapter and connect the GNSS module to the BT module, with a shared power supply (battery) and RX/TX crossover. You should now be able to connect to the BT module and view the NMEA messages on your phone/PC along the lines noted above. On a PC it should be possible to use QGNSS after setting it to use the serial port associated with the BT connection.

It should now be straight-forward to make a connection via Bluetooth from SWMaps (or whatever), noting that only one connection at a time is possible. The default configuration of the LC29HDA modules causes almost all of the NMEA messages which SWMaps can use; it does not emit the GST message (see above, under firmware). For different mapping or logging software you might find there are different requirements; check the documentation and compare to the NMEA messages summarised in the QGNSS Console.

The GST messages can be enabled, given the right firmware, by using the $PQTMCFGMSGRATE command in the GNSS Console (Protocol Package tab). The command is:

$PQTMCFGMSGRATE,W,GST,1*0B

To make this apply after a power cycle, save the setting using:

$PQTMSAVEPAR*5A

The part following the “*” is the checksum and QGNSS Console will helpfully generate it for you. Do some background reading on the checksum.

Using NTRIP for RTK Revisited

This was already described for QGNSS but it is now a snip to turn into a mobile station; all we need to do now is to set SWMaps (or whatever) to acquire NTRIP data using the same information as previously, which it then converts to RTCM3 messages and sends over the BT connection to the GNSS module.

The first of the two aims is now satisfied. Magic!

LoRa – Radio Link for Base Station

There are lots of options for making the wireless connection from a base station to the rover, with the essential requirement being something which can translate our UART in/out messages to a wireless protocol. I used LoRa, specifically the E22 modules from Ebyte because I’d already experimented with them and been impressed by the signal range, ability to cope with non-clear line of sight, ease of use, and price. The E22 modules are available in different frequency bands, not all of which are licenced in all regions. The 868MHz band is probably the best for Europe. Two modules will be required, of course, but buying an extra one with a built-in USB adapter does make exploring/testing easier.

In addition to the frequency band, the E22 module is available with different powers and connection arrangements. The higher powers are thought to not be legal in the UK (do your homework!). I suggest the E22-900T22D (and buy an antenna of the correct frequency band too).

EByte provides adequate documentation and a software configuration tool, which makes setting the modules up easier. See the documentation for how to use the M0/M1 pins to put the module into configuration mode. Here are the settings for the rover:

Note that the baud rate is set to match the GNSS module and that I have chosen an air data rate of 19200 baud. The default air rate of 2400 baud gives about 5km range but is too slow for the spew of RTCM messages which must be communicated. The higher data rate will give a smaller range, but hopefully sufficient. The rover and base station must use the same air data rate, channel and “net id”. I have set the rover address to be 0, whereas for the base station this MUST be 65535 (this is the FFFF hexadecimal address), which is required for the messages to be broadcast – see the E22 documentation.

Once configured, connect both M0 and M1 pins to 0V/ground to put the module into “transparent” mode. Connect the USB serial adapter to one E22, noting that these modules are intended to be powered by 5V supply* but have 3.3V UART logic. Use an E22-900T22U or a second USB serial adapter to make up the other end of the LoRa data link, open two Termite windows and connect them to each end, and exchange a few messages.

[* – the datasheet indicates >=5V for good output power, but I have found that a 3.7V 18650 Lithium cell is not noticeably worse in terms of practical range]

Configuring a LC29HDA Base Station

This entails issuing a number of commands using the QGNSS Console (Protocol Package) window and performing module restarts at appropriate points. I did not find a way to initiate an elegant restart, so resorted to briefly disconnecting the power supply to the GNSS module (while leaving the serial adapter connected, in order to avoid dropping the serial connection, although this is just a nuisance).

Change the mode to be a base station then save the change (otherwise it only lasts until a power cycle). Restart the module after these two commands:

$PQTMCFGRCVRMODE,W,2*29
$PQTMSAVEPAR*5A

You should now see that the Console is showing only RTCM messages and no longer NMEA messages. Several different types of message are seen, reflecting the range of satellite constellations; refer to RTCM message documentation to see what they are.

The essential next step is for the base station to know where it is. To do this, the $PQTMCFGSVIN command is used with an a-priori known location or the GNSS module can determine its location by gathering position data for a long period of time; the protocol specification document recommends 12 hours! Since we’re mostly not going to know a precise location, the automatic “survey-in” procedure is likely to be the way to go. Additionally, if we don’t need the absolute position accuracy to be at the cm level, a shorter automatic survey-in period might be acceptable. For example, if we get to 10cm accuracy and: a) record and re-use the same geospatial position and the determined coordinates, and b) before each survey session record the location of a few well-defined/documented points, we should have good cm-level relative accuracy within the survey and enough information for a later adjustment to absolute accuracy, if required. Top quality absolute accuracy is also going to require consideration of datums and epochs which take account of continental drift, for example. This is not in scope for this post!

For the purpose of testing/demonstration, a survey-in period of 2 minutes is a reasonable starting point. With a moderately good sky view I got a 0.25m accuracy estimated by the GNSS in only 60s. To monitor the progress of the survey-in process, we should first enable the $PQTMSVINSTATUS messages (see the protocol specification). The set of messages to send to the GNSS module is:

$PQTMCFGMSGRATE,W,PQTMSVINSTATUS,1,1*58
$PQTMCFGSVIN,W,1,120,0,0,0,0*21
$PQTMSAVEPAR*5A

Then restart the module and observe the $PQTMSVINSTATUS messages in the Console (Protocol Package). Once the GNSS has got a position fix, the PQTMSVINSTATUS messages should start to change and continue to do so for as long as the survey in period (120s in the command above). Watch the accuracy improve and the status message finally show valid=2.

Connecting the Base and Rover

The second aim is now in sight!

A trial run without involving the LoRa radio link can be attempted by making a wire connection between the base and rover. The TX output from the base should be connected to the RX input of the rover (and any other connection to the rover RX removed). It is still OK to have either a USB serial adapter or BT adapter connected to the rover TX (only) and this can be used to establish a connection from QGNSS to the rover to see it get the RTK Fix, at which time the variation of the location seen via the “Deviation Map” will suddenly drop to a few cm.

The final step is to replace the wire with a LoRa link. The rover GNSS should have its TX connected to the BT RX – this will send the NMEA messages out over Bluetooth. The rover RX should be connected to the TX of the E22 module with address = 0. This is how it gets the RTCM RTK messages. The base station GNSS TX should be connectd to the RX of the E22 module with address 65535 (FFFF hex). The base station will probably also need either another BT module or a USB serial adapter connected so that the survey-in command can be sent (and survey-in process monitored via the $PQTMSVINSTATUS messages). It is OK for the base station GNSS TX to be connected to both LoRa E22 RX and USB serial adapter RX. Use separate power sources for each of the rover and base.

Set the base station up somewhere and do not move it, connect your phone to the rover via Bluetooth, watch the RTK Fix be established… and be amazed. This is truely awesome technology!

 

 

SIMCom Y7080 with MQTT (and MQTTS) – Success!

Following on from my earlier article describing use of the SIMCom y7080 CAT-NB (NB-IoT) module, which ended with me “running into the sand” with MQTT and having to update the firmware, here is the report of success!

For reference, see the SIMCom Y70XX Series MQTT(S) Application note.

In this article, I will be using the mosquitto test broker, which has listeners on several ports according to different scenarios for SSL/TLS and authentication.

Preliminaries

I’m assuming a mobile/cellular connection has already been made (see earlier article). I’m also going to jump straight in with encrypted communications, i.e. MQTTS aka MQTT with SSL/TLS. In the cases given below I am not, however, concerned with either: a) verification of the server certificate or b) using a client certificate to authenticate to the server. Since the server is one I control and trust, (a) is not necessary, and conventional user/password authentication will be sufficient instead of (b) since the connection is encrypted. Only using encryption makes the setup of the modem somewhat easier but it still involves the modem and server undertaking some behind-the-scenes communications to transfer the encryption keys. This means using our limited bandwidth and takes quite a bite out of a typical NB-IoT subscription data allowance; the payload for a single simple MQTT message is likely to be very much smaller than the connection data cost.

Initial set-up is straight-forward, involving starting the modem’s MQTT service and one client within it (the 0 is the handle for the MQTT service):

AT+CMQTTSTART
AT+CMQTTACCQ=0,"client_123",1

Note that client id in AT+CMQTTACCQ is what will appear to the broker and that the final parameter must be 1 for a MQTTS (TLS) connection. Note also that for the encryption-only use case, it is not necessary to do any additional SSL configuration, as described in the Y70XX SSL Application Note. That 1 is all you need!

DNS

Using domain name servers to look up an IP address is nice but it does introduce an additional communication overhead; while this is rarely noticeable in most scenarios, the low bandwidth of NB-IoT makes using numerical IP addresses an attractive option. Things get a bit messy if you use a cloud service; I’ve noticed that the IP address for my Azure IoT Hub has changed (see my other article on talking MQTT with Azure IoT Hub).

It is easy to look up the IP address using nslookup (same command on linux and windows – yay!). For test.mosquitto.org it is 91.121.93.94 .

The Y7080 can also query DNS, using either servers whose addresses are provided by the mobile network operator or falling back on factory-defined servers (in China?).

Check the servers with:

AT+QIDNSCFG?

I got two Class A private addresses, so these must belong to the mobile network operator:
PrimaryDns:10.105.16.254
SecondaryDns:10.105.144.254

Querying DNS is as simple as:

AT+CMDNS=test.mosquitto.org

This is an asynchronous operation, so it returns “OK” very quickly and then provides the IP address as a separate message +CMDNS:91.121.93.94. Bad news comes in the form of +CMDNS:QUERY_DNS_FAILED; on the Vodafone platform this will happen if you query a domain name which is not listed in your APN Access List (see my previous article).

Repeating the CMDNS command straight-away will give a much faster response, so there must be caching somewhere (given the speed, I suppose this is internal to the modem). I’ve not found the cache duration documented anywhere but it is only of the order of a few minutes.

Making a Connection

Although using authentication will be essential for any in-service data logging, it makes sense to get a more simple situation working first. The connection to the simple MQTTS service of test.mosquitto.org without authentication uses port 8883:

AT+CMQTTCONNECT=0,"tcp://91.121.93.94:8883",60,1

Note that the Y7080 does require the “tcp://” and that the port is not a separate parameter.

This is another asynchronous command. Be prepared to wait several seconds, hoping for the successful message: +CMQTTCONNECT: 0,0. If you forget to make the last parameter of AT+CMQTTACCQ be 1 then the modem will very quickly follow that with +CMQTTCONNLOST: 0,2. This can also happen with test.mosquitto.org at quite frequent and seemingly-random times; the price of a free public service.

You can check whether the modem is connected with:

AT+CMQTTCONNECT?

A bare +CMQTTCONNECT: 0 tells of no connection, whereas a live connection will give a response containing the server URL etc.

Now the simple case works, it is a simple matter to add a username and password, using port 8885. The test.mosquitto.org documentation indicates we can use username=”rw” and password “readwrite” for a user with read and write access, so simply connect using:

AT+CMQTTCONNECT=0,"tcp://91.121.93.94:8885",60,1,rw,readwrite

Publishing Some Data

For each message, three steps are required (even if the topic stays the same). First declare the topic in two states, with the first stage declaring the length of the topic, e.g. for a five character long topic:

AT+CMQTTTOPIC=0,5

The Y7080 will respond with a prompt of “>” and wait for the topic string. I found that providing too few characters caused a broker disconnect when publishing. Conversely, providing too many characters simply caused the modem to use the first 5. Unlike the AT+MQPUB described in my previous article, the topic is provided using normal characters, without the need to convert to hexadecimal.

The payload is declared in similar fashion, with a “>” prompt to enter the payload string:

AT+CMQTTPAYLOAD=0,8

The payload string can be either shorter or longer than the length declared in AT+CMQTTPAYLOAD; the modem quietly does the right thing, sending shorter if shorter and truncating if longer.

And finally send it with (I am using QoS of 1 here):

AT+CMQTTPUB=0,1,60

The final command is, as you might by now expect, asynchronous. A short time after “OK”, success is indicated by +CMQTTPUB=0,0.

Tidy-up

Tidy-up is a three-stage process. These must be done in the correct order otherwise error responses are given:

AT+CMQTTDISC=0,120
AT+CMQTTREL=0
AT+CMQTTSTOP

 

 

Coal Mining History Field Guide – West of Buxton: Burbage and Upper Goyt Valley

This field guide contains itineraries above Burbage, at Thatch Marsh and Cisterns Clough, and in the upper Goyt Valley near Derbyshire Bridge. It describes various mining history surface features, and a little of the relevant history and geology. These are not “walking guides”.

The main download is a zip file which contains a written field guide, maps, and digital location data for GPS devices.


If you make use this guide, please let me know by adding a comment. You can also post corrections and suggestions (but please note I am unlikely to respond to requests for more information).

Links to all mining history field guides may be found on the Peak District Mining History Field Guides index page.

Mobile Phone (Cellular) Networks for IoT – DIY Devices Using SimCom Modules

Although there are a few vendors of mobile phone network hardware for Arduino, Raspberry Pi, etc there are plenty of other options, but a lack of good documentation and a handful of practical issues. This article comprises some notes on getting a SimCom Y7080 working over the LTE CAT-NB network in the UK, but should generalise fairly well.

I have two use cases in mind which underpin the direction which this article takes. Both require only low bandwidth (low data transfer rates) but should ideally have good coverage (fortunately, these two things go together). These are: a) static sensors reporting readings, and b) tracker reporting GPS location.

Choosing Hardware

There are plenty of people selling mobile network modules, including “trackers” on AliExpress and similar sites. The variety of options is rather baffling. Broadly speaking these are in one of three categories:

  1. Obsolete hardware designed for GSM (2G), GPRS. I’ll include 3G here too. These networks are being (or have been) taken down in most developed countries. The hardware is cheap for a reason!
  2. Hardware which works on the same network services as normal 4G/5G mobile phones. This might be right for you, but the higher bandwidth required for voice means poor coverage in rural areas or indoors.
  3. Hardware supporting either LTE CAT-NB or LTE CAT-M1 (LTE = 4G). The former is also known as “NB-IoT” (but this term is used very loosely!) and is the lower-bandwidth (but better coverage) of the two, although both are relatively lower bandwidth than “normal” 4G. LTE CAT-M1 is better for devices which are likely to move between towers/cells and LTE CAT-NB is better for static devices (especially if they are likely to be in difficult terrain). This is where I am focussing this article. NB: LTE CAT1 is different to CAT-M1; it is a medium-bandwidth LTE standard.

A second consideration is the geographical region because different regions use different frequencies. These are usually given as a “band” number, e.g. “band 20” or B20. Hardware suppliers should list the bands they support and seem to segment their products into “global”, “Europe”, “Japan” etc. Check the bands that your local network providers use for LTE CAT-NB/M1.

I ended up drilling down to “breakout” boards based on SIMCom modules, as these seem to have a good compromise between price and available documentation, as well as having GNSS (i.e. GPS) as an option. AND Global make a selection of SIMCom-based boards and sell on AliExpress; I have a preference for choosing a supplier who seems to be more than just a reseller of random modules they do not understand! The SIM7000 module supports both CAT-NB and CAT-M1, while the Y7080 module supports only CAT-NB. As I’m getting the board to play/experiment, I opted for the cheaper Y7080E (E = European bands) board with the GPS option at a little over £12 + tax + shipping. The AND boards also have helpful power supply specifications, allowing 2.2V-4.2V for NB only, or 3.0V-4.2V with GNSS, perfect for operation from a LiPo cell (I like my 18650!), while working with 3.3V logic. Note that AND also list a SIMCom A7670 board in their “nb-iot breakout board” category, but it uses LTE CAT1. However: before you go and buy one, see the end of this post!

Getting a SIM Card

This was MUCH harder than I expected. There are two significant issues: a) making sure the SIM card, and the mobile network operators which are accessible with it in your area, supports one or both of CAT-NB or CAT-M1 according to your hardware and use case, and b) finding a supplier who will deal with a non-business customer. For (a), its often just difficult to find out the specific detail among the supposedly multi-country and multi-network offers. Additionally, many suppliers are peddling their cloud platforms and I just want a SIM! And, much to my irritation, it seems like almost everyone wants to charge >£8 for delivery; dudes, its just a SIM card!

I ended up using getting a Vodafone SIM card from AllioT. Their specific focus on Vodafone in the UK, which I had identified as the nework with good coverage for me, was an important deciding factor. This does work, although they do charge a lot for delivery (the terrible DPD, who took 4 attempts to deliver it).

Other options I tried first, and failed with:

  • Soracom were very quick to send out a card (without stupid postage costs), and the charging rate seems quite good. Unfortunately, it only supports CAT-M1, and not CAT-NB, and the Y7080 is CAT-NB only.
  • Onomondo dish out 5 free SIM cards (30 day limit). Sounds good for some DIY testing but they come pre-activated so you really only get 30 days. The cards also took a while to arrive. It also turned out that they didn’t have a CAT-NB/M1 service in the UK, on the free trial plan at least. That took a while to work out.

Options for the future:

  • Olivia Wireless looked quite reasonable and explicitly mentioned NB-IoT on Vodafone in the UK. First off, I only found their 5 year package, so I thought I’d look at them for a long term deployment. Since then I found their sensible PAYG. They also offer “pooled data” allowances. Worth a look.
  • m2m data connect also mention Vodafone in the UK and have CAT-NB and CAT-M1. The rates look OK but I’d already made contact with AllioT.

Discarded options:

  • 1NCE won’t sell to private individuals
  • Hologram didn’t look like they had the right network coverage for the UK.
  • Wherever SIM looked promising until “we do not sell individual M2M SIM cards and our offers are not aimed at private individuals”.
  • Things Mobile have too many extra charges in the small print, and have some bad reviews.
  • Lots of others which either look unsuitable (or suitability isn’t clear) for my region, seem pricey, or other issues, such as InfiniSIM, KeySIM, EMnify.

Getting it Working #1 – Just Connect!

So… my Vodafone SIM finally arrived but the Vodafone portal didn’t work properly (no SIM listed, lots of 500 errors). After failed attempts to get my module to connect, I contacted Alliot and got a different URL for the portal [the one at iotportal.vodafone.com is pure shit, while m2mportal.vodafone.com works] and was able to activate the SIM. Yay! I set the status to “active.test” and it seems not to be counting the data used against my 1M/month quota; it looks like there is a 100k test allowance.

Using the datasheet from SIMCom (Y70XX Series AT Command Manual) and an onomondo blog post + discovering the APN is “lpwan” from the Vodafone portal got me started. Here is the summary. I assume readers are familiar with making a serial connection and using a terminal to issue AT commands. FWIW, I use a USB serial adapter and Termite for interactive exploration. Consult the AT Command Manual to understand the parameters and responses.

If you’ve been messing about, or even with a newly-delivered module, it makes sense to factory reset:

AT+RESET

Check the type of hardware:

ATI

The factory settings of the modem may be to use power saving mode (PSM). This is great for real use but having it shut down when you are reading the manual or thinking is a nuisance. Check the PSM with:

AT+CPSMS?

My response shows +CPSMS:1,,,01001000,00100001 , so PSM is active and causes a shut down after 1 minute (see the manual), so I’ll turn it off (this preserves the periods) with:

AT+CPSMS=0

Tell the modem to issue Unsolicited Response Codes (URCs) when the network state changes. This avoids the need to keep polling for status.

AT+CEREG=1

Check whether the modem is set to auto-connect:

AT+COPS?

That should return +COPS:0 . If not (another onomondo post is quite helpful concerning good practices for connecting/disconnecting), send the command:

AT+COPS=0

Specify a PDN for connection, in this case with id=1, and save it:

AT+CGDCONT=1,"IP","lpwan"
AT+NV=SAVE

That should trigger a reboot ending in the message ^SIMST:1 to tell you the SIM card is successfully initialied.

A nice extra is to ask the modem to report its IP address when connecting. You can see if this is enabled with:

AT+NIPINFO?

And if that returns +NIPINFO:0 (i.e. it is turned off), turn reporting on with:

AT+NIPINFO=1

Connect to the network with id=1:

AT+CGACT=1,1

Give it a while to connect. This can take a seriously long time for the first connection: minutes! You should eventually see two messages which confirm connection. The first declares that the PDN is active: +CGEV:ME PDN ACT 1 . The second shows the date/time, e.g. +CTZEU:+04,0,2024/01/10,13:44:05 . You will probably also see +CEREG:5 (connected and roaming) and if you followed the suggestion above, a line starting +NIPINFO:1,”IP” .

Now check which network/operator is connected:

AT+COPS?

In my case this gave +COPS:0,2,”23415″,9 , showing I had indeed connected (the value of the stat is 2) to Vodafone UK CAT-NB network. The “24315” decodes to 243 = UK and 15 = Vodafone (these country and network codes appear in the Vodafone portal under devices > details, if you dig about a bit!). The final 9 means the modem is using the NB access technology (it would be 7 for CAT-M1).

You can also see which networks are available with:

AT+COPS=?

In my case, this returned +COPS:(2,,,”23415″,9),,(0-4),(2) , showing only one network is available and that it is connected. The “,,(0-4),(2)” at the end is just boiler-plate; see the manual.

As an extra, maybe look at the signal strength.

AT+CESQ

If you’ve finished, rather than just switching off, try using the following, which will preserve settings for a faster re-connection next time, then remove power off when the modem responds with +POWERDOWN:0,-1 .

AT+FASTOFF

Next time you turn the module on (and have the serial port connected), you should find it connects without intervention and reports the same kind of things which were obtained using the manual approach above, e.g.:

+POWERON:0
^SIMST:1
+CEREG:2
+CGEV:ME PDN ACT 1
+NIPINFO:1,”IP”,”10.198.71.109″
+CEREG:5
+CTZEU:+04,0,2024/01/11,21:59:41

Getting Started #2 – Ping and NTP

The important preparation is to make sure that the destination servers are not blocked by a firewall. In my case, the vodafone access control list (ACL) was initially set to only allow one IP address through (which I had provided to AllIot when getting set up) and only one ACL is associated with each APN. I bet there is something similar on other operators. I found the ACL settings under Administration > APN Access Lists. It works with either a numeric IP address or a host name (FQDN). If you don’t do this, you will get time-out errors.

Once that is done, the basic process is straight-forward. Note that the modem initially returns an “OK”, then does the ping (etc), and then returns the result. Check the documentation for the meaning of all the parameters. This example only does one ping, of the 8.8.8.8 Google DNS server, and has a 10s timeout.

AT+NPING="8.8.8.8",32,1,10,1

Getting date and time from an NTP server is also straight-forwards. Here I query one of the usual UK NTP servers.

AT+CMNTP=131.111.8.171

This is also an asynchronous command which will respond immediately with “OK” then give something like: +CMNTP:0,”24/01/10,17:14:57+4″ . If you get +CNMP:2 then the request timed out.

Time-outs can happen if the signal strength is too low, even if you have an established connection.

Next Steps #1 – HTTP Requests

Set the APN access list (aka ACL) if required!

This is just a simple example to show how to make a HTTP GET request. The Y7080 is quite nice for this, whereas some other modules require several more commands. Remember that you are working with a low bandwidth channel and that the response will come through as the raw text or bytes; use a URL pointing to something simple. I put up a simple test page on this site.

The first step is to prepare the modem to communicate with a particular host (in this case, my website, for which the “www” part is not needed):

AT+HTTPCREATE=http://hilltop-cottage.info

This returns a “handle” for use in what follows by replying with +HTTPCREATE:0. For the Y7080, you can only have one handle open at a time (see the close command below), so it will be 0.

How use the handle to make a request for my test page, providing the path to the “page”. See the documentation for the parameters (spoiler: the first is the handle and the second means GET).

AT+HTTPSEND=0,0,"/test.txt"

As for ping and NTP, the process is asynchronous, with the modem first replying “OK” then going off to try the request. I found that it then replies +REQUESTSUCCESS in some error conditions. The one I discovered was that a poor network signal can lead to +REQUESTSUCCESS being followed by +HTTPDICONN:0,-1 , which means some kind of network error. In this case, you can try the HTTPSEND again, without needing to repeat the HTTPCREATE. You might get +BADREQUEST if the handle is not currently in use and this sometimes happens even with everything done correctly (just repeating the HTTPSEND might be enough).

If it really is a successful fetch then prepare for several messages to come through the serial port.

First will be the HTTP headers, which are introduced by something like +HTTPNMIH:0,0,304 (the values are handle, a continuation flag, and the length of the header string). Each header is followed by newline characters, and the first line should be HTTP/1.1 200 OK . Then comes the content. Since this is a minimal page, there will only be one chunk of content. For my test page it should be introduced by +HTTPNMIC:0,0,23,23 (the values are as for NMIH but there are now two lengths, the first is the total length, and the second is the length of this chunk). Then you should see the content of test.txt, “A plain text test file”. Receipt of larger content, which needs more than 1 chunk makes use the continuation flag; it is set to 1 until the last chunk. Finally, the modem will signal that the connection to the HTTP server was closed by sending +HTTPDICONN:0,-2 .

Free up the HTTP connection handle by providing the handle to HTTPCLOSE:

AT+HTTPCLOSE=0

Complications come in when dealing with HTTPS, and there is a SIMCom application note (see the link to the AT commands reference, above) dealing with this, but there may be bandwidth issues with all the overhead of encrypted HTTP using CAT-NB. If you need to HTTP POST, then HTTPS is essential so HTTP is probably not the way to go for sending data vs lower overhead protocols.

That said, here is the experience in POSTing simple messages. I used the httpbin.org service, specifically its /post endpoint, which responds with some JSON containing your payload and headers etc IF the request is a POST, otherwise returning a HTTP status code of 405.

Getting HTTP POST to work was not quite as simple as the documentation led me to believe. It turned out to be necessary to set the Content-Length header, rather then relying on the modem to do this. Failing to do this causes some very strange responses when attempting requests against httpbin.org. Oddly, issuing a request using Postman from my PC and suppressing the Content-Length header does not cause any problems, so I suspect some proxying in the Vodafone platform is at fault. Also, with the module and antenna in exactly the same position (admittedly with a fairly low -92dBm received signal strength), attempts at POST seem more prone to failure with +HTTPDICONN:0,-1. Anyway, here is the transcript of a simple interaction (omitting “OK”) using the same formatting convention as above:

AT+HTTPCREATE="http://httpbin.org"

+HTTPCREATE:0

AT+HTTPHEADER=0,Content-Length:5\r\n
AT+HTTPCONTENT=0,flood
AT+HTTPSEND=0,1,"/post"

+REQUESTSUCCESS

+HTTPNMIH:0,0,230
HTTP/1.1 200 OK
Date: Fri, 12 Jan 2024 16:54:44 GMT
Content-Type: application/json
Content-Length: 294
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true

+HTTPNMIC:0,0,294,294
{
“args”: {},
“data”: “flood”,
“files”: {},
“form”: {},
“headers”: {
“Content-Length”: “5”,
“Host”: “httpbin.org”,
“X-Amzn-Trace-Id”: “Root=1-65a16ed4-2094703473d8153049b1d240”
},
“json”: null,
“origin”: “46.108.136.9”,
“url”: “http://httpbin.org/post”
}

AT+HTTPCLOSE=0

Notice that the header must end with “\r\n” (i.e. carriage return and line feed control characters to signify the end of the header), and also that the second parameter of HTTPSEND is 1. Also, in spite of the documentation indictating it is required, there is no “\r\n” on the HTTPCONTENT. No explicit Content-Type was set, so httpbin has returned the payload as “data”; it is just a string of bytes. If the HTTP service expects JSON, or treats it differently, as httpbin does, then the incantation is a bit different. Also, while the documentation suggests you can use commands like AT+HTTPCONTENT=0,{“RPM”:22}, I found that this did not work in practice; the double-quote marks are clearly causing a problem as httpbin says it things the data is “RPMPOST /p”. I resorted to using the hex encoded approach. Here is an edited-down transcript:

AT+HTTPCREATE=http://httpbin.org
AT+HTTPHEADER=0,Content-Type:application/json\r\n
AT+HTTPHEADER=0,Content-Length:10\r\n
AT+HTTPCONTENT=0,7b2252504d223a32327d,1
AT+HTTPSEND=0,1,"/post"
AT+HTTPCLOSE=0

Giving:

+HTTPNMIC:0,0,359,359
{
“args”: {},
“data”: “{\”RPM\”:22}”,
“files”: {},
“form”: {},
“headers”: {
“Content-Length”: “10”,
“Content-Type”: “application/json”,
“Host”: “httpbin.org”,
“X-Amzn-Trace-Id”: “Root=1-65a28db6-4775b6bd41c402ac1f9ac7dd”
},
“json”: {
“RPM”: 22
},
“origin”: “46.190.188.99”,
“url”: “http://httpbin.org/post”
}

Note that the Content-Length was still required (see earlier notes about this potentially being a Vodafone platform issue) and that I found the \r\n termination of the content (0a0d in hex) is not required (and these characters are not counted in the Content-Length).

Then, after a bit of persistence, the solution to the HTTPCONTENT quotes problem was discovered to be surrounding the content in double quote and escaping the inner double-quotes, like so:

AT+HTTPCONTENT=0,"{\"RPP\":23}"

Next Steps #2 – MQTT Misadventures

The SIMCom application note for Y70XX MQTT(S) which I found appears to be wrong (misadventure #1 but see the end of this article!), but the module I have recognises the commands in the AT commands manual.

My use cases are data source applications, so I’ll look at PUBLISH first.

For real use, with a broker on the public internet, MQTTS and authentication is a must, but a basic test on an open testing broker is a good place to start and mqtt://public.mqtthq.com:1883 should fit the bill, but there is also test.mosquitto.org.

Misadventure #2 coming up!

AT+MQNEW=public.mqtthq.com,1883,5000,200

A reply of +MQNEW:0 is good but if you get +CME ERROR:8002, check the APN ACL in the Vodafone portal! Guess who spent several minutes being baffled.

Try to make a connection (use a sensibly unique client id!):

AT+MQCON=0,4,"client_id",1000,1,0

The broker just disconnects at almost every attempt, responding with:

OK
+MQDISCON:0

I found no way around this. Initially, I thought it was because of the version 4 required by the AT command; there is no MQTT version 4! However, this might be a different interpretation of “protocol version” as outlined in this blog post explaining why there is no version 4 of MQTT (i.e. version 3.1.1 = 4!).Persisting with multiple attempts, each beginning MQNEW, eventually did work; a connection was established, shown by +MQCONNACK:0,0 and I was able to publish (note that the payload is a set of hex bytes) to a topic “aardvark” and see it in another client.

AT+MQPUB=0,"aardvark",0,0,0,4,61626364

I surmise that these open public brokers have quite aggressive connection controls, and that the low bandwidth makes the connection dialogue too slow. Maybe it is load dependent because trying again on UK Saturday lunch time seems much more likely to connect OK, and my PC client is just quietly re-trying the background. Once connected (CONNACK), at least the connection seems durable.

Disconnection with AT+MQDISCONN=0 is not the exact counterpart of AT+MQCON; after disconnection a new AT+MQNEW is required.

Although it is out of my core use cases, there may be utility in MQTT SUBSCRIBE. Sending a “retained” message to the broker from another device/service, the modem can come online and subscribe as a way of asynchronous messaging to the device (which will only poll occasionally to conserve battery power). Assuming I’ve published a message of “World” to topic aardwolf from somewhere else, the AT commands and reply looks like this (responses edited down):

AT+MQNEW=test.mosquitto.org,1883,5000,200
AT+MQCON=0,4,"client_911",1000,1,0
AT+MQSUB=0,aardwolf

+MQSUBACK:0,aardwolf,0
+MQPUB:0,aardwolf,0,1,0,5,576F726C64

OK, so now for the final episode of the misadventure. When I spotted that the MQTT(S) application note had different commands to the AT Commands document, I just got on using the commands from the latter. Only later did I notice that it did not mention encrypted/TLS/MQTTS. I tried providing the connection info for my Azure IoT Hub, hoping that it would auto-negotiate. It did not; I got +CME ERROR:8002.

I surmise that the hardware version I have (I get Y7080E R2212 in response to ATI) has firmware which predates MQTTS support. Version 1.00 of the MQTT(S) application note is dated 2022-11-30. It clearly states “This document applies to SIMCom Y7025 Series, Y7026 Series, Y7028 Series ,Y7012 Series, Y7080E Series”. So much for that. It looks like buying a Y7080 rather than a SIM7080 was a false economy. The trouble is the SSL config for SIM7080 looks rather complicated and doesn’t appear to have a simple “just encrypt” option without having to install certificates etc. I also have concerns about firmware version, so I don’t think just ordering a SIM7080 is the smart move either. Back to the drawing board. First idea = try to get the firmware updated. There is an application note for firmware updating and firmware release notes for Y7028… wish me luck!

Updating Firmware

The SimCom website says you need to contact technical support for a patch, which looks like it is a small “diff” and they need your current and desired firmware versions. The full version of the module, including firmware is found using:

AT+NV=GET,PRODUCTVER

I got 2212B03V01Y7080E . SimCom tech support wanted to know where I got the module from and my country, but seemed happy with my answers and eventually got back to me with what I think is a full firmware update file, for 2212B07V01Y7080E, not just a “diff”, and the comment:

Y7080E comes with GPS, and FOTA has only a few tens of K of available space. For example, B03->B05 is not enough space, so only minor upgrades can be made, such as frequency flips with minimal changes. So is there a way to upgrade B03 to B07? Yes, we provide firmware upgrade tool for upgrading. [links to download tool and firmware]

Stumbling through the documentation it looked like I needed to hold BOOT to VDD_EXT then use their tool to upload the new firmware. A close inspection of the PCB alongside the SimCom module pin-out suggested two test points, “T3” and “EXT” were what I needed, so two short “Du Pont” connecters were soldered in. Unfortunately, the tool was only partly in English, with the “download” pop-up (yes the firmware upload process is called “download”) in Chinese. Google translating the word “upgrade” showed me the correct Chinese characters to allow selection of the correct drop-down. So, I connected my usual USB-Serial adapter and … it failed. It turned out that it is NOT necessary to play about with pulling BOOT to VDD_EXT. Just connect the serial adapter and run the SimCom tool. It worked.

My firmware is now updated! And now the missing MQTT commands are available (see misadventure #1, above).

A return to MQTT looking for glory appears in another article; this one is quite long enough.

 

Millers Dale and Monsal Dale Mining History Field Guides

This field guide contains two itineraries around mining history sites (and some geology and other historical notes) near to Millers Dale in the Peak District: Maury Rake and Tideswell Dale. Three others are in preparation and will be added in due course…

The main download is a zip file which contains a written field guide, maps, and digital location data for GPS devices.


If you make use this guide, please let me know by adding a comment. You can also post corrections and suggestions (but please note I am unlikely to respond to requests for more information).

Links to all mining history field guides may be found on the Peak District Mining History Field Guides index page.

Azure IoT Hub and MQTT – Capture to CosmosDB

Azure IoT Hub is not a proper MQTT Broker but does have sufficient MQTT support to allow it to be used with MQTT devices and can be easily coupled with Azure Cosmos DB to record the message payloads. Both these Azure services are available with Free Tier pricing, subject to usage limits (daily quota of 8000 Hub messages) which would be quite acceptable for domestic/hobbyist/diy IoT use*. In contrast to the open public free MQTT brokers, the Azure setup offers privacy and security.

[* – On the whole, I favour keeping what I can on the LAN and using Mosquitto on a Raspberry Pi, but my particular use case is devices using the “mobile carriers'” LTE CAT-NB (NB-IoT) or LTE CAT-M1 networks with modules based on the SimCom Y7080]

Having struggled to extract the necessary information from Microsoft documentation, here are some notes. They presume an Azure subscription and use of MQTT Explorer, but can surely be applied to other MQTT clients and physical devices.

Preparing Azure Cosmos DB

There are three layers to a CosmosDB setup: the “DB account”, the “database”, and the “container”. Only one “DB account” can be created within the Free Tier and the throughput, which is limited to 1000RU/s can only be split between databases in units of 400RU/s. Consequently, it probably makes sense to run one database in one account and to work at the container level. As far as I can see, the free RU allocation is more than adequate for the kinds of use outlined above. Create a “no sql” account.

Once you have created a DB account, use Azure Portal to access the Data Explorer for that account. I think the defaults for “New Container” are probably fine but be sure to set automatic indexing and note that the partition key name will be required for the next step (using a key name of “/partition_key” is as good as anything, noting the initial slash).

Data Explorer also allows you to browse and query the records in the container (“Items”) and to alter settings (“Settings”). A little trick for deleting records while experimenting (there is no equivalent to SQL DELETE!) is to use Settings to temporarily set the “Time to Live” to “On” and give a time of 0 or a few seconds (note you do need to hit “Save”).

Preparing Azure Iot Hub

Creating an IoT Hub in Azure Portal deserves little comment, except to note that you should leave the default Networking > Connectivity configuration = “Public Access” (and not change to the “recommended” option). In what follows, I will use {hubName} as a placeholder for the name assigned in this step.

It is essential that a “device” is created in Azure, to allow a real device or MQTT client application to interact with the hub. In Azure Portal, refer to Device management > Devices. Add a device with the default settings (this is not an “IoT Edge Device”, authentication should be “Symmetric key” (it will use “Shared Access Signature”) with auto-generated keys). I will use {deviceId} for the ID assigned in this step.

Note that the Hub is not a store; its role is to receive and route messages. In this case, I wish to route messages to CosmosDB. This is achieved in Azure Portal by using Hub settings > Message routing. A useful routing setup actually comprises three parts: the route, an endpoint, and an optional enrichment (which is referred to later).

First add a “Custom endpoint” for Cosmos DB, entering the Cosmos DB account, database, and container. Use the same partition key name as assigned previously and leave the key value template as it is. Then add a route (the name is arbitrary) to use that endpoint and selects “Device Telemetry Messages” as the Data source. To route all messages to Cosmos DB, leave the “Routing query” as the default value of “true”.

At this point, you should see a notification that messages will not now flow to the built-in endpoint. That is OK. Since the routing query was left as the default, the “fallback route” feature should be disabled.

Using MQTT Explorer

This is the hardest bit to get right, and where the important details are rather lost in the MS documentation.

Before any device/client can connect, it will need a Shared Access Signature Token (SAS Token). Wierdly, Azure Portal does not let you create a SAS token for IoT Hub devices and the keys it does show are NOT what you need. SAS Tokens can be created with the Azure CLI utilities, but since I use VSCode for development (both for Python and for microcontrollers using PlatformIO), I opted to use the “Azure IoT Hub” extension (ignore the note recommending use of the Azure IoT Tools extension pack; that is now discontinued!). This will add an “AZURE IOT HUB” entry to the VSCode Explorer panel, from which a right-click context menu allows you to generate a SAS Token for any device.

A new MQTT Explorer connection should have the following settings ({…} denotes placeholders mentioned above):

  • Turn “encryption” on (the MQTT interaction must use Transport Layer Security, commonly known as Secure Sockets Layer, SSL)
  • Protocol: mqtt://
  • Host: {hubName}.azure-devices.net (this is the current domain used – check Azure Portal)
  • Port: 8883
  • Username: {hubName}/{deviceID}
  • Password: {SAS token}
  • Advanced > MQTT Client ID: {deviceID}

MS documentation refers to “api-version” in Host and Username but I found this to be redundant and the value doesn’t appear in Azure Portal AFAIK.

The IoT Hub is not a proper MQTT broker, and the topic is prescribed; it must be: devices/{deviceId}/messages/events/

It should now be possible to make a connection and send a message without MQTT Explorer indicating “disconnected” errors. Use the Cosmos DB Data Explorer to check for new records (note there is a “refresh” icon). Look in the record JSON and find the element “Body”. This contains the message but in base64 encoded form. The original message can be found by decoding using an online tool such as base64decode.

Improving the Message

It is possible to avoid having to deal with base64 encoded Body values by declaring the content type with the message (base64 is a helpful way of allowing arbitrary binary data to be expressed as a text string). This can be done by decorating the MQTT topic. To tell IoT Hub that the message payload is JSON, set the topic to: devices/{deviceId}/messages/events/$.ct=application%2Fjson&$.ce=utf-8 . A message sent with the new topic should appear with JSON in the Body of the Cosmos DB record, which should also now contain new elements declaring the content type and encoding. Note that the MQTT Explorer selection of raw/XML/JSON does not work as expected.

An alternative to working with the message Body is to put the data in the topic, leaving the message empty and so not needing the .ct and .ce decorations. For example, using a topic such as devices/{deviceId}/messages/events/rh=80.2&temp=14.2 gives you JSON in CosmosDB containing elements “rh” and “temp” inside a “Properties” container element.

Finally, the message enrichment feature of IoT Hub can be used to inject information into the Cosmos DB recods, although this is quite limited and probably not particularly useful. The process relies on the Azure concept of the “device twin”, which is its metadata counterpart to the real device. The following process allows you to associate values with a device in Azure Portal and see them come through to Cosmos DB:

  • Add a tag to the device, e.g. add a tag named “foo” with value “bar”.
  • In the Enrich messages tab add an entry with a Value of “$twin.tags.foo” and choose a Name (e.g. “Foo”) which you wish to appear in the Cosmos DB document.
  • Choose the Cosmos DB endpoint.

Cosmos DB records will now contain an element called “Foo” inside the “Properties” container, with a value of “bar”.

Postscript

Since writing this, I have discovered that Cosmos DB has some irritating limitations on aggregation functions for cross-partition queries. The notes above will fragment data across partitions (usually a good idea for scaling) but will require cross-partition queries. Since you get 20GB of data per partition, the loss of querying power by partitioning is really not justified for the scenario above.

Uploading ESP8266 or ESP32 Firmware OTA from Android

Using OTA firmware loading is rather useful for IoT devices once they are “in the wild”. I adopted the precaution of using a hardware jumper to enable OTA because my devices spend most of their time in deep sleep to save power. The down-side of this is that I need to visit the device before each upload. Rather than wander about with a laptop, I thought that using my phone would be more convenient. Setting this up and using is fairly straight-forward but took a bit of thinking-through and desk research. So, for others interested in doing the same…

Termux

The power-house is Termux, which runs a (command-line only) linux environment on your phone. I installed the APK from f-droid, which seems to be common practice over Google Play Store.

Install python with:

pkg install python

Working with phone storage is not as simple as you might think; see the termux wiki. For example, I believe it is not possible to see the filesystem which you see inside Termux when the phone is connected to a PC for file transfer. My operating procedure is to copy firmware binaries into the phone’s Downloads folder and then to use the file manager on my phone to move it over to the Termux “home” as described on the wiki page.

The second issue is that working on the phone screen is a pain! I had previously installed scrcpy for use in demoing some software and this works a treat in making life easier. Keyboard joy! This just works. Read the documentation about enabling USB debugging in developer mode.

Doing the OTA

I’m using ESP8266’s so downloaded the espota.py script from the esp8266/Arduino github repository. You just need the one file. I expect the same procedure as I describe here would also work for ESP32 users with otatool.py.

What follows is based on having created a new directory in the termux “home”, placing espota.py in that directory and then creating one directory for each kind of device, which contains a small Bash script and the firmware binary file. I am also using PlatformIO inside VSCode.

Compiled firmware can be found in the project folder at: .pio\build\{env}, where {env} is the environment name in platformio.ini. Use the firmware.bin file (not the .elf file).

Create the bash script for ease of use (I call it upload.sh), substituting xxx.xxx.xxx.xxx with the device IP address:

#!/bin/bash
python ../espota.py -d -i xxx.xxx.xxx.xxx -f firmware.bin

You can either use “nano” to edit these files inside Termux or move the .sh file between your phone and PC as noted above. Check the espota documentation for options to upload filesystem or if you’ve set up authentication.

Make it executable using:

chmod +x

This is simply run by using (in the appropriate directory):

./upload.sh

A nice modification if you have multiple devices of the same type is to change upload.sh to allow the IP address to be entered at run-time, in response to a prompt. All my devices are on the same subnet (192.168.1.x), so I only need enter the last component. The modified script is:

#!/bin/bash
read -p "Last part of IP address: "
python ../espota.py -d -i 192.168.1.$REPLY -f firmware.bin

HM-10 (and HM-11) BLE – the good, the clones, and the down-right fake!

The HM-10 and HM-11 modules, originated by Jinan Huamao Technology look quite useful for tinkering about with Bluetooth, and their documentation is better than average and there is supplementary information on the web (e.g. Martyn Currey’s blog). The scourge of clones for this kind of module is not news to me but it turned out to be harder than I expected to get a satisfactory product from ebay or Aliexpress.

I would ideally like a genuine product, and while there are sellers proclaiming “genuine” in their listings, this often comes with an excessive price. Other than that, price seems to be a poor guide to quality. I consider a “clone” to be something which is functionally correct but not original and maybe with lower quality hardware. A module with the wrong firmware listed as “HM-10” is a fake, and if the core bluetooth chip (for these modules it is the Texas Instruments CC2541) is not correct it is a down-right fake. The last case is useless as it won’t be possible to load the HM-10 (or HM-11) firmware, which is available for download. Re-flashing is not particularly difficult, but generally involves some “hacky” soldering. Partially to alleviate the hassle of re-flashing but also to give me access to more of the CC2541’s pins (the HM-10 has lots of GPIO, ADCs, DS18B20, DHT11, and a poor-mans PWM), I designed some breakout boards and got a batch made. The Eagle CAD files are on GitHub, along with some links to software/instructions on reflashing.

It’s not that I wasn’t aware of the potential for fakery; I tried to assess listings by checking for ambiguous or garbled wording (almost universal!), clearly inaccurate descriptions, and checked the photos to see “CC2541” on the chip. I thought the most likely outcome was getting a module with an out-dated version of the hardware. I was wrong!

HM-10 Fail #1

On May 6th 2023, I bought two “HM-10” modules on breakouts from ebay seller Alimodule. The total price was £8.02, by no means the cheapest on sale. The breakout has a 5V regulator and [probably] some level shifting to allow it to be used on a 5V Arduino. I really wanted 3.3V supply and logic to work with an ESP8266, but I reckoned these would be OK to experiment with, and that I could butcher the breakout to achieve this. The downside of these boards is that they only expose the UART + status and key pins. The upside is that they are widely available.

After some stumbling about trying to follow the Huamao manual (and failing) it became clear that the firmware was wrong. THe device appears as “BT05” (which I later found out to imply some Bolutek firmware, although I never found documentation for it) and the result of AT+HELP shows the commands are wrong for HM-10.

I tried multiple times to flash new firmware, using both ESP8266 CCLoader and Raspberry Pi CC2541 programmer (see the GitHub link above) and got nowhere. The firmware appeared to upload correctly but the module appeared to be bricked. After SOME TIME, I realised that the chip id which is reported by the RPi programmer showed the chip was a CC2540, a TOTAL FAKE, even though the chip is clearly marked CC2541. Although these two chips are compatible at the source code level (I believe), they are not compatible at the level of compiled firmware. I tried REALLY HARD to find some firmware to run on the CC2540 but failed, so while the product which arrived would have worked as a simple serial-over-ble device, I now have junk!

I got a full refund, but no compensation for the lost time in working out I had a deliberate fake.

Avoid Alimodule!

HM-10 Fail #2

This time and for all the attempts below, I got “stamp” SMD modules to suit my shiney new breakout PCBs.

Ebay seller h-quality_electronic had a somewhat garbled description but a clear title – “HM-10 4.0 Bluetooth UART Transceiver Module Serial Port CC2541” – and a picture which looked OK; although it was missing one of the two SMD crystals (a common sign of a cost-reduced knock-off), the chip was right. I ordered 2 pieces for £6.80 delivered.

Unfortunately, it was the same story as before. The firmware claimed to be BT05 (and this was marked on the board) but many of the commands shown by executing AT+HELP simply returned “ERROR”. The chip id was read to disclose that this was another total fake with a CC2540 marked as CC2541. Seriously, someone went to the trouble of faking the markings on chips on a module only retailing for a few pounds!

This one ended with ebay customer support issuing me with a refund. Avoid this seller.

HM-10 Fail #3

Ebay seller cayin35 had a credible listing; it only mentioned CC2541, gave specific firmware versions, and the picture looked right. With only a few inconsistencies in the description I went for it: 2 items delivered for £6.82.

Sadly, the same story: crappy BT05 firmware with commands that don’t work and a chip which turned out to be a CC2540 when interrogated by a programmer. More TOTAL FAKERY.

A full refund again and another seller to avoid!

Now I am getting a bit pissed off. It really shouldn’t be this hard to get an item as it is described. This isn’t a matter of minor detail; these devices are NOT CC2541 and NOT HM-10.

HM-10 Success (partial)

Four boards from Aliexpress seller “Advanced Tech” cost me £12.91 including tax and postage. The description seemed mostly accurate for a HM-10, with the exception of chip ambiguity at the bottom of the description: “HM-10 CC2541 CC2540 4.0 Bluetooth UART Transceiver Module Transparent Serial Port”. I believe very early HM-10s did use the CC2540, which might explain how this comes about, but still. The picture did show a CC2541 and two SMD crystals (which is generally a sign of not cutting corners), so I gave it a shot.

These arrived today, August 10th 2023. It really has taken since May to cycle through order – wait – receive – test …

The hardware appears OK and the items received do look like the listing picture. The firmware advertises itself as HMSoft via a BLE scanner but once I started to interact with the device using AT commands it quickly became clear this is crap: AT+HELP? lists commands which dont all work (e.g. AT+VERSION returned nothing). AT+VERS? (which is the command used in a real HM-10) DID return a version string: “HMSoftV004”. Obviously nonsense. Firmware is a FAIL!

The Bluetooth chip is, however, a CC2541 (or at least it reports a chip id which matches, when queried by a programmer). And yes, miracle… I managed to flash it with a v540 firmware using the Raspberry Pi CC2541-programmer software. This isn’t the latest firmware but it is what I had in “.bin” form, and it supports updating over the serial interface using AT+SBLUP.

I’d have preferred not to have the hassle of a re-flash, but I’ll consider trying this seller in future as the evidence so far is that trying a new seller is more likely to waste my time and give me more junk than not.

HM-11 Fail

My first attempt to get some HM-11 was from ebay seller satisfyelectronics. Two items for £4.05 including tax and postage should maybe have alerted me to fakery but the title lacked the usual ambiguity about chip: “Bluetooth 4.0 module BLE CC2541 low power NEW HM-11 S“.

When the modules arrived it was immediately clear these were not HM-11 devices. The chip is not a Texas Instruments chip (CC254x)! By comparison with the Huamao datasheet and my HM-11 breakout boards it is obvious thath the module solder pads are all wrong. I should have spotted that in the photo. I can’t even test these and don’t know what they really are.

What egregious mis-description! I got a full refund.

Avoid satisfyelectronics!

HM-11 Success!

Aliexpress seller “fYD Open Source Hardware“. I bought 4 for £12.98 including tax and postage. The listing did give me some cause for concern as the title included “CC2540” (wrong!) and “CC2541”. The picture did show a CC2541 and two SMD crystals (which is generally a sign of not cutting corners), so I gave it a shot.

These were either genuine or good fakes. Hooray! The firmware was not the latest, being v6xx, but that is new enough to allow for firmware updating via the serial port using “AT+SBLUP” and the Huamao upload tool. The listing actually stated firmware v508.

I suppose there is no guarantee that they still have the same batch, and some of their other listings had garbled descriptions, but I would try them again.

Final Words

It is pretty clear that pictures and descriptions are not a good guide. If you find yourself in the same situation, make sure you check the chip id. If it is wrong demand a refund and avoid that seller.

Using Z-Stack ZNP to Make DIY Zigbee Devices to Work With Zigbee2MQTT – Part 2

At the end of part 1, I had got as far as joining my ZNP board to the Zigbee2MQTT (Z2M) coordinator PAN, and seeing the series of attempts by Z2M to interview my device. This is the point at which Z-Tool is evidently limited; it is not really feasible to respond to Z2M messages in a timely manner. To be fair, Z-Tool does have a scripting facility using a Javascript engine, but I didn’t find documentation on the ZNP API to match, so… I used Python and worked at the level of UART messages. Working at this level will make transferring what is learned into a MCU-based device, while using Python reduces the friction of a compile-upload-test cycle while learning. The Python code is available on github.  This not meant to be a library/package and was substantially written as a learning activity. There are published Python packages, zigpy and zigpy-znp, but I felt that: a) using a full-featured package was likely make it harder to understand how things work; b) the comment “Zigpy is tightly integrated with Home Assistant’s ZHA component” suggested I would run into issues and complexity. Additionally, the Python code is just to demonstrate what the interactions should be for a MCU-based system, for which I will need to be working at a lower level than a Python object model. That said, there are some potentially-useful modules in zigpy-znp, which are not tightly integrated to ZHA and would be worth considering if using a Raspberry Pi etc as the basis for a Zigbee ZNP-based device.

Staying Sane and in Control

By the very nature of experimentation, there will be changes in intent and design, errors, etc. Staying in control is helped by removing previously-joined devices from Z2M, making sure the device is set up from a restart, and then re-joining. This isn’t always necessary but…

There is also ample opportunity to get inconsistent set-up in Z2M. If the cluster specification in AF_REGISTER changes in an incompatible way (e.g. the change is not simply an addition), the information which Z2M retains, even when removing and re-joining can mess things up. In this case, stop Z2M, go to the “data” folder of the Z2M installation and remove coordinator_backup.json and database.db, and restart Z2M. This will completely mess up any existing Zigbee network.

Setting up a sand-box coordinator + Z2M installation is probably a good idea to avoid messing up an existing network but also because having lots of devices and even one router is going to make Wireshark captures get very busy indeed. Use a different Zigbee channel for the sandbox; as well as avoiding contention between the radios, this means you can assign the channel Wireshark to only sniff the sandbox messages.

I’ve also found it is generally a good plan to keep Z2M with joining disabled until you want to join a device, after having started Wireshark and generally getting organised.

Z2M Requests Attribute Values – the “Interview”

Stepping back from the Z2M interview…

I am interested in three kinds of interaction between the ZNP board and coordinator/Z2M: Z2M requests attribute values, Z2M requests an action, and the ZNP board sends reports. All of these are in the context of Zigbee Clusters and use Zigbee Cluster Language (ZCL) messages embedded within the ZNP commands sent over the serial interface to the ZNP board. It is important to note that both the ZNP messages and the ZCL messages embedded within them (as the “data” of the ZNP message) have parts with the same (or similar) names and functions: a sequence identifier, a command identifier, and a body/payload/data. The use of a Zigbee sniffer, such as ZBOSS + Wireshark, really helps with interpreting this nested structure and debugging (e.g. to spot malformed ZCL).

Referennce [R6] (see part 1) describes the structure of a ZCL message frame (section 2.4, “Command Frame Formats”), lists the permitted ZCL command ids, and specifies the content of the ZCL message payload for each type of command (section 2.5). The commands relevant to my “three kinds of interaction” are: 0x00 (read attributes), 0x01 (read attributes response), and 0x0a (report attributes). These are all “global” commands in the sense that these commands ids have the same meaning for all clusters. The alternative kind of command is “specific”, meaning that the command id has a meaning which is specified separately for each cluster. Whether the command id is interpreted as global or specific is indicated in the ZCL frame control field (FCF) via bits 0 and 1 of the FCF (see [R6]). The way Zigbee is designed, things like turning (e.g.) a lamp on and off are achieved by sending cluster-specific commands, rather than by writing attributes, whereas the current state of the lamp is determined by reading attributes. This is why I said “Z2M requests an action” in my list of kinds of interaction, above.

Returning to the interview, which is a set of messages requesting the values of attributes in the Basic Cluster.

ZNP passes requests to read attributes to its serial interface as an AF_INCOMING_MSG (see [R2]). The message contains information about the endpoint, source network id (this will be 0x0000, the coordinator), cluster identifier, etc and a ZCL message as its “data” component. For the interview, the cluster id will be 0x0000 (Basic Cluster) and the ZCL command id will be 0x00 (read attributes). Consulting section 2.5.1 in [R6], we can see that the ZCL message comprises a standard ZCL header + a list of one or more two-byte attribute identifiers (Z2M will sometimes issue requests for one attribute per incoming message, and sometimes request several attributes at the same time). The meaning of the attribute identifiers for the Basic Cluster is given in section 3.2.2.2 of [R6].

Care must be taken when handling the cluster id and attribute ids (and any multi-byte component) because little-endian byte order is used. Wireshark is your friend in checking for byte-order bugs.

The ZNP which should be sent in response to an AF_INCOMING_MSQ is AF_DATA_REQUEST (yes, it is called “REQUEST”), which will in turn give rise to ZNP sending both AF_DATA_REQUEST_RSP and AF_DATA_CONFIRM messages. A sequence identifier in AF_DATA_REQUEST matches that in AF_INCOMING_MSQ. My approach is to always wait for a “RSP” before proceeding and to log AF_DATA_CONFIRM as they arrived (this is OK for the Z2M interview, but maybe not always). Note that these messages do not necessarily mean that the attribute values you sent back have actually been handled by Z2M.

Here is a fragment of the annotated log which my Python script emits for the interview, comprising the interactions for a single attribute in the Basic Cluster:

RX body: Cmd: 4481 Body: 00 00 00 00 00 00 01 01 00 31 00 3b 3d 01 00 00 07 10 10 00 05 00 04 00 82 3d 1d
Incoming ZCL for endpoint 1, cluster id = 0000 is: 10 10 00 05 00 04 00
ZCL Command = read attributes (0x00)
Sending response...
TX: fe 24 24 01 00 00 01 01 00 00 00 00 10 1a 18 10 01 05 00 00 42 08 5a 4e 50 2d 54 65 73 74 04 00 00 42 05 41 52 43 31 32 02
[Response] RX body: Cmd: 6401 Body: 00
AF_DATA_REQUEST_RSP success? True
--------------
RX body: Cmd: 4480 Body: 00 01 00
AF_DATA_CONFIRM for TransId=0 on Endpoint=1 had Status=0

The request is a composite requst for the values of two attributes: Manufacturer Name (0x0004) and Model Identifier (0x0005). The response (notice the ZCL command is now 0x01 = read attributes response, and that the ZCL FCF shows the client-server direction is reversed) contains strings for the attribute values. Strings are data type 0x42 (table 2-11 in [R6]) and the “value” given in the data has the length of the string as the first byte, followed by the character codes. Section 2.5.2 of [R6] shows how each attribute requested gets a chunk in the respons comprising: 2 bytes of attribute id + 1 byte status (=0x00) + 1 byte data type (=0x42) + 1 byte string length + n bytes of string characters.

The Wireshark log is helpful to understand and verify what went on. The messages are built up in multiple (nested) layers. The ZCL forms the inner-most, with the Zigbee Application Support Layer above. These are the two most relevant for understanding the ZNP interactions; the other layers are concerned with the underlying network interactions.

The “Read Attributes” and “Read Attributes Response” to match the Python log above are both marked as having protocol = “Zigbee HA” – Wireshark has detected we’re using the Home Automation profile. The Wireshark fragment for the request is:

ZigBee Application Support Layer Data, Dst Endpt: 1, Src Endpt: 1
  Frame Control Field: Data (0x00)
  Destination Endpoint: 1
  Cluster: Basic (0x0000)
  Profile: Home Automation (0x0104)
  Source Endpoint: 1
  Counter: 145
ZigBee Cluster Library Frame, Command: Read Attributes, Seq: 16
  Frame Control Field: Profile-wide (0x10)
  Sequence Number: 16
  Command: Read Attributes (0x00)
  Attribute: Model Identifier (0x0005)
  Attribute: Manufacturer Name (0x0004)

And for the response:

ZigBee Application Support Layer Data, Dst Endpt: 1, Src Endpt: 1
  Frame Control Field: Data (0x00)
  Destination Endpoint: 1
  Cluster: Basic (0x0000)
  Profile: Home Automation (0x0104)
  Source Endpoint: 1
  Counter: 3
ZigBee Cluster Library Frame, Command: Read Attributes Response, Seq: 16
  Frame Control Field: Profile-wide (0x18)
  Sequence Number: 16
  Command: Read Attributes Response (0x01)
  Status Record, String: ZNP-Test
  Status Record, String: ARC12

Digging into the component parts of the ZCL in Wireshark, and seeing how elements relate to the bytes-level view should clarify the comment above: “each attribute requested gets a chunk in the respons comprising: 2 bytes of attribute id + 1 byte status (=0x00) + 1 byte data type (=0x42) + 1 byte string length + n bytes of string characters”.

Beyond the Interview – a Switch and a LED – the Generic OnOff Cluster

Defining two endpoints on the ZNP board, one for a switch and one for a LED, is sufficient to explore the “three kinds of interaction” through: the Z2M web dashboard, Wireshark, and serial interface (whether via Z-Tool, Python, etc). NB: I use “switch” to mean something like a push button or toggle switch, rather than as a synonym for a relay (the practice of using it as a synonym for relay should be strenuously avoided as it makes the word “switch” become ambiguous).

The AF_REGISTER command (see part 1) is used twice, once for each endpoint. Both the LED and switch use cluster 0x0006 (generic On/Off). For the switch, I added cluster 0x0006 as both an input and output to endpoint 1. The input allows the switch to be remote-controlled from Z2M whereas the output is used to inform the coordinator/Z2M of a change in switch state. State changes are sent as ZCL reports (ZCL command id = 0x0a) whenever the switch changes state. The same kind of ZCL message can also be used for periodic reporting, e.g. for a sensor. I put the LED on endpoint 2. In this case, the appearance of 0x0006 in the in-cluster list is what allows the LED to be switched on and off (analogous to remote controlling the switch) and in the out-cluster list it means the LED state is reportable (in my Python I use a periodic report for this, for demonstration purposes, but an event-driven report would be more sensible in practice).

After issuing ZDO_STARTUP_FROM_APP, that is all that is needed to set the ZNP board up. The real work is done handling incoming ZCL and preparing outgoing ZCL. We do, of course, need to have a converter JS file in Z2M too. There is a rather hacky converter in the github repo, which will work but may not be optimal!

I will now outline what happens for each of the remaining “three kinds of interaction” (reading the state of the LED/switch simply being the same pattern as reading the Basic Cluster attributes) in the context of this simple test-device.

Reporting Switch Changes and LED State

These two are identical in the structure of the ZCL, which is very close to that used in the “read attributes response”; the only differences in the ZCL are the command id (0x0a) and the report data structure does not have a status byte. See the frame format for the Report Attributes Command [R6]. The attribute id for on/off state is 0x0000 and for the ZCL we need its data type (boolean, type id = 0x10) and one byte for on (0x01) or off (0x00). Remember that the switch and LED are on different endpoints, but also recall that the endpoint is included in the data sent via an AF_DATA_REQUEST ZNP message. This is the message to use although, again, having “REQUEST” in the name seems wrong for a report; its just the way it is! As for the Read Attributes Response case, expect AF_DATA_REQUEST_RSP and AF_DATA_CONFIRM in response to the report message.

Reference to section 3.8 in [R6] describes other attributes which a generic on/off device may support.

Here is my Python log for reporting when I pressed the button linked to endpoint 1:

TX: fe 11 24 01 00 00 01 01 06 00 00 00 10 07 18 00 0a 00 00 10 01 26
[Response] RX body: Cmd: 6401 Body: 00
AF_DATA_REQUEST_RSP (for report) success? True
--------------
RX body: Cmd: 4480 Body: 00 01 00
AF_DATA_CONFIRM for TransId=0 on Endpoint=1 had Status=0

And the Wireshark capture to match:

ZigBee Application Support Layer Data, Dst Endpt: 1, Src Endpt: 1
  Frame Control Field: Data (0x00)
  Destination Endpoint: 1
  Cluster: On/Off (0x0006)
  Profile: Home Automation (0x0104)
  Source Endpoint: 1
  Counter: 2
ZigBee Cluster Library Frame, Command: Report Attributes, Seq: 0
  Frame Control Field: Profile-wide (0x18)
  Sequence Number: 0
  Command: Report Attributes (0x0a)
  Attribute Field
    Attribute: OnOff (0x0000)
    Data Type: Boolean (0x10)
    On/off Control: On (0x01)

Remote Control

This is about using the Z2M front end or MQTT publish activity to control the ZNP board device.

Turning the LED on and off and remote-controlling the switch are identical as far as the ZCL goes. In our case, the only difference is the endpoint which comes through with the AF_INCOMING_MSG. The ZCL has a different structure to the previous cases seen and the ZCL FCF is what signals the difference. Recall that attribute requests had a FCF of 0x10 and 0x18 for the response and reports. These commands have the same meaning no matter what cluster is involved. The important conceptual point is that the ZCL for turning something on or off is not the same as setting the attribute for state; it is an action which has specific meaning only within the context of the genOnOff cluster (0x0006). The FCF signals the fact that the command is “local or specific to a cluster” (see [R6]) via bit 0, which is unset for “global” (any-cluster) commands and set for local commands; checking the FCF is key to handling these messages correctly. The remainder of the ZCL after the FCF is very simple; first there is the usual sequence number, then there is a single command byte, where 0x01 means “turn on” and 0x00 means “turn off”. Section 3.8.2 of [R6] describes four other commands for use cases such as off-with-fade, on-with-timed-off, etc.

Having received the command, we should reply to Z2M because the FCF says a default repsonse is expected (bit 4, “disable default response” is not set). A default response is simply a ZCL message with a command id of 0x0b and a short payload indicating the command which was sent and a byte for success/fail (see [R6] 2.5.12). This is a global, not a cluster-local, command. Our default response message leads to a AF_DATA_REQUEST_RSP and AF_DATA_CONFIRM, as for any other AF_DATA_REQUEST.

Here is the message log from my Python script for turning the LED on:

RX body: Cmd: 4481 Body: 00 00 06 00 00 00 01 02 00 3c 00 40 16 03 00 00 03 01 29 01 82 3d 1d
Incoming ZCL for endpoint 2, cluster id = 0006 is: 01 29 01
Local/specific command to cluster 0006
=> device on endpoint 2 set to: on
Sending response...
TX: fe 0f 24 01 00 00 01 02 06 00 00 00 10 05 18 29 0b 01 00 01
[Response] RX body: Cmd: 6401 Body: 00
AF_DATA_REQUEST_RSP success? True
--------------
RX body: Cmd: 4480 Body: 00 02 00
AF_DATA_CONFIRM for TransId=0 on Endpoint=2 had Status=0

And here is the Wireshark capture for the same on command:

ZigBee Application Support Layer Data, Dst Endpt: 2, Src Endpt: 1
  Frame Control Field: Data (0x00)
  Destination Endpoint: 2
  Cluster: On/Off (0x0006)
  Profile: Home Automation (0x0104)
  Source Endpoint: 1
  Counter: 151
ZigBee Cluster Library Frame
  Frame Control Field: Cluster-specific (0x01)
  Sequence Number: 41
  Command: On (0x01)

and the default response (just the ZCL part this time):

ZigBee Cluster Library Frame, Command: Default Response, Seq: 41
  Frame Control Field: Profile-wide (0x18)
  Sequence Number: 41
  Command: Default Response (0x0b)
  Response to Command: 0x01
  Status: Success (0x00)

Footnote

While these two posts are a long way short of a comprehensive tutorial, I hope they provide enough structure to make it easier to understand what is going on and to work out the details. I found it very helpful to work carefully through the content of messages both at the UART/ZNP level and at the Zigbee sniffer (Wireshark) level, correlating the bytes with the relevant documentation.

If you find errors, especially where these are misunderstandings of how ZNP and Zigbee work, please do comment.

Using Z-Stack ZNP to Make DIY Zigbee Devices to Work With Zigbee2MQTT

Z-Stack ZNP (Zigbee Network Processor) allows you to make a Zigbee device by running ZNP on the Zigbee chip (CC2530 etc) and a microcontroller to read sensors, actuate motors, display stuff etc, using a basic communication protocol (UART serial or SPI) between the two. This saves you from having to compile code for the Zigbee chip (which requires a costly IDE for the CC2530) and means you can, for example, work with user-friendly libraries such as Arduino (whether for Arduino hardware, ESP8266, etc…).

Unfortunately, the documentation for ZNP is not great. Information is spread over several documents. There are errors. There is a distinct lack of information which “walks you through”. This article is my notes on how I got things to work, with a few caveats: I am working with a Zigbee2MQTT environment running @KoenKK’s Z-Stack HA 1.2 coordinator firmware and am explicitly working only within the scope of the Zigbee 1.2 Home Automation Profile on a CC2530 device. I see no reason why what follows could not be applied to Zigbee 3 or to chips other than CC2530 with a few tweaks.

Getting Started

Download Z-Stack Home 1.2 from the Texas Instruments Z-Stack archive (it should be the one named “Z-STACK-HOMEZigBee Home Automation Solutions“, with an installer file name of “Z-Stack_Home_1.2.2a.exe”. This provides you with the Z-Tool and some documentation. Do not use the firmware included! There is source code, which can sometimes be useful to work out details which the documentation does not explain.

Download the @KoenKK Home 1.2 coordinator firmware for your hardware (CC2530 etc). This is almost the same as the ZNP firmware available in the TI Z-Stack download but has a few modifications and different compiler flags set. These differences are essential! Although this is designated as “coordinator”, it is possible to change the role to end-device using ZNP. This is what we’ll do.

I’m using my E18 boards (see my E18 breakout post), but any basic development board based around the same CC2530 chip should work just the same, so long as it exposes the UART pins (see below). Ebyte sell a dev board with an E18 already mounted and an onboard USB-serial converter (but you will need to use jumper wires to connect the the CC2530 UART I/O to this). I will use “ZNP board” to refer to this hardware.

Flash the ZNP board with the @KoenKK firmware.

I’ll be using Zigbee2MQTT (hereafter “Z2M” and the ZBOSS/Wireshark sniffer as I outline my DIY Zigbee post. The sniffer is very useful for observing the sequence and content of messages and will helpfully show when you try to send malformed messages (these are often not reported by ZNP, which cheerfully responds with “success” messages).

For the purposes of exploration and learning, I use a PC to communicate with the ZNP board, rather than a micro-controller. This makes it much easier to experiment, while knowing that the same UART messages send from a MCU would have the same effect.

The system outline is: [PC – USB/Serial adapter] << UART serial >> [ZNP board] << Zigbee wireless >> [Zigbee coordinator dongle] << serial tty >> [Zigbee2MQTT].

At the PC end, the ZNP commands can be generated and sent “as you please”. I used Z-Tool (see below), some Python code, and (mostly just to show it worked) Termite with the “hexadecimal view” filter, my favourite serial terminal. ZNP commands are just a bunch of bytes!

While the TI ZNP firmware is compiled to require hardware flow control (which Arduino libraries do not have), and to require configuration (SPI vs UART) via a hardware pin, the @KoenKK does not (see the “patch” file in the github repo if you are interested to see the changes). It only supports UART comms and uses P0.2 for RX and P0.3 for TX. Remember to cross-over TX and RX between the USB-serial adapter and the ZNP board. It also exposes four GPIOs via ZNP on P0.6, P0.7, P1.6, and P1.7.

Unfortunately, the UART comms option doesn’t come with any power saving support, whereas the SPI option does (although how effective it is I do not know). The bottom line is that the ZNP board seems to draw a continuous 23mA  approx, which would be hopeless for a battery device. For a sensor/output-oriented device it would be feasible to use a MOSFET to kill the power to the E18 before putting the MCU to sleep, but devices which should be remote-controlled or queried will just have to be mains powered. I don’t suppose this is such a big deal as such devices are likely to have more energy consumption than a battery is likely to be suitable for.

Note: I will use “Termite” to mean “use the serial terminal of your choice” and “Wireshark” to mean “use the Zigbee sniffer of your choice”.

Documents

The main references which I found to be useful, specifically those relevant to HA 1.2, are:

[R1] “Z-Stack ZNP Interface Specification” – outline of ZNP, esp the physical interface (for various SoCs and devkits) and message frame structure.

[R2] “Z-Stack Monitor and Test API” – basic message and response structure for commands shown in Z-tool (except the Simple API). This is the core reference for ZNP message structures.

[R3] “Z-Stack Simple API” – general description of Simple API + index of configuration ids.

[R4] “Developing a ZigBee System Using CC2530 ZNP” – this contains a very useful walkthough of ZNP commands but doesn’t explain much. (Lamentably, TI seem to have removed this from their online document library.)

[R5] “ZigBee Home Automation Application Profile” – contains prescribed profile Id, device Ids, and cluster Ids applicable to HA 1.2. There is also a more recent Zigbee Cluster Library specification [R6] which is applicable to Zigbee 3.0 (and will surely have incompatible elements alongside stuff which works!). HOWEVER: this HA profile does not document the ZCL frame structure or the Basic Cluster attributes; there is presumably a Z1.2 document for these… but [R6] seems good enough. (Lamentably, the Zigbee Alliance, now CSA, no longer make this available. Bad people!)

[R6] “07-5123-08-Zigbee-Cluster-Library” – revision 8 of the Zigbee Cluster Library, aka “ZCL”.

[R1] to [R3] may be found in the Documents folder of the Z-Stack installation.

First Test – Termite and Z-Tool

Having connected P0.2 and P0.3 as noted above, connect with Termite at 115200 baud (at this point it is best not to have the sniffer dongle also attached, so you can easily see which the correct serial port is). Hit the ZNP board reset button and watch for its reset indication message to appear in Termite. This message is called “SYS_RESET_RESPONSE” (but some of the documentation calls it “SYS_RESET_IND”). Note that the message is framed: there is a “start of frame” (SOF) byte 0xfe, followed by the message body, and terminated by a single “FCS” checksum byte. The checksum uses the XOR8 algorithm (look it up). The message body starts with 1 length byte, followed by a 2 byte command and then a variable length of data. Review the message alongside the documentation for the message frame structure and the SYS_RESET_RESPONSE message type. Get some paper and check the checksum! Note that the command id (0x4180) is useful for looking up in the documentation when the “friendly” name has been changed.

The same message structure is the basis for all ZNP messages.

Close Termite’s connection to the serial port and start up Z-tool (it can be found in the Tools folder of the Z-Stack installation). Unfortunately, Z-tool defaults to the wrong baud rate. Look in Tools > Settings and change it to 115200 for the serial port in question. The setting persists for each serial port.

It should scan for and find your device (use Tools > Scan for devices to repeat the scan). It can take a while for the ZNP board to become visible on a cold start, but it should come up quickly once you’ve seen it appear in Termite. Hit the ZNP board reset button again and correlate the Z-Tool presentation with the raw serial message in Termite.

Now experiment with a few simple commands such as SYS_VERSION, SYS_PING, and SYS_RESET. UTIL_GET_DEVICE_INFO is also quite informative. You should be able to see the hardware MAC address (aka IEEEAddr) and the ShortAddress, which is used within the PAN. DeviceType is a little confusing; it shows the roles the firmware is capable of operating as, rather than the role which it currently has. The DeviceState for an unjoined device should be 0 = “DEV_HOLD”. More on device states later.

If feeling adventurous, wire some LEDs to the GPIO pins mentioned above and experiment with UTIL_LED_CONTROL and SYS_GPIO (you will definitely need to read the documentation carefully). Note that some SYS_GPIO settings interfere with UTIL_LED_CONTROL.

Also: try composing raw byte sequences using Termite, remembering the SOF and FCS bytes. You will find that ZNP will not respond if you make a mistake; in general, do not expect to receive an error message but DO expect to receive an explicit response message. The response might be a simple “success” response, some data, or a deferred indicator/callback as for the case of a restart.

None of these ZNP messages have gone beyond the ZNP board; they are only communicating with the ZNP firmware.

Preparing the ZNP Board as an End Device

So far, the ZNP board still thinks it is a coordinator; some configuration settings must be changed.

First, though, we should give some thought to the start-up procedure. Whether we’re thinking of a PC or a MCU communicating with the ZNP board, we need to know when it is ready. Refer to [R1]. In practice this amounts to waiting for SYS_RESET_RESPONSE (command id 0x4180) before attempting to send ZNP messages.

Although this configutation is only needed once, I think it makes sense to execute these commands with a script each time I change anything. For an MCU-based device in use, I would not normally perform these steps unless a jumper or button is active at start/reset time.

The command to use is ZB_WRITE_CONFIGURATION (Simple API) and each command should cause a ZB_WRITE_CONFIGURATION_RSP response from ZNP. Refer to [R2] for the structure of the ZB_WRITE_CONFIGURATION message and response (in the doc, “SREQ” is the request and “SRSP” is the response). The values for ConfigId are hidden in section 5.3 of [R3]. Note that Z-Tool expects the configIds to be provided in decimal form!

The following should be set:

  • ZCD_NV_LOGICAL_TYPE (configid=0x87=135d), value=0x02 for end device.
  • ZCD_NV_PAN_ID (configid=0x83=131d), a value of 0xffff means “don’t care”, which is appropriate for a single PAN environment as we want the device to join any PAN it finds.
  • ZCD_NV_CHANLIST (configid=0x84=132d), value = 4 bytes bitmask of channels to use. The LSB is channel 0, so the bitmask for just channel 11 would be 0x00 00 08 00. This is the compiled default and leaving it as-is does not stop the end device connecting to a coordinator on a different channel. Setting ZCD_NV_CHANLIST to match the coordinator channel will reduce scan-and-connect-to-PAN time because the end device will try channels in ZCD_NV_CHANLIST first, and then try others. I use channel 16 = binary 1 0000 0000 0000 0000 = 0x00 01 00 00. Note that in both Z-Tool and in the raw ZNP message, the value is set in little-endian byte order, so the bytes to send (in order) for channel 16 would be 00, 00, 01, 00 (remembering to use decimal form of each byte in Z-Tool).

I also set the ZCD_NV_POLL_RATE. This modifies the interval with which the device contacts the coordinator with IEEE 802.15.4 “Data Request” packets once joined (see Wireshark sniffer). The compiled default is 1000ms. THe documentation wrongly states this is config_id 0x24 and has 1 byte length. It is actually config_id = 0x35 and is a 4 byte value (in ms) expressed in little-endian form! I use a value of 15s, which matches the PTVO firmware. Expressing in ms and in little-endian order, this requires the series of bytes: 0x98 0x3A 0x00 0x00.

A likely cause of problems when “messing about” is caused by the way ZNP stores network state in non-volatile memory. While this is a good thing for devices in real use, speeding up their re-connection, it is helpful to always have a clean start if messing about… So, for experimental work, set the ZCD_NV_STARTUP_OPTION (configid=3) to a value of 2. This will clear the network state on restart. Issue a ZB_SYSTEM_RESET command to clear state immediately.

At this point the ZNP board is still unconnected from the network. It is not joined to a PAN and you will not see any network traffic to/from it in Wireshark (or other sniffer).

Joining a PAN

This involves two steps: first declaring to ZNP what “clusters” it offers/consumes, and second is telling ZNP to attempt to join a PAN.

There are some concepts and associated names for them which need to be understood before the first step can be completed.

Endpoint is a way of identifying separate features offered by a single physical device (identified by its MAC address and two byte PAN-network address). Consider an endpoint as a “virtual” device which shares a single physical device. For example, a Zigbee device controlling three separate lights would have three endpoints.

ClusterId is a way of identifying the set of attributes which a device feature supports. A simple on/off device has a cluster id of 0x0006. There is a Basic Cluster, id = 0x0000, which all devices should support. The Z2M “interview” involves the coordinator sending requests to read attributes in the Basic Cluster such as the device model and manufacturer. See [R5] and [R6].

InCluster, OutCluster, Server, Client are very confusing terms used in the Zigbee specifications. They often seem to be the wrong way! One way of deciding what they should be is to create a device using PTVO Configurable Firmware builder and then observe the messages using Wireshark. The InCluster list contains “server” features and the OutCluster list contains “client” features. Some examples: a light is a “server”, so is set up as an input cluster; a switch is a “client”, and a temperature sensor a “server”. I see that PTVO devices put the Basic Cluster as both an input and output cluster, but thinks work just fine with it defined as only an input cluster.

The first step, declaring the clusters uses the AF_REGISTER command (use Z-Tool), with the following values:

  • Endpoint = 1 (actually a free choice, but 1 seems logical!)
  • AppProfId must be set to the Home Automation Profile ID, which is 0x0104.
  • AppDeviceId should be chosen from the values given in [R5]. I used 0x0000.
  • AppDevVer can be freely chosen.
  • Set a length of 1 for the InClusters list and add the Basic Cluster Id.

After defining the endpoint, set Z2M to accept join requests and start Wireshark (remember to set the channel to match the Z2M coordinator).

Now for the second step. This will trigger network activity and a series of ZNP callback messages to inform you of the progress of joining. The command to use is ZDO_STARTUP_FROM_APP, setting a delay of 0 (appears as “Mode” in Z-Tool). The ZNP callbacks should comprise one ZDO_STARTUP_FROM_APP_RSP (with a network status code of 1) and several ZDO_STATE_CHANGE_IND messages. The status codes given by ZDO_STATE_CHANGE_IND are not in the documentation but can be found by referring to Components\stack\zdo\ZDApp.h in your Z-Stack 1.2 installation (find the devStates_t enum). The values I see follow the pattern: 2 DEV_NWK_DISC {potentially repeated} -> 3 DEV_NWK_JOINING -> 5 DEV_END_DEVICE_UNAUTH -> 6 DEV_END_DEVICE. Z-Tool incorrectly decodes status = 2 as “INVALID_PARAMETER”. If you see status 2 arriving multiple times it means you did not set the channel list to match the coordinator, and if you only ever see status 2, the coordinator is probably off-line or not accepting join requests. Status 6 means the device has joined the PAN as an end device and it should appear in the Z2M web interface.

You will now see a bunch of AF_INCOMING_MSG messages. These are from the coordinator to the ZNP device and are caused by Z2M’s “interview”, which is trying to read the attributes in the Basic Cluster. It will eventually give up. The ZNP board is now joined into the PAN but is not properly described in Z2M and can do nothing useful. Still: getting this far is an achievement.

Aside: there are also Simple API commands for this process, but I find the responses less informative.

Message Log

Here are the raw ZNP messages sent over UART and the responses (generated by a Python script – see Part 2). The responses have been parsed out to command + body, whereas the TX bytes include the SOF and FCS.

TX: fe 03 26 05 87 01 02 a4
[Response] RX body: Cmd: 6605 Body: 00
TX: fe 04 26 05 83 02 ff ff a6
[Response] RX body: Cmd: 6605 Body: 00
TX: fe 06 26 05 84 04 00 00 01 00 a4
[Response] RX body: Cmd: 6605 Body: 00
TX: fe 03 26 05 03 01 02 20
[Response] RX body: Cmd: 6605 Body: 00
TX: fe 06 26 05 35 04 98 3a 00 00 b6
[Response] RX body: Cmd: 6605 Body: 00

And here is the AF_REGISTER and “start”.

TX: fe 0b 24 00 01 04 01 00 00 01 00 01 00 00 00 2b
[Response] RX body: Cmd: 6400 Body: 00
TX: fe 01 25 40 00 64
[Response] RX body: Cmd: 6540 Body: 01

In Wireshark, you should see an Association Request, followed (but probably not immediately) by an Association Response. Note that the protocol is IEEE 802.15.4, rather than “Zigbee”, as these are network-level interactions. See that the request and response source and destination use a MAC address and that the short address (2 bytes) which will subsequently be used (and is shown in Z2M) is provided in the response.

Further down the Wireshark log, you should see a “Transport Key” message being sent to the short address, now given as “Zigbee” protocol, followed by a “Device Announcement” where the newly-joined device broadcasts its presence.

To be Continued…

Part 2 will look at the request and response messages involved in the Z2M “interview”, and beyond to build a simple device to demonstrate reporting and remote control.