Data management

Online seit Mon 28 September 2020 in Infrastructure

The water production data collected by the project's loggers will be send to our servers (and to the public) via a Raspberry Pi (raspi form now on) sitting at the water utility.

The proposed data flow is shown below

Data flow form utility to servers as of 2020-09-28

The raspi receives data from the logger (in this situation via an SD card inserted into a card reader), prepares it for the server, and then calls the server to notify that data is ready for pickup. That's the Call Home module in the diagram.

The server will then securely retrieve the data from the site and run the post-processing algorithms, which includes our CFD-based discharge curve. When all this is done, the data will appended to the database, and made available to the public.

Our implementation partner, ULAG, will keep nightly backups of the DB.

The development of these infrastructure is on the way, a set of raspi's is already being tested.

Transformation of DATA

As the data flow from one device to the other it gets transformed. The data format at each stage is made to simplify the activity on that stage. The raw format delivered by the logger is transformed into a table format at the raspi. The raspi also generates some metadata, that among other things, helps us verify that the data is not corrupted.

At the server the data is put into an influx DB, with the schema described in the diagram. Each operator (water utility) get its own bucket (database in influx v1.8) which contains measurements (tables) organizing the data to simplify common queries, e.g. request made by users or the visualization tools.

Call home and data retrieval

When a raspi has finished preparing the data that it got from the logger, they need to tell the server where (IP) and when the data is available. The raspi know where to call because our server does have fixed IP. When the server identifies and verifies the call, it retrieves the data using an SSH connection.

Ok, that's the idea. Actually the whole process is implemented using the so-called Call Home protocol, which is just an ssh reverse tunnel.

There are two reasons to work like this: 1) the raspi sitting at the water utility doesn't have a fixed IP nor forwarded ports (to minimize the burden on the IT departments), 2) we do not want to put our server to listen to incoming data connections. The server still needs to have an open port for the incoming call, but has total control on the data retrieval process.

If you have any suggestion or question do not hesitate to contact us.

Stay tuned for updates on this topic!