Using extension scripts always leads to unsuccessfully terminated session

I’m trying to call a tiny extension script from a provision. Whenever it is being called the session is terminated unsuccessfully. I read this might be related to the script having a long runtime, but I stripped it down to the bare minimum:

ext/pppmon.js:

function addVals(args, callback) {
    callback(null, {});
}

exports.addValues = addVals;

The provision calling the extension only gets one parameter and passes it to the extension:

let manufacturer = declare('InternetGatewayDevice.DeviceInfo.Manufacturer', {value: Date.now() - (50 * 1000)}).value[0];
ext('pppmon', 'addValues', manufacturer)

If I comment the extension call, the log file looks fine:

2021-08-11T14:12:22.449Z [INFO] 100.64.32.191 00040E-FRITZ%21Box-444E6D6F0761: Inform; cpeRequestId="1498" informEvent="2 PERIODIC" informRetryCount=0
2021-08-11T14:12:22.807Z [INFO] 100.64.32.191 00040E-FRITZ%21Box-444E6D6F0761: ACS request; acsRequestId="17b358e96dd0100" acsRequestName="GetParameterValues"

When calling the extension I get:

2021-08-11T14:13:22.460Z [INFO] 100.64.32.191 00040E-FRITZ%21Box-444E6D6F0761: Inform; cpeRequestId="1499" informEvent="2 PERIODIC" informRetryCount=0
2021-08-11T14:13:22.821Z [INFO] 100.64.32.191 00040E-FRITZ%21Box-444E6D6F0761: ACS request; acsRequestId="17b358f81430100" acsRequestName="GetParameterValues"
2021-08-11T14:13:23.277Z [ERROR] 100.64.32.191 00040E-FRITZ%21Box-444E6D6F0761: Connection dropped
2021-08-11T14:13:23.278Z [WARN] 100.64.32.191 00040E-FRITZ%21Box-444E6D6F0761: Channel has faulted; channel="default" retries=0 faultCode="session_terminated" faultMessage="The TR-069 session was unsuccessfully terminated"
2021-08-11T14:13:23.323Z [ERROR] 100.64.32.191: Invalid session

We are at v1.2.3 and use Apache as a reverse proxy with disablereuse=On. I couldn’t attach PCAP files for pass and fail so I uploaded them here: WeTransfer

Genieacs runs multiple cwmp processes and per cwmp process it will launch the ext script at first use. This loading processes takes time. (I thought I have this issue, because on load my scripts connect to mariadb)
So at my production ACS on reboot the first 16-20 devices will have the problem of an unterminated Connection, but it does not matter for me.

If it is a new device, it will not receive the provisioned tag.
As long it does not have that tag, I set an inform Intervall of 30 seconds. So it will try again in 30 seconds.

I am running behind nginx.

Thanks, I did set the number of CWMP workers to one (GENIEACS_CWMP_WORKER_PROCESSES=1) and restarted genieacs-cwmp. As you said, the first time I’ll get a session_terminated. For the next couple of informs it seems fine. However, I still get an occasional session_terminated roughly in 10% of the cases. From my observation GenieACS won’t try to run the provision for next five informs after a fault happend. Afterwards the fault is cleared and normal operation continues.

In my tests I’m using a “monitoring” provision, which reacts on 2 PERIODIC and queries a couple of PHY parameters. This is used to draw diagrams of that device. So far I queried the NBI for the values. However, that doesn’t scale well when talking about thousands of devices.

So my idea was to let the provision call an extension script storing the preprocessed information in a file named after the serial number. Another external process will later loop over the files, gets the content, deletes the file and pushes the content into a times series database, from which our diagrams are created.

Getting a fault means that I’ll loose 5 datapoints in 10% of the cases, which is not ideal.

The provision + extension looks like:

let values = {};
const serialNumber = declare('DeviceID.SerialNumber', {value: Date.now()}).value[0];

// DeviceLog is not used for diagrams but for over stuff
declare('InternetGatewayDevice.DeviceInfo.DeviceLog', {value: Date.now()});
values['avg_ds_pwr'] = declare('InternetGatewayDevice.WANDevice.1.X_AVM-DE_Optical.Interface.1.OpticalSignalLevel', {value: Date.now()}).value[0] / 1000;
values['avg_us_pwr'] = declare('InternetGatewayDevice.WANDevice.1.X_AVM-DE_Optical.Interface.1.TransmitOpticalLevel', {value: Date.now()}).value[0] / 1000;

ext('pppmon', 'addValues', serialNumber, JSON.stringify(values));

The extension script is:

'use strict';

const fs = require('fs');

function addVals(args, callback) {
    const serial_num = args[0];
    const values = JSON.parse(args[1]);
    const columns = ['avg_ds_pwr', 'avg_ds_snr', 'avg_us_pwr', 'avg_us_snr'];

    fs.writeFile('/tmp/genieacs/' + serial_num, columns.map(k => (k in values) ? values[k] : 'null').join(','), () => {});
    callback(null, {});
}

exports.addValues = addVals;

Of course I could also use the log() function of a provision to get rid of the extension altogether:

let values = {};
const columns = ['avg_ds_pwr', 'avg_ds_snr', 'avg_us_pwr', 'avg_us_snr'];

declare('InternetGatewayDevice.DeviceInfo.DeviceLog', {value: Date.now()});
values['avg_ds_pwr'] = declare('InternetGatewayDevice.WANDevice.1.X_AVM-DE_Optical.Interface.1.OpticalSignalLevel', {value: Date.now()}).value[0] / 1000;
values['avg_us_pwr'] = declare('InternetGatewayDevice.WANDevice.1.X_AVM-DE_Optical.Interface.1.TransmitOpticalLevel', {value: Date.now()}).value[0] / 1000;

log(columns.map(k => (k in values) ? values[k] : 'null').join(','));

This seems much more stable. But this would mean I’ll spam the log and the external process needs to go through the whole log to find the last line of a device/serialnumber to push that into the time-series database.

Oh, your are using as fiber optic FRITZ!Box. I am looking forward to use them in my network.

Back to the topic, I only use logging for trouble devices. There I use a special provision to load the data into genieacs and the export it over the api from genieacs in an asynchronous way.

If your logging/monitoring system can will remove duplicated entry with the same timestamp, you could use the api in a similar way. Especially if your using numeric values.

The api also carries a timestamp for every value.
For those devices I have an inform time of 5 minutes and my cronjob with curl runs every 7 minutes. So I have a file for ever 7 minutes timestamp. (I store the compressed)
Unfortunately this is the best solution for my problem, because I want to preserve the logs from the device, especially if the device has a flipping connect.

I am using that approach to connect the serialnumbers to ip addresses from the device. I switched the logformat to json, to make automated process easy. I run a script every hour for that. In a day, I generate about 270M of logs, bevor compression.

It would also be possible to use a file watcher, e.g. inside node js, to parse the log.
https://nodejs.org/docs/latest/api/fs.html#fs_fs_watch_filename_options_listener

What inform intervals are you using for your standard devices?

Right now we are using a 5 minute inform interval, since we are coming from DOCSIS with a 5 minute SNMP polling cycle.

JSON logformat would be better for programmatic access. However, the format is less readable in case one needs to debug an ill-behaving device.

Can’t you query ManagementServer.ConnectionRequestURL to get the IP address? I think this is what GenieACS is using internally as well to retrieve the IP address of a device. We can query our RADIUS server for this, since all our CWMP devices are also using PPPoE.

Yes and no. I am using TR181 and TR098 devices and some devices are reporting wrong addresses or will report wrong addresses in the future.
e.G. when a device is running in dhcp mode with CGNAT and PCP it will report the external address from the pcp server as connection request url. In our network architecture it is possible, that some devices are only reachable over NAT/PCP and others are directly connection to the acs. So I need the IP that is connection to the acs.
Or if the devices is behind undetected nat, it will report the internal ip of that nat zone. In some cases it is behind the customers nat.
Currently, we are not running PCP on our CGNAT System but it is planed.

We are also using dynamic device / customer linking for provisioning with some of our partner networks. For debugging we have the possibility to use dynamic linking with almost every partner network. This allows us to work with customer-provided equipment.

The ‘Connection dropped’ error is fixed in the latest master branch. v1.2.6 will be out very soon and will include the fix for this issue.

3 Likes

Thx @zaidka . :slightly_smiling_face: Looking forward to the new release!