ACS send HTTP auth header only at every second attempt

Hi,

I encountered a weird behavior with a Sagemcom ONT.

The ONT registers with the ACS just fine, no auth issues in the log, the device ping does work. But when I try to summon the device (or issue a reboot or refresh a parameter), I always got “No contact from CPE”. Periodic inform does work in this state.

When I look at the Wireshark trace, I found something interesting:

Every odd connection-keepalive message sent by the ACS to the ONT contains the HTTP auth header, to these messages the ONT responds with 200 OK.

But every even connection-keepalive message is sent by the ACS to the same ONT does not contain any HTTP auth header at all, to these messages the ONT responds with 401 Unauthorized.

This is a factory defaulted ONT.

Question: why GenieACS does not send the HTTP auth header in every message?

The same ACS work correctly with another ONT vendor…

This is an example about the same ONT contacted by the ACS just a couple seconds apart, two different results:

GenieACS version is: v1.2.9+20220822165235

This is by design. You don’t want any server spraying credentials if its not necessary. When GenieACS gets a 401 (unauthorized) response from the CPE then it tries to authenticate.

Ah, good to know.

Only question is: why do I get “No contact from CPE” when I try to summon, refresh parameters, or issue a reboot?

Sounds like a permissions issue with the CPE. First thing is to disable all forms of authentication, both CPE → ACS and ACS → CPE. Then see if the error goes away.

I am not using “cwmp.auth” at all, the ManagementServer parameters are set by GenieACS automatically:

Device.ManagementServer.ConnectionRequestURL
Device.ManagementServer.ConnectionRequestPassword
Device.ManagementServer.ConnectionRequestUsername

Is there any other settings I need to check or change to make sure it is disabled properly?

Much appreciate your help!

if cwmp.auth is not set, then it defaults to AUTH(USERNAME, PASSWORD) under the covers.

I have seen some CPEs acknowledge to the ACS the SPV for connection request un/pw but not actually update the params on the CPE. Look in the Tr-069 section of your CPE’s UI what it lists for credentials.

This part checked out on the CPE, same as on the ACS side.

Now I modified the prov script to push “acs” as both the username and password:

Device.ManagementServer.Username
Device.ManagementServer.Password

I think that is the default user and pass for the Huawei ONTs, which are working fine.

This is a “summon” of a Huawei ONT:

This is the “summon” of a Sagem ONT (“No contact form CPE”):

Those params are for the CPE → ACS authentication.

Go to Admin → Config and add/update the cwmp.debug to DeviceID.ID = "<your_acs_id>" then look in the genieacs-debug.yaml file. The location of which will be in your /opt/genieacs/genieacs.env file. If you do not have an entry for GENIEACS_DEBUG_FILE in the genieacs.env, then add the entry and point it to a location the ACS has write permissions to. You will need to restart the CWMP process for the change to take effect.

Looking at the normal log, it is interesting that when I hit summons for a Huawei ONT, the log instantly shows the "informEvent=“6 CONNECTION REQUEST” plus the response, but when I hit the summons button for the Sagem ONT, the "informEvent=“6 CONNECTION REQUEST” only shows up in the log 6-7 seconds later, after the webgui displays the “No contact from CPE” error.

Sounds like a timing issue. The way the connection request stuff works is the ACS sends a GET request to the ConnectionRequestURL endpoint of the CPE. Then its up to the CPE to send a 6 CONNECTION REQUEST to the ACS. It sounds like in this case the Sagem is sending the request to the ACS after some sort of internal timeout, maybe 10 seconds?

After 5 seconds:

Can this timeout be increased just to confirm?

You would have to dig into the code. I don’t recall what it defaults to, I thought it was 2-3 seconds (in milliseconds in the code).

I assume you are referring to this parameter:

CONNECTION_REQUEST_TIMEOUT:{type:"int",default:2e3}

?

Yes, you can change that one in the genieacs.env file, set it to something like
CONNECTION_REQUEST_TIMEOUT=10000 for testing purposes. You will need to restart the cwmp process after making that change.

I did that, interestingly nothing changes: the “No contact from CPE” message arrives exactly after the same 4-5 seconds as before.

I tried with:

CONNECTION_REQUEST_TIMEOUT=10000
and
GENIEACS_CONNECTION_REQUEST_TIMEOUT=10000

No change: after 4 seconds the “No contact from CPE” message appears, the CPE responds to the summons around 5-6 seconds. I went through all the “ManagementServer” parameters on the ONT to see if there is any throttling parameters in there, but no.

Maybe there are other parameters that needs changing as well?
@zaidka

@dchard
Do not listen to @Felipe. He does not know of what he speaks. Changing the EXT timeout has no effect on the connection timeout to the CPE.

I believe the GENIEACS_EXT_TIMEOUT=1000 would only change the timeout for external scripts, which I dont have any. So it is quite clear it will not help.

Question is, why the GENIEACS_CONNECTION_REQUEST_TIMEOUT=10000 parameter does not change the behavior at all? The “No contact from CPE” message comes exactly after the same 4 seconds as with the default value, and the CPE responds at 5 seconds. It really seems like GenieACS does not apply this 10 seconds timeout.

MOD:

Interestingly: if I try to refresh the whole datamodel and push the “commit” button right before a periodic inform arrives from the ONT, the data refesh begins, but it gets to a “Session took too long to complete” error, while the ONT is still sending the data, and the data sending of the ONT is really slow. On the Huawei ONT it takes about 6-8 seconds to refresh the whole data model, on the Sagem ONT it takes more than 4 minutes.

MOD2:

The solution (added to genieacs.env):

GENIEACS_CONNECTION_REQUEST_TIMEOUT=9000
GENIEACS_SESSION_TIMEOUT=60
GENIEACS_DEVICE_ONLINE_THRESHOLD=9000

It was not enough to just increase the connection_request_timeout, the device online threshold also needed to be increased. The Session_timeout is increased for the ONT to be able to finish large queries like a full datamodel refresh.

I was not able to edit, so some addition in case someone else might need it:

The GENIEACS_DEVICE_ONLINE_THRESHOLD is 2 second higher in the default configuration compared to the GENIEACS_CONNECTION_REQUEST_TIMEOUT, so I increased the GENIEACS_DEVICE_ONLINE_THRESHOLD to 11000 and kept the GENIEACS_CONNECTION_REQUEST_TIMEOUT at 9000.

Works fine.