Overloaded server - various issues

Hi,

I have an genieacs servcer which is very overloaded.
Approximately 16000 CPEs

I am getting various errors, which looks like race contidions (I have not examined the code)
If I increase the number of CPU’s the server has, the issue becomes worse.

Getting various errors cwmp errors like:
exceptionMessage=“Cache snapshot does not exist” exceptionStack=
"Error: Cache snapshot does not exist\n at Bt.get (/opt/genieacs-source/genieacs/lib/local-cache.ts:29:26)

exceptionName=“MongoServerError” exceptionMessage=“Updating the path ‘InternetGatewayDevice.LANDevice.1.WLANConfiguration.1.AssociatedDevice’ would create a conflict at ‘InternetGatewayDevice.LANDevice.1.WLANConfiguration.1.AssociatedDevice’” exceptionStack="MongoServerError: Updating the path ‘InternetGatewayDevice.LANDevice.1.WLANConfiguration.1.AssociatedDevice’ would create a conflict at ‘InternetGatewayDevice.LANDevice.1.WLANConfiguration.1.AssociatedDevice’\n at (/opt/genieacs-source/genieacs/node_modules/mongodb/src/operations/update.ts:146:44)

exceptionName=“MongoExpiredSessionError” exceptionMessage=“Cannot use a session that has ended” exceptionStack="/usr/lib/node_modules/genieacs/node_modules/mongodb/src/sessions.ts:978

exceptionName=“MongoNotConnectedError” exceptionMessage=“Client must be connected before running operations” exceptionStack="/usr/lib/node_modules/genieacs/node_modules/mongodb/src/operations/execute_operation.ts:89

exceptionName=“Error” exceptionMessage=“Cache snapshot does not exist” exceptionStack=
"Error: Cache snapshot does not exist\n at Bt.get (/opt/genieacs-source/genieacs/lib/local-cache.ts:29:26)

Please help!

Kind Regards,

Johan

You probably need to provide more details on your setup. What is your inform interval. Does the MongoDB and GenieACS run on the same server? What are you server specs?

Morning Akcoder,

Setup is (was - see below) a single server running Mongo/CWMP/UI/FS/NBI.
12G RAM / 2 x 6 core CPU.
VM running on Proxmox.

I have in the meantime split it over 4 different servers.
Server 1 (12G RAM / 2 x 6 core CPU) CWMP/UI/FS/NBI
Server 2 (8G RAM / 2 x 4 core CPU) CWMP
Server 3 (8G RAM / 2 x 4 core CPU) CWMP
Server 4 (8G RAM / 2 x 4 core CPU) Mongo

Nginx on Server 1 is load balancing to CWMP on server 1/2/3
Nginx load balancing is ip based, so the same CPE reaches the same CWMP instance every time

I’ve also incresed the timeouts on all 3 CWMP servers
GENIEACS_DEVICE_ONLINE_THRESHOLD=22000
GENIEACS_CONNECTION_REQUEST_TIMEOUT=20000

Still see a lot of this in “systemctl status genieacs-cwmp”:

exceptionName=“Error” exceptionMessage=“Lock expired” exceptionStack="Error: Lock expired\n at Wt (/opt/genieacs-source/genieacs/lib/lock.ts:44:37)

exceptionName=“Error” exceptionMessage=“Cache snapshot does not exist” exceptionStack="Error: Cache snapshot does not exist\n at Bt.get (/opt/genieacs-source/genieacs/lib/local-cache.ts:29:26)

Also a lot of this in /var/log/genieacs/genieacs-cwmp.log:

2024-02-27T13:30:27.871Z [INFO] 41.85.21.63 5895D8-ONT-CXNKD8A85BF8: ACS request; acsRequestId=“18deac1dbef0107” acsRequestName=“GetParameterNames”
2024-02-27T13:30:27.873Z [ERROR] 41.85.21.63 5895D8-ONT-CXNKD8A85BF8: Connection dropped

Happens on all three servers

Kind Regards,

Johan Meiring

O yes. Inform interval was originally set to 10 minutes.
Now updated to 60 minutes.

ONTs are slowly backing off as they successfully connect.

You should probably set your inform interval to 6 hours. Unless you have a need for an hourly inform interval, you are just burning up your infrastructure time.