Overloaded server - various issues

jmeiring · February 26, 2024, 8:54am

Hi,

I have an genieacs servcer which is very overloaded.
Approximately 16000 CPEs

I am getting various errors, which looks like race contidions (I have not examined the code)
If I increase the number of CPU’s the server has, the issue becomes worse.

Getting various errors cwmp errors like:
exceptionMessage=“Cache snapshot does not exist” exceptionStack=
"Error: Cache snapshot does not exist\n at Bt.get (/opt/genieacs-source/genieacs/lib/local-cache.ts:29:26)

exceptionName=“MongoServerError” exceptionMessage=“Updating the path ‘InternetGatewayDevice.LANDevice.1.WLANConfiguration.1.AssociatedDevice’ would create a conflict at ‘InternetGatewayDevice.LANDevice.1.WLANConfiguration.1.AssociatedDevice’” exceptionStack="MongoServerError: Updating the path ‘InternetGatewayDevice.LANDevice.1.WLANConfiguration.1.AssociatedDevice’ would create a conflict at ‘InternetGatewayDevice.LANDevice.1.WLANConfiguration.1.AssociatedDevice’\n at (/opt/genieacs-source/genieacs/node_modules/mongodb/src/operations/update.ts:146:44)

exceptionName=“MongoExpiredSessionError” exceptionMessage=“Cannot use a session that has ended” exceptionStack="/usr/lib/node_modules/genieacs/node_modules/mongodb/src/sessions.ts:978

exceptionName=“MongoNotConnectedError” exceptionMessage=“Client must be connected before running operations” exceptionStack="/usr/lib/node_modules/genieacs/node_modules/mongodb/src/operations/execute_operation.ts:89

exceptionName=“Error” exceptionMessage=“Cache snapshot does not exist” exceptionStack=
"Error: Cache snapshot does not exist\n at Bt.get (/opt/genieacs-source/genieacs/lib/local-cache.ts:29:26)

Please help!

Kind Regards,

Johan

akcoder · February 26, 2024, 5:39pm

You probably need to provide more details on your setup. What is your inform interval. Does the MongoDB and GenieACS run on the same server? What are you server specs?

jmeiring · February 27, 2024, 1:35pm

Morning Akcoder,

Setup is (was - see below) a single server running Mongo/CWMP/UI/FS/NBI.
12G RAM / 2 x 6 core CPU.
VM running on Proxmox.

I have in the meantime split it over 4 different servers.
Server 1 (12G RAM / 2 x 6 core CPU) CWMP/UI/FS/NBI
Server 2 (8G RAM / 2 x 4 core CPU) CWMP
Server 3 (8G RAM / 2 x 4 core CPU) CWMP
Server 4 (8G RAM / 2 x 4 core CPU) Mongo

Nginx on Server 1 is load balancing to CWMP on server 1/2/3
Nginx load balancing is ip based, so the same CPE reaches the same CWMP instance every time

I’ve also incresed the timeouts on all 3 CWMP servers
GENIEACS_DEVICE_ONLINE_THRESHOLD=22000
GENIEACS_CONNECTION_REQUEST_TIMEOUT=20000

Still see a lot of this in “systemctl status genieacs-cwmp”:

exceptionName=“Error” exceptionMessage=“Lock expired” exceptionStack="Error: Lock expired\n at Wt (/opt/genieacs-source/genieacs/lib/lock.ts:44:37)

exceptionName=“Error” exceptionMessage=“Cache snapshot does not exist” exceptionStack="Error: Cache snapshot does not exist\n at Bt.get (/opt/genieacs-source/genieacs/lib/local-cache.ts:29:26)

Also a lot of this in /var/log/genieacs/genieacs-cwmp.log:

2024-02-27T13:30:27.871Z [INFO] 41.85.21.63 5895D8-ONT-CXNKD8A85BF8: ACS request; acsRequestId=“18deac1dbef0107” acsRequestName=“GetParameterNames”
2024-02-27T13:30:27.873Z [ERROR] 41.85.21.63 5895D8-ONT-CXNKD8A85BF8: Connection dropped

Happens on all three servers

Kind Regards,

Johan Meiring

jmeiring · February 27, 2024, 1:36pm

O yes. Inform interval was originally set to 10 minutes.
Now updated to 60 minutes.

ONTs are slowly backing off as they successfully connect.

akcoder · February 28, 2024, 5:33pm

You should probably set your inform interval to 6 hours. Unless you have a need for an hourly inform interval, you are just burning up your infrastructure time.

Topic		Replies	Views
Cwmp service failed with "Database clock skew too great" log	5	640	December 1, 2023
Cwmp met the worker overloaded Site Feedback	0	275	September 12, 2023
Error: Cache snapshot does not exist	0	361	May 7, 2021
1.2.13 CWMP crash	0	123	August 7, 2024
Error" exceptionMessage="Failed to acquire lock"	2	556	November 30, 2023

Overloaded server - various issues

Related topics