Solutions

High CPU utilization causing slow traffic on the ProxySG appliance.

Solutions ID:    KB4397
Version:    5.0
Status:    Published
Published date:    04/26/2011
Updated:    05/08/2012
 

Problem Description

The CPU Utilization of the SG510 is hitting 100% and  the Internet is slow.

The CPU-Monitor is showing that most of the CPU is consumed by the TCPIP process:

CPU 0                                                100%
     TCPIP                                            37%
     HTTP and FTP                                     25%
     Policy evaluation - HTTP                         18%
     Object Store                                     10%
     DNS service                                       4%
     Access Logging                                    3%
     Authentication                                    2%
     Miscellaneous                                     1%

Looking at interface statistics, the traffic was on the high side for an SG510. Traffic peaked at about 20mbps. Even when the traffic decreased to 8mbps the CPU was still at 100%. The only time the ProxySG appliance's CPU dipped was at the end of the day when the users left the network.

Access-logging/ICAP/policy trace were disabled. DRTR was configured in realtime mode. These changes made no difference in CPU utilization.

There were about 7000 connections in the TCP connection table. Over half were in time-wait state. This was in spite of the config having the 2msl set to 30 seconds.

Resolution

The event-log contains many unknown user errors for different client IP addresses. Tto understand the problem, we have added the following policy to local policy layer:

<Exception>
exception.id=internal_error action.log_internal_error(yes)
exception.id=authentication_failed action.log_internal_error(yes)
define action log_internal_error
   log_message("you are using $(request.header.User-Agent) from $(client.address) and as user $(user.name) in realm $(realm) going to $(url)")
end action log_internal_error

The above policy adds more detail to the errors logged in the event-log. Now there were only a few sites causing the unknown user errors.  In this case, they were ovi.com and nokia.com.
 
What was happening is that mobile phones were synching their email to Microsoft Outlook.  The user-agent being used was Internet Explorer. Users were not logged on to the domain. To fix the issue, we added a rule to bypass authentication for these sites.  Additionally, we added an allow rule to the Web Access Layer (in the VPM) for these sites and placed them above the group= and user= rules. The CPU dropped to 40%.
 
The issue was caused by mobile synch not being allowed out.  The mobile sync client was not sending authentication headers and this resulted in the clients re-transmitting the requests.  Thus, a majority of the ProxySG appliance's CPU was consumed by TCP.  On single processor appliances, such as the SG510, this additional traffic can cause a bottleneck.

 

 

 

 

 

 

 


Rate this Page

Please take a moment to complete this form to help us better serve you.

Did this document help answer your question?
 
 
If you are finished providing feedback, please click the RATE CONTENT button. Otherwise, please add more detail in the following text box and then click RATE CONTENT.
 
 

Your response will be used to improve our document content.

Ask a Question