Symantec Management Platform (Notification Server)

 View Only
Expand all | Collapse all

CPU Usage high post SP1 Upgrade?

  • 1.  CPU Usage high post SP1 Upgrade?

    Posted Jun 25, 2014 10:29 AM
      |   view attached

    Has anyone else seen this with the NS Server post SP1?

    Before SP1 our servers were using on average below 20% cpu utilization.  Now they are almost pegging the CPU consistantly.  Looking at reports from a tool we utilize this happened and corresponds exactly with the day I upgraded to SP1.  We have a heirarchy with 3 NS servers and they all show the same symptoms.

     

    Any thoughts on where I should check to see why this is happening?

    The attached image has an after on top and before on bottom for the same server

     



  • 2.  RE: CPU Usage high post SP1 Upgrade?

    Posted Jun 25, 2014 12:46 PM

    Check your hierarchy health it could have been broken in the upgrade process, i've had a problem before where hierarchy rules where failing and repeating in loop.  Also this doesn't exclude confirming which process use the cpu and checking the logs and reports for hints.



  • 3.  RE: CPU Usage high post SP1 Upgrade?

    Posted Jun 26, 2014 03:25 PM

    I wound up opening a ticket with support for help.  Waiting to hear from them.  I also found out our SQL server is having a ton of deadlocks and maintenance is failing now.  I don't know how much RAM we were using pre SP1 but now the SQL box is consistant and 75 GB of RAM being used too.



  • 4.  RE: CPU Usage high post SP1 Upgrade?

    Posted Jun 26, 2014 03:38 PM

    Sounds like a good plan.

    Failing maintenance can cause sql slowyness that would cause deadlocks that could cause maintenance failing... You need good skills in SQL to fix that.



  • 5.  RE: CPU Usage high post SP1 Upgrade?

    Posted Jun 27, 2014 08:37 AM

    Yeah. Same issue here. CPU has been consistently pegged north of 80% when I haven't been resetting IIS or the Altiris services. Support has been understanding, but I hope they realize that we are in tune with our systems....we NOTICE the slightest of differences and SP1 has brought some major problems. Here are some screenshots of my NS with 80GB of RAM and 16 Cores.

    ALSO...has anyone seens their agent randomly disconnect from the NS? I have seen on MULTIPLE endpoints the SMA say "Disconnected from Notification Server" and then a minute later connect again. Normally I'd think it was a network issue, but this WASNT happening pre-SP1.

     

     



  • 6.  RE: CPU Usage high post SP1 Upgrade?

    Posted Jun 27, 2014 09:00 AM

    I do have this agent disconnect issue pre sp1.  I haven't work on it yet since it doesn't seem to affect normal operations but still it need to be fix.

    I have a ton of this error which I think is related to agent disconnects

    It was not possible to read POST data from agent 3e6636f5-2d42-4691-aa0a-74fcb7312917. This is expected behaviour when agent changed the network or network related issue just happened. Situation will be resolved when agent will retry to get list of task servers next time.
    -----------------------------------------------------------------------------------------------------
    Date: 6/27/2014 8:52:44 AM, Tick Count: 295837144, Host Name: STHA31884, Size: 557 B
    Process: w3wp (7584), Thread ID: 113, Module: w3wp.exe
    Priority: 2, Source: Altiris.TaskManagement.ClientTask.AgentWeb.GetClientTaskServers.WriteResponseImplementation
    File: C:\ProgramData\Symantec\SMP\Logs\a.log

    I need to open a case for this after some minor investigation like ruling out real network problems which is unlikely.  If you have the same behavior then yes I have a similar problem pre sp1.  Maybe it would be better to start one thread per problem to gather comments.



  • 7.  RE: CPU Usage high post SP1 Upgrade?

    Posted Jun 27, 2014 09:55 AM

    well this makes me feel a little better (well not really) but at least its not just me.and you have a way more powerful NS then I do.  Support hinted that I needed more power and thats why it was possibly using CPU like that, your server doing the same thing kinda disproves that in my mind.



  • 8.  RE: CPU Usage high post SP1 Upgrade?

    Posted Jul 02, 2014 10:47 AM

    Yes, we are having the same problems with CPU utilization since our upgrade this past weekend.  

    Just this morning we started having the agent disconnected problem which is how I found this thread. When agents fail to connect, agent logs show "Network Operation: Operation 'Head' failed.  HTTP request Read Headers error, Bad SMP server version".  Note: These are agents on the latest SMP 7.5 agent with sub-agents, fully updated.  

    We have a case open on the CPU issue (since 6/27) but haven't made any progress.  Backline has a profiler trace and logs they are reviewing.  I get the impression in talking with front line, this isn't the only case but they have not been able to give me a workaround or tell me they have identified the cause of the issue.  I did not have these issues in our upgrade on DEV but then again I don't have 13k agents nor the number of filters, targets, SWD policies, patches, etc.  

    I've noticed when clients are connected, they have a very difficult time updating config.  Often there is an instant reply from the server stating "Policy Request Failed: Unexpected response from the URL: https://[fqdn]/Altiris/NS/Agent/GetClientPolicies.aspx': The server is currently busy or paused (0x8004200C)".  I sometimes see errors in the server log from GetClientPoliciesHandler that show the policy request failed.  COM Exception errcode: 0x80000FFFF.  Errors in the Windows application event log show ASP.NET 2.0 warnings: HTTPException, Request timed out.  

    If anyone has any suggested workarounds or any hints on the possible cause for any of these issues, I'd love to hear them.  If you are having similar problems after 7.1, I'd like to hear that too so we can figure out which of these items are possibly related or not.  My own troubleshooting seems to be pointing to the policy config requests being the root cause of the utilization issue but its not much more than a gut feel at this point. 



  • 9.  RE: CPU Usage high post SP1 Upgrade?

    Broadcom Employee
    Posted Jul 02, 2014 02:56 PM
      |   view attached

    I had upgrade from ITMS 7.5 HF6 to ITMS 7.5 SP1 release, and haven't seen such CPU usage.

    Here is attached AppPool list from my two servers, where upgrade was completed and I see that it differs against your list of AppPool.



  • 10.  RE: CPU Usage high post SP1 Upgrade?

    Broadcom Employee
    Posted Jul 04, 2014 06:59 AM

    Hi,

    If you experience high CPU usage (for w3wp process, for example) after SP1 (upgrade/install), try to change Core Setting:

    "SingletonPolicyRequest", which have value of "1" to new value "0"

    The core setting file to be changed located in (sysdrive)\ProgramData\Symantec\SMP\Settings

    "CoreSettings.config"

    Changes can be done with any simple text editor, like notepad.

    This can (potentially) lower CPU usage on some servers.

    Hope this helps,

    Juri.



  • 11.  RE: CPU Usage high post SP1 Upgrade?

    Posted Jul 04, 2014 06:11 PM
    Interesting bit of info I just learned. Got a call from a network and security admin. He received a 3AM call about the domain controllers CPUs being pegged at 100%. Long story short, multiple tools were showing endpoint communicating way too much data to the DC. Started to disable the SMA on a few endpoints and all of a sudden they were falling out of the problematic list of endpoints spamming the DC. Anyone want to check their AD environment?


  • 12.  RE: CPU Usage high post SP1 Upgrade?

    Posted Jul 06, 2014 12:02 AM

    Since many of us seem to be suffering similar issues, here is an update of where we stand post SP1:

    • High CPU utilization.  Per recommendation of support we disabled SingletonPolicyRequest which immediatly solved our high CPU problem.  Juri posted additinoal instructions above.
    • Inability of clients to update config.  Unfortunatly fixing CPU did not solve this issue so they were not related as I originally thought.  This one appears to be caused by patch policies. I removed all my machines from the filter I use for our patch policies and it solved this problem (well, worked around it).  Our logs were bleeding red with all types of errors related to the inability to generate config policies for certain patches.  I have lots of detail on this if anyone is interested and support still working on it but at least I have the ability to deploy software again.
    • Agents randomly disconnecting.  This issue appears to be gone, probably related to one of the two fixes above but I won't know for sure until the holiday weekend is over and all agents are talking again.
    • Patch and Inventory install / upgrade policies not working as desired.  When the NS policy tries to instal one of these agents on a new machine, it tries to install ALL the varities instead of just the proper one.  For example, unix, mac, x86, x64.  Usually they fail and the right one sticks but I found at least one case where the 32-bit inventory agent installed on a x64 server and then the agent would crash every time it started.  I need to do some work to figure out how common this problem may be and how to fix it.
    • I've not seen the AD storm Ray mentioned but haven't looked for it yet either.  Can't figure why agents would be querying AD DCs at all!


  • 13.  RE: CPU Usage high post SP1 Upgrade?

    Posted Jul 08, 2014 08:23 AM

    Joe,

    I did learn through this troubleshooting process that a Basic Inventory does perform LDAP queries to get AD info. Makes sense, but yeah it is interesting.



  • 14.  RE: CPU Usage high post SP1 Upgrade?

    Broadcom Employee
    Posted Jul 08, 2014 10:26 AM

    Just an additional info to Ray's comment: SMA restart also will cause LDAP requests. So if SMA restarted on a lot of clients or if clients constantly send basic inventory then this potentially could cause "PDC flooding" issue.

    Thank you,

    Alex.



  • 15.  RE: CPU Usage high post SP1 Upgrade?

    Broadcom Employee
    Posted Jul 08, 2014 02:32 PM

    Please don't apply this setting yet: it can bring policy assignment problem for clients.



  • 16.  RE: CPU Usage high post SP1 Upgrade?

    Posted Jul 10, 2014 11:11 AM

    So is this somethign we should not do?  

    I did it and it did lessen my CPU load a lot

     



  • 17.  RE: CPU Usage high post SP1 Upgrade?

    Posted Jul 10, 2014 12:25 PM

    So I started to see clients when attempting to get new configuration that they were getting "service unavailable" from the NS.

     

    I changed this setting back and they started working again but my CPU went back to 100%



  • 18.  RE: CPU Usage high post SP1 Upgrade?
    Best Answer

    Posted Jul 10, 2014 03:56 PM

    Good news:  The cause of the high utilization issue has been identified and a hotfix is being prepared for release.  My source tells me it should be available in SIM by early next week.

    To confirm you are experincing this issue, check the "show expired packages" in your agent.  If you are like me you will see LOTS of disabled packages which should have never shown up on the machine being examined.  For example, on my Win 7 laptop I could see patches for different client and server operating systems and products I don't have installed.  Although these policies were showing as disabled (thankfully), they were included in the client config policy resulting in a lot of overhead during client config requests. 

    If you have "Aex SWD Status" events enabled in the agent global settings (events tab), the agents would begin reporting status on all these inactive policies flooding the NS with event notifications.   If you have that setting on, turn it off asap.



  • 19.  RE: CPU Usage high post SP1 Upgrade?

    Posted Jul 10, 2014 04:06 PM

    Note:  I have the hotfix applied and SingletonPolicyRequest set back to True.  Utilization back around 20% and NS no longer flooded with events.



  • 20.  RE: CPU Usage high post SP1 Upgrade?

    Broadcom Employee
    Posted Jul 10, 2014 04:14 PM

    Hi JoeVan!

    1. Did you see any problems on your NS server or client computer, after this pointfix applying?
    2. Now clients are receiving correctly assigned policies to them?
    3. Did you return back only "SingletonPolicyRequest" CoreSetting or also enabled back "AeX SWD Status" events?

    Thanks,

    IP.



  • 21.  RE: CPU Usage high post SP1 Upgrade?

    Posted Jul 10, 2014 05:11 PM

    1) no new issues yet.  only good results.

    2) yes, policies are now correct.  However, historical policies are cached on the client so to truely see the impact it is necessary to delete the agent config XML and SWD XML to see the difference and purge the expired items from the wks cache.  at this point I'm not planning on doing that on all agents unless I determine it is necessary.  In my case my SWD policy XML went from 1.7MB to 12KB (145x smaller).  Despite the local caching, the agent config generation process is much faster and what they download much smaller.

    3) I re-enabled SingltonPolicyRequest.  No problem there.  I do not plan on re-enabling SWD Status.  We do have SWD Package and SWD Execution enabled and use those regularly.  As far as I recall I never used the SWD package event status for anything anyway.  Also, I'm not certain if I turned that back on if agents which have those unwanted SWD status' cached locally would start reporting on them again or not.



  • 22.  RE: CPU Usage high post SP1 Upgrade?

    Posted Jul 11, 2014 02:47 AM

    http://www.symantec.com/docs/TECH222855

    But where can I get this Point fix?



  • 23.  RE: CPU Usage high post SP1 Upgrade?

    Broadcom Employee
    Posted Jul 11, 2014 04:17 AM

    Wonderful! Many thanks for your answers!

    So remaining problem is to remove unnecessary policies/packages from clients.

    • As far as I see details of delivered SWU policy, it has default 365 days.

    SWU_Expire2.jpg

    Do you have same "Package files deleted after" amount of days for expired SWU policies on client?

    Seems like "Windows Patch Remediation Settings" should be considered to purge unnecessary SWU packages.

     

    • But what about Domain controller problem? Does it still get high load level or after point fix applying, DC load is normal?

    Thank you!



  • 24.  RE: CPU Usage high post SP1 Upgrade?

    Posted Jul 11, 2014 04:19 PM

    Igor,  the policies which were inappropriatly added to the config file were not actually downloaded by the clients so there is no concern around purging cached package files.



  • 25.  RE: CPU Usage high post SP1 Upgrade?

    Posted Jul 11, 2014 04:25 PM

    I just got the point fix from my support rep who is working my case and it appears to have worked for me.  I'm down to 10-20% cpu now vs 100%.  Waiting to make sure it stays that way for a day or two before he closes my case, but I think it is fixed.   

    Now onto my other issues :)

     



  • 26.  RE: CPU Usage high post SP1 Upgrade?

    Posted Jul 17, 2014 02:43 AM

    Same here. CPU was causing so many 'The server is currently busy or paused (0x8004200C)' After applying the PointFix (from a call to Symantec Support), CPU is down and clients are checking in (sporadically).

    This is not to say that the 'The server is currently busy or paused (0x8004200C)' errors went away. Due to the number of systems needing to be upgraded, I can expect many of these during the upgrade.