Setup and Upgrade in System Center Operations Manager 2012

September 12, 2011, 8:46 pm

≫ Next: What Gets Monitored with System Center Operations Manager 2012 Network Monitoring

≪ Previous: Topology changes in System Center 2012 Operations Manager (Overview)

For the Operations Manager 2012 release, we have made some big changes in the setup and upgrade areas to make the deployment experience more simple, intuitive, and easy to use. In this post, I’m going to go through some of the major changes in the deployment experiences to help you better prepare for OM 2012.

Operations Manager 2012 Setup

If you are familiar with previous versions of Operations Manager, you are likely familiar with the MSI installer-based setup process. One of the biggest changes we have made is to abstract out the MSI installers into a single setup experience, launched via setup.exe. Most components can be installed directly through this setup experience, without having to use MSI installers. By launching setup.exe, you get the bootstrapper screen:

From the screen, you can use the “Install” link to launch install for components such as the management server, operations console, reporting, and web console. Additionally, the bootstrapper allows you to launch the MSI installers for agent, gateway server, ACS, and ACS for Unix/Linux.

Let’s walk through some of the screens within the setup experience.

On this screen, we can select the components to install in setup, as well as view descriptions of the different components. You will notice on this list that the operational database and data warehouse database are not explicitly listed out. This is because both of these operational database and data warehouse are now installed by default when installing the first management server in a management group. The operational database has always been a mandatory component in a management group; a new change in 2012 is that the data warehouse is also mandatory. This is done in order to enable key scenarios, such as dashboards, out of box for all users even if they have a very simple all-in-one management group.

Another big change from prior releases is that in 2012, the prerequisite checker is integrated right into the setup experience. Additionally, right within the setup wizard, resolutions are provided to resolve prereqs to help you resolve potential problems much more easily.

Lastly, this screen, which shows if you are installing a management server, asks whether you are creating the first management server in a management group, or adding management servers to an existing management group. The first option implies that you are creating a new management group, and you will be asked to provide the necessary information to create an operational database and a data warehouse for this management group. If you go with the second option, it implies that you already have a management group, and you are adding an additional management server to it. As you know from a prior post on Topology Changes in 2012, adding a second management server to a management group is all you need in order to enable high availability of the SDK and Configuration services in OM 2012.

Agent Configuration Changes

Aside from the setup experience, another place where we have made major changes for 2012 is agent configuration. In the past, one of the biggest problems has been that there is no easy way to determine which management groups an agent reports to, as well as no easy or automatable way to change (add/remove) management groups that an agent reported to without going into Add/Remove Programs. The first part of the Agent Configuration changes is the Agent Control Panel applet, available in the Control Panel under “Operations Manager Agent”.

The applet lists all management groups that the agent belongs to, and provides the ability to add and remove management groups.

If you are interested in automating the process of adding or removing management groups from an agent, we have also added the Agent API that allows you to write scripts that can automate the agent configuration process. The Agent API is documented, along with samples on how to use it within scripts; for more information on how to use the Agent Configuration Library, please refer to http://msdn.microsoft.com/en-us/library/hh328987.aspx.

R2 --> 2012 Upgrade

NOTE: OM 2007 R2 --> OM 2012 Beta Upgrade is NOT supported; please do not try out this upgrade in production environments.

The upgrade procedure from OM 2007 R2 to OM 2012 is design with the mentality of leaving no management group behind; no matter what your R2 management group looks like, there is an upgrade path to bring it forward to OM 2012 without losing your data.

While the specific steps you will use to upgrade will depend on what your management group looks like, the high-level steps for upgrading your management group are the same in all cases:

1. Bring your secondary management servers and gateway servers up to (OM 2012) supported configuration. If needed, move your agents so that they report to servers of supported configuration.

2. Upgrade manually installed agent to OM 2012.

3. Upgrade secondary management servers to OM 2012.

4. Upgrade gateway servers to OM 2012.

5. Upgrade push-installed agents to OM2012.

6. If the RMS meets supported configuration, run upgrade from RMS, which will upgrade the RMS to an OM 2012 management server, upgrade your database, and upgrade your data warehouse (or add one if one doesn’t exist).

7. If the RMS does not meet supported configuration, run upgrade from an OM 2012 management server, which will upgrade your database, and upgrade your data warehouse (or add one if one doesn’t exist), and remove the RS from the management group.

8. Upgrade Reporting server, console, web console, ACS.

The upgrade process can be complicated at times, particularly if your environment is complex and distributed, so the best place to start is with the upgrade flow diagrams available now. A sample is shown below, and the rest of the flow diagrams are available at http://technet.microsoft.com/en-us/systemcenter/om/hh204730.aspx.

One additional point to call out in the OM 2007 R2 à OM 2012 upgrade process is the Upgrade Helper MP. This MP is designed to help you walk through the upgrade process by giving you, via monitors and health states that you are already used to, whether your management servers, gateway servers, and agents (both Windows and XPlat) are upgraded.

Relevant Links

Be sure to check out the OM 2012 Beta deployment guide, available now at http://technet.microsoft.com/en-us/library/hh278852.aspx. Also, check out the new supported configurations document for 2012, available at http://technet.microsoft.com/en-us/library/hh205990.aspx. Use the documents to help you get ready for upgrading at release time by making sure that you get your OM 2007 R2 management group up to supported hardware and operating system/service pack requirements as needed; this will help to ensure that your final upgrade goes much more smoothly. Also, for clean installs, use these documents to plan out what you will need for your 2012 management group, being sure to account for having at least 2 management servers to ensure that you have high availability.

I hope this post helps get you ready for building out or upgrading your environments. Stay tuned for the next post in our series about the Operations Manager 2012 Topology, which will cover resource pools.

Thanks,

Nishtha

Disclaimer

This posting is provided "AS IS" with no warranties, and confers no rights. Use of included utilities are subject to the terms specified at http://www.microsoft.com/info/copyright.htm.

↧

What Gets Monitored with System Center Operations Manager 2012 Network Monitoring

September 20, 2011, 1:54 pm

≫ Next: OpsMgr 2007 R2 now supports SQL 2008 R2 SP1

≪ Previous: Setup and Upgrade in System Center Operations Manager 2012

There are many questions on which components of network devices are monitored for System Center network monitoring. This post will hopefully clear up some of those questions as the post will cover what gets monitored and the conditions where the monitoring will apply. What components of the network device are monitored depends on three things:

What is discovered on the network device
Is monitoring available out of the box for component discovered
Is monitoring is enabled out of the box for component discovered

What Gets Discovered

Devices will be discovered differently depending on the manufacturer, model, and device system Object Identifier (OID). For example a Cisco Catalyst 3560 will be discovered with interfaces, processors, memory, fans, power supplies and temperature sensors. But a Cisco 2950 will only get interfaces, processors, and memory even though the device may have other components like fans, power supplies and temperature sensors. Other devices may only discover the interfaces on the device and no peripheral components will be discovered. The best way to determine what is discovered is to open the diagram view on a network node. Below you can see the memory (MEM), Processor (PSR), and ports discovered in the diagram view.

Interface Discovery

Interfaces are not all discovered equally. Interfaces that have implemented the interface MIB (RFC 2863) and MIB-II (RFC 1213) standards can have more monitoring available than other devices. This can include OperStatus, AdminStatus and performance counters like percent utilization and error packets. Devices that don’t implemented these MIBs may only have the existence of the interface discovered, or the OperStatus may depend on a vender specific MIB. To figure out if a particular interface is monitoring the standard interfaces you can open the health explorer on an interface from the diagram view of the device.

Port 1 is able to be monitored using the standard MIBs. Under the rollup monitor “Interface Status” you can see monitors for AdminStatus and OperStatus are present. Under the Performance monitor you can see the High Discard rollup monitor contains specific monitors to check the health of input and output rates the port is processing. However, in the picture you can see that even though the interface can be monitored using the standard MIBs in this case no monitoring is enabled.

Interface 30 on the device is an example of an interface where performance counters will not be collected or monitored. The interface is monitored only through AdminStatus and OperStatus as seen by the two monitors for status under the Interface Status aggregate monitor. Looking under the High Discard Percentage aggregate monitor you can see the High Input Rate Discard monitor is missing compared to Port1 above indicating performance counters won’t be collected.

What Gets Monitored

Peripheral Component Monitoring

Processor and memory can be monitored out of the box on devices where components are discovered. http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=26831 has information on which devices will have processor and memory monitored. Other peripherals like fans, power, cards, and temperature will not get monitoring out of the box when discovered. Through the Authoring Pane in the Operations Manager console SNMP based rules and monitors can be created to monitor these components.

Port and Interface Monitoring

Rather than monitoring every interface out of the box only those interfaces that were discovered as connected are monitored. This is done to avoid noisy alerts on interfaces that are not connected, and to avoid excess monitoring on interfaces that won’t return a valid performance counter. This means the default state for an interface is to have all the monitoring disabled. For those interfaces that are known to be connected monitoring will be enabled. Interfaces you want to monitor the monitoring can also be enabled. Interfaces are only monitored if they are a member of one of three groups in Operations Manager. All three groups enable all the standard interface workflows assuming the standard MIBs were supported. These groups will also enable any vender specific workflows for the interface.

Relay Network Adapters Group

This group contains interfaces that are connecting two devices. When a full discovery is run containing both the devices, then the interfaces linking the two devices are added to this group and monitoring will become enabled.

Managed Computer Network Adapters Group

An agent computer that is directly connected to a device will have the connecting interface added to this group. For this to work the management group must have the Windows Operating System management pack for the agent’s operating system, the Windows Client Network Discovery management pack and the Windows Server Network Discovery management pack. The full discovery of the agent’s operating system has to be completed including the discovery of the agent’s network adapters. Then when network discovery runs for the device it should stitch the port on the device to the network adapter on the computer and add the port to this group.

Critical Network Adapters Group

This group can be updated through the Operations Manager console under authoring. You can add any of the interfaces to this group and the monitoring will be enabled. If the interface connected to your web server isn’t monitored then adding the interface to this group will give you alerts on problems with the interface connected to the web server.

Advanced Network Adapters Group

There is also a fourth group which behaves a little differently than the other three groups mentioned. This group turns on some extra advanced workflows for interfaces that won’t be enabled by the other groups. These workflows are disabled out of the box because they are they are often a duplication of performance counters already collected. These are advanced performance counters like Cisco Collision packets which are already reported as part error packets in the monitoring by the other three groups. If you want visibility into a particular performance metric than adding the interface to this group is one way to get that extra data.

Network Monitoring Troubleshooting

When trying to figure out whether your Network monitoring is working correctly ask these questions to see what monitoring is taking effect.

Did Discovery Work?

Before any monitoring starts the discovery of the device needs to be completed successfully. The network monitoring has very similar dependencies to the discovery methods as both are SNMP based. Be sure in the discovery rule you specified the device should be monitored via SNMP Only or SNMP and ICMP. A future blog post should cover trouble shooting the network discovery.

What was Discovered?

The next thing to check is what was discovered on the network device. Use the Diagram view of the network device in the console to see what components of the device were discovered. Use the health explorer on the interfaces to see if performance counter and status monitoring is available or not. If your device is not getting components discovered and not getting performance counters monitoring than likely the device doesn’t support the standard MIBs.

Interface Monitoring Enabled?

Check to see if the interface you want to be monitored is a member of one of the network monitoring groups. You can view this in the Authoring pane of the console under groups.

Network Monitoring Management Pool Availability

When the Network Discovery rule was created a management pool was specified to use to monitor the devices. By default this will be the All Management Servers pool, but using a specific pool for Network Monitoring servers is advised. If the Network Device is behind a firewall or remote, then using a specific pool will be necessary. Check the Discovery rules in the Administration Pane of the Console to see which management pool should be monitoring the devices. Then check the Resource Pools in the Administration Pane to be sure the management server resource pool only contains servers that can contact the network device. It might be necessary to create multiple network discovery rules and management server resource pools to be sure your network monitoring is being fired from the correct locations.

11013 Event - SNMP Get Timeout

When the SNMP workflows timeout because a reply from the device was not received in time the Health Service will log 11013 events to the Operations Manager Log. With the out of the box workflows, Operations Manager will retry to the SNMP query on the next
interval. There is a monitor in Operations Manager to detect these events.

Log Name: Operations Manager

Source: Health Service Modules

Event ID: 11013

SNMP GET request to IP Address 10.11.64.25 has timed out. This can be due to the device being offline or to the workflow using incorrect credentials.

Possible Resolutions:

Device is offline, bringing the device online will cause the SNMP query to be successful.
Device is overloaded – the device is too busy cannot respond in time.
Another device in the path is having issues –misrouted packets, or a queue on a device in between the Operations Manager server and the target device
Device is “remote”, look at installing a gateway closer to the device and reporting the data back as opposed to trying to monitor the device from a central management server.

11009 Event – SNMP Get failure

When a network device is queried for a particular value it might not be present. Then the Health Service will log an 11009 event in the Operations Manager log. The workflows that were using this value will be unloaded.

Log Name: Operations Manager

Source: Health Service Modules

Event ID: 11009

Error in SNMP GET response from IP Address: 10.11.64.68, Status: noSuchInstance(129).

One or more workflows were affected by this.

OID: .1.3.6.1.2.1.10.7.2.1.2.268

Workflow name: System.NetworkManagement.MIB2_dot3.NetworkAdapter.InputPacketErrorPct

Instance name: PORT- 268

Possible Resolutions

Stale Discovery Data – Device has been reconfigured since the last discovery and Operations Manager is attempting to monitor a component that no longer exists on the device.
If the device doesn’t support the workflow, then a solution is to disable the workflows utilizing the value for the device. This will prevent these workflows loading and failing in the future.
Possibly a device issue, try updating the Firmware and OS on the device
Possible a discovery issue where the instance is being discovered incorrectly. For example Operations Manager is expecting to monitor a performance counter but this is a virtual interface and the counters are not present for the interface. Try running a re-discovery for the device.

Disclaimer

This posting is provided "AS IS" with no warranties, and confers no rights. Use of included utilities are subject to the terms specified at http://www.microsoft.com/info/copyright.htm.

↧

OpsMgr 2007 R2 now supports SQL 2008 R2 SP1

October 11, 2011, 7:19 pm

≫ Next: KB: Configuration may not update in System Center Operations Manager 2007

≪ Previous: What Gets Monitored with System Center Operations Manager 2012 Network Monitoring

OM Community,

System Center Operations Manager 2007 R2 now supports SQL Server 2008 R2 SP1. Note: We will have the Supported Configuration posted in the next few weeks to make this more official, but feel free to go ahead and install it.

Thanks!

Nishtha Soni | Program Manager

↧

KB: Configuration may not update in System Center Operations Manager 2007

February 29, 2012, 10:34 am

≫ Next: KB: Configuring the SharePoint 2010 Management Pack for System Center Operations Manager

≪ Previous: OpsMgr 2007 R2 now supports SQL 2008 R2 SP1

Here’s a new Knowledge Base article we published today. This one talks about troubleshooting an issue where configuration doesn’t update in System Center Operations Manager 2007:

=====

Symptoms

You may experience one or more of the following symptoms in a System Center Operations Manager 2007 Management Group:

Newly installed agents display as "Not Monitored" in the Operations Console, yet existing agents are monitored.
One or more monitors on one or more agents may not change state when healthy or unhealthy conditions are met.
Agents show as being in maintenance mode in the Operations Console, yet the workflows are not actually unloaded by the System Center Management service on the monitored computer.
Configuration changes, new rules or monitors, or overrides are not applied to some agents.
The Operations Manager event log on one or more agents will display event 21026, indicating that the current configuration is still valid, even though the configuration for these agents should have been updated.
The file "OpsMgrConnector.Config.xml" in the management group folder under "Health Service State"\"Connector Configuration Cache" does not update for long periods of time relative to the rest of the management group on one or more agents.

In addition, the Operations Manager event log may display one or more event with an ID 29106 when the System Center Configuration Management service restarts. For example

Log Name: Operations Manager
Source: OpsMgr Config Service
Event ID: 29106
Level: Warning
Description:
The request to synchronize state for OpsMgr Health Service identified by "da4d36df-ce22-8930-e6d4-45b783e9fdb1" failed due to the following exception "System.Collections.Generic.KeyNotFoundException: The given key was not present in the dictionary.

Log Name: Operations Manager
Source: OpsMgr Config Service
Event ID: 29106
Level: Warning
Description:
The request to synchronize state for OpsMgr Health Service identified by "fc1c815b-c0c4-242d-ae27-30db4ef99b54" failed due to the following exception "Microsoft.EnterpriseManagement.Common.DataItemDoesNotExistException: TypedManagedEntityId = 'ac8f3d08-ee2a-ae21-0e46-19c3da794183' is deleted.

Collecting ETL logs against the Configuration Service at INF level might reveal lines similar to that below:

3326 [ConfigurationChangeSetProvider.UpdateQueryTimestampFromResults] [configurationchangesetprovider_cs595]( 000000000343A92F )Timestamp = 04/11/2074 08:57:09.        
3327 [DatabaseAccessor.NotifyOnChanges] [databasenotification_cs329]( 0000000002E4BD4E )Firing change notification.        
3328 [ConfigurationEngine.DatabaseHelper.OnConfigurationChange] [configurationengine_cs499]( 00000000023546E1 )IsIncremental=True, NumberOfChanges=0        
3329 [StateManager.CollectDirty] [statemanager_cs39]( 00000000035D75A8 )State=274cda45-6031-c0e2-3659-0072251f5655 is dirty        
< large number of additional GUIDS >        
3432 [StateManager.CollectDirty] [statemanager_cs39]( 00000000035D75A8 )State=6ec4fb2d-d1c1-72a8-32e6-fe26df42aba8 is dirty        
3433 [StateManager.CollectDirty] [statemanager_cs45]( 00000000035D75A8 )NumberOfDirtyStates=104
3434 [ConfigurationEngine.CommunicationHelper.NotifyDirtyStatesTask.Run] [configurationengine_cs869]Completed successfully        
3435 [DatabaseAccessor.GetPollingIntervalMillisecondsTimeSpan] [databaseaccessor_cs126]Database polling interval 0 milliseconds

Note the timestamp in line 3326 is set to 04/11/2074. If this appears in ETL logging, use the SQL queries in the "More Information" section to confirm the condition listed in the "cause" section exists.

Cause

The System Center Management Configuration service uses a timestamp to determine when new configuration data needs to be calculated for agents and management servers. If the system clock on an agent is faster than the system clock on the RMS, discovery data from this agent will set the timestamp for one or more managed instances hosted by that agent to the current agent system clock time. The System Center Configuration Management service will delay calculating configuration updates for the instances on that agent until the system clock on the RMS is current with the timestamp for that discovery data. If the agent system clock was significantly faster than RMS system time when discovery data was sent, or the agent continues to send data with a future timestamp, then it is possible that the management group would experience the symptoms listed above.

Setting the agent system clock time to match the RMS system clock time will not reset the timestamp for the existing discovery data and the issue will remain until the RMS system clock time exceeds the discovery data by the grooming interval, when the obsolete discovery data will be groomed normally.

Resolution

1) The system clocks for all agents and management servers in the management group must not significantly exceed the system clock on the RMS when submitting ANY data. If any agents or management servers have system clocks more than a few minutes faster than the RMS, they should be corrected first to avoid any additional data with future timestamps being added to the database.

2) The future timestamps for the discovery data that has already been submitted must be modified in the OperationsManager database to reflect the current time.

3) The System Center Configuration Management service and System Center Management service on the RMS must be restarted after both the above conditions are met.

More Information

1) Use the following three queries to determine if this condition exists. The queries must be run against the OperationsManager database. If the timestamp with the greatest value in the table is greater than the current time (in UTC format), then the condition exists.

Select GetUTCDate()as 'Current Time',       
MAX(TimeGeneratedOfLastSnapshot) as 'DiscoverySource Timestamp' from DiscoverySource 
Select GetUTCDate()as 'Current Time',       
MAX(timegenerated) as 'DiscoverySourceToTypedManagedEntity Timestamp' from DiscoverySourceToTypedManagedEntity
Select GetUTCDate()as 'Current Time',       
MAX(timegenerated) as 'DiscoverySourceToRelationship Timestamp' from DiscoverySourceToRelationship

2) The following three queries can be used to determine which computers may have submitted discovery data with a future timestamp. If the system clocks on these agents are not current, set them to current time before taking any additional action.

-- Find all computers with DiscoverySource Timestamp more than one day in future --        
Select DisplayName, *         
from BaseManagedEntity        
where BaseManagedEntityID in        
(select BaseManagedEntityId from BaseManagedEntity BME        
join DiscoverySource DS on DS.BoundManagedEntityId = BME.BaseManagedEntityId        
where DS.TimeGeneratedOfLastSnapshot > DATEADD (d, 1, GETUTCDATE())        
and FullName like 'Microsoft.Windows.Computer%')
-- Find all computers with DiscoverySourceToTypedManagedEntity Timestamp more than one day in future --        
Select DisplayName, *         
from BaseManagedEntity        
where BaseManagedEntityID in        
(select BaseManagedEntityId from BaseManagedEntity BME        
join DiscoverySourceToTypedManagedEntity DSTME on DSTME.TypedManagedEntityId = BME.BaseManagedEntityId        
where DSTME.TimeGenerated > DATEADD (d, 1, GETUTCDATE())        
and FullName like 'Microsoft.Windows.Computer%')
-- Find all computers with DiscoverySourceToRelationship Timestamp more than one day in future --         
Select DisplayName, *         
from BaseManagedEntity        
where BaseManagedEntityID in        
(select BaseManagedEntityId from BaseManagedEntity BME        
join DiscoverySource DS on DS.BoundManagedEntityId = BME.BaseManagedEntityId        
join DiscoverySourceToRelationship DSR on DSR.DiscoverySourceId = DS.DiscoverySourceId        
where DSR.TimeGenerated > DATEADD (d, 1, GETUTCDATE())        
and FullName like 'Microsoft.Windows.Computer%')

3) To correct the existing data, run the following commands against the affected tables.

Update DiscoverySource
Set TimeGeneratedOfLastSnapshot = GETUTCDATE()
where TimeGeneratedOfLastSnapshot > GETUTCDATE()
Update DiscoverySourceToTypedManagedEntity
Set TimeGenerated = GETUTCDATE()
where TimeGenerated > GETUTCDATE()
Update DiscoverySourceToRelationship
Set TimeGenerated = GETUTCDATE()
where TimeGenerated > GETUTCDATE()

4) The following query can be used to see what additional data has been submitted to the database with a timestamp in the future. The tables related to maintenance mode should have several rows, assuming there are agents currently in maintenance mode which is scheduled to end at some time. All other tables should have timestamps with the current time, or in the past.

/* */       
/* The following query will search all tables in the database */        
/* for columns with datetime datatypes. It will then return */        
/* the total number of rows in each table that have values */        
/* greater than the configured number of days from present. */        
/* Times are all in UTC format. The default increment is */        
/* 3 days, but can be adjusted as needed. */        
/* */        
DECLARE @tabname AS sysname;        
DECLARE @colname AS sysname;        
DECLARE @fcontin AS tinyint;        
DECLARE @query AS nvarchar(max);
CREATE TABLE #work       
(        
TableName sysname,        
ColumnName sysname,        
NumRows int,        
);
DECLARE cur_meta CURSOR FOR       
SELECT t.Name 'Table',        
c.Name 'Column'        
FROM sys.columns c        
INNER JOIN sys.tables t ON c.object_id = t.object_id        
INNER JOIN sys.types y ON c.system_type_id = y.system_type_id        
WHERE y.Name = 'datetime';
/* Change the increment in the DATEADD(dd,3,GETUTCDATE()) function */       
/* as needed from the default of +3 days from current time */         
OPEN cur_meta;        
SET @fcontin = 1;        
WHILE (@fcontin > 0)        
BEGIN        
FETCH cur_meta INTO @tabname, @colname;        
IF (@@FETCH_STATUS < 0)        
BREAK;        
PRINT 'Table = '+ @tabname + ', Column = ' + @colname;        
SET @query = 'SELECT ''' + @tabname        
+ ''', ''' + @colname        
+ ''', COUNT(*) FROM ' + QUOTENAME(@tabname)        
+ ' WHERE ' + QUOTENAME(@colname) + ' > DATEADD(dd,3,GETUTCDATE())';        
INSERT INTO #work        
EXECUTE ( @query );        
END        
CLOSE cur_meta;        
DEALLOCATE cur_meta;
SELECT *       
FROM #work        
ORDER BY 3 DESC;        
DROP TABLE #work;

=====

For the most current version of this article please see the following:

2635742 : Configuration may not update in System Center Operations Manager 2007

J.C. Hornbeck| System Center & Security Knowledge Engineer

Get the latest System Center news onFacebookandTwitter:

The Forefront Server Protection blog: http://blogs.technet.com/b/fss/
The Forefront Endpoint Security blog : http://blogs.technet.com/b/clientsecurity/
The Forefront Identity Manager blog : http://blogs.msdn.com/b/ms-identity-support/
The Forefront TMG blog: http://blogs.technet.com/b/isablog/
The Forefront UAG blog: http://blogs.technet.com/b/edgeaccessblog/

↧

KB: Configuring the SharePoint 2010 Management Pack for System Center Operations Manager

April 16, 2012, 12:25 pm

≫ Next: KB: How to troubleshoot Event ID 2115 in Operations Manager

≪ Previous: KB: Configuration may not update in System Center Operations Manager 2007

Here’s another new Knowledge Base article we published today. This one goes through some common tips and troubleshooting for the SharePoint 2010 Management Pack for System Center Operations Manager:

=====

Summary

This article describes the steps to troubleshoot or configure the SharePoint 2010 Management Pack for the following scenarios:

No Access to SharePoint Foundation 2010 and SharePoint Server 2010 Databases
Configuring the Run As Account Association
Unable to monitor multiple farms in local domain or remote domains
How to run the Configuration Task
Unable to run the "Configure SharePoint Management Pack" task in System Center 2012 Operations Manager
Unable to monitor SharePoint 2010 Databases
User Education - Isolating Discoveries
User Education - Sync Time Overrides
User Education - Adding Workflows to SharePoint Config file

Scenario 1 - No Access to SharePoint Foundation 2010 and SharePoint Server 2010 databases
Symptom:

Computers are populated in the "Unidentified Machines" view in the Operations Manager console under Monitoring -> SharePoint 2010 Products -> Unidentified Machines

Several views in the Console under Monitoring -> SharePoint 2010 Products are "blank" or "Not Monitored", these views include:
Administration - Not monitored
Content Databases - Blank
Diagram View - Not monitored
Events - Blank
Farms - Blank
Performance - Blank
Servers - Blank
Service Front Ends - Blank
Services - Blank
Shared Services - Blank
SPHA Rules - Blank
Web Applications - Blank

Resolution: Set the proper permissions on the SharePoint Foundation and SharePoint Server 2010 databases. Enable debug tracing to determine where errors may occur.

Required Permissions
The required permissions for the configured run as account on an individual SharePoint farm are:

Local admin on all SharePoint 2010 Front End and Application Servers
Local admin on all SQL machines that host SharePoint 2010 databases
Full Farm Administrator rights within SharePoint 2010
DBO for all SharePoint databases

NOTE All SharePoint Foundation 2010 and SharePoint Server 2010 databases created during initial setup require the above permissions.
Below is a list of some of the databases in SharePoint Foundation 2010 and SharePoint Server 2010 which require DBO permissions. This is not a complete list as it depends on your specific configuration.

Application_Registry_Service
Bdc_Service_DB
Managed Metadata Service
PerformancePoint Service Application
Search_Service_CrawlStoreDB
Search_Service_DB
Search_Service_Application_PropertyStoreDB
Secure_Store_Service_DB
SharePoint_Config
SharePoint_AdminContent
StateService
User Profile Services Application_ProfileDB
User Profile Services Application_SocialDB
User Profile Services Application_SyncDB
User Profile Services Application_ReportingDB
User Profile Services Application_StagingDB
WebAnalyticsServiceApplication_ReportingDB
WebAnalyticsServiceApplication_StagingDB
WordAutomationServices
WSS_Content
WSS_Logging

NOTE The initial installation account for SharePoint 2010 Foundation and SharePoint 2010 Server already has the necessary permissions required in all databases created during initial installation. It is recommended that you use this installation account to configure the SharePoint Foundation 2010 and SharePoint Server 2010 Management Packs. If requirements for security call for the creation of a new account for the management pack administration and discovery, take into account that you will have to duplicate the same permissions already granted to the SharePoint installation account.

NOTE For a clustered installation of the root management server the SharePointMP.config file must exist in the following directory

For x86 bit version of Windows Server Operating Systems %Program Files (x86)\System Center Management Packs\Microsoft SharePoint 2010 Products OpsMgr 2007 MP en-us

For 64-bit version of Windows Server 2008 or Windows Server 2008 R2 copy the SharePointMP.config file to the %Program Files\System Center Management Packs from the %ProgramFiles(x86)%\System Center Management Packs directory

To grant a new account full farm administrator rights:

Open SharePoint 2010 Central Administration.
On the left panel click on Security.
In the middle pane right below Users click on “Manage the Farm Administrators Group”.
If the account you initially installed SharePoint is not already there. Then add SharePoint Run As Account to the group.
Adding the Operations Manager SharePoint account - in the top left corner click on drop down arrow next to “New” and choose Add Users.
Click on the small book icon (browse).
Type in the name of the Operations Manager SharePoint Action Account.
Click on the search icon and wait until it returns the Operations Manager SharePoint Action Account.
Click on the Add button.
Click OK.

Enable Debug Tracing
Enabling Debug Tracing will enable debug trace on those agent computers that run Windows PowerShell script based discoveries and SPHA monitors. By default it is turned off. When it is enabled, the script based discoveries and monitors will write debug trace information to Event Log in Operations Manager channel on all agent computers, and all the debug trace events have an event ID of 0.
To enable debug tracing do the following:

In the Operations Console Select Monitoring.
Select SharePoint 2010 Products.
Select Administration view.
On the Actions panel, click the task named “Set DebugTrace for SharePoint Management Pack”. A Run Task window will popup.
To enable debug trace (the default option), click Run. To disable debug trace, click Override.
Set the Enabled parameter value to “False” in the popup dialog.
Click Override to close the dialog.
Click Run.
Wait for the task to finish in Task Status window, and then check the Task Output to ensure that the task completes successfully.
Click Close.

How to use debug tracing
Run the “Set DebugTrace For SharePoint Management Pack” task then rerun the Admin Task, and then go to Operations Manager Event channel on the server and check events with ID = 0. Look for the timestamp in the event log and then check the SharePoint ULS trace log to ensure that it is the case
For more information about the ULS trace log, see the SharePoint Foundation 2010 documentation on TechNet (http://technet.microsoft.com/en-us/sharepoint/ee263910.aspx ).

Configure the "More Secure Option"
The More Secure option will deliver the configured credentials only to the machines specified in this section. The credentials sent will be for the purpose of discovering and monitoring the SharePoint farms specified. The machines in this list should be the same machines specified in the SharePointMP.config file. The requirement is to have all distributed application components listed for each individual farm. This would include front end server and SQL servers that host the SharePoint databases or any component thereof.

To configure the More Secure Option do the following:

Option 1 - Create the Run As Account and configure

Open the Operations Manager Console.
Go to the Administration tab.
Expand the Security node.
Right-click Run As Accounts.
Select Create Run As Account and Click Next.
Set the Run As Account Type as "Windows", give it a Display Name and Click Next.
Enter in the Credentials for the Active Directory Domain User Account and Click Next.
Select "More Secure" option and add all of the servers that are part of the SharePoint farm. This will include all SharePoint Frontend, Application and SQL Servers for that SharePoint farm.
Click Create.

Option 2 - Configure an already existing account

Open the Operations Manager Console and navigate to the Administration Tab.
Expand the Run As Configuration node and highlight Accounts.
In the middle panel open an existing Run As Account from the middle pane under Type: Windows
right click the account and choose properties.
Click on the Distribution Tab.
Select "More Secure" option and add all of the servers that are part of the SharePoint farm. This will include all SharePoint Frontend, Application and SQL Servers for that SharePoint farm.
Click OK.

NOTE Distribution of Security ensures that all the servers that are part of the SP Farm are selected and included here. We recommend having one set of Operations Manager servers monitor only one SharePoint farm. We do not recommend having multi-homed agent computer (SharePoint servers that are monitored in multiple Operations Manager management groups).

Scenario 2 - Configuring the Run As Account Association
Symptoms

Several views in the Console under Monitoring -> SharePoint 2010 Products are "blank" or "Not Monitored", these views include:

Administration - Not monitored

Content Databases - Blank

Diagram View - Not monitored

Events - Blank

Farms - Blank

Performance - Blank

Servers - Blank

Service Front Ends - Blank

Services - Blank

Shared Services - Blank

SPHA Rules - Blank

Web Applications - Blank

The following error message can be seen when the Run As Account Association is not configured properly due to syntax
Example:

The Event Policy for the process started at 10:44:13 PM has detected errors in the output. The 'StdErr' policy expression: 
.+ 
matched the following output: 
Account OpsMgr SharePoint Action Account doesn't exist 
Failed to find RunAs account OpsMgr SharePoint Action Account 
Command executed: "C:\Windows\system32\cmd.exe" /c powershell.exe -NoLogo -NoProfile 
-Noninteractive "$ep = get-executionpolicy; if ($ep -gt 'RemoteSigned') {set-executionpolicy 
remotesigned} & '"C:\Program Files\System Center Operations Manager 2007\Health Service State\
Monitoring Host Temporary Files 32\9687\AdminTask.ps1"' 'SharePointMP.Config'" 
Working Directory: C:\Program Files\System Center Management Packs\ 
One or more workflows were affected by this. 
Workflow name: Microsoft.SharePoint.Foundation.2010.ConfigSharePoint 
Instance name: Microsoft SharePoint 2010 Farm Group 
Instance ID: {B7E9A5AF-62D1-CF79-0AE8-044AE7CECBD7} 
Management group: XXX 
Error Code: -2130771918 (Unknown error (0x80ff0032))

Machines that do not have SharePoint Foundation 2010 or SharePoint Server 2010 installed are discovered as SharePoint 2010 Servers.

Resolution - Configure the Run As Account association, configure the Machine Name association and configure the "More Secure Option".
Configure the "More Secure Option"

The More Secure option will deliver the configured credentials only to the machines specified in this section. The credentials sent will be for the purpose of discovering and monitoring the SharePoint farms specified. The machines in this list should be the same machines specified in the SharePointMP.config file. The requirement is to have all distributed application component listed for a specific farm. This would include front end server and SQL servers that host the SharePoint database.
To configure the More Secure Option do the following:

Option 1 - Create the Run As Account and configure

Open the Operations Manager Console.
Go to the Administration tab.
Expand the Security node.
Right-click Run As Accounts.
Select Create Run As Account and Click Next.
Set the Run As Account Type as "Windows", give it a Display Name and Click Next.
Enter in the Credentials for the Active Directory Domain User Account and Click Next.
Select "More Secure" option and add all of the servers that are part of the SharePoint farm. This will include all SharePoint Frontend, Application and SQL Servers for that SharePoint farm.
Click Create.

Option 2 - Configure an already existing account

Open the Operations Manager Console and navigate to the Administration Tab.
Expand the Run As Configuration node and highlight Accounts.
In the middle panel open an existing Run As Account from the middle pane under Type: Windows
right click the account and choose properties.
Click on the Distribution Tab.
Select "More Secure" option and add all of the servers that are part of the SharePoint farm. This will include all SharePoint Frontend, Application and SQL Servers for that SharePoint farm.
Click OK.

NOTE Distribution of Security ensures that all the servers that are part of the SharePoint Farm are selected and included here. We recommend having one set of Operations Manager servers monitor only one SharePoint farm. We do not recommend having multi-homed agent computer (SharePoint servers that are monitored in multiple Operations Manager management groups).

Configuring the Run As Account Association
The Run As Account needs to be associated within the SharePoint Management Pack config file. If not configured correctly you will not be able to discover the SharePoint Servers.

To configure the SharePointMP.config file:

Navigate to <drive>:\Program Files (x86)\System Center Management Packs\Microsoft SharePoint 2010 Products OpsMgr 2007 MP en-us
NOTE For 64-bit version of Windows Server 2008 or Windows Server 2008 R2 copy the SharePointMP.config file to the %Program Files\System Center Management Packs from the %ProgramFiles(x86)%\System Center Management Packs directory
Right click the SharePointMP.config file and choose edit
Locate the section as shown below
Example:
<Association Account="SharePoint Discovery/Monitoring Account" Type="Agent">
Change this section to reflect the “Display Name” of the Run Account you have previously configured as the Run As Account for the SharePoint farm.
Now this section should look like this
<Association Account="SPAdmin" Type="Agent">
or
<Association Account="Domain\SPAdmin" Type="Agent">

NOTE Do not confuse this with the actual active directory domain user account.

Configuration of Machine Names
Configure the machine name of all the servers that are part of the SharePoint Farm and match the "More Secure" section of the Run As Account used for the SharePoint 2010 farm.

NOTE To confirm this name run a hostname command from a command prompt on the servers either locally or remotely for each computer that is part of the farm.
To configure the SharePointMP.config file:

Navigate to <drive>:\Program Files (x86)\System Center Management Packs\Microsoft SharePoint 2010 Products OpsMgr 2007 MP en-us
NOTE For 64-bit version of Windows Server 2008 or Windows Server 2008 R2 copy the SharePointMp.config file to the %Program Files\System Center Management Packs from the %ProgramFiles(x86)%\System Center Management Packs directory.
NOTE For a clustered root management servers the same procedure must be performed on both nodes of the cluster.
Right click the SharePointMP.config file and choose edit.
Find the section as shown below
Example:
<Machine Name="" />
<Machine Name="" />
</Association>
Change this section to include the SharePoint Server names for example:
<Machine Name="SRV1" />
<Machine Name="SRV2" />
</Association>

Confirm the Run As Account has been configured
To confirm the Run As Account has been configured:

Open the Operations Manager event log.
Look for event ID 7026 - open this event - this should indicate that the run as account for the SharePoint MP has successfully logged on.

NOTE: An event ID 7000 in the Operations Manager event log indicates that the run as account for the SharePoint MP has failed to log on.

Log Name: Operations Manager
Source: HealthService
Date:
Event ID: 7000
Task Category: Health Service
Level: Error
Keywords: Classic
User: N/A
Computer:
Description:
The Health Service could not log on the RunAs account contoso\spadmin for management group <MGName>. The error is Logon failure: unknown user name or bad password.(1326L). This will prevent the health service from monitoring or performing actions using this RunAs account
Additionally you may also see the following events
Log Name: Operations Manager
Source: HealthService
Date:
Event ID: 7021
Task Category: Health Service
Level: Error
Keywords: Classic
User: N/A
Computer:
Description:
The Health Service was unable to validate any user accounts in management group <MGName>.

Log Name: Operations Manager
Source: HealthService
Date:
Event ID: 7015
Task Category: Health Service
Level: Error
Keywords: Classic
User: N/A
Computer:
Description:
The Health Service cannot verify the future validity of the RunAs account contoso\spadmin for management group <MGName>. The error is Logon failure: unknown user name or bad password.(1326L).

Scenario 3 - Unable to monitor multiple farms in local domain or remote domains

Symptom:

Only one server farm is discovered as seen from the Monitoring -> SharePoint 2010 Products -> Farms state view. Servers for other farms show up in the Monitoring -> SharePoint 2010 Products -> Unidentified Machines state view

Resolution: Configure the SharePointMP.config to discover more than one servers farm

Required Permissions
The required permissions for each individual SharePoint farm run as account are:

Local admin on all SharePoint Front End and Application servers
Local admin on all SQL boxes that host SharePoint 2010 Databases
Full Farm Administrator rights within SharePoint 2010
DBO for all SharePoint databases

For a clustered installation of the root management server the SharePointMP.config file must exist in the same directory as described above in each individual node of the cluster.

Configure the "More Secure Option"
The More Secure option will deliver the configured credentials only to the machines specified in this section.
The credentials sent will be for the purpose of discovering and monitoring the SharePoint farms specified. The machines in this list should be the same machines specified in the SharePointMP.config file. The requirement is to have all distributed application component listed for a specific farm. This would include front end server and SQL servers that host the SharePoint database.

To configure the More Secure Option do the following:

Option 1 - Create the Run As Account and configure

Open the Operations Manager Console.
Go to the Administration tab.
Expand the Security node.
Right-click Run As Accounts.
Select Create Run As Account and Click Next.
Set the Run As Account Type as "Windows", give it a Display Name and Click Next.
Enter in the Credentials for the Active Directory Domain User Account and Click Next.
Select "More Secure" option and add all of the servers that are part of the SharePoint farm. This will include all SharePoint Frontend, Application and SQL Servers for that SharePoint farm.
Click Create.

Option 2 - Configure an already existing account

Open the Operations Manager Console and navigate to the Administration Tab.
Expand the Run As Configuration node and highlight Accounts.
In the middle panel open an existing Run As Account from the middle pane under Type: Windows
right click the account and choose properties.
Click on the Distribution Tab.
Select "More Secure" option and add all of the servers that are part of the SharePoint farm. This will include all SharePoint Frontend, Application and SQL Servers for that SharePoint farm.
Click OK.

Example Scenario: You have 3 farms residing in 2 different domains.

Contoso - SharePoint Farm Administrator 1 is associated with the farm administrator account for the first SharePoint farm in contoso.com domain and uses the Domain Account SPADMIN1

Contoso - SharePoint Farm Administrator 2 is associated with the farm administrator account for the second SharePoint farm in contoso.com domain and uses the Domain Account SPADMIN2
Fabrikam - SharePoint 2010 Farm Administrator is associated with the farm administrator account for the third SharePoint farm in fabrikam.com domain and uses the Domain Account FKSPADMIN

NOTE For the remote domain Fabrikam.com it is assumed that you have a reliable link using an Operations Management Gateway Server or a two way full trust for the domains.

Using the display name of the Run As Account in the Administration -> Run As Configuration -> Accounts -> Type: Windows
To configure the SharePointMP.config file:

Navigate to the:
Drive:\Program Files (x86)\System Center Management Packs\Microsoft SharePoint 2010 Products OpsMgr 2007 MP en-us
NOTE For 64-bit version of Windows Server 2008 or Windows Server 2008 R2 copy the SharePointMP.config file to the %Program Files\System Center Management Packs from the %ProgramFiles(x86)%\System Center Management Packs directory
NOTE For a clustered installation of the root management server the SharePointMP.config must exist in the same directory as described above in each individual node of the cluster.
Right click the SharePointMP.config file and choose edit.

Find the "Association" and "Machine Name" section in the SharePointMP.config file as shown in the example below

</Annotation>
  <Association Account="SharePoint Discovery/Monitoring Account" Type="Agent">
    <Machine Name="" />
  </Association>

Change the "Association" and "Machine Name" section to read as followed in this example:

<Association Account="Contoso - SharePoint Farm Administrator 1" Type="Agent"> 
<Machine Name="Contoso1" /> 
<Machine Name="Contoso2" /> 
<Machine Name="Contoso3" /> 
<Machine Name="Contoso4" /> 
<Machine Name="Contoso5" /> 
<Machine Name="Contoso6" /> 
</Association>

<Association Account="Contoso - SharePoint Farm Administrator 2" Type="Agent"> 
<Machine Name="Constosrv1" /> 
<Machine Name="Constosrv2" /> 
<Machine Name="Constosrv3" /> 
</Association>

<Association Account="Fabrikam - SharePoint 2010 Farm Administrator" Type="Agent"> 
<Machine Name="Fabrikam1" /> 
<Machine Name="Fabrikam2" /> 
<Machine Name=" Fabrikam3" /> 
</Association>

Confirm the Run As Account has been configured
To confirm the Run As Account has been configured:

Open the Operations Manager event log.
Look for event ID 7026 - open this event - this should indicate that the run as account for the SharePoint MP has successfully logged on.

NOTE An event ID 7000 in the Operations Manager event log indicates that the run as account for the SharePoint MP has failed to log on.

Log Name: Operations Manager
Source: HealthService
Date:
Event ID: 7000
Task Category: Health Service
Level: Error
Keywords: Classic
User: N/A
Computer: SRV1.contoso.com
Description:
The Health Service could not log on the RunAs account contoso\spadmin for management group <MGNAME>. The error is Logon failure: unknown user name or bad password.(1326L). This will prevent the health service from monitoring or performing actions using this RunAs account
Additionally you may also see the following events

Log Name: Operations Manager
Source: HealthService
Date:
Event ID: 7021
Task Category: Health Service
Level: Error
Keywords: Classic
User: N/A
Computer: SRV1.contoso.com
Description:
The Health Service was unable to validate any user accounts in management group <MGNAME>.

Log Name: Operations Manager
Source: HealthService
Date:
Event ID: 7015
Task Category: Health Service
Level: Error
Keywords: Classic
User: N/A
Computer: SP2010SRV1.contoso.com
Description:
The Health Service cannot verify the future validity of the RunAs account contoso\spadmin for management group <MGNAME>. The error is Logon failure: unknown user name or bad password.(1326L).

Scenario4- How to Run the Configuration Task

Symptom: Unable to run configuration task the following error(s) are generated

Example 1

Exception calling ".ctor" with "1" argument(s): "The user Contoso\SPAdmin does not have sufficient permission to perform the operation." 
Failed to connect to local management group 
Command executed: "C:\Windows\system32\cmd.exe" /c powershell.exe -NoLogo -NoProfile -Noninteractive "$ep = get-executionpolicy; if 
($ep -gt 'RemoteSigned') {set-executionpolicy remotesigned} & '"C:\Program Files\System Center Operations Manager 2007\Health Service 
State\Monitoring Host Temporary Files 49\5037\AdminTask.ps1"' 'SharePointMP.Config'" 
Working Directory: C:\Program Files\System Center Management Packs\ 
One or more workflows were affected by this. 
Workflow name: Microsoft.SharePoint.Foundation.2010.ConfigSharePoint 
Instance name: Microsoft SharePoint 2010 Farm Group 
Instance ID: {B7E9A5AF-62D1-CF79-0AE8-044AE7CECBD7} 
Management group: XXX
Error Code: -2130771918 (Unknown error (0x80ff0032)).

Example 2

The Event Policy for the process started at 10:44:13 PM has detected errors in the output. The 'StdErr' policy expression: 
.+ 
matched the following output: 
Account OpsMgr SharePoint Action Account doesn't exist 
Failed to find RunAs account OpsMgr SharePoint Action Account 
Command executed: "C:\Windows\system32\cmd.exe" /c powershell.exe -NoLogo -NoProfile -Noninteractive "$ep = get-executionpolicy; if 
($ep -gt 'RemoteSigned') {set-executionpolicy remotesigned} & '"C:\Program Files\System Center Operations Manager 2007\Health Service 
State\Monitoring Host Temporary Files 32\9687\AdminTask.ps1"' 'SharePointMP.Config'" 
Working Directory: C:\Program Files\System Center Management Packs\ 
One or more workflows were affected by this. 
Workflow name: Microsoft.SharePoint.Foundation.2010.ConfigSharePoint 
Instance name: Microsoft SharePoint 2010 Farm Group 
Instance ID: {B7E9A5AF-62D1-CF79-0AE8-044AE7CECBD7} 
Management group: XXX 
Error Code: -2130771918 (Unknown error (0x80ff0032)).

Resolution: Add the Run As Account Operations Manager Administrator Role

To add the Run As Account being used to execute the task

Open the Operations Console.
Navigate to Administration.
Click on Security.
Click on User Roles.
Click on Operations Manager Administrators.
Add the account running the task as part of the Operations Manager Administrators role.
NOTEFor 64-bit version of Windows Server 2008 or Windows Server 2008 R2 copy the SharePointMP.config file to the %Program Files\System Center Management Packs from the %ProgramFiles(x86)%\System Center Management Packs directory
NOTE For a clustered installation of the root management server the SharePointMP.config must exist in the same directory as described above in each individual node of the cluster.

Configure SharePoint Management Pack Task

The admin task configures the management pack by ensuring the existence of an override management pack, associating 'RunAs' account(s) to servers, enabling proxy settings, and initiating discoveries.

To run the "Configure SharePoint Management Pack" task do the following

Open the Operations Manager Console.
Click on the Monitoring tab of the console.
Expand the SharePoint 2010 Products view.
Click on the Administration state view.
On the Actions Pane, under Microsoft SharePoint 2010 Farm Group Tasks, click on the Configure SharePoint Management Pack.
Select the appropriate task credentials (preferably the SharePoint Admin Run As Account you have previously setup)
Click Run and wait for the task to finish successfully.
Click Close

Example of successful task

Configure SharePoint Management Pack Task Description
This admin task configures the management pack by ensuring the existence of an override management pack, associating 'RunAs' account(s) to servers, enabling proxy settings, and initiating discoveries.
Status:Success
Scheduled Time:
Start Time:
Submitted By:CONTOSO\SPADMIN
Run As:
Run Location:
Target:
Target Type:Microsoft SharePoint 2010 Farm Group
Category:Operations
Task Output:
Output
Load configuration file SharePointMP.Config
Configure Microsoft.SharePoint.Foundation.2010 version 14.0.4744.1000
Found override management pack Microsoft.SharePoint.Foundation.2010.Override version 1.0.0.0
Change 'SyncTime' configuration override to 20:06 for Microsoft.SharePoint.Foundation.2010.WSSInstallation.Discovery
Microsoft.SharePoint.Foundation.2010.WSSInstallation.Discovery does not have configuration TimeoutSeconds
Change 'SyncTime' configuration override to 20:08 for Microsoft.SharePoint.Foundation.2010.SPFarm.Discovery
Change 'SyncTime' configuration override to 20:14 for Microsoft.SharePoint.Foundation.2010.SPService.Discovery
Change 'SyncTime' configuration override to 20:20 for Microsoft.SharePoint.Foundation.2010.SPSharedService.Discovery
Change 'SyncTime' configuration override to 20:26 for Microsoft.SharePoint.Foundation.2010.SPHARule.Discovery
Change 'SyncTime' configuration override to 20:32 for Microsoft.SharePoint.Foundation.2010.SPHARuleMonitor.Availability
Change 'SyncTime' configuration override to 20:32 for Microsoft.SharePoint.Foundation.2010.SPHARuleMonitor.Security
Change 'SyncTime' configuration override to 20:32 for Microsoft.SharePoint.Foundation.2010.SPHARuleMonitor.Performance
Change 'SyncTime' configuration override to 20:32 for Microsoft.SharePoint.Foundation.2010.SPHARuleMonitor.Configuration
Change 'SyncTime' configuration override to 20:32 for Microsoft.SharePoint.Foundation.2010.SPHARuleMonitor.Custom
Change 'SyncTime' configuration override to 20:38 for Microsoft.SharePoint.Foundation.2010.SPHARuleMonitor.SPServer.Availability
Change 'SyncTime' configuration override to 20:38 for Microsoft.SharePoint.Foundation.2010.SPHARuleMonitor.SPServer.Security
Change 'SyncTime' configuration override to 20:38 for Microsoft.SharePoint.Foundation.2010.SPHARuleMonitor.SPServer.Performance
Change 'SyncTime' configuration override to 20:38 for Microsoft.SharePoint.Foundation.2010.SPHARuleMonitor.SPServer.Configuration
Change 'SyncTime' configuration override to 20:38 for Microsoft.SharePoint.Foundation.2010.SPHARuleMonitor.SPServer.Custom
SharePoint management pack configuration completed successfully
Error
None
Exit Code: 0

Scenario 5 - Unable to run the "Configure SharePoint Management Pack" task in System Center Operations Manager 2012

Symptom: The following error message is generated

The Event Policy for the process started at 6:51:29 PM has detected errors in the output. The 'StdErr' policy expression: 
.+ 
matched the following output: 
Exception calling "ImportManagementPack" with "1" argument(s): "This method from the System Center Operations Manager 2007 
R2 SDK is not supported to work with System Center Operations Manager 2012. Please migrate to the System Center Operations Manager 2012 SDK." 
Failed to create override management pack Microsoft.SharePoint.Foundation.2010.Override 
Command executed: "C:\Windows\system32\cmd.exe" /c powershell.exe -NoLogo -NoProfile -Noninteractive "$ep = get-executionpolicy; 
if ($ep -gt 'RemoteSigned') {set-executionpolicy remotesigned} & '"E:\Program Files\System Center 2012\Operations Manager\Server\Health Service 
State\Monitoring Host Temporary Files 11\7481\AdminTask.ps1"' 'SharePointMP.Config'" 
Working Directory: C:\Program Files\System Center Management Packs\ 
One or more workflows were affected by this. 
Workflow name: Microsoft.SharePoint.Foundation.2010.ConfigSharePoint 
Instance name: Microsoft SharePoint 2010 Farm Group 
Instance ID: {B7E9A5AF-62D1-CF79-0AE8-044AE7CECBD7} 
Management group: XXX
Error Code: -2130771918 (Unknown error (0x80ff0032)).

Resolution: Call Microsoft Support to provide an updated version of the management pack compatible with the System Center Operations Manager 2012 SDK

Scenario 6 - Unable to monitor SharePoint 2010 Databases

Symptom:

Critical alerts are generated in the active alerts view under Monitoring -> SharePoint 2010 Products -> Active Alerts

SharePoint: Database Connection Failed Alert Description
Source: Configuration Database A critical incident has occurred where the connection to database Data Source=sp2010srv2;Initial Catalog=SharePoint_Config;Integrated Security=True;Enlist=False;Connect Timeout=15 failed.
Path: Configuration Database
Alert Monitor: SQL Database Connection Failed
Created:
Alert Context:
Date and Time
HRESULT -2147217805
Result Data Source could not be initialized
Error Message Format of the initialization string does not conform to the OLE DB specification.
Initialization Time 23
Open Time 0
Execution Time 0
Fetch Time 0
Result Set Input Data Item

The SQL Database Connection Failed monitors are showing critical under the following views

Monitoring -> SharePoint 2010 Products ->
Configuration Databases
Content Databases
Shared Services
Diagram View
Resolution: Create a new override for the connection string value on the SQL Database Connection Failed monitors
To create the Override needed do the following:

1. From the Monitoring -> SharePoint 2010 Products ->Active Alerts view select an affected monitor
2. Under the Alert Details (bottom pane) take notice of the Alert Description. It should look like this

Example:
Alert Description
A critical incident has occurred where the connection to database Data Source=sp2010srv2;Initial Catalog=SharePoint_Config;Integrated Security=True;Enlist=False;Connect Timeout=15 failed

3. Copy and Paste the text to a text editor such as notepad.exe
4. Right Click on the monitor once again and select View or edit the settings of this monitor
5. In the SQL database Connection Failed Properties windows select the overrides tab and click on the override button
6. Select the option "For a specific object of class: XXX"

Example:
For a specific object of class: SharePoint Configuration Database

7. In the Select Object under matching objects select the appropriate matching object and click OK

Example:
Configuration Database

8. Override ConnectionString parameter value from

Example:
Provider=SQLOLEDB;$Target/Property[Type="Microsoft.SharePoint.Foundation.2010.SPDatabase"]/ConnectionString$

Provider=SQLOLEDB;Data Source=SP2010srv2;Initial Catalog=SharePoint_Config;Integrated Security=SSPI;Enlist=False;Connect Timeout=15

9. Create an new override management pack or save to an existing override management pack other than the and save the changes by clicking OK

NOTE Since each individual database needs its own unique database string that corresponds to its database name (Initial Catalog), you will need to modify the previously copied connection string the alert description of the monitor and change Integrated Security from True

Examples:
Data Source=sp2010srv2;Initial Catalog=SharePoint_Config;Integrated Security=SSPI;Enlist=False;Connect Timeout=15
Data Source=sp2010srv2;Initial Catalog=SharePoint_AdminContent_0ada3e0b-a0f6-4af5-a311-34bcedb1c4eb;Integrated Security=True;Enlist=False;Connect Timeout=15
Data Source=sp2010srv2;Initial Catalog=WSS_Content;Integrated Security=SSPI;Enlist=False;Connect Timeout=15
Data Source=sp2010srv2;Initial Catalog=Bdc_Service_DB_17ab85413d424b84ac58ea247e7f5b47;Integrated Security=SSPI;Enlist=False;Connect Timeout=15
Data Source=sp2010srv2;Initial Catalog=Search_Service_Application_CrawlStoreDB_04e2a4bcdb974275954c0ab090d8a0aa;Integrated Security=SSPI;Enlist=False;Connect Timeout=15

User Education - Sync Time Overrides

We recommend using the defaults values in place for sync time. If the default values are not deemed appropriate for our environment take special considerations on the performance impact this may cause by changing these values.
SyncTime overrides are particularly useful during failed discovery troubleshooting. By overriding the default values you can configure the start time of different workflows and isolate discovery problems.
SyncTime (start time) is a property of discoveries and monitors. SyncTime is a string value in the format of "HH:mm". SyncTime, IntervalSeconds and Management Pack Import time together determine the exact run time of a given workflow.
The BaseStartTime attribute can have value in the form of "HH:mm" or integer. "HH:mm" format works as the start time alignment based on which
the cycle repeats. Integer format functions as setting the alignment start time to be the current time plus that many seconds. Be aware that
if you set integer value, every time you rerun the admin task, the cycle start time is recalculated.
The Length attribute specifies the length (in seconds) of each cycle.
The Spacing attribute specifies the spacing time (in seconds) between one workflow's timeout time and the next workflow's start time.
For example, if IntervalSeconds = 21600 (6 hours) and SyncTime = "01:15", the possible run time of the workflow is 1:15AM, 7:15AM, 1:15PM, 7:15PM; if the Management Pack is imported after 1:15AM but before 7:15AM, it will start at 7:15AM, if the Management Pack is imported after 1:15PM but before 7:15PM, it will start at 7:15PM. However, due to other factors such as network delay the actual start time may still vary. Do not change the default SyncTime value unless absolutely required.
So in case you imported the MP at 03:00 PM and the Interval seconds is set to every 8 hrs.=(28,800 seconds) and you configured the sync time to be “03:00” then it will sync at 11:00 PM or 8 hours after the sync time was setup when you imported the MP.
Possible error messages when not configuring this properly are shown below
Example 1

The Event Policy for the process started at 6:46:08 PM has detected errors in the output. The 'StdErr' policy expression: 
.+ 
matched the following output: 
Cycle length 60 is not long enough to ensure the order of workflows 
Please change cycle length to no less than 360 or decrease times, timeout values, and/or spacing 
Command executed: "C:\Windows\system32\cmd.exe" /c powershell.exe -NoLogo -NoProfile -Noninteractive "$ep = get-executionpolicy; if 
($ep -gt 'RemoteSigned') {set-executionpolicy remotesigned} & '"C:\Program Files\System Center Operations Manager 2007\Health Service 
State\Monitoring Host Temporary Files 22\9315\AdminTask.ps1"' 'SharePointMP.Config'" 
Working Directory: C:\Program Files\System Center Management Packs\ 
One or more workflows were affected by this. 
Workflow name: Microsoft.SharePoint.Foundation.2010.ConfigSharePoint 
Instance name: Microsoft SharePoint 2010 Farm Group 
Instance ID: {B7E9A5AF-62D1-CF79-0AE8-044AE7CECBD7} 
Management group: XXX 
Error Code: -2130771918 (Unknown error (0x80ff0032)).

Example 2

The Event Policy for the process started at 6:42:01 PM has detected errors in the output. The 'StdErr' policy expression: 
.+ 
matched the following output: 
Cycle length must be in whole minutes (times of 60) 
Length value 500 is undefined or invalid 
Command executed: "C:\Windows\system32\cmd.exe" /c powershell.exe -NoLogo -NoProfile -Noninteractive "$ep = get-executionpolicy; if 
($ep -gt 'RemoteSigned') {set-executionpolicy remotesigned} & '"C:\Program Files\System Center Operations Manager 2007\Health Service 
State\Monitoring Host Temporary Files 21\9314\AdminTask.ps1"' 'SharePointMP.Config'" 
Working Directory: C:\Program Files\System Center Management Packs\ 
One or more workflows were affected by this. 
Workflow name: Microsoft.SharePoint.Foundation.2010.ConfigSharePoint 
Instance name: Microsoft SharePoint 2010 Farm Group 
Instance ID: {B7E9A5AF-62D1-CF79-0AE8-044AE7CECBD7} 
Management group: XXX 
Error Code: -2130771918 (Unknown error (0x80ff0032)).

User Education - Isolating Discoveries

The following example sets the run time of the discovery to run 5 minutes after running the configuration task for a single workflow that has been failing.

</Annotation>
<WorkflowCycle BaseStartTime="+5" Length="6240" Spacing="15">
<Workflow Id="SPFarm.Discovery" Type="Discovery" Times="1" />

If starting this procedure at 7:35 PM then configure the override as followed when viewed from the Authoring->Management Pack Objects -> Overrides view in the console to start at 7:40 PM
SyncTime Override Value = 19:40
Interval Seconds = 6240

Enable Debug Tracing

Enabling Debug Tracing will enable debug trace on those agent computers that run Windows PowerShell script based discoveries and SPHA monitors. By default it is turned off. When it is enabled, the script based discoveries and monitors will write debug trace information to Event Log in Operations Manager channel on all agent computers, and all the debug trace events have an event ID of 0.
To enable debug tracing do the following:

In the Operations Console Select Monitoring.
Select SharePoint 2010 Products.
Select Administration view.
On the Actions panel, click the task named “Set DebugTrace for SharePoint Management Pack”. A Run Task window will popup.
To enable debug trace (the default option), click Run. To disable debug trace, click Override.
Set the Enabled parameter value to “False” in the popup dialog.
Click Override to close the dialog.
Click Run.
Wait for the task to finish in Task Status window, and then check the Task Output to ensure that the task completes successfully.
Click Close.

How to use debug tracing

Run the “Set DebugTrace For SharePoint Management Pack” task then rerun the Admin Task, and then go to Operations Manager Event channel on the server and check events with ID = 0. Look for the timestamp in the event log and then check the SharePoint ULS trace log to ensure that it is the case
For more information about the ULS trace log, see the SharePoint Foundation 2010 documentation on TechNet (http://technet.microsoft.com/en-us/sharepoint/ee263910.aspx ).

For Advanced Users:

For further troubleshooting of failed discoveries you can use the Operations Manager 2007 R2 Workflow Analyzer which is part of the Operations Manager 2007 R2 MP Authoring Resource Kit http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=18222as
As well as use enabling diagnostic tracing in System Center Operations Manager 2007 http://support.microsoft.com/kb/942864

User Education - Adding Workflows to SharePoint Config file
If you want to add workflows to discover both SharePoint Foundation 2010 and SharePoint 2010 Products
Default <WorkflowCycle BaseStartTime="+300" Length="28800" Spacing="60">
<Workflow Id="WSSInstallation.Discovery" Type="Discovery" Times="1" />
<Workflow Id="SPFarm.Discovery" Type="Discovery" Times="1" />
<Workflow Id="SPService.Discovery" Type="Discovery" Times="4" />
<Workflow Id="SPSharedService.Discovery" Type="Discovery" Times="4" />
<Workflow Id="SPHARule.Discovery" Type="Discovery" Times="1" />
<Workflow Id="SPHARuleMonitor.Availability;SPHARuleMonitor.Security;SPHARuleMonitor.Performance;SPHARuleMonitor.Configuration;SPHARuleMonitor.Custom" Type="Monitor" Times="8" />
<Workflow Id="SPHARuleMonitor.SPServer.Availability;SPHARuleMonitor.SPServer.Security;SPHARuleMonitor.SPServer.Performance;SPHARuleMonitor.
SPServer.Configuration;SPHARuleMonitor.SPServer.Custom" Type="Monitor" Times="8" />
</WorkflowCycle>

Add the following section to the SharePointMp.config file
<Workflow Id="MOSSInstallation.Discovery;WACInstallation.Discovery;SearchExpressInstallation.Discovery;SearchStandardInstallation.Discovery" management pack="Microsoft.SharePoint.Server.2010" Type="Discovery" Times="1" />
<Workflow Id="SPService.Discovery" management pack="Microsoft.SharePoint.Server.2010" Type="Discovery" Times="4" />
<Workflow Id="SPSharedService.Discovery" management pack="Microsoft.SharePoint.Server.2010" Type="Discovery" Times="4" />
<Workflow Id="SPSharedService.Discovery.WAC" management pack="Microsoft.SharePoint.Server.2010" Type="Discovery" Times="4" />

The configuration file should now look like this
<WorkflowCycle BaseStartTime="+300" Length="28800" Spacing="60">
<Workflow Id="WSSInstallation.Discovery" Type="Discovery" Times="1" />
<Workflow Id="MOSSInstallation.Discovery;WACInstallation.Discovery;SearchExpressInstallation.Discovery;SearchStandardInstallation.Discovery" management pack="Microsoft.SharePoint.Server.2010" Type="Discovery" Times="1" />
<Workflow Id="SPFarm.Discovery" Type="Discovery" Times="1" />
<Workflow Id="SPService.Discovery" Type="Discovery" Times="4" />
<Workflow Id="SPSharedService.Discovery" Type="Discovery" Times="4" />
<Workflow Id="SPService.Discovery" management pack="Microsoft.SharePoint.Server.2010" Type="Discovery" Times="4" />
<Workflow Id="SPSharedService.Discovery" management pack="Microsoft.SharePoint.Server.2010" Type="Discovery" Times="4" />
<Workflow Id="SPSharedService.Discovery.WAC" management pack="Microsoft.SharePoint.Server.2010" Type="Discovery" Times="4" />
<Workflow Id="SPHARule.Discovery" Type="Discovery" Times="1" />
<Workflow Id="SPHARuleMonitor.Availability;SPHARuleMonitor.Security;SPHARuleMonitor.Performance;SPHARuleMonitor.Configuration;SPHARuleMonitor.Custom" Type="Monitor" Times="8" />
<Workflow Id="SPHARuleMonitor.SPServer.Availability;SPHARuleMonitor.SPServer.Security;SPHARuleMonitor.SPServer.Performance;SPHARuleMonitor.SPServer.Configuration;SPHARuleMonitor.SPServer.Custom" Type="Monitor" Times="8" />
</WorkflowCycle>

=====

For the most current version of this article please see the following:

2690744 : Configuring the SharePoint 2010 Management Pack for System Center Operations Manager

J.C. Hornbeck| System Center & Security Knowledge Engineer

Get the latest System Center news onFacebookandTwitter:

↧

KB: How to troubleshoot Event ID 2115 in Operations Manager

April 17, 2012, 9:35 am

≫ Next: How to get Knowledge Editing to work in Operations Manager 2012 with Office 2010

≪ Previous: KB: Configuring the SharePoint 2010 Management Pack for System Center Operations Manager

Here’s another new Knowledge Base article we published today. This one goes through some common troubleshooting tips for eliminating Event IDs 2115 in Operations Manager:

=====

Symptoms

In Operations Manager, one of the performance concerns surrounds Operations Manager Database and Data Warehouse insertion times. The following is a description to help identify and troubleshoot problems concerning Database and Data Warehouse data insertion.

Examine the Operations Manager Event log for the presence of Event ID 2115 events. These events typically indicate that performance issues exist on the Management Server or the Microsoft SQL Server that is hosting the OperationsManager or OperationsManager Data Warehouse databases. Database and Data Warehouse write action workflows run on the Management Servers and these workflows first retain the data received from the Agents and Gateway Servers in an internal buffer. They then gather this data from the internal buffer and insert it into the Database and Data Warehouse. When the first data insertion has completed, the workflows will then create another batch.

The size of each batch of data depends on how much data is available in the buffer when the batch is created, however there is a maximum limit on the size of the data batch of up to 5000 data items. If the data item incoming rate increases, or the data item insertion throughput to the Operation Manager and Data Warehouse databases throughput is reduced, the buffer will then accumulate more data and the batch size will grow larger. There are several write action workflows that run on a Management Server. These workflows handle data insertion to the Operations Manager and Data Warehouse databases for different data types. For example:

Microsoft.SystemCenter.DataWarehouse.CollectEntityHealthStateChange
Microsoft.SystemCenter.DataWarehouse.CollectPerformanceData
Microsoft.SystemCenter.DataWarehouse.CollectEventData
Microsoft.SystemCenter.CollectAlerts
Microsoft.SystemCenter.CollectEntityState
Microsoft.SystemCenter.CollectPublishedEntityState
Microsoft.SystemCenter.CollectDiscoveryData
Microsoft.SystemCenter.CollectSignatureData
Microsoft.SystemCenter.CollectEventData

When a Database or Data Warehouse write action workflow on a Management Server experiences slow data batch insertion, for example times in excess of 60 seconds, it will begin logging Event ID 2115 to the Operations Manager Event log. This event is logged every one minute until the data batch is inserted into the Database or Data Warehouse, or the data is dropped by the write action workflow module. As a result, Event ID 2115 will be logged due to the latency inserting data into the Database or Data Warehouse. Below is an example Event logged due to data dropped by the write action workflow module:

Event Type: Error
Event Source: HealthService
Event Category: None
Event ID: 4506
Computer: <RMS NAME>
Description:
Data was dropped due to too much outstanding data in rule "Microsoft.SystemCenter.OperationalDataReporting.SubmitOperationalDataFailed.Alert" running for instance <RMS NAME> with id:"{F56EB161-4ABE-5BC7-610F-4365524F294E}" in management group <MANAGEMENT GROUP NAME>.

Event ID 2115 contains 2 significant pieces of information. First, the name of the Workflow that is experiencing the problem and second, the elapsed time since the workflow began inserting the last batch of data.

For example:

Log Name: Operations Manager
Source: HealthService
Event ID: 2115
Level: Warning
Computer: <RMS NAME>
Description:
A Bind Data Source in Management Group <MANGEMENT GROUP NAME> has posted items to the workflow, but has not received a response in 300 seconds. This indicates a performance or functional problem with the workflow.
Workflow Id : Microsoft.SystemCenter.CollectPublishedEntityState
Instance : <RMS NAME>
Instance Id : {88676CDF-E284-7838-AC70-E898DA1720CB}

This particular Event ID 2115 message indicates that the workflow Microsoft.SystemCenter.CollectPublishedEntityState, which writes Entity State data to the Operations Manager database, is trying to insert a batch of Entity State data and it started 300 seconds ago. In this example the insertion of the Entity State data has not yet finished. Normally inserting a batch of data should complete within 60 seconds. If the Workflow Id contains Data Warehouse then the problem concerns the Operations Manager Data Warehouse. Otherwise, the problem would concern inserting data into the Operations Manager database.

Cause

As the description of Event ID 2115 states, this may indicate a database performance problem or too much data incoming from the agents. Event ID 2115 simply indicates there is a backlog inserting data into the Database; Operations Manager or Operations Manager Data Warehouse. These Events can originate from a number of possible causes. For example, a large amount of Discovery data, a Database connectivity issue or full database condition, potential disk or network constraints.

In Operations Manager, Discovery data insertion is a relatively expensive process. We define a burst of data as a short period of time where a significant amount of data is received by the Management Server. These bursts of data can cause Event ID 2115 since the data insertion should occur infrequently. If Event ID 2115 consistently appears for Discovery data collection, this can indicate either a Database or Data Warehouse insertion problem or Discovery rules in a Management Pack collecting too much discovery data.

Operations Manager configuration updates caused by Instance Space changes or Management Pack imports have a direct effect on CPU utilization on the Database Server and this can impact Database insertion times. Following a Management Pack import or a large instance space change, it is expected to see Event ID 2115 messages. For more information on this topic please see the following:

2603913 - How to detect and troubleshoot frequent configuration changes in Operations Manager (http://support.microsoft.com/kb/2603913)

If the Operations Manager or Operations Manager Data Warehouse databases are out of space or offline, it is expected that the Management Server will continue to log Event ID 2115 messages to the Operations Manager Event log and the pending time will grow higher.

If the write action workflows cannot connect to the Operations Manager or Operations Manager Data Warehouse databases, or they are using invalid credentials to establish their connection, the data insertion will be blocked and Event ID 2115 messages will be logged accordingly until this situation is resolved.

In Operations Manager, expensive User Interface queries can impact resource utilization on the Database which can lead to latency in Database insertion times. When a user is performing an expensive User Interface operation it is possible to see Event ID 2115 messages logged.

Event ID 2115 messages can also indicate a performance problem if the Operations Manager Database and Data Warehouse databases are not properly configured. Performance problems on the database servers can lead to Event ID 2115 messages. Some possible causes include the following:

The SQL Log or TempDB database is too small or out of space.
The Network link from the Operations Manager and Data Warehouse database to the Management Server is bandwidth constrained or the latency is large. In this scenario we recommend to Management Server to be on the same LAN as the Operations Manager and Data Warehouse server.
The data disk hosting the Database, logs or TempDB used by the Operations Manager and Data Warehouse databases is slow or experiencing a function problem. In this scenario we recommend leveraging RAID 10 and we also recommend enabling battery backed Write Cache on the Array Controller.
The Operations Manager Database or Data Warehouse server does not have sufficient memory or CPU resources.
The SQL Server instance hosting the Operations Manager Database or Data Warehouse is offline.

It is recommend that the Management Server reside on the same LAN as the Operations Manager and Data Warehouse database server.
Event ID 2115 messages can also occur if the disk subsystem hosting the Database, logs or TempDB used by the Operations Manager and Data Warehouse databases is slow or experiencing a function problem. In this scenario we recommend leveraging RAID 10 and we also recommend enabling battery backed Write Cache on the Array Controller.

Resolution

The first step in troubleshooting Event ID 2115 is to identify what Data Items are being returned within the Event. For example, the Workflow ID indicates what type of Data Items (Discovery, Alerts, Event, Perf) as well as what Database in question. From the Workflow ID we can then determine the corresponding database. If the Workflow ID contains the term DataWarehouse, then the focus should be towards the Operations Manager Data Warehouse. In other instances the focus would be towards the Operations Manager Database.

The seconds indicator in the Event ID 2115 message returns how long the workflow in question has been waiting to insert the data items into the Database.

Scenario 1

In the example Event ID 2115 below, the problem concerns the workflow Microsoft.SystemCenter.CollectSignatureData.

Event Type: Warning
Event Source: HealthService
Event Category: None
Event ID: 2115
Computer: <RMS NAME>
Description:
A Bind Data Source in Management Group <MANGEMENT GROUP NAME> has posted items to the workflow, but has not received a response in 300 seconds. This indicates a performance or functional problem with the workflow.
Workflow Id : Microsoft.SystemCenter.CollectSignatureData
Instance : <RMS NAME>
Instance Id : {F56EB161-4ABE-5BC7-610F-4365524F294E}

Resolution 1

We can identify the Performance Signature Data Collection Rules in this example by executing the following SQL Query. This query should be executed in SQL Management Studio against the Operations Manager database.

-- Return all Performance Signature Collection Rules
Use OperationsManager
select
managementpack.mpname,
rules.rulename
from performancesignature with (nolock)
inner join rules with (nolock)
on rules.ruleid = performancesignature.learningruleid
inner join managementpack with(nolock)
on rules.managementpackid = managementpack.managementpackid
group by managementpack.mpname, rules.rulename
order by managementpack.mpname, rules.rulename

This query will return all Performance Signature Collection Rules and their respective Management Pack name. A column is returned for Management Pack name and Rule name.

The following Performance Monitor Counters on a Management Server will provide information concerning Database and Data Warehouse write action insertion batch size and time. If the batch size is growing larger, for example the default batch size is 5000 items, this indicates either the Management Server is slow inserting the data to the Database or Data Warehouse, or is receiving a burst of Data Items from the Agents or Gateway Servers.

· OpsMgr DB Write Action Modules(*)\Avg. Batch Size
· OpsMgr DB Write Action Modules(*)\Avg. Processing Time
· OpsMgr DW Writer Module(*)\Avg. Batch Processing Time, ms
· OpsMgr DW Writer Module(*)\Avg. Batch Size

From the Database and Data Warehouse write action account Average Processing Time counter, we can understand how long it takes on average to write a batch of data to the Database and Data Warehouse. Depending upon the amount of time it takes to write a batch of data to the Database, this may present an opportunity for tuning.

Scenario 2

The SQL Server instance hosting the Operations Manager Database or Data Warehouse is offline. Event ID 2115 as well as Event ID 29200 appear within the Operations Manager Event log. For example:

Log Name: Operations Manager
Source: HealthService
Date:
Event ID: 2115
Level: Warning
Description:

A Bind Data Source in Management Group MSFT has posted items to the workflow, but has not received a response in 60 seconds. This indicates a performance or functional problem with the workflow.

Workflow Id : Microsoft.SystemCenter.CollectEventData
Instance : name.contoso.local
Instance Id : {88676CDF-E284-7838-AC70-E898DA1720CB}

Log Name: Operations Manager
Source: OpsMgr Config Service
Event ID: 29200
Level: Error
Description:

OpsMgr Config Service has lost connectivity to the OpsMgr database, therefore it can not get any updates from the database. This may be a temporary issue that may be recovered from automatically. If the problem persists, it usually indicates a problem with the database. Reason:

A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)

Resolution 2

To resolve the issue in this scenario, follow these steps:

Connect to the Server hosting the Operations Manager database.
Open the Services applet.
Verify that the SQL Server (MSSQLSERVER) service is started and running.
If the SQL Server (MSSQLSERVER) service is not started and running, start the service.

Once Database connectivity is restored workflows should resume successfully storing data within the respective database. Event ID 31554 validates that the information has been written successfully.

For example:

Log Name: Operations Manager
Source: Health Service Modules
Event ID: 31554
Task Category: Data Warehouse
Level: Information

Description:
Workflow succeeded storing data in the Data Warehouse

One or more workflows were affected by this.

Workflow name: Microsoft.SystemCenter.DataWarehouse.CollectEventData
Instance name: name.contoso.local
Instance ID: {88676CDF-E284-7838-AC70-E898DA1720CB}

Scenario 3

Event ID 2115 caused by invalid RunAs Credentials.

Resolution 3

Examine the Operations Manager Event log for the following events. These events typically indicate that the Data Warehouse SQL Server Authentication Account may have incorrect credentials.

Log Name: Operations Manager
Source: HealthService
Event ID: 7000
Task Category: Health Service
Level: Error
Description:

The Health Service could not log on the RunAs account <ACCOUNT NAME> for management group <MANGEMENT GROUP NAME>. The error is Logon failure: unknown user name or bad password.(1326L). This will prevent the health service from monitoring or performing actions using this RunAs account

Log Name: Operations Manager
Source: HealthService
Event ID: 7015
Task Category: Health Service
Level: Error
Description:

The Health Service cannot verify the future validity of the RunAs account <ACCOUNT NAME> for management group <MANGEMENT GROUP NAME>. The error is Logon failure: unknown user name or bad password.(1326L).

To resolve the issue in this scenario, follow these steps:
1. Open the Operations Manager console.
2. Select Administration.
3. Select Run As Configuration\Accounts.
4. Configure the appropriate credentials for the Data Warehouse SQL Server Authentication Account.

Scenario 4

Event ID 2115 caused by disk performance issues. The data disk hosting the Database, logs or TempDB used by the Operations Manager and Data Warehouse databases is slow or experiencing a functional problem. In this scenario we recommend leveraging RAID 10 and we also recommend enabling battery backed Write Cache on the Array Controller.

Resolution 4

Counters to identify disk pressure

Capture these Physical Disk counters for all drives that contain SQL data or log files:

% Idle Time: How much disk idle time is being reported. Anything below 50 percent could indicate a disk bottleneck.
Avg. Disk Queue Length: This value should not exceed 2 times the number of spindles on a LUN. For example, if a LUN has 25 spindles, a value of 50 is acceptable. However, if a LUN has 10 spindles, a value of 25 is too high. You could use the following formulas based on the RAID level and number of disks in the RAID configuration:

- RAID 0: All of the disks are doing work in a RAID 0 set
- Average Disk Queue Length <= # (Disks in the array) *2
- RAID 1: half the disks are “doing work”; therefore, only half of them can be counted toward Disks Queue
- Average Disk Queue Length <= # (Disks in the array/2) *2
- RAID 10: half the disks are “doing work”; therefore, only half of them can be counted toward Disks Queue
- Average Disk Queue Length <= # (Disks in the array/2) *2
- RAID 5: All of the disks are doing work in a RAID 5 set
- Average Disk Queue Length <= # (Disks in the array/2) *2
- Avg. Disk sec/Transfer: The number of seconds it takes to complete one disk I/O
- Avg. Disk sec/Read: The average time, in seconds, of a read of data from the disk
- Avg. Disk sec/Write: The average time, in seconds, of a write of data to the disk
- Disk Bytes/sec: The number of bytes being transferred to or from the disk per second
- Disk Transfers/sec: The number of input and output operations per second (IOPS)
NOTE The last three counters in this list should consistently have values of approximately .020 (20 ms) or lower and should never exceed.050 (50 ms). The following are the thresholds that are documented in the SQL Server performance troubleshooting guide:
- 1. - Less than 10 ms: very good
    - Between 10 - 20 ms: okay
    - Between 20 - 50 ms: slow, needs attention
    - Greater than 50 ms: serious I/O bottleneck
- Disk Bytes/sec: The number of bytes being transferred to or from the disk per second
- Disk Transfers/sec: The number of input and output operations per second (IOPS)

When % Idle Time is low (10 percent or less), this means that the disk is fully utilized. In this case, the last two counters in this list (“Disk Bytes/sec” and “Disk Transfers/sec”) provide a good indication of the maximum throughput of the drive in bytes and in IOPS, respectively. The throughput of a SAN drive is highly variable, depending on the number of spindles, the speed of the drives, and the speed of the channel. The best bet is to check with the SAN vendor to find out how many bytes and IOPS the drive should support. If % Idle Time is low, and the values for these two counters do not meet the expected throughput of the drive, engage the SAN vendor to troubleshoot.
The following links provide deeper insight into troubleshooting SQL Server performance:

Scenario 5

Event ID 2115 is logged, and a management server generates an "unable to write data to the Data Warehouse" alert in System Center Operations Manager.
You experience the following symptoms on a management server computer that is running Microsoft System Center Operations Manager:

The management server generates one or more alerts that resemble the following:
Performance data collection process unable to write data to the Data Warehouse

Performance data collection process unable to write data to the Data Warehouse. Failed to store data in the Data Warehouse. Exception 'SqlException': Management Group with id '9069F7BD-55B8-C8E8-1CF9-4395F45527E2' is not allowed to access Data Warehouse under login 'DOMAIN\Action_Account' One or more workflows were affected by this. Workflow name: Microsoft.SystemCenter.DataWarehouse.CollectPerformanceData Instance name: dataWarehouseServer.domain.com Instance ID: {48936EE3-4E3E-BEE1-8C09-AFDAB8ECF236} Management group: Management Group Name.
The following event is logged in the Operations Manager event log on the management server:
Event Type: Warning
Event Source: HealthService
Event Category: None
Event ID: 2115
Date: date
Time: time
User: N/A
Computer: ManagementServerName
Description: A Bind Data Source in Management Group Management Group Name has posted items to the workflow, but has not received a response in 1712 seconds. This indicates a performance or functional problem with the workflow. Workflow Id : Microsoft.SystemCenter.DataWarehouse.CollectPerformanceData Instance : ManagementServerName.domain.com Instance Id : {C7FDDE2A-E0AA-4B80-70DE-1D50D9965221}

Resolution 5

This issue may occur if the management server does not have accounts that are specified for its data warehouse "Run As" profiles. This issue is more likely to affect a secondary management server.

To resolve this problem, follow these steps:

On the computer that is running System Center Operations Manager, open the Operations Console.
In the navigation pane, click Administration.
Expand Security, and then click Run As Profiles.
In the Run As Profiles view, double-click Data Warehouse Account.
In the Run As Profile Properties - Data Warehouse Account properties dialog box, click the Run As Accounts tab, and then click New.
In the Run As Account list, click Data Warehouse Action Account.
In the Name list, click the management server that generated the alert.
Click OK two times.
Follow steps 4 through 8 to assign the appropriate Run As account to the following profiles:
Data Warehouse Configuration Synchronization Reader Account
Data Warehouse Report Deployment Account
Data Warehouse SQL Server Authentication Account
For each profile, select the Run As account that matches the name of the Run As profile. For example, make the following assignments:
Assign the Data Warehouse Configuration Synchronization Reader Account to the Data Warehouse Configuration Synchronization Reader Account profile.
Assign the Data Warehouse Report Deployment Account to the Data Warehouse Report Deployment Account profile.
Assign the Data Warehouse SQL Server Authentication Account to the Data Warehouse SQL Server Authentication Account profile.
On the management server that generated the alert, restart the OpsMgr Health Service.
In the Operations Manager event log on the management server, verify that event ID 31554 events are logged.
NOTE Event ID 31554 indicates that the monitor state has changed to Healthy. This change resolves the alert.

Scenario 6

Event ID 2115 occurs on a server running HP MPIO FF DSM XP v3.01 to which there are no LUNs presented. When the user opens Performance Monitor and attempts to add a counter, Performance Monitor will hang and the handle count for this application increases rapidly.

Resolution 6

There are two workaround regarding this issue.

Rename the HPPerfProv.dll file and reboot Windows. Performance Monitor will work without issue when the file is renamed and not loaded.
Have at least 1 LUN present on the system.

For more information on this issue please see the following:

Event ID 2115 is caused by HPPerfProv.DLL
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c01743552&prodTypeId=18964&prodSeriesId=421492

=====

For the most current version of this article please see the following:

2681388 : How to troubleshoot Event ID 2115 in Operations Manager

J.C. Hornbeck| System Center & Security Knowledge Engineer

Get the latest System Center news onFacebookandTwitter:

↧

How to get Knowledge Editing to work in Operations Manager 2012 with Office 2010

October 10, 2012, 1:46 pm

≫ Next: Are you attending TechEd 2014? Interested in meeting with the OpsMgr Product team?

≪ Previous: KB: How to troubleshoot Event ID 2115 in Operations Manager

If you have tried editing knowledge in OpsMgr 2012 with Office 2010 you are likely to have run into this error “Could not load file or assembly ‘Microsoft.Office.Interop.Word” or as shown in the image below. Working with our support team (Mitch L, Stephen R) we have a temporary work around which works on both x64 and x86 of Office 2010. I need to caveat that this is not something that is officially supported by the product team at this time, but plan to do so in our service pack release.

The Workaround:
Step 1: Install Visual Studio 2005 Tools for Office Second Edition Runtime
Link to download VSTO 2005 SE: http://www.microsoft.com/en-us/download/details.aspx?id=24263

Step 2: Install Visual Studio 2010 Tools for Office Runtime
Link to download VSTO 2010 (64 or 32): http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=20479
Note depending on the version of Office 2010 you have installed you will need to install the x64 or x32 version of VSTO 2010.

Step 3:Reboot the OpsMgr Console machine. Note there is not prompt in the VSTO setup requiring you to reboot but this is something you have to do otherwise it will not work.

Step 4: Replace the Knowledge.DOT file found in install path:\\Program Files\System Center 2012\Operations Manager\Console with Knowledge.DOT file attached to this blog

Step 5: Replace the Microsoft.EnterpriseManagement.Monitoring.Console.exe configuration file found in install path:\\Program Files\System Center 2012\Operations Manager\Console with the Microsoft.EnterpriseManagement.Monitoring.Console.exe file attached to this blog

This posting is provided "AS IS" with no warranties, and confers no rights. The opinions expressed here represent my own and not those of my employer. I accept no liability for the content of this blog post, or for the consequences of any actions taken on the basis of the information provided. Using this workaround is at your own risk.

↧

Are you attending TechEd 2014? Interested in meeting with the OpsMgr Product team?

March 25, 2014, 3:02 pm

≫ Next: SC Advisor - Error 3000: Unable to register to the Advisor Service & Onboarding Troubleshooting Steps

≪ Previous: How to get Knowledge Editing to work in Operations Manager 2012 with Office 2010

We wanted to gauge our OpsMgr Customers interest in doing an informal round table discussion with OpsMgr product team to influence the future of this product.

If you are planning to attend TechEd 2014 in Houston could you please take this quick survey to let us know. If there is enough interest we will setup a meeting and details on this event will be posted to this blog site.

Note we do not collect any information about you(name, company) in the survey.

Link to the survey:https://www.surveymonkey.com/s/TXQWBP7

↧

SC Advisor - Error 3000: Unable to register to the Advisor Service & Onboarding Troubleshooting Steps

May 29, 2014, 2:19 pm

≫ Next: ‘Feedback’– your Product Team and Community connection in Microsoft System Center Advisor– Limited Public Preview

≪ Previous: Are you attending TechEd 2014? Interested in meeting with the OpsMgr Product team?

We have had a few customers run into the “Error 3000: Unable to register to the Advisor Service” while trying to connect their OpsMgr 2012 Management group to System Center Advisor.

There are two reason why a customer may run into this:

The server clock is off sync with the current time by more than 5mins. You can resolve this pretty easily by changing the clock time on your server to match the current time, you can accomplish this with opening command prompt as an Administrator type w32tm /tz to check the time zone, and w32tm /resync to sync.
Their internal proxy server\firewalls are blocking communication to the Advisor service endpoints. We provide detailed instructions for this second case in this article. Read on.

PROXY REGISTRATION / CONFIGURATION STEPS

Step 1: Request exception for the service endpoints

The following domains and URLs need to be accessible through the firewall for the management server to access the Advisor Web Service.

Resource

Ports

service.systemcenteradvisor.com

scadvisor.accesscontrol.windows.net

scadvisorservice.accesscontrol.windows.net

scadvisorstore.blob.core.windows.net/*

data.systemcenteradvisor.com

ods.systemcenteradvisor.com

*.systemcenteradvisor.com

Ports 80 and 443

The following domains and URLs need to be accessible through the firewall to view the Advisor Web portal and OpsMgr Console (to perform ‘registration’ to Advisor).

Resource

Ports

*.systemcenteradvisor.com

*.live.com

*.microsoft.com

*.microsoftonline.com

Ports 80 and 443

Also ensure the Internet Explorer proxy is set correctly on your computer you are trying to login with. Especially valuable test is to try and connect to a SSL-enabled website, i.e. https://www.bing.com/– if the HTTPS connection doesn’t work from a browser, it probably also won’t in the OpsConsole and in the server modules that talk to the web services in the cloud.

If none of the above steps resolves your issue please login to the Advisor Preview portal click the ‘Feedback’ button on the bottom right of the web page and file a ‘Feedback’ item and we will respond and help you troubleshoot further within 24hours.

POST-REGISTRATION PROXY CONFIGURATION STEPS

Once you have completed registering your OpsMgr Environment to the Advisor Service you need to follow Steps 2, 3 and 4 to allow your Management servers to send data to the Advisor Web Service.

Step 2: Configure the proxy server in the OpsMgr Console

Open the OpsMgr Console
Go to the “Administration” view
Select “Advisor Connection” under the "System Center Advisor" Node

Click “Configure Proxy Server”

Check the checkbox to use a proxy server to access the Advisor Web Service
Specify the proxy address

Step 3: Specify credentials if the Proxy Server requires Authentication

If your proxy server required authentication, you can specify one in the form of an OpsMgr RunAs account and associate it with the ‘System Center Advisor Run As Profile Proxy’

In the OpsMgr Console, go to the “Administration” view
Select “Profiles” under the "RunAs Configuration" Node
Double click and open “System Center Advisor Run As Profile Proxy”
Click ‘Add’ to add a 'RunAs Account'. You can either create one or use an existing account. This account needs to have sufficient permissions to pass through the proxy

Set the Account to be targeted at the ‘Operations Manager Management Servers’ Group

Complete the wizard and save the changes

Note: not all code paths currently support authentication. It is still possible that you will need to set some of those exclusions mentioned in Step 1 to allow anonymous traffic to some of those destinations. We will keep this document uptodate as this requirement evolves.

Step 4: Configure the proxy server on each Management Server for WinHTTP

NOTE: this step is no longer required if you updated your Management Servers to Update Rollup 3 for System Center 2012 R2, or Update Rollup 7 for System Center 2012 SP1.

Open Command Prompt as an Administrator on the Management Server
Type netsh winhttp set proxy myproxy:80

Restart the ‘System Center Management’ Service (HealthService)

Do step 2 on each of your management servers in your management group

Step 5: Configure the proxy server on each Management Server for Managed code

There is another setting in Operations Manager, which is intended for general error reporting, but we have noticed that - when set - due to the same modules being used in multiple workflows, this proxy setting also ends up affecting Advisor connector's functionality.
The recommendation is therefore to also set it (to the same proxy you set in the other places) for each and every MS if you use a proxy.

In the OpsMgr Console, go to the “Administration” view
Select “Device Management” and then the "Management Servers" Node
Right-click and choose “Properties” for each MS (one at the time) and set the proxy in the “Proxy Settings” tab.

If none of the above steps resolve your issue please login to the Advisor Preview portal click the ‘Feedback’ button on the bottom right of the web page and file a ‘Feedback’ item and we will respond and help you troubleshoot further within 24hours.

VERIFYING IF THINGS ARE WORK POST COMPLETING THE CONFIGURATION WIZARD

Step 1: Validate if the following MPs get downloaded to your OpsMgr Environment

Note: Only if you added these intelligence packs from the Advisor Portal will you see all these MPs. Search for keyword ‘Advisor’ in their name.

You can additionally check for these MPs using OpsMgr PowerShell and typing these commands

get-scommanagementpack | where {$_.DisplayName -match 'Advisor'} | select Name,DisplayName,Version,KeyToken

get-scommanagementpack | where {$_.DisplayName -match 'Advisor'} | select Name,DisplayName,Version,KeyToken | Out-GridView

Step 2: Validate if data is being sent up to the Advisor service

Open ‘Performance Monitor’
Select ‘Health Service Management Groups’
Add all the counters that start with ‘HTTP’

If things are configured right you should see a lot of perfmon activity for these counters, as events and other data items (based on the intelligence packs onboarded in the portal, and the configured log collection policy) are uploaded

Step 3: Check for Errors on the Management Server Event Logs

As a final step if all of the above fails see if you are seeing any errors in the Operations Manager event log and filter by Event Sources:Advisor, Health Service Modules, HealthService. You can copy these event and post them in the Advisor Preview portal ‘Feedback’ view so we in the product team can help you further.

A few of the ‘bad’ events you might see when looking if things aren’t working are described in the following table:

EventID	Source	Meaning	Resolution
2138	Health Service Modules	Proxy requires authentication	Follow step 3 and/or step 1 above
2137	Health Service Modules	Cannot read the authentication certificate	Re-running the Advisor registration wizard will fix certificates/runas accounts
2132	Health Service Modules	Not Authorized	Could be an issue with the certificate and/or registration to the service; try re-running the Advisor registration wizard that will fix certificates and runas accounts. Additionally, verify the proxy has been set to allow exclusions as in step 1 above, and/or verify authentication as in step 3 (and that the user indeed has access thru the proxy)
2129	Health Service Modules	Failed connection / Failed SSL negotiation	There could be some strange TCP settings on this server. Check this other blog post from the community for such as case http://jacobbenson.com/?p=511
2127	Health Service Modules	Failure sending data received error code	If it only happens once in a while, this could be just a glitch. Keep an eye to understand how often it happens. If very often (every 10 minutes or so throughout the day), then it is an issue – check your network configuration, proxy settings described above, and re-run registration wizard. But if it only happens sporadically (i.e. a couple of times per day) then everything should be fine, as data will be queued and retransmitted.
2128	Health Service Modules	DNS name resolution failed	You server can’t resolve our internet address it is supposed to send data to. This might be DNS resolver settings on your machine, incorrect proxy settings, or a (temporary) issue with DNS at your provider. Like the previous event, depending if it happens constantly or ‘once in a while’ it could be an issue – or not.
2130	Health Service Modules	Time out	Like the previous event, depending if it happens constantly or ‘once in a while’ it could be an issue – or not.
4511	HealthService	Cannot load module – file not found	This is also described in the section below ‘other known issues and workarounds’. This error typically just indicates you have old DLLs on your machine, that don’t contain the required modules. The fix is to update your Management Servers to the latest Update Rollup.
4502	HealthService	Module crashed	If you see this for workflows with names such as CollectInstanceSpace or CollectTypeSpace it might mean the server is having issues to send some data. Depending on how often it happens - constantly or ‘once in a while’ - it could be an issue or not. If it happens more that every hour it is definitely an issue. If only fails this operation once or twice per day, it will be fine an able to recover. Depending on how the module actually fails (description will have more details) this could be an on-premises issue – i.e. to collect to DB – or an issue sending to the cloud. Verify your network and proxy settings, and worst case try restarting the HealthService.

In addition to the above, the Advisor engineering team is committed to resolving all your onboarding issues so please contact us if you run into any issues. We are here to help. You can use the in-portal feedback mechanism for this as well.

OTHER KNOWN ISSUES AND WORKAROUNDS

'Search' button in the 'Add a Computer/Group' dialogue is missing

We have had a couple of customers report that the Search button in the Computer Search dialog is invisible. We are trying to investigate why this happens. A temporary workaround is click in the ‘Filter by(optional)’ edit box and press TAB to get to the invisible search button, and then activate it by <Spacebar> or <Enter>.

Initialization of a module of type "System.PublishDataToEndPoint" (CLSID "{D407D659-65E4-4476-BF40-924E56841465}") failed with error code The system cannot find the file specified.

If you are not receiving any data in Advisor and you see events in the Operations manager event log of your Management Servers that look like the one below, this simply means that you have not installed the latest update rollup as per the instructions “"on “Update your Environment” on https://preview.systemcenteradvisor.com/instructions?LandingPP5 . The solution is to install the latest update rollup that will contain the updated modules.

Log Name: Operations Manager

Source: HealthService

Event ID: 4511

Level: Error

Computer: SCOM01.contoso.com

Description: Initialization of a module of type "System.PublishDataToEndPoint" (CLSID "{D407D659-65E4-4476-BF40-924E56841465}") failed with error code The system cannot find the file specified. causing the rule "Microsoft.SystemCenter.CollectEventDataToCloud" running for instance "SCOM01.contoso.com" with id:"{B6881268-7E49-731C-C26C-DDA954F62679}" in management group "SCOMMG".

Note that you will see multiple of these events, for various workflows.

-Satya Vel and the Advisor team

Advisor Preview Twitter Handle: @mscAdvisor

↧

‘Feedback’– your Product Team and Community connection in Microsoft System Center Advisor– Limited Public Preview

June 4, 2014, 9:12 am

≫ Next: Edge Show - Log Management and Other Intelligence Packs in System Center Advisor

≪ Previous: SC Advisor - Error 3000: Unable to register to the Advisor Service & Onboarding Troubleshooting Steps

For Advisor Limited Preview, we (@mscadvisor) have enabled a first class community experience that seamlessly blends into the Advisor Preview Portal’s look and feel. Once logged on to a Preview account, you’ll notice a prominently placed ‘Feedback’ link on the bottom right corner:

When you follow that link, we’ll sign you on to the feedback forum.

Once in the feedback portal, you can:

Search for existing feedback, ideas, issues others have encountered. There is also a KB section, which contains ‘quick publish’ articles that have not yet made it into the official documentation or provide additional pointers
Suggest product improvements, ideas, report bugs and otherwise communicate with us and with the community
See what other people are saying and vote on other people’s ideas/issues
Comment, interact with the community of early adopters, learn and share workarounds or tips and tricks and opinions on features
go to ‘Settings’ in the right end corner, under your user name, and leave your email – this allows you to be informed when your ideas get comments, or official updates from the product team are posted

We will respond to all feedback, and we will use it to influence the directions of the product.

Hope to see you all engaged!

↧

Edge Show - Log Management and Other Intelligence Packs in System Center Advisor

August 8, 2014, 9:26 am

≫ Next: Get Fast + Easy Support with OMS

≪ Previous: ‘Feedback’– your Product Team and Community connection in Microsoft System Center Advisor– Limited Public Preview

Check out this video to see how System Center Advisor helps Enterprise Operation teams transform machine data and unstructured logs into Real Time Operations Intelligence: http://channel9.msdn.com/Shows/Edge/Edge-Show-114-Log-Management-and-Other-Intelligence-Packs-in-System-Center-Advisor

↧

Get Fast + Easy Support with OMS

October 8, 2015, 7:10 pm

≫ Next: Technical Preview : Management Packs for Windows Server 2016 Server Roles have been updated

≪ Previous: Edge Show - Log Management and Other Intelligence Packs in System Center Advisor

by Lisa Guthrie - Microsoft Support Engineering

If you’ve ever called Microsoft Support to work on an issue on one of your servers, you probably had to send some sort of data to the support professional working your case. Perhaps some event logs, web server logs, a list of patches installed… the list goes on and on.
Gathering this data (especially if you have to do it across multiple machines in your enterprise), sending it to Microsoft, and getting it analyzed can take time. And if your server is down or experiencing a critical issue, time is not your friend.
Enter a cool new feature of Operations Management Suite: You can now choose to add one or more Microsoft Support users to any of your workspaces. The user(s) you add will immediately see the same dashboard and the same data that you see. No gathering. No uploading. No waiting!
To get started, simply go to the Accounts tab of the Settings page in OMS, and choose the Microsoft Support radio button. Then enter the @microsoft.com email address of the support professional who you wish to add to your workspace.

A pop-up will appear explaining more about the access provided to the Microsoft Support user, and asking you to confirm that you want to add this access. You must click Yes to add the user.

Once confirmed, the Microsoft Support user will appear on your Manage Users screen.

The Microsoft support user will immediately have read-only access to your OMS dashboard and all the data in your OMS workspace. If you upload new data, the support user will be able to see that data as soon as it’s available.
You can remove the Microsoft support user manually at any time by clicking the blue “X”. In addition, the support user’s access will expire automatically after 7 days.
We hope this feature helps you get back to business faster than ever. Try it out on your next support case!

-Lisa

↧

Technical Preview : Management Packs for Windows Server 2016 Server Roles have been updated

January 21, 2016, 4:46 pm

≫ Next: What new workloads would you like SCOM 2016 to monitor?

≪ Previous: Get Fast + Easy Support with OMS

Just a quick note to let you know that we updated System Center Operations Manager Management Packs for Windows Server 2016 roles. The table in this blog summarizes the updates made to each of the management packs. If you already installed the August 2015 version of management packs for the same role, please uninstall it manually before installing the updated management packs.

We are committed to creating management packs that are performat, address your monitoring needs but at the same time do not overwhelm you with too many alerts. We release management packs early so you can preview them and offer us feedback about the management packs. If you have feature requests for any of the management packs, please let us know at our Feedback website. If you find your feature has already been requested by someone else, please up-vote it. We consider the most up-voted feature requests for the next release of the management pack.

We are actively working on creating management packs for Windows Server 2016 server roles such as Active Directory Certificate Services, Active Directory Lightweight Directory Services, Remote Desktop Services, Hyper-V, Message Queuing, Host Guardian Service, Routing and Remote Access Service, Active Directory Web Application Proxy, Windows Server Update Services, Windows Deployment Services, Network Policy and Access Services, Active Directory Federation Services, Hyper V, MSMQ.

Windows Server 2016 Role Name	What’s new in the Management Pack
Windows Server 2016 Operating System	Works on Windows Server 2016 Nano server monitoring support added
DNS	Works on Windows Server 2016 Technical Preview 4 DNS Role Has one Folder view for DNS which rolls DNS 2016 and DNS 2012 R2 management packs together Nano server monitoring support Coming Soon!
DHCP	Works on Windows Server 2016 Technical Preview 4 DHCP Role
Failover Clustering	Works on Windows Server 2016 Technical Preview 4 Failover Cluster Role Nano server monitoring support added
Network Load Balancing	Works on Windows Server 2016 Technical Preview 4 NLB Role
Print Services	Works on Windows Server 2016 Technical Preview 4 Print Services Role
Web Server IIS	Works on Windows Server 2016 Technical Preview 4 IIS Role Nano server monitoring support Coming Soon!
Active Directory Domain Services	Works on Windows Server 2016 Technical Preview 4 ADDS Role
DTC Transactions	Works on Windows Server 2016 Technical Preview 4 DTC Transactions Role
Windows Defender	Works on Windows Server 2016 Technical Preview 4 Windows Defender Role
Windows Server Essentials	Works on Windows Server 2016 Technical Preview 4 Windows Server Essentials Role
Active Directory Rights Management Services	No changes since August 2015 Technical Preview
Branch Cache	No changes since August 2015 Technical Preview
File and iSCSI Services	No changes since August 2015 Technical Preview

Suraj Suresh Guptha | Program Manager | Microsoft

Get the latest System Center news on Facebook and Twitter:

System Center All Up: http://blogs.technet.com/b/systemcenter/

Configuration Manager Support Team blog: http://blogs.technet.com/configurationmgr/
Data Protection Manager Team blog: http://blogs.technet.com/dpm/
Orchestrator Support Team blog: http://blogs.technet.com/b/orchestrator/
Operations Manager Team blog: http://blogs.technet.com/momteam/
Service Manager Team blog: http://blogs.technet.com/b/servicemanager
Virtual Machine Manager Team blog: http://blogs.technet.com/scvmm

↧

What new workloads would you like SCOM 2016 to monitor?

January 21, 2016, 7:49 pm

≫ Next: Update Rollup 9 for System Center 2012 R2 Operations Manager is now available

≪ Previous: Technical Preview : Management Packs for Windows Server 2016 Server Roles have been updated

Many of you have already assisted in our efforts to learn about your monitoring problems and we appreciate your insights tremendously! I’m back with another opportunity to provide feedback to Microsoft about new server workloads you would like to see SCOM monitoring in the future. SCOM offers management packs for almost all Microsoft server workloads and many Non-Microsoft server workloads such as Linux OS, Unix OS, Apache, MySQL, etc. What other workloads (Including Non-Microsoft) would you like to see SCOM supporting monitoring in the future? Any workloads you or your customers were puzzled SCOM didn’t support?

Please let us know through the below survey link. The survey has just two questions and shouldn’t take you more than a minute to complete.

http://www.instant.ly/s/yFAhr

Suraj Suresh Guptha | Program Manager | Microsoft

Get the latest System Center news on Facebook and Twitter:

System Center All Up: http://blogs.technet.com/b/systemcenter/

↧

Update Rollup 9 for System Center 2012 R2 Operations Manager is now available

January 27, 2016, 12:35 am

≫ Next: Behind the Scenes : Active Directory Domain Services 2016 Management Pack

≪ Previous: What new workloads would you like SCOM 2016 to monitor?

Update Rollup 9 for System Center 2012 R2 Operations Manager (OpsMgr 2012 R2 UR9) is now available to download. The KB article below describes the issues that are fixed and also contains the installation instructions for this update. For complete details including issues fixed, installation instructions and a download link, please see the following:

3129774 – Update Rollup 9 for System Center 2012 R2 Operations Manager (https://support.microsoft.com/en-us/kb/3129774)

J.C. Hornbeck | Solution Asset PM | Microsoft

Our Blogs

Configuration Manager: http://blogs.technet.com/configurationmgr/
Data Protection Manager: http://blogs.technet.com/dpm/
Orchestrator: http://blogs.technet.com/b/orchestrator/
Operations Manager: http://blogs.technet.com/momteam/
Operations Management Suite: https://blogs.technet.microsoft.com/omsblog/
Service Manager: http://blogs.technet.com/b/servicemanager
Virtual Machine Manager: http://blogs.technet.com/scvmm
Microsoft Intune: https://blogs.technet.microsoft.com/intunesupport/
WSUS: http://blogs.technet.com/sus/
AD and Azure RMS: http://blogs.technet.com/b/rms/
Application Virtualization: http://blogs.technet.com/appv/
MED-V: http://blogs.technet.com/medv/
Application Proxy: http://blogs.technet.com/b/applicationproxyblog/
Forefront Endpoint Protection: http://blogs.technet.com/b/clientsecurity/
Forefront Identity Manager: http://blogs.msdn.com/b/ms-identity-support/
Forefront TMG: http://blogs.technet.com/b/isablog/
Forefront UAG: http://blogs.technet.com/b/edgeaccessblog/

↧

Behind the Scenes : Active Directory Domain Services 2016 Management Pack

January 27, 2016, 2:57 pm

≫ Next: SCOM 2012 R2 now offers additional support for SQL Server 2012 SP3

≪ Previous: Update Rollup 9 for System Center 2012 R2 Operations Manager is now available

Hey all, I wanted to share with you some details regarding the new Active Directory-Domain Services Management Pack for System Center Operations Manager that we will be releasing in conjunction with Windows Server 2016 (WS2016). But first who am I. My name is Eric Hunter and I have worked with the Active Directory (AD) product team for over 8 years now. A large part of that time I was responsible for the operational support of a live production domain that we, the Active Directory team here at Microsoft, use to validate the next version of AD through live, production usage. We validate it by promoting domain controllers with beta builds of the next version of Windows Server and having our employees and services use them daily. The one difference between our domain and yours is that we are constantly rebuilding domain controllers, other than that we are very similar. It is a multi-domain forest with thousands of users and computers, and mission critical systems relying on the domain to be operating with a high service quality.

Our approach for this new MP was to use it for our daily operations and to continually iterate on the MP functionality over time. The previous design philosophy had been a waterfall model where we would try to add features and bugs in a small window of time to ship with a version of Windows Server. But we found that was not conducive to creating an MP that we really wanted to use and that would work for us let alone you our customers. In the past we had relied on in-house monitoring tools to help fill in the monitoring gaps from the MP thinking that our environment was different from what a customer would have. What we found though was that what we wanted in the MP our customers wanted as well. So we decided to decrease the reliance on the in-house tools and shift to relying solely on SCOM in our environment. When we found a gap in the monitoring, we filled it. If the monitors we too noisy, we changed them. We have been working in this manner now for a few years and slowly but surely have been fixing many of the issues that were inherent in the old MP. Continually updating it over time, based on real world data, to eventually become what we rely on for domain health.

One of the biggest changes we made was to clean up the MP and remove anything that wasn’t working like we thought it should. We cleaned up all rules that did not auto-resolve, removed the oomads dependency from all scripts, reliance on down-level discovery MPs, and cleaned up some legacy monitors and OS specific libraries. We were removing so many of the legacy monitors, alerts, and libraries that we had to start over with a new library requiring a whole new MP. Thus the new MP that we are releasing with a shiny new name, Active Directory-Domain Services MP.

Another large change is how we monitor replication now. In the past replication was monitored by injecting a change into AD and tracking how long it took to replicate that change. The old way also required you to opt-in to the monitor, configuring it for individual DC’s or all at once. The old method generated a bunch of replication traffic and created objects in AD that could be confusing to admins. We didn’t like this method, and many customers did not either, and so we set out to find a way to monitor replication in a more passive manner. Now we use the built in tools in AD to track replication health. We added a replication queue monitor that will alert you when a DC starts lagging and a replication health check to make sure your DC has replicated within expected thresholds. Note that there are so many different replication environments out there and no one size fits all so some adjusting maybe necessary. As is the case with almost all of our new monitors, both the replication queue and replication health check monitors are configurable. You can set the thresholds for when they alert you so that they will fit your personal environment.

Lastly we changed our philosophy on how we monitor AD from an event driven model to synthetic monitors. In our experience, the event based model is both noisy and not very helpful. When an event rule fired, is there really a problem? Is it still a problem a few hours later? What’s the impact? Therefore we decided to move exclusively to synthetic based health monitors. We put a number of changes into the MP we now call the Domain Member Perspective MP (this was previously named Client Perspective). This MP verifies in a generic way if it can contact the domain so you know that clients are able to connect to it. It also tests the bind performance of your PDC and verifies that group policy is getting applied without errors on your member systems. But the biggest monitor we added to the Domain Member Perspective is the Domain Controller Health Monitor. This monitor has become my favorite for alerting me to an issue on a DC. We have spent a lot of time verifying and updating the logic of this monitor to not throw false alerts, but to know when a DC is not responding like it should and whether or not it is in a state that could impact users. It is our go to monitor in our environment and hopefully it will help you as well.

You can find the new ADDS 2016 MP at the DLC page for Windows Server 2016 Technical Preview Management Packs.

I hope you enjoy the changes we made to the ADDS MP and we would love to have your feedback as we continue to iterate on this and make it better for you and for us. We would especially like to hear from you about how you chose to change the defaults in the new monitors. It would really help us to know what the best default value is and if there are other monitors we can add or ways to adjust the MP to be more efficient out of the box. Please share your feedback about the MP at the SCOM Product Feedback forum. If your feedback has already been suggested by someone else, please up-vote it.

Eric Hunter | Software Engineer | Microsoft

Get the latest System Center news on Facebook and Twitter:

System Center All Up: http://blogs.technet.com/b/systemcenter/

↧

SCOM 2012 R2 now offers additional support for SQL Server 2012 SP3

February 9, 2016, 9:24 am

≫ Next: Update Rollup 9 for Microsoft System Center 2012 R2 Operations Manager Management Packs for UNIX and Linux now available

≪ Previous: Behind the Scenes : Active Directory Domain Services 2016 Management Pack

We are pleased to announce that System Center 2012 R2 Operations Manager (SCOM 2012 R2) now supports SQL Server 2012 SP3 as its database. There is no minimum SCOM Update Rollup (UR) requirement for SCOM to work with SQL 2012 SP3 but we highly recommend installing the latest SCOM UR.

We are currently doing comprehensive testing to validate if there are any issues using SCOM 2012 SP1 with SQL 2012 SP3. Please continue to watch the SCOM blog for news about SCOM 2012 SP1 support for SQL Server 2012 SP3.

Suraj Suresh Guptha | Program Manager | Microsoft

Get the latest System Center news on Facebook and Twitter:

Main System Center blog: http://blogs.technet.com/b/systemcenter/

Configuration Manager Support Team blog: http://blogs.technet.com/configurationmgr/
Data Protection Manager Team blog: http://blogs.technet.com/dpm/
Orchestrator Team blog: http://blogs.technet.com/b/orchestrator/
Operations Manager Team blog: http://blogs.technet.com/momteam/
Service Manager Team blog: http://blogs.technet.com/b/servicemanager
Virtual Machine Manager Team blog: http://blogs.technet.com/scvmm

↧

Update Rollup 9 for Microsoft System Center 2012 R2 Operations Manager Management Packs for UNIX and Linux now available

February 9, 2016, 12:44 pm

≫ Next: New ways to enable Log Analytics (OMS) on your Azure VMs

≪ Previous: SCOM 2012 R2 now offers additional support for SQL Server 2012 SP3

Update Rollup 9 for System Center 2012 R2 Operations Manager Management Packs for UNIX and Linux is now available to download. The KB article below describes the issues that are fixed and also contains the installation instructions for this update. For complete details including issues fixed, installation instructions and a download link, please see the following:

3141435 – Update Rollup 9 for System Center 2012 R2 Operations Manager Management Packs for UNIX and Linux (https://support.microsoft.com/en-us/kb/3141435)

J.C. Hornbeck | Solution Asset PM | Microsoft

Our Blogs

Configuration Manager: http://blogs.technet.com/configurationmgr/
Data Protection Manager: http://blogs.technet.com/dpm/
Orchestrator: http://blogs.technet.com/b/orchestrator/
Operations Manager: http://blogs.technet.com/momteam/
Operations Management Suite: https://blogs.technet.microsoft.com/omsblog/
Service Manager: http://blogs.technet.com/b/servicemanager
Virtual Machine Manager: http://blogs.technet.com/scvmm
Microsoft Intune: https://blogs.technet.microsoft.com/intunesupport/
WSUS: http://blogs.technet.com/sus/
AD and Azure RMS: http://blogs.technet.com/b/rms/
Application Virtualization: http://blogs.technet.com/appv/
MED-V: http://blogs.technet.com/medv/
Application Proxy: http://blogs.technet.com/b/applicationproxyblog/
Forefront Endpoint Protection: http://blogs.technet.com/b/clientsecurity/
Forefront Identity Manager: http://blogs.msdn.com/b/ms-identity-support/
Forefront TMG: http://blogs.technet.com/b/isablog/
Forefront UAG: http://blogs.technet.com/b/edgeaccessblog/

↧

New ways to enable Log Analytics (OMS) on your Azure VMs

February 9, 2016, 4:38 pm

≫ Next: Monitor Active Directory Replication Status with OMS

≪ Previous: Update Rollup 9 for Microsoft System Center 2012 R2 Operations Manager Management Packs for UNIX and Linux now available

Operations Management Suite (OMS) is Microsoft’s simplified cloud-based IT management solution providing Log Analytics, Automation, VM Backup & Site Recovery, and Security & Compliance across any of your on-premises and public cloud environments. We are excited to announce new integration in the Azure portal with Log Analytics (OMS) allowing you to gain insights even faster.

Azure and Log Analytics (OMS) portal experience

OMS has a brand new management experience in the Azure portal. This new experience allows you to create a new OMS workspace, link an OMS workspace to an Azure subscription, and onboard Windows and Linux Azure VMs into the the OMS service.

Onboard my pre-existing Windows and Linux Azure VMs into OMS

To onboard your pre-existing Windows and Linux Azure VMs into OMS select the Log Analytics (OMS) resource.

Pro Tip: Click the star icon on the right to pin the Log Analytics (OMS) resource to your default menu

From here select the OMS workspace you wish to onboard your Azure VM into. On the OMS Workspace panel select the Virtual Machines button.

Lastly, select the Windows or Linux Virtual Machine and click Connect.

That’s it! Within a couple minutes your Azure VM is sending data to OMS.

Windows and Linux with OMS Quickstart ARM Template

If you are new to OMS and want an easy way to see the capabilities offered by the service, we highly recommend signing up for the free tier data plan that includes a 500 MB daily upload limit and seven-day data retention.

Once signed up with OMS, you can click the following buttons below to provision a brand new Windows or Linux VM that comes pre-installed with the OMS Agent.

Note: The OMS Workspace ID and OMS Workspace Key are required to onboard to the OMS Service. These can be found by logging into the OMS Portal and selecting Settings from the large blue pane or the drop down by clicking the Workspace name.

In Settings select CONNECTED SOURCES and the you are able to copy the WORKSPACE ID and PRIMARY KEY.

Windows VM OMS Quickstart

Ubuntu VM OMS Quickstart

Additionally, these templates can be deployed through Azure CLI or Azure PowerShell.

Adding Log Analytics (OMS) to your pre-existing ARM Templates

If your enterprise already has an arsenal of ARM Templates, you can insert the resource snippets below to easily add the OMS Agent.

Note: The OMS Workspace ID and OMS Workspace Key are required to onboard the OMS Service and can be found in the OMS Portal under Settings > Connected Sources

Windows resource snippet

{
  "type": "extensions",
  "name": "Microsoft.EnterpriseCloud.Monitoring",
  "apiVersion": "[variables('apiVersion')]",
  "location": "[resourceGroup().location]",
  "dependsOn": [
    "[concat('Microsoft.Compute/virtualMachines/', variables('vmName'))]"
  ],
  "properties": {
    "publisher": "Microsoft.EnterpriseCloud.Monitoring",
    "type": "MicrosoftMonitoringAgent",
    "typeHandlerVersion": "1.0",
    "autoUpgradeMinorVersion": true,
    "settings": {
      "workspaceId": "[parameters('workspaceId')]"
    },
    "protectedSettings": {
      "workspaceKey": "[parameters('workspaceKey')]"
    }
  }
}

Linux resource snippet

{
  "type": "Microsoft.Compute/virtualMachines/extensions",
  "name": "<extension-deployment-name>",
  "apiVersion": "<api-version>",
  "location": "<location>",
  "dependsOn": [
    "[concat('Microsoft.Compute/virtualMachines/', <vm-name>)]"
  ],
  "properties": {
    "publisher": "Microsoft.EnterpriseCloud.Monitoring",
    "type": "OmsAgentForLinux",
    "typeHandlerVersion": "1.0",
    "settings": {
      "workspaceId": "<workspace id>"
    },
    "protectedSettings": {
      "workspaceKey": "<workspace key>"
    }
  }
}

Supported Operating Systems	Version
Ubuntu Server	12.04 LTS, 14.04 LTS, 15.04, 15.10 (x86/x64)
SUSE Linux Enterprise Server	11 and 12 (x86/x64)
Red Hat Enterprise Linux Server	5,6, and 7 (x86/x64)
Oracle Linux Server	5,6, and 7 (x86/x64)
CentOS Linux Server	5,6, and 7 (x86/x64)
Amazon Linux Server	2012.09 –> 2015.09 (x86/x64)
Windows Server	2008 SP1 and above (x86/x64)

Overall, the new Azure Integration streamlines creating and integrating Log Analytics into your Azure environment. Pair this ease of deployment with our OMS Free tier (500 MB Daily Upload and 7 Day data retention) and powerful Log Analytics and Insights are only a couple clicks away. We hope to see you on the OMS Service, and are happy to take feedback for new features on our User Voice or answer issues emailed to scdata@microsoft.com

↧

Monitor Active Directory Replication Status with OMS

February 10, 2016, 10:51 pm

≫ Next: Update to Windows Server Management Packs

≪ Previous: New ways to enable Log Analytics (OMS) on your Azure VMs

Active Directory is a key component of an enterprise IT environment. To ensure high availability and high performance, each domain controller has its own copy of the Active Directory database. Domain controllers replicate with each other in order to propagate changes across the enterprise. Failures in this replication process can cause a host of problems across the enterprise, so staying on top of replication status is an important task for any Active Directory administrator.

To help with this task, we’ve recently released a new solution for Operations Management Suite: AD Replication Status. This solution gathers information about replication failures throughout your AD environment and surfaces it on your OMS dashboard.

Getting started with the AD Replication Status solution

If you don’t have an OMS workspace yet, you can create one here, for free.

Then you’ll need to connect least one of your domain controllers to OMS. You can view detailed documentation on how to connect a machine to OMS.

If you’d prefer to run from an OMS-connected member server in your domain, rather than a domain controller, you’ll need to set the following registry key on the member server, then restart the HealthService:
Key: HKLM\SOFTWARE\Microsoft\AzureOperationalInsights\Assessments_Targets
Value: ADReplication

Once you have connected at least one domain controller (or member server with registry key set) to your OMS workspace, simply go to the Solutions Gallery from the main OMS dashboard, then click on the AD Replication Status solution.

adrepl1

Using the AD Replication Status solution

When you add this solution to your workspace, you’ll start to see statistics on replication errors in your Active Directory environment, right on your OMS dashboard:

adrepl2

(The “critical” replication error number refers to errors that are over 75% of tombstone lifetime, or TSL. If you’re not familiar with TSL, we’ll talk about it more in just a minute.)

This tile will update automatically every few days, so you’ll always be able to see the latest information on replication errors in your environment.

Clicking this tile will take you into the AD Replication Status dashboard screen, which has more detailed information about the errors that were detected:

adrepl3

Let’s take a closer look at the specific blades that show on this screen.

Destination Server Status and Source Server Status

adrepl4

These show destination servers and source servers, respectively, that are experiencing replication errors. The number after each domain controller name indicates the number of replication errors on that domain controller.

We show the errors by both source server and destination server because some problems are easier to troubleshoot from the source server perspective, and others from the destination server perspective. In this example, you can see that many destination servers have roughly the same number of errors, but there’s one source server (ADDC35) that has many more errors than all the others. It’s likely that there is some problem on ADDC35 that is causing it to fail to send data to its replication partners, and so fixing the problems on ADDC35 will likely resolve many of the errors that appear in the destination server blade.

If you click on a domain controller name, you will drop into the search screen, where you can see more detailed information on the errors on that specific domain controller.

Of course, all the great features of the OMS search screen are available to you to drill in to the root cause of the problem. Here, we’ve filtered down the results to just look at replication errors involving the Schema partition.

We can see that this source server is failing to replicate this same partition with 19 different destination servers, and at least the three shown here are failing with the same error (8451 – The replication operation encountered a database error). Again, this indicates that we can most likely focus our troubleshooting efforts on this single source server (ADDC35) and expect that a single fix will address multiple errors.

The search screen also displays a “HelpLink” for each error. Unfortunately, clicking on this link currently does not work properly, but you can copy/paste it into your browser window to view documentation on TechNet that has more information on the error and how to troubleshoot it. As an example, here’s a clickable link to the help on 8451 errors: http://go.microsoft.com/fwlink/?LinkId=228631

Replication Error Types

adrepl7

This blade gives you information on the types of errors detected throughout your enterprise. Each error has a unique numerical code, as well as a message that can help you determine the root cause of the error.

The donut at the top gives you an idea of which errors appear more/less frequently in your environment. In this example, we can see that the top occurring error codes were 8451 (152 occurrences), 1256 (93 occurrences), 1908 (22 occurrences), and 1722 (21 occurrences).

The list shows the error codes identified, along with the associated message. Again, you can click on an error in the list to drop into the search screen and see more detailed information on each occurrence of that particular error code, across all domain controllers in your enterprise. Here’s an example filtering down to just occurrences of error code 1908:

Tombstone Lifetime

adrepl9

The tombstone lifetime determines how long a deleted object (called a “tombstone”) is retained in the Active Directory database. Once a deleted object passes the tombstone lifetime, a garbage collection process automatically removes it from the Active Directory database.

The default tombstone lifetime is 180 days for most recent versions of Windows, but it was 60 days on older versions, and it can be changed explicitly by an Active Directory administrator.

It’s important to know if you’re having replication errors that are approaching or are past the tombstone lifetime. If two domain controllers experience a replication error that persists past the tombstone lifetime, then replication will be disabled between those two DCs, even if the underlying replication error is fixed.

The “Tombstone Lifetime” blade helps you identify places where this is in danger of happening. In the example shown above, you can see that there are 64 errors that are over 100% of tombstone lifetime (the orange arc in the donut). Each of these errors represents a partition that has not replicated between its source and destination server for at least the tombstone lifetime for the forest. Again, you can click on the “Over 100% TSL” text to drill into details of these errors. Here is one example:

adrepl10

In this case, the data was collected by OMS on December 29, 2015 (TimeGenerated field). The last successful synchronization (LastSuccessfulSync field) was on January 27, 2015 – 11 months earlier. Clearly, this is way past the tombstone lifetime!

In this situation, simply fixing the replication error will not be enough. At a minimum, you’ll need to do some manual investigation to identify and clean up lingering objects before you can restart replication. You may even need to decommission a domain controller.

In addition to identifying any replication errors that have persisted past the tombstone lifetime, you’ll also want to pay attention to any errors falling into the “50-75% TSL” or “75-100% TSL” buckets. These are errors that are clearly lingering, not transient, so they likely need your intervention to resolve. The good news is that they have not yet reached the tombstone lifetime. If you fix these problems promptly, before they reach the tombstone lifetime, replication can restart with minimal manual intervention.

As noted earlier, the dashboard tile for the AD Replication Status solution shows the number of “critical” replication errors in your environment, which is defined as errors that are over 75% of tombstone lifetime (including errors that are over 100% of TSL). Strive to keep this number at 0.

Note: All the tombstone lifetime percentage calculations are based on the actual tombstone lifetime for your Active Directory forest, so you can trust those percentages are accurate, even if you have a custom tombstone lifetime value set.

Replication problems are one of the top call generators for Microsoft’s Active Directory support team. We hope this new OMS solution helps you stay on top of your replication errors and fix them quickly when they occur. For more information on Active Directory replication, please see the Active Directory Replication Technologies topic on TechNet.

↧