It is not uncommon that we are called upon to assist a customer who is having asynchronous deadlock issues with their Microsoft Dynamics CRM solution. Often times the deadlocks are caused by missing indexes, out of date statics, insufficient hardware, server configuration, bad design decisions or excessive use of plugins and workflows. There are several good posts on how to resolve some of the more commonly known issues, but recently I found myself running through all these posts and still not able resolve the deadlocking issues that plagued my 2011 CRM environment.
The CRM solution I was working on had about 10 plug-ins running as async, 15 workflows and email blasts of roughly 50K two to three times a month. This load, although not small, was not (in database terms) enough traffic to cause the problems we were seeing:
- Async service would kick off 2,000 jobs (the default setting for the AsyncItemsInMemoryHigh)
- Run into deadlocks on nearly all the 2,000 jobs
- Put the started jobs in pending status until the next retry interval
- Load the next batch of jobs and start the process all over again
This process would continue until all the jobs were in a pending status and nothing was executing.
Looking for Clues
I ran through the following troubleshooting steps:
- Verified all async workflows and plugin designs
- Update the database isolation level to read snapshot committed
- All indexes were up to date and with a default fill factor of 80%
- Additional indexes were added based on the SQL Server Database Engine Tuning Advisor
- All statics were up to date
- A single dedicated Asynchronous server was set up with all other async services turned off
- The registry key “AsyncDBAppLock” was added to the async server to assist with reducing deadlocks
- Reduced the number of async jobs that were executed at once
- Reduced the interval time between jobs
- Reduced the number of retry attempts
None of the above steps resolve the deadlocking. After reviewing the SQL trace logs in great detail, we found the issue was the locking of the clustered indexes on key workflow and asynchronous tables.
Mystery Solved
By default, the Dynamics CRM table’s primary key has a clustered index on it. Each index has two properties set to “true” for locking:
- Allow row locks
- Allow page locks
These settings are critical ensuring that SQL is keeping the smallest lock possible when reading or writing from the table. In this CRM instance, these settings were set to false causing SQL to hold an exclusive level lock blocking all other write transactions.
If you find your CRM instance running deadlocking issues there are a lot of great resources listed below, but if those do not resolve the issue, check the lock settings of the indexes on the following tables:
- asyncoperationbase
- workflowwaitsubscriptionbase
- workflowbase
- workflowbaseids
- workflowdependencybase
- workflowlogbase
These verify/update these settings by:
- Opening up SQL Server Management Studio
- Expand the organization database
- Expand tables
- Expand the table in question
- Expand indexes
- Right-click the clustered index and select properties
- In the properties dialog, click on Options
I want to thank Microsoft Premier Support in helping resolve this issue. They were great to work with and provided that much needed second set of eyes.