by Logo Technical Blog – Future Processing
26.08.2013
Performance with SharePoint 2010 Large lists_

Each SharePoint software engineer should know how to avoid query throttling by writing efficient queries, and designing appropriate list’s structure. The sections below provide practices and tips ensuring your ability to access many items. Additionally, they help to keep site operating smoothly. All information in this article applies to SharePoint Server 2010.


SharePoint 2010 has introduced many new features. One of them is Resource Throttling – designed to avoid performance degradation that can occur while working on a list containing a great number of items. SharePoint is able to manage a very large list (60,000,000 items per list, according to MSDN and SharePoint 2010 boundaries [1]), however it requires knowledge of different techniques, described below, which can affect the performance of these large lists.

You probably wonder why the Resource Throttling feature has been introduced in SharePoint 2010. To answer your question, you should take a look at how SQL Server tries to minimize a database connection to improve performance. SQL Server often uses a low-level locking technique to ensure updates without directly affecting other users accessing alternative rows at the same time [2]. In some circumstances, i.e. while executing a query that reads more than 5,000 rows blocked at once, it is more efficient to temporally lock the entire table until the operation is completed. If the lock is escalated to the entire table too often, other users will experience a system slowdown.

The default value of List View Threshold property is 5,000 items, and can be configured independently for each Web Application. SharePoint’s reason for choosing 5,000 as the throttle has some basis in reality. It is the approximate value at which a SQL row lock is escalated to a table lock. This value specifies the maximum number of rows in the content database that can be accessed in order to execute query. Number of results returned by query does not matter, contrary to the number of items scanned at the row level in content database. As a result of this action, the operation may be forbidden, even if the expected number of results is of a lower value than a List View Threshold property.

The most common approach to improve the performance of various query operations is to introduce indexing columns. This method is rather simple, however the lack of knowledge of a list columns indices functioning under the hood, may even cause performance degradation! SharePoint indexing columns are somewhat similar to the ones in a database table. In addition, all items in any site list in SharePoint are stored in a single database named AllUserData. Indices are maintained in an additional database, called NameValuePair by SharePoint instead of the SQL Server. Why? Because each list is stored in the same table; what is more, would have different columns indexed and as a result we would have huge amounts of overhead. Each indexed column needs to be stored in a NameValuePair table, and has to be joined with the AllUsersData table while query is being executed. You can simply calculate that NameValuePair table stores additional rows in the amount of Items in the List multiplied by Indexed Columns in the List. Maintaining these values entails additional processing time when adding, updating, or deleting items from the list. The conclusion is that columns, which need to be indexed should be carefully selected to avoid unnecessary overhead.

SharePoint allows creating many filtered views for each list or library. To ensure a quick access to these views, it is required that the first column specified in the filter must be indexed. In addition, filtering by the first column of the filter cannot return more items than List View Threshold, even if the final number of items of filtered view is lower than List View Threshold. It is also worth to mention, that items in a recycle bin also matter when a user accesses filtered views. These items have not yet been deleted from the database, so the result set may be truncated, or unavailable at all.

Another common performance mistake is missing OrderBy clause while querying a list. A simple way to avoid query throttling is to ensure the use of one of three dedicated OrderBy clauses: ItemEnumerationOrderByID [3], ItemEnumerationOrderByPath [4], or ItemEnumerationOrderByNVPField [5] from Microsoft.Office.Server.ContentIterator [6] class which enables the usage of the index. Without one of OrderBy clauses, mentioned above query may be blocked. Why? SharePoint, behind the scenes adds OrderBy clause that orders by content type, just to be sure that folders are returned before items, but the content type column is not indexed by default. So in case a total number of items reaches the throttling limit, query will be blocked.

Another approach to execute resource-consuming query is to specify a daily time window when large queries can be executed. Let’s assume you have nightly job that processes a great number of items. In that case it would be perfect to create a time block during the night (when usage is low) and allow job’s queries to run without enforcing the list throttling limits. Any query executed during that period would run until completed, even if it does not finish within the specified time window.

The next way to improve the efficiency of data access is organizing items into folders. Whenever new folder is created, SharePoint behind the scenes creates an internal index. When you access items in a folder, you are effectively using an internal index to access the data.

To sum up, knowledge of techniques and tips, as well as following the recommendations are particularly important when users work on many items in a list or library. The performance of a SharePoint strongly depends on decisions made during setting up and planning SharePoint site, especially if it contains lists or libraries with many items.

And now…
What are your experiences of working with great number of items on SharePoint platform?

[1] SharePoint Server 2010 capacity management: Software boundaries and limits. [Online].
http://technet.microsoft.com/en-us/library/cc262787(v=office.14).aspx#WebApplication

[2] SQL Server Transaction Locking and Row Versioning Guide. [Online].
http://msdn.microsoft.com/en-us/library/jj856598.aspx#Lock_Engine

[3] ContentIterator.ItemEnumerationOrderByID property. [Online].
http://msdn.microsoft.com/en-us/library/microsoft.office.server.utilities.contentiterator.itemenumerationorderbyid(v=office.14).aspx

[4] ContentIterator.ItemEnumerationOrderByPath Property. [Online].
http://msdn.microsoft.com/en-us/library/microsoft.office.server.utilities.contentiterator.itemenumerationorderbypath(v=office.14).aspx

[5] ContentIterator.ItemEnumerationOrderByNVPField Property. [Online].
http://msdn.microsoft.com/en-us/library/microsoft.office.server.utilities.contentiterator.itemenumerationorderbynvpfield(v=office.14).aspx

[6] ContentIterator Class. [Online].
http://msdn.microsoft.com/en-us/library/microsoft.office.server.utilities.contentiterator(v=office.14).aspx

Related Posts

Comments

Cookies

This website stores cookies on your computer. These cookies are used to improve our website and provide more personalized services to you, both on this website and through other media. To find out more about the cookies we use, see our Cookies policy.