SharePoint 2010 pain - splitting a large content database
I know, I know. SharePoint 2010. Bad. But here I am.
Much of the pain I have been going through over the last week or so is frankly caused by the biggest mistake that a someone designing a new SharePoint environment can make: not planning for data volumes. It doesn’t matter how many times your client tells you “ah, it’ll never grow that big, we won’t use it much” - ignore them, they are lying.
I have learnt the hard way (so you don’t have to) - always design your environment so that it scales nicely.
So here’s the issue: what started out as a modest SharePoint environment - of around 50gb, that was supposed to be a temporary DMS / document repository until a permanent solution was found… has become the permanent solution. Yay for SharePoint! (It is actually rather good at storing large volumes of data.) But the issue is the design of the storage simply didn’t accommodate the future growth, which today stands at around 350gb. This is in a small number of lists, in a single site collection in a single content database. Eek. Microsoft published a list of guidelines in terms of best practices - the salient one here is that site collections shouldn’t go much more than a 100gb originally, and subsequently upped to 200gb after SP1. This is for ‘active’ environments anyway… ‘archive’ site collections can go bigger.
And in fact, they are really only guidelines as the database has sailed merrily through these limits and is still perfectly functional today. But it causes a huge administrative nightmare in terms of resilience/redundancy/backup and how you’d cope in the event of a recovery scenario.
Moving and restoring large databases is a pain in the backside. Touch wood we’ve not had an issue yet, but it would only take for a couple of disk failures in the SAN for it to cause a world of pain.
So I set about looking at ways to split out the data in to a series of smaller, more manageable, content databases. By virtue of this, it would require multiple site collections.
Scripting out to create the new required site collections with their own content databases is relatively trivial but what do you do when you then need to migrate the content from the ‘primary’ site collection to the ‘new’ site collections?
SharePoint provides a variety of methods for doing this, but, in SharePoint 2010, at least, none of them are right on spec.
The most obvious one is to use ‘
Export-SPWeb‘ and target a specific list. You can do this through Central Administration or through Powershell and it will spit out a bunch of sparse files, which can be used to restore a copy of the list to the new site collection.
There are a couple of key issues with this:
- It is SLOOOOW! I don’t think it was designed to be run on lists with 100k+ items which in my testing would take hours to export and the same again to import.
- Worse, you lose a lot of the history with the new list - particularly the date created metadata. So this was a no-no.
- Furthermore, (so… 3) if you do this via Central Administration you can monitor progress, but I don’t particularly like that it runs in the timer service. And on a long job like this, if it was interrupted for any reason, then you’d be in nomansland.
So, what else?
Do it up the backend
Well, I looked in to options of doing it all from the SQL side of things. This is much quicker, but I was not able to find a workable solution.
For instance, I could backup my primary database, then restore it to a new one and using an old and overlooked command in the old stsadm (stsadm -o addcontentdb -databasename -url) etc. you can add it and target it to a new site collection! Great!
Except it doesn’t work, because in this scenario you can’t add it since the database ID is the same as your primary. You can use
Mount-SPContentDatabase with the -assignnewdatabaseid switch which will mount it - but to the web application and not to a specific site collection- but it warns you of issues with orphaned site collections. Theoretically, you could then dismount it, and then use then use the previous stsadm command to add it to a site collection. This seems like it would work but feels cumbersome.
IF we were in SharePoint 2013 then we could use the new Powershell CMDLET ‘Copy-SPSite’ (in SP2010 we only have ‘Move-SPSite’) which may be useful. But we’re not in SP2013. Moving on…
So we come to the backup operations. Through Central Administration and Powershell you can
Restore-SPSite. So ultimately just restore copies of the current primary site collection. This will obviously restore all of it and then you need to delete the bits you don’t want after. This is not a fast operation either but so far seems to be the most workable.
I’ve not touched on the 3rd party tools that are available (Sharegate, Metalogix etc.) but as far as I know these all use the built-in operations anyway.
Long story short - design your SharePoint environments with scaling in mind from the beginning. Let me know in the comments if I’ve missed anything obvious!
Update (a few hours later): Ah, balls.
SharePoint site collection backup and restore is only supported for a maximum site collection size of 100 GB. For larger site collections, the complete content database must be backed up. If multiple site collections larger than 100 GB are contained in a single content database, backup and restore operations can take a long time and are at risk of failure.