Restructuring a Very Large Document Library

Marc D Anderson
  • Restructuring a Very Large Document Library Marc D Anderson

    Another question about the Very Large List I referenced in this question: http://sharepoint.stackexchange.comquestions/445/deleting-old-versions-in-a-document-library

    This Document Library has over 5000 items and has outgrown its britches. The Document Library is an artifact repository for the SDLC process at one of my clients. Given this, each document belows to a specific project, which is specified in the item's metadata.

    All of the documents currently live in the root of the Document Library. What we'd like to do is move the documents into folders by project to both improve performance as well as to reduce the likelihood of filename repetition causing destruction by overwrite.

    This is a "no managed code" environment (don't ask), so I'm trying to use SharePoint's Lists and Copy Web Services to move the documents. The Copy Web Service copies the docs just fine, but we want to preserve the Created and Created By column values if at all possible. I've tried to set the Created and Created By columns ReadOnly=False to update them using the Lists Web Service's UpdateListItems operation, but can't make that work.

    Any ideas?

    Update 2009-11-12 - The Copy Web Service seems to preserve the Created and Created By column values just fine. Turns out that the last hurdle is unlinking the copy from its source. I can't figure out a way to empty out the _CopySource column. No combination of setting things like ReadOnly="False" on the column, etc. seems to work. This is one capability that I would expect to be part of the Copy Web Service, as it's exposed in the UI on the page when you look at the newly created item. If I delete the source item, then the _CopySource is still there, but with an invalid link back.

  • Marc,

    Don't know if these stsadm extensions would be off limits to you or not, but you may find using them you can create some scripts to move the data for you based on the metadata? I would start here if you are able to use them. http://stsadm.blogspot.com/2007/08/stsadm-commands_09.html

    Lori

  • Marc, is it an option to split the documentlibrary into a library per project as this would do wonders performancewise ?

  • What about doing it manually using the explorer view?

    You might want to update the filenames through the web services to prefix the files with the projectname (I hope you can perform something like a SystemUpdate() that preserves the user and datetime), so that you can easily select all files for a project (sort by filename) and move them in one go.

    When all files are moved, rename back again using the webservices.

  • After seeing my tweets about this, Peter Senescu at Metavis Technology suggested I look at their MetaVis Architect for SharePoint product. I've fired up the trial version, and it looks like it will get me where I want to go on this. I have no association with Metavis, and I've got to say that the product is really cool. It's going to let me move large groups of documents in one go based on their metadata values, which is exactly what I wanted to do. The one drawback is that we will lose the original Created and Created By column values, but we've decided that is an OK price to pay.

    One nice thing about this tool is that there's no server-side install, which was a must-have in this case. It looks like it uses the Web Services (my recent favorite domain!) to get things done. I was trying to build my own Web Serivce-based solution, but I was never going to get anywhere near this level of sophistication.

Tags
document architecture
Related questions and answers
  • ? I can't for the life of me determine why some files are throwing this error while others are copying just fine. UPDATE The files copy without any problem if we go into the doc lib's settings and turn off versioning. The error must somehow be related to versioning, but the versions of these documents are not consistent with the error. Documents with versions less than one (eg 0.6) copy fine...I have a document library which contains some Word documents that I'm trying to copy down to my local disk. I've opened the document library using Windows Explorer, and am dragging the individual

  • I have a Very Large List (~5000 documents) which has been chugging along for about a year. It has had versioning turned on, with major versions, keeping 2 old versions. We're totally restructuring the Document Library (more on that in another question) and I'd like to simply get rid of the old versions across the board to clean up the library and regain space. I've turned versioning off, but the versions don't go away, when I think they should. Any thoughts? One thing it's important to note is that I inherited the Document Library. It is a little odd in several ways and I'm not exactly

  • I am facing problem with document check out in a SharePoint document library. This is the message I am getting: alt text http://img11.imageshack.us/img11/1575/libraryerror.jpg I'm using the out of the box document library. I have logged on as Admin Document is not already checked out I have tried creating a new document library I have tried creating a new site collection UPDATE 1: Check out works with Firefox but not IE. UPDATE 2: No 2003 products are installed. This happens with all users. When I check out a document in-place (locally), check out works fine... This might be an IE

  • I recently authored a C# based script for a blog posting, which copies column data from one file to another. While testing this solution I noticed that copying this data works differently depending on the file format, which is painful to say the least. For 'non Office 2007' files I can just assign the source file's Properties variable to the destination file. For Office 2007 files I need to write the destination file, iterate through all fields of the source file and then update the destination file again. If I treat all documents the same then either meta data is not copied over

  • I created a Calendar List and a Project Tasks list. Then I added a lookup from the Tasks list to the Calendar list. However, when I try to connect the two with the related field visible on both List View WebParts, the connection is greyed out. How do I make this work? There is very little information available on connecting WebParts on a web part page. This seems to be too advanced a topic... on the topic. Office Online, Connect Web Parts, is not complete. Update 1/29/2010: Rereading my question and Brian's answer, I've given him the credit. This question really has two parts and he did

  • but I can't find anything ideal, or even just something I'm semi happy with. I'm hoping that people could answer with ways they can think of for using a workflow to drive further document creation...SharePoint workflow by nature is document centric in that you have to have a 'thing'; A object for the workflow to run on. This object can be a document, form or simple list item but this thing has... or automatic creation of an accident list item. Now the process will need to have a set of official forms filled in by various people. This is where I find SharePoint limiting me with it's document centric

  • merge also) 2 different sources: Item list Document library I've got one of the columns of the document library matches the column of the item list (it's actually a lookup) I can query and filter... in the iframe: _layouts/filter.aspx?ListId={YOUR-LIST-ID}&FieldInternalName=DocIcon&ViewId={YOUR-VIEW-ID}&FilterOnly=1&Filter=1 I can't seem to capture the one that get's sent though when the datasource is a join of 2 different lists. The question is has anyone successfully enabled the column header filtering and obtained the 'suggested' filters when 2 datasources were used? If so

  • only site collections/sites using non-publishing templates. All other content is availble e.g. documents, lists Has anyone else come across this? UPDATE 08/05/10: I have been doing some further...This is an old issue I thought this was a bug in an specific environment after reading a couple of blogs it has made me think again... The issue is that SharePoint site collections and sites... in the index under the contentclass:STS_Site and the sites (SPWeb's) are contentclass:STS_Web. The following people seem to have come across this issue and assumed it was just a limitation. I think

  • for the set looks like a folder instead of the document set icon clicking on the document set doesn't show the document set home page, it just behaves like a normal document library running CAML queries shows that both internal fields HTML File Type and ProgId should be SharePoint.DocumentSet but instead they are blank changing a field value on the document set folder doesn't propogate to its contents...I have a document library using a document set content type. I can create a new document set through the UI and it works perfectly. I'm now trying to create a document set in this library