Tag Archives: building

It Looks Like You’re Building a Large Library. Would you Like Help?

SharePoint 2010 is more than just SharePoint 2007 plus a bunch of new bullet points on the box. We didn’t just haphazardly build a bunch of new features, look back at the fertile seeds we planted, and muse about how “everything should work pretty well as libraries get large.” We built, and more importantly, tested all the features you’re reading about with scale in mind. We are setting new scale targets for 2010 that go above and beyond what we set in 2007. These numbers are not final yet, but we’re shooting for tens of millions of documents in a single library, depending on some specific parameters of your scenario.

When I throw out numbers like that, I’m not talking about just big, static libraries with content that just sits there. We want you to do crazy things with SharePoint 2010 like stuff a million document sets in a single document library with workflows running every which way, a hundred different retention policies firing off actions when you least expect them, and users uploading, tagging, and searching day in and day out. All the goodness of the SharePoint platform will be available to you whether you’re building a team site, a collaborative repository, a knowledge base, or a super large archive.

Like a plump, juicy sausage, much of the good stuff in SharePoint 2010 to give it delicious scalability are things that most people don’t need (or want) to know about. For the most part, scale just works. However, the chef (or information architect) is still a super important player. A well-planned repository is one that will have your users coming back for seconds and writing rave reviews; a poorly-planned one is one that will have them chugging Pepto-Bismol the next morning. Just because you can stuff a bunch of documents in a SharePoint 2010 library without your server igniting in flames the next day at doesn’t mean that you should without first thinking through how to best use the tools available to deliver an excellent experience to your end users.

So, even though scale in SharePoint 2010 just works, you’re not going to install the bits on day 1 and have a massive, searchable, beautiful content storefront on day 2. Guidance still matters, and believe me, we know it; this blog entry is just the beginning of the content we’re planning on delivering to help you on this front. I wouldn��t even call this blog entry guidance; it’s just a primer on the features and capabilities of SharePoint 2010 that you will grow to love if you’re passionate about scale at the library level – if you want to shove a whole bunch of documents in one place and have it be a great experience for both IT and your end users.

So what are these features and capabilities? Here are a few of the most important ones that I’m going to blog about now and in the near future:

  1. We protect your database backend from dangerous queries. If you run a query against any database that requires it to scan through millions of items to find the ones you’re asking for, you’re going to balk at how long it ties up the server’s CPU. Quite frankly, SharePoint is not an exception. Even in SharePoint 2010, there is a class of user operations in certain scenarios that make unreasonable demands on the backend. For these operations, our strategy is to nip them in the bud before they’re executed, which keeps your high-demand servers healthy and responsive. Knowing when this throttling will kick in and planning for it is an important part of large list planning.
  2. We give end-users tools to find content. When you have a sea of documents, the specific one you’re looking for can seem like a needle in a haystack. Structured metadata, easy tagging, metadata navigation, and built-in search refinement make this a less daunting task in SharePoint 2010 out of the box. This is an area we are particularly passionate about; after all, what good is a hugely scalable library if your end users hate it and can’t find what they’re looking for?
  3. We help developers write excellent code. In SharePoint 2007, we didn’t give developers the right tools to write code that scaled well as the amount of content in your site grew. Even worse, it was pretty hard to tell why and when code was bad, and if your site was running slowly, which one out of your ten custom web parts was bringing things down. You had to “build around” SharePoint and do things “just so” to avoid this from happening. In SharePoint 2010, we give you a bunch of tools to make this story better.

Dangerous Queries

One challenge we’ve consistently seen customers run into when building large repositories on SharePoint 2007 is trouble with large containers. As the number of documents in any single container grows – either at the root of a library, or in a folder – bad things start to happen. For one, as your document to container ratio increases, it becomes harder and harder to find exactly what you’re looking for. More serious are the performance implications of large containers. Any of the out of the box ways of retrieving content from containers in SharePoint 2007 – like the All Documents view, the Explorer view, or a Content Query web part – would work, but they don’t scale very well. Loading All Documents in a library with a million items at the root would take a long time to finish. The big problem here is that you wouldn’t be the only one affected; all your friends running SharePoint sites on that same database server would experience things slowing to a crawl as well, as the database server dutifully iterated over those million documents to find the right ones.

Why does this happen? Any time you ask for content from SharePoint, you have to specify how it’s sorted – for example, the All Documents view in SharePoint 2007 asks for the top 100 results, sorted by filename. But items aren’t sorted by filename in the SharePoint content database – so, to bring you this view, SharePoint has to gather up all these million items, sort them, and finally display the 100 ones at the top of the sorted list. Imagine this as being like flipping through the residential section of a phonebook to find the first 100 addresses, sorted in alphabetical order. This would be a miserable task, because the telephone book isn’t sorted in this way – so in order to ensure your sorted list was accurate in the end, you’d have to look through the entire residential section, from start to finish, because after all, the last person listed in the phone book might live at 1000 Aardvark Lane.

Large Lists and Fallback Queries

The laws of physics are the same in SharePoint 2010 as they were in SharePoint 2007; if you run a query that needs to touch a very large number of items, you’re going to have to wait a long time, and so will everybody else. One prominent thing we did in SharePoint 2010 is to nip these queries in the bud before they get executed. To make a long story short (you can read the long story here), a farm administrator can set a threshold which defines the maximum number of items a single SharePoint query can touch. By default this threshold is 5,000. Any library with more items than this threshold is a large list.

Let’s go back to our example of the library with one million items at the root. Say you had that library in SharePoint 2007, and you upgraded to SharePoint Server 2010. First thing you’ll see upon navigating to this library will look something like this:


See the yellow bar above the list view? That’s a sign you have the Metadata Navigation and Filtering site feature turned on and it’s causing something magical to happen! When you load this view, SharePoint 2010 knows that you’re being greedy and asking it to scan through those million items. Since this query exceeds the maximum number of items a single query is allowed to scan (5,000) it doesn’t run the query. But who wants to stare at an empty list view? Instead of running this query as-is, SharePoint finagles it a bit and transforms it into a query that’s almost as good as the one you were asking for, but won’t make the database buckle under the pressure. In this case, we assume that it’s fairly likely that the document you’re looking for is one of the most recently created items in the library – so instead of scanning all one million items, we only scan the top 1,000 or so recently created documents, sort those by filename, and show them to you in the list view. This is what we call a simple fallback query: a query that doesn’t specify an index and asks for too many items in return, so instead of considering the entire list as being eligible for the query, SharePoint considers only the thousand or so most recently added items.

“Wait a second. You’re telling me that SharePoint throttles queries without asking me first? How on earth am I supposed to find anything in this crazy world of fallback queries and partial results?”

Let me assure you; this throttling business is a good thing. It’s a core ingredient in what makes SharePoint 2010 a resource for addressing your scale challenges. Gone are the sleepless nights where you toss and turn and worry about page faults on your database cluster resulting from Mack in Accounting stuffing 6,000 beer pong tournament photos in the root of a library in a forgotten team site in the dusty corners of your SharePoint deployment. The SharePoint 2010 feature set replaces this overarching concern with a set of well-scoped challenges; instead of worrying about every library that might get big, you get to plan for and craft experiences for the set of libraries that need to get big for business reasons.

I should mention really quickly that throttling is about more than just list views. There is a whole class of operations that involve iterating through all the documents in a list, or all the documents in a folder, that will get throttled (in other words, they will not execute) when the list or container is large. These operations include things like:

  • Adding a column to a library
  • Creating an index on a library
  • Deleting a large folder in a library

Metadata Navigation – finding and working on content in large lists


Above is another screenshot from my million item library. This time, we’ve put a couple of SharePoint 2010 features to work. See that I have “demonstration scripts” selected in the left hand side in the tree view, and my list view is rendering without the yellow bar that’s telling me I’m only seeing newest results. That hierarchy of tags you see there represents a taxonomy, Item Type. I am browsing the documents in this library according to their Item Type; in the screenshot, I am filtering to show all documents with the value “demonstration scripts”. Here are the steps that I took to make this happen:

  1. I created a taxonomy that describes my content. You can look forward to some posts from our very own Dan Kogan on this very topic in this very blog in the near future. There’s a lot to learn here. Not just any taxonomy will do here; it needs to be one that broadly divides my content up into evenly-sized buckets. For example, if I had 990,000 demonstration scripts, the query you see above would not get me anywhere. In that case, it wouldn’t make much sense to use Item Type as a piece of metadata and a navigation hierarchy for this library; I would need to find another, more divisive way to pivot the data.
  2. I bound that taxonomy to a field in my library called Item Type. Think of a taxonomy field as a choice field on steroids. Instead of picking values out of a flat list, you pick them out of a tree.
  3. I configured that field as a navigation hierarchy. Every library now has a Metadata Navigation and Filtering settings page where you can configure navigation hierarchies (the filters you see arranged in the tree view) and key filters (the additional filters that show up beneath the tree view)

In these three easy steps, I made “Item Type” a first class navigational pivot over the data. Instead of just staring at a partial list of content at the root, I can now browse with impunity by this virtual folder structure.

Here’s a couple of cool aspects of this feature that aren’t apparent from a single nifty screenshot:

  1. Metadata navigation lets you slice and dice multiple ways. I might have a bunch of taxonomies on my library that classify content in different ways; for example, I might have a Products field, a Region field and a Competitors field, all bound to domain-specific taxonomies that classify the content along those dimensions. Depending on my current task, it might make more sense to filter by the Region field (for example, if I’m looking for the latest sales figures for the North America region). I get more filters than just my virtual folder; I can combine this filter with any number of key filters or list view column filters to drill down to just the content I want (for example, I want to see all demonstration scripts by the ECM team created after 2007).
  2. Metadata navigation thinks about indices and large lists so you don’t have to. Hey, remember just a few minutes ago when we were talking about large lists, indices, and being throttled? Well, metadata navigation thinks a lot about indices and how to run queries the “right way” to make them perform well and prevent throttling from happening. For starters, all the fields you configure as navigation hierarchies and key filters get indexed, and the resulting queries are written in a way that ensures the best index is used to make the query succeed.

You aren’t immune from the laws of physics; if you ask for documents tagged with demonstration scripts and there are 10,000 demonstration scripts, we’re not going to be able to show you all of them. In this case, though, you get something better than a simple fallback; you get an indexed fallback, which means that instead of considering the entire list, the query considers only the items that match the indexed portion of your query.


This article was just the first in my series of posts about architecting and building large lists filled with discoverable content. Here’s what you can expect over the next few weeks:

  • A deep dive on metadata navigation, how it works, and some tips to getting the most out of it
  • A discussion on how other features, like Search and the Content Query Web Part, fit into the equation, and how to configure their metadata filtering capabilities
  • Some geeky developer tips on writing code that plays nicely with large lists

After that, I’ll be widening my scope a bit to talk about the overall knowledge management story in SharePoint 2010 – which is about more than just browsing for content in a library!

Lincoln DeMaris, Program Manager, ECM

Read More

Export SAP data to SharePoint with Winshuttle Query

When business users are automating processes by essentially building applications in SharePoint, there are often use cases where a solution needs to include data from SAP. It could be mash-up scenarios between SharePoint content and SAP data or it could be simple use cases where having SAP data in SharePoint for easy access makes sense.

There are many ways of achieving this through custom programming. However, what I wanted to bring to your attention here is a new feature of Winshuttle Query which effectively empowers a business user to create an SAP data query and export the extract to a SharePoint list without writing any code.

I have recorded a five-minute video that demonstrates how this works. The video shows the desktop approach which is handy for quick prototypes and ad-hoc requirements. There is also an enterprise version of this functionality where the data queries can be scheduled to run regularly on a server, automatically keeping the SharePoint lists in sync with the SAP tables. Apologies if my voice sounds a bit muffled on the video. I blame Camtasia’s voice optimisation.

Read More

SharePoint Sideshow – Building a SharePoint Development Machine Using the Easy Setup Script

  Watch now: http://bit.ly/SPSS-EasySetupScript SharePoint Easy Setup Script is a new tool released by Chris Johnson and Paul Stubbs to help you set up a SharePoint 2010 developer machine without requiring Hyper-V. The Easy Setup script will set…(read more)
Read More

Happy About My Resume: 50 Tips for Building a Better Document to Secure a Brighter Future

Many great job candidates have poor resumes that are merely a laundry list of job tasks that do little to distinguish them from their competition. The average recruiter or hiring manager spends less than 15 seconds reviewing a resume. Most people’s resumes fail to “wow” the reader and quickly end up in the “no” pile.

Writing a resume can feel like an overwhelming task. It can seem like a Herculean effort to consolidate so much important information about a career into a one or two page document. But it doesn’t have to be that way! In ‘Happy About My Resume’, Barbara Safani offers 50 tips for creating compelling copy and presenting it in a powerful way to grab the hiring authority’s attention and get them to pick up the phone to call you in for an interview. Safani provides practical and easy-to-follow advice as well as numerous samples that show each of her tips in action.

The book will help readers learn how to quickly create a resume that is professional, gets them noticed, minimizes the amount of time they spend in a job search, and maximizes their earning power. The book is for anyone who wants to proactively manage their career and improve the quality of their current resume or create a resume from scratch.

Buy Now!

List Price: $ 19.95
Price: $ 16.62

Building Integrated Business Intelligence Solutions with SQL Server 2008 R2 & Office 2010

Master Microsoft’s Business Intelligence Tools

Building Integrated Business Intelligence Solutions with SQL Server 2008 R2 & Office 2010 explains how to take full advantage of Microsoft’s collaborative business intelligence (BI) tools. A variety of powerful, flexible technologies are covered, including SQL Server Analysis Services (SSAS), Excel, Excel Services, PowerPivot, SQL Server Integration Services (SSIS), Server Reporting Services (SSRS), SharePoint Server 2010, PerformancePoint Services, and Master Data Services. This practical guide focuses on deveoloping end-to-end BI solutions that foster informed decision making.

  • Create a multidimensional store for aggregating business data with SSAS
  • Maximize the analysis capabilities of Excel and Excel Services
  • Combine data from different sources and connect data for analysis with PowerPivot
  • Move data into the system using SSIS, InfoPath, Streamsight, and SharePoint 2010 External Lists
  • Build and publish reports with SSRS
  • Integrate data from disparate applications, using SharePoint 2010 BI features
  • Create scorecards and dashboards with PerformancePoint Services
  • Summarize large volumes of data in charts and graphs
  • Use the SSRS map feature for complex visualizations of spatial data
  • Uncover patterns and relationships in data using the SSAS data mining engine
  • Handle master data management with Master Data Services
  • Publish the components of your BI solution and perform administrative tasks

Buy Now!

List Price: $ 50.00
Price: $ 29.61

Building Business Intelligence Applications with .NET (Networking Series)

Data is the lifeblood of modern business. As computerized systems have spread throughout all facets of business, the amount of data collected has exploded. Data is captured in almost every consumer activity, and most of this data ends up in some kind of transaction processing system, but it only has value if it can be used to make better decisions. Business intelligence allows you to pull all of this data together to help form a unified view of your enterprise that executives and analysts can use to generate insights and make better decisions. Making this data accessible, however, is a challenge facing all businesses. The commercial tools available do not always address all of the issues of particular data, so most companies create their own solutions.

Building Business Intelligence Applications with .NET provides a practical guide to creating these tools, by teaching corporate software developers how to use .NET technologies to build applications for data mining, statistical analysis, or OLAP, and explains how to integrate them with existing transactional applications. By using the compelling technologies offered by .NET (such as Web Services, Smart Clients, and XML for Analysis) developers can construct BI applications that address their critical data, analytical, and flexiblity issues.

Key Features
* Discusses how business intelligence plays a crucial role in managing risk by discovering trends in data before situations become critical
* Details the steps to take to create business intelligence in any organization
* Covers key topics such as data warehousing, OLAP Cubes, ADOMD.NET, Windows SharePoint Services, and Crystal Reports
* Helps developers make the transition from creating transactional applications to BI applications
* Includes a CD-ROM with all of the source code from the book, as well as URLs for relevant standards documents (XML, Web services, etc.)

On the CD!
* Code
Contains all of the ready-to-use source code from the book
* Projects
Contains all of the sample projects from the various examples in the book

PC: 450 MHz Pentium II-class processor; 600 MHz Pentium III-class or better processor recommended. Memory requirements vary by operating system with 160MB of RAM as the minimum requirement for Windows Server 2003 or XP Professional, 192MB of RAM for Windows 2000 Server and 96MB of RAM for Windows 2000 Professional. More RAM will yield better performance. In addition to the space required by the prerequisite applications, the sample applications require 25MB of free disk space. To use the sample applications, you will need to have access to the following software: Windows 2000 or better, Microsoft SQL Server 2000, Microsoft Analysis Services, Microsoft Visual Studio .NET 2002 or better

Buy Now!

List Price: $ 49.95
Price: $ 0.74