SQL Server 2005 Partitioning

One way to speed access to a large table is to partition the data. Oracle has partitioning and now so does SQL Server. I've worked with Oracle's version for a while, but this is my first foray into the SQL Server variation.

I'm going to try to keep this as brief as possible, so I'm going to leave out much theory and background and leave that as an exercise for the reader and his friend Google.

In a nutshell partitioning means subdividing your table so that a query only has to spin through the subset of records, not the entire table.

In practice both Oracle and SQL Server allow you to divide your table up into more than one physical file, with certain ranges of data assigned to each file. This has speed advantages when querying a subset, but it also has management advantages since you can perform tasks like backups and archives on individual files instead of having to test every record in the table to see if it meets your criteria. Being included in the file it is in means that it meets the criteria.

I'd say most people who look to partitioning are doing so because they want to speed up their queries. Fair enough. Will it work? If so, how much speed will be gained? There's really only one way to find out, right? Marketing materials! No, wait, I mean, Experimentation!

For my experiment I start with a real table to which I need to speed access. The table has about 32 million records and 33 fields.

The table has historical data that is broken into six month ranges. In other words, you will typically only need to access six months worth of the data at a time, but it could be any six month period in the last two years. Without partitioning, the dbms has to spin through all two years for every query. With partitioning it only has to spin through the relevant six month period.

The create script (with names changed to protect the innocent) is as follows:

CREATE TABLE [testTable](
    [a] [bigint] NOT NULL,
    [b] [int] NULL,    
    [c] [int] NULL,    
    [d] [varchar](30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,    
    [e] [int] NULL,    
    [f] [int] NULL,    
    [g] [datetime] NULL,    
    [h] [int] NULL,    
    [i] [varchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,    
    [j] [varchar](22) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,    
    [k] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,    
    [l] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,    
    [m] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,    
    [n] [varchar](1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,    
    [o] [int] NULL,    
    [p] [int] NULL,    
    [q] [varchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,    
    [r] [varchar](1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,    
    [s] [varchar](8) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,    
    [t] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,    
    [u] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,    
    [v] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,    
    [w] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,    
    [x] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,    
    [y] [varchar](5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,    
    [z] [datetime] NULL,    
    [aa] [int] NULL,    
    [bb] [int] NULL,    
    [cc] [int] NULL,    
    [dd] [int] NULL,    
    [ee] [varchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,    
    [ff] [varchar](1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,    
    [gg] [varchar](25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,    
CONSTRAINT [PK_testTable] PRIMARY KEY CLUSTERED     
(    
    [a] ASC    
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]    
) ON [PRIMARY]

The partitioned table is identical except for the last few lines. From the CONSTRAINT keyword on down it is as follows:

 CONSTRAINT [PK_testTable] PRIMARY KEY NONCLUSTERED    
(    
    [a] ASC    
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]    
) ON [testTablePS]([g])

The key is the very last line. testTablePS is the Partition Scheme to use. It takes the column name [g] as a parameter. It feeds this into the Partition Function to determine which partition a given record is in based on its value in the g column (in this case a datetime).

Each partition is in its own file in its own filegroup.

Now for the speed tests.

To begin with we'll try WITHOUT indexing. Indexing will most likely improve the results, but I'd like a baseline first. This pretty much narrows things down to just the difference between partitioning and not partitioning.

The query I'll use is this:

select *   
from testTable    
where g > '5/15/08'

This means we are looking at all the fields each time.

Here are the results (~2.4 million rows returned):

  Not Partitioned Partitioned
Query Run Time 1st run 3:45 1:22
Query Run Time 2nd run 2:42 1:16
Query Run Time 3rd run 2:40 1:02

Predictably the first run is the slowest. Subsequent runs are a bit more consistent.

As you can see the partitioned table returns about twice as fast as the non-partitioned table.

Now lets try returning a specific date.

The query will be:

select *   
from testTable    
where g = '5/22/08'

Those results are here (~470,000 rows returned):

  Not Partitioned Partitioned
Query Run Time 1st run 1:54 0:14
Query Run Time 2nd run 1:52 0:12
Query Run Time 3rd run 1:54 0:12

Much faster.

Now let's look at what happens if we take the return and display of the data out of the equation.

select count(1)   
from testTable    
where g = '5/22/08'

The results are here:

  Not Partitioned Partitioned
Query Run Time 1st run 1:41 0:06
Query Run Time 2nd run 1:43 0:00
Query Run Time 3rd run 1:43 0:00

It's hard to find fault with less than a second...

Now a HUGE caveat... You knew there had to be one, right?

What if you write a query that doesn't limit by the column on which the partition is based? Poor you. You will have the same OR WORSE results from the partitioned table that you have from the non-partitioned table. Why? Well now the DBMS has to look through several files to gather up the data instead of just one. This is especially noticable on big joins between tables.

Partitioning can be a big help, but it is definitely not a silver bullet. 

That wraps it up for this post.

Next post I'll add some indexing and rerun the queries.

Posted on 6/25/2008 9:21:00 AM by jeffa

Permalink | Comments (0) | Post RSSRSS comment feed |

Categories:

Tags:

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Sidetracked

No news to report on the Brains-On-A-Stick front because I've been seriously sidetracked with a number of projects and events. Hopefully I'll get back to it before long, but I don't expect much chance in the next few weeks. Stay tuned. Or not.

I do have some unrelated posts coming, however. The first will be about SQL Server 2005 partitioning. You can stay tuned on that one, because I'm about to write it now...

Posted on 6/25/2008 8:10:09 AM by jeffa

Permalink | Comments (0) | Post RSSRSS comment feed |

Categories: Pure Rambling

Tags:

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Windows Portable Devices

The Windows Portable Devices API is to talking to media devices what DirectX is to gaming. It is an attempt to make a secure, standard method of communicating with devices ranging from Media Players to cameras to temperature sensors...

There are things I like about it and things I really, really don't like. For an MP3 player manufacturer one big reason to use this API instead of making the device work like a USB drive is DRM.

But, it looks at this point like DRM (Digital Rights Management) for music files is a dying beast. The recording industry seems to be accepting the fact that their customers HATE DRM. There are now several ways to LEGALLY purchase songs in unencumbered MP3 format. Arguably the only people inconvenienced by DRM were the legal users. The pirates never had a problem circumventing the so called protection.

What does this mean to you (and me)? Well, when I work with my son's Transcend player (which could never play DRM music), it looks just like a USB drive. Easy, peasy. When I work with my Creative Zen or my daughter's Creative Zen V, I have to use the Windows Portable Devices API.

The lack of a .Net version of the API is a bummer, but not insurmountable. You get to take a fun and exciting trip through the world of ILDASM and ILASM, then you get to read the C++ examples and do a bit of mental gymnastics, but you can get there.

So far I have gotten as far as enumerating the connected devices, getting their friendly names, descriptions and manufacturer info. Next I'll be diving into how to get the file info and move files around. Not as straightforward as working with the file system, but always remember, "No matter where you go, there you are."

Here are some links that are helpful with working with the Windows Portable Devices API.

API documentation in MSDN

http://msdn.microsoft.com/en-us/library/ms740786(VS.85).aspx

Windows Portable Device Team Blog

http://blogs.msdn.com/wpdblog/

The team put up an example of using the API to talk to a temperature sensor using c#:

http://blogs.msdn.com/wpdblog/archive/2007/11/26/creating-a-temperature-sensor-gadget-for-windows-sidebar.aspx

Darene Lewis' Blog

http://blogs.msdn.com/dimeby8/default.aspx

I'm not really sure exactly who Drene is, but I'm guessing Microsoft Employee based on the blog address...

In any event there is much good info here regarding the Windows Portable Device API and how to use it with .Net.

Posted on 5/24/2008 2:25:21 PM by jeffa

Permalink | Comments (0) | Post RSSRSS comment feed |

Categories: .Net

Tags:

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Brains-On-A-Stick Progress Report

Just wanted to post a quick update on my progress.

At this point I have the following working, or mostly working:

  1. Subscribe to a podcast
  2. Download the episodes to the hard drive
  3. Add devices, specifying their root directory for podcasts
  4. Walk the directories under the root and show a list of all files and their paths
  5. Persist library and device data to the hard drive using JSON serialization
  6. Query objects using Linq to Objects

Here's what's NOT finished:

  1. Synchronize podcasts to the device
  2. Connect to a device that doesn't appear as a drive

That last one is a big sticking point right now. I have two devices from Creative and one from Transcend. The Transcend shows up as a drive. That's nice. The Creative devices show up as Windows Media Devices. That's not nice.

To communicate with such devices (mp3 players, cameras, etc.) you have to use the Windows Media Device Manager APIs. Unfortunately they are NOT implemented in .Net. That means you got some pInvoking to do.

This is the bit I'm working on now. I've never worked with this particular API before (I didn't know there was such a thing before yesterday...), so it may take a bit of fiddling to get moving forward again.

It is a pain in the posterior, but once I get a handle on the API, I'll be back in business.

In the meantime here's a screenshot of the UI. Not pretty, but not entirely WinForms like. Not all super-de-duper WPFy, but functional.

image

Eventually I'll try to make it look a bit more like that famous fruit inspired music site...

Posted on 5/17/2008 11:32:00 PM by jeffa

Permalink | Comments (0) | Post RSSRSS comment feed |

Categories: .Net

Tags: , , ,

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

WPF - Should I?

Windows Presentation Foundation (WPF) offers Windows developers a new style of development.

Instead of firing up Visual Studio and putting together a standard Windows Forms UI, you can split the development between a designer (or team of designers) using Expression Blend and a programmer (or team of programmers) using Visual Studio.

The design team can use a tool that is familiar to them to design the UI and the programming team can use a tool that is familiar to them. Their common currency is XAML. XAML (Extensible Application Markup Language) is an XML file that describes how the screen should look.

Here's a screenshot from Expression Blend:

image

Wow. That sure isn't Visual Studio.

Timelines? Storyboards? Animations? Oh, my!

So should I use WPF or not?

Here's my decision process:

1) Is this a consumer app or an internal business app?

In other words, will people pick my app because it looks good or will they use it no matter how it looks because it can help them make money (or they are mandated to use it by the people who hope to make money)?

If I begin a consumer oriented app that will live on the desktop, I will pick WPF and Expression Blend without a doubt. You simply cannot make a Windows Forms app looks as good as you can a WPF app.

If I am making an app that ISN'T a consumer app, then I have to ask another question.

2) Do I want to use tried and true Windows Forms or prepare for the future with WPF?

So I'm working on an internal app that will be used by people within my company whether they like it or not. Should it be WPF? That is a tough question. It is possible to make an app with WPF that looks remarkably like a Windows Forms app. Problem is that many of the standard controls available to a Windows Forms app are not available to WPF.

The not so simple answer is that if I use WPF it will be possible to redo the UI at a later date with lots of spiffy new shiny pieces, BUT it will take me longer to produce than a Windows Forms app. If I use Windows Forms, I will finish sooner, but when I'm done, I'm done. To move to WPF at that point will require re-writing the app.

For Brains-On-A-Stick (my podcast agregating app), I'm using WPF and initially creating a very pedestrian UI. Once all the plumbing works, I'll re-do the UI in a variety of ways using WPF and XAML.

If you want to see a WPF podcast player check out Podder. I haven't looked at it much since my app is fairly similar and I don't want to be influenced by their app. The screenshots look great. They took an approach similar to mine by creating a very basic looking app then handing it over to a designer. They are even having a skinning contest.

The big difference in their app and mine is that they are a player whereas I'm more concerned with maintaining libraries and synching them with multiple devices.

Once I get the first whack at mine complete, I plan to look closely at theirs and see how they implemented skinning, etc.

Posted on 5/12/2008 10:37:00 PM by jeffa

Permalink | Comments (0) | Post RSSRSS comment feed |

Categories: .Net

Tags: , ,

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Persisting Data and Settings To Mesh Without The Developer API

This sounds like a big topic, but it really isn't.

I'll stall and BS a bit to pad the post a little, but really you can sum the topic up in one sentence...

Save your settings and data into a text file or files that live inside one of the folders you have set up to sync with Mesh.

Now wasn't that better than the local news who have a tendency to say things like "Which local neighborhood should be evacuating right now? Tune in at 11 to find out!"

Of course you have to be careful when you do this. You don't want to use large files or files that are updated at a high frequency. I haven't tried it, but common sense tells me that it would be a certifiably bad idea...

Using the example of Brains-On-A-Stick (my podcast organizer) I save my settings file and data files (with meta-data about the podcasts) into a local directory that is set to sync with Mesh. Automagically any changes are synced with Mesh and then my other computers.

This is NOT the smart way to do this, but it is the way that works now withOUT the developers APIs.

One of the things I like about Mesh is the architecture. Behind the scenes all the syncing is carried out using syndication ala RSS or Atom. More or less all these devices are subscribing to each others' feeds and the Mesh server is their conduit and referee. NO WAY! Microsoft building on open standards? Cool.

Quite a bit of info exists on the web already from MS about Mesh. Here are some links:

Mesh itself

Windows Live Dev Blog

Channel9 video: Ori Amiga: Programming The Mesh

Channel9 Video: Abolade Gbadegesin: Live Mesh Architecture

Generic Press Releases etc.

Scobleizer Take On Mesh

Good stuff out there. Lots of us waiting for a chance to use the actual dev APIs.

del.icio.us Tags: ,,

Posted on 5/1/2008 2:25:00 PM by jeffa

Permalink | Comments (0) | Post RSSRSS comment feed |

Categories: .Net

Tags: , , ,

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Windows Live Mesh

OK, let me start by saying this isn't really an objective review of Mesh. That would be like asking your buddy about his new girlfriend. As far as he's concerned she's perfect and going to get better than perfect over time.

Caveat number two is that you won't find any amazing new revelations here about Mesh. Everything I'm going to say is shown in the demos on the Mesh.com page. You just have me saying it instead of Microsoft, and my only affiliation with MS is that I use their products. I run a user group dedicated to .Net development, but I am not beholden to them in any way.

Having given you those caveats, let me say that I'm very excited by the potential of Mesh.

I've been using it for a couple of days now and it has proven its worth to me. I have a project that I want to have available at home and at work. With Mesh I can simply go into the file explorer, right click on the folder and add it to Mesh.

When I get to the other computer, I go to File Explorer, My Computer and explore Live Mesh Folders. Once I see my folder, I can right click and tell it to download.

That's it. Now any changes I make on either computer are synced via the Mesh server and Live Desktop.

Kind of like a poor-man's source control but without versioning.

The coolest part is that it is transparent to me when I'm using it. I just fire up Visual Studio or Expression Blend and go nuts. When I go to the other computer the newest changes are there. Sweet.

You can also interact with the other computer via something akin to Remote Desktop. It is naturally slow via the net, but it works. You wouldn't want to rely heavily on this, but when you need something in a pinch, you can reach out and touch it. And copy it from remote to local.

What I'm waiting for is developer access. The ability to easily persist settings and data across devices is compelling.

To use my current project, Brains-On-A-Stick (podcast organizer) as an example; you could save your library data and preferences to Mesh and then have that info available on any machine you are using. Clearly you wouldn't want to put the podcasts themselves up there, but the information about the podcasts.

Coming in the next post: how to persist data and settings to Mesh without the developer APIs...

Posted on 5/1/2008 1:14:00 PM by jeffa

Permalink | Comments (0) | Post RSSRSS comment feed |

Categories: .Net

Tags: ,

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

WPF Links

Just a quick blog entry to list some WPF related links. Why? So I can find them later...

Rob Relyea:

WPF Panels

Rob's main WPF page

Windows Client.net:

WindowsClient.net

Kirk Evans:

WPF Databinding

Outlook Clone using WPF

Tim Sneath:

Getting Started with WPF

I'll add more to this blog later.

Posted on 5/1/2008 9:16:00 AM by jeffa

Permalink | Comments (0) | Post RSSRSS comment feed |

Categories:

Tags: ,

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

New Project, New BlogProcessor?

This weekend I've begun a new project to both scratch an itch I have and to learn some of the new bits of the 3.5 .Net Framework. More on that in a bit.

As a side note, this is my first post from Microsoft's Windows Live Writer. W00t! First post!

I think Live Writer is a step in the write direction (ba-dump-bump). It is a very small program (less than 3 MB) and it doesn't do very much. But what it does do, it does very well. It is a word processor for blogs. It is also free. That is free-as-in-beer for the FOSS crowd, and free-as-in-you-don't-pay-money-for-it for everyone else.

I could use Word 2007 to write my blog entries. I could also buy an RV to cruise around town and pick up my dry cleaning. The point is that 99.93% of the features of Word 2007 are unnecessary for writing this blog post. I don't plan to mail-merge and print envelopes nor do I need fancy macros. Nope, I just want to write, insert links and the occasional picture and have spell check watch my back for crap like spelling occasional as "occassional " or "ocassional " (not that that would ever happen to me).

The web is all around us and that is a good thing, but for some things HTML makes for a poor interface even with AJAX, Achilles, Odysseus or any of the other warriors of the Trojan war. For some things you just need a good old fashioned application running on your computer. The trick is for that application to interact with the mighty Internet and its hordes of servers.

Live Writer is a very good example of doing this and doing it well (he said confidently having never actually tried that big, beautiful "Publish" button).

Enough of that for now, on to the Project. Hmm. I guess I need to give it a secret Code Name to use to speak about it until I decide to release it to the world with a boring name that doesn't mean much (Revolution becomes Wii?). How about this: Brains-On-A-Stick. Since it's a temporary working name, who cares, right? No chance that name could stick. With brains. Man, that was bad.

Here's the annoying-problem-I-want-to-solve: my son, daughter and I all have MP3 players (neither iPods nor Zunes) and we all like podcasts. Keeping up with syncing all those devices with podcasts is a pain in the 'cast. Oh, and I want to try out all the neat new stuff in the 3.5 Framework...

My first thought was to use one of the various podcast feed-readers to pull down the shows and write an app to sync them up. After reading about Microsoft's new MESH and its RSS/Atom use, I decided that it was time to bone up on using .Net to work with syndication feeds. I found several good blogs that talked about syndication and started hacking around with it.

I now have  a rudimentary feed reader that uses the following new bits:

1) WPF (Windows Presentation Foundation). Uses XAML (Extensible Application Markup Language) to describe the presentation. This XML file is parsed and the UI is built on the fly both at design time and runtime. We've all talked about separating presentation from behavior for a long time, and this is one swell way to do it. Why do I find it exciting? You can FINALLY rely on DirectX and your 3D processing video card to handle the graphics (good-bye, GDI, well, mostly). This means your windows (and elements inside the windows) can be 3D and not 2D, even though they look 2D most of the time. This means you can spin a window around and stack them up. More about WPF and XAML in future posts.

2) Expression Blend. If you have the ability to separate presentation from code, then why not make a tool for designers so they can make really cool looking graphics? I'm not a designer, so I'm not going for cool looks yet, but as I get things working, I'll spend more time tinkering with the looks.

3) JSON. JavaScript Object Notation. I'm using this to serialize objects to disk. XML would work just as well, but JSON is all hip and trendy. All the cool kids are using it.

4) LINQ. Language Integrated Query. I'm using this to query the objects in memory. Much more on this later.

5) Syndication. RSS and Atom feeds are how you get your podcasts. The System.ServiceModel.Syndication namespace objects help you work with them.

I'll write about all the little bits as I work through the project.

Posted on 4/27/2008 9:50:00 PM by jeffa

Permalink | Comments (0) | Post RSSRSS comment feed |

Categories: .Net

Tags: , , , ,

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

ASP.NET and Godaddy, part 3

Sweet! Got everything working.

As I mentioned in the last post, I dropped back to square one and rebuilt the app one piece at a time to see where the problem was.

I got the login and register pages working first, then went to the data entry page.

Once I built my Linq objects, but before I referenced them, I uploaded everything one more time. I did a diff on the old and new web.config files to see what had changed. There were only two changes: 1) added new connection string and 2) added assembly info for Linq.

A quick check that everything worked well, then I added the code-behind to use the Linq objects to write the data to the database. 

Once everything was deployed, I hit the page and... 500 Internal Server Error.

This time I thought about the new App_Code directory where the Linq objects live. Since I did a website instead of a webapp, everything has to compile on the fly when it runs the first time... hmm. Think maybe that directory needs write permissions?

Added write permissions and, hey presto! Now the Linq data entry page works!

It was literally five more minutes to get the data view page working using a Grid View that uses a Linq object as its source.

One thing I did differently this time was change the name of the Linq datasource to something other than the default.

Now I move on to making things pretty. I'll also write up a more detailed set of instructions and post them. 

 

Posted on 4/22/2008 10:48:00 AM by jeffa

Permalink | Comments (1) | Post RSSRSS comment feed |

Categories: .Net

Tags: , , ,

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5