Distributed Application Case Study: Part I - The Problem

For the past year-and-a-half, I have been involved in the design and implementation of a system that pulls data out of corporate application databases, integrates it, and presents it as a series of charts. This has been implemented with SQL Server 2005 for the database & integration side of things, and a C#-based Windows Forms application with calls to Dundas Chart and Map for the charts. The cliché-de-jour for this type of application is a dashboard, and if you want to call it that, so be it.

To date, this application has been built using a monolithic approach – the database and application all reside on a single development machine. I’ve always assumed it would  wind up on a production server in a very similar configuration.  As the day approaches when the application goes into production, however, it has become increasingly evident that decoupling the user interface from the query engine is a mandatory requirement. The user interface is blocked for up to 30 seconds — or more, if the database call didn’t time out — during which the user simply has to sit and stare at a locked up application.

Externally, I visualize the user being able to set the parameters of their query, click the Search button, and have control of the application return to them more-or-less instantly. A progress bar, updates to a status line, or other amusement will keep the users informed of running query’s progress.

While the query is running, it seems obvious you should still be able to perform other, unrelated activities. Setting up a chart for printing would be a good example. Resizing the window or panels within the window would be another. Heck, even minimizing the application while waiting for the query to complete would be desirable option for many users. But alas, none of that is possible given the current state of affairs.

Internally, all database access is implemented through a series of stored procedures. The client application provides a series of parameters, passes them to the stored procedure, and then waits for the resulting dataset to be filled. The application is blocked at this point — the user cannot continue with any other use of the application until such time that particular line of code is finished executing. This is the beast that sometimes runs to 30 seconds, or worse yet, times out, in which case, the best you can hope for is a well-handled exception which is then handed back to the user.

There are basically three, distributed computing mechanisms that can be used to achieve the desired objective of decoupling the user and their interface from the slow-running query; web services, .NET Remoting, and Microsoft Message Queuing. In addition, multithreading within a single application domain could be used to achieve a similar result. But most important, regardless of the which approach is chosen, it will be necessary to implement it asynchronously – which is the ability for the client application to carry on doing it thing while the query is running on another machine.

Of the four choices, my initial inclination is to go with web services. The application is intended for users that reside inside the corporate firewall, but there is a pretty good chance a subsequent version will be made available to users on the outside. It’s my assumption that using web services is the better choice, given the relative ease with which XML-based web service traffic can be made to traverse the firewall, and other potential obstacles. On the down side, web services will not likely deliver the performance of the more closely coupled .NET Remoting and Message Queuing. But given the volume of data that is being manipulated by this application, this lower performance is not considered a factor.

Multithreading is not really being seriously considered given the relative complexity of the implementation, coupled with the fact that it does nothing to break the application into small, more manageable chunks — another important objective of this exercise.

This series of posts is intended to document my approach to solving this problem. This is not a theoretical exercise; the problem I’m trying to solve is real, and the solution, assuming the outcome of this exercise is successful, will go into a production application. I can’t make a specific commitment as to how often I will post, or when I’ll be finished, but I’ll do my best to keep this blog current with progress.

Thank you very much for reading, and please let me know what you think.

Using Windows Live Writer 2008 with WordPress

I admit to be fairly ignorant of the Microsoft’s Live strategy, but being preoccupied with other things at the moment, I haven’t spent much time trying to get educated about it.  But in the weekly email regarding new Microsoft downloads, there was specific mention of Windows Live Writer 2008, coupled with a  promise that I could edit WordPress posts with this client-based tool.   Well, so far, so good — this post was composed, edited and published using WLW

Subsequent to downloading, a wizard guided me through the installation and configuration for use with WordPress, and within about five to ten minutes, I was oriented in the new environment.  The WLW user interface is similar enough to the WordPress interface to make it easy to find the functionality found in the native web-based user interface.  So first impressions are pretty positive.  A few other random thoughts and observations, so far;

  1. When I attempted to publish my first post with WLW, I couldn’t get the new post to appear.  What I subsequently discovered, however, is that the default publication date for posts was way back in 1999.  The post was actually working, except the new post was being inserted before all the other posts, as opposed to after (at the top) as expected.  Once I had set the publication date manually at the bottom of the user interface, everything worked as expect.
  2. The main menu, just below the title bar, keeps getting corrupted, with portions of the menu appearing in duplicate.  Minimizing and maximizing the application makes the problem go away, but it re-appears periodically.  I have a sneaking suspicion this is related to the video driver for the laptop that I use — it has exhibited other, similar problems with other applications.
  3. It seems as though sometimes you have to click the Publish button a couple of times to actually get the post to appear.  This may be a question of not having saved the draft, or some such thing, but I think it’s also possible this is a bug.  Not a big deal — just keep clicking on Publish until the post appears.
  4. I attempted to install WLW on my work PC, and ran into problems that I think are sourced at the corporate firewall.  The installation hangs up about half-way through.  Attempting to install on a second PC — behind the same firewall — provided the same result.  The firewall is about the only thing the two machines have in common.   I have no idea why this would be a problem, but I’m assuming it’s a problem with the installed WLW software attempting to make contact with the Windows Live website, but that’s just a hunch on my part.
  5. As noted above, this blog is implemented with WordPress, and while I’m thoroughly impressed with the quality of the WordPress software (not to mention the price), I did find the relatively small scrolling region used to edit posts a little tedious.  One of the prime benefits of WLW is that get a lot more real estate for editing.  Common tasks such as inserting tables, pictures and other items is considerably streamlined.
  6. There is a Web Preview mode that integrates the post that is being edited — but isn’t published yet — with the content that is already published.  This helps visualize the new material in the context of the existing material.  Not essential, by any means, but useful in getting things just right before taking the new material public.
  7. I started this post with a comment about my admittedly limited knowledge of the Windows Live strategy, but using WLW is providing me with a little insight, I think.  For example, selecting Help from the main menu launches you directly into the Windows Live Help site.  So, I think it’s reasonable to guess that the Live-branded applications — even the ones downloaded and installed on your PC — are going be tightly integrated with the Live web site.  I may well be catching up to the rest of the world’s understanding of this subject, but if you didn’t know either, well, now we both do.

Based on a fairly positive experience so far, I will continue to use WLW, and comment on it in future posts.  Until then, thank you very much for reading, and please let me know what you think. 

‘Contact Us’ Hall of Shame

How many times have you clicked the Contact Us link with dewy-eyed innocence only to have your enquiry drop into a black hole, never to be heard from again.  For the life of me, I just don’t understand why any modern organization wouldn’t set themselves up to handle incoming email enquiries with the same rigour with which they set up their response to phone enquiries.  When was the last time you called any organization about anything, only to have the phone ring into infinity, and not have anyone pick up — not even voice mail?  Or how about a busy signal — heard one of those lately?  It never happens.  So why is it acceptable to maintain the email equivalent of same?

Partly to deal with my own frustration at this situation, and partly as a ‘public service’, I’m proud to launch the ‘Contact Us’ Hall of Shame.  These will be real emails, enquiring about real things for which I would like an answer, related to the my normal day-to-day routine.  This is not a scientific, or even a statistically significant effort, but rather anecdotes relating to my real experiences dealing with this frustrating problem.  I don’t intend to make up things just so I can test a given organization’s response — that’s the equivalent of spam, and there’s enough of that already.  The tricky part will be the complete non-response.  In these cases, it just may be possible that the original message, or the response, got caught in a spam filter — but I’m assuming, for now, this won’t be a major factor.

If this gets some traction, it might be interesting to let others contribute their experiences as well.  But at the moment, I’m don’t know how I can be sure about the facts of each individual case.  If I send the email, and I’m the one expecting the response, then I have control of the entire loop.  If there is interest, and if I figure out a sure-fire way of guaranteeing the facts submitted are accurate, I’ll post cases other than ones that I have experienced first hand, or cases of which I’m 100% sure of the specific facts.

On a positive note, I have had some spectacular responses in the past; the best one, by far, from Hummer.  Yes, the giant car people.  I emailed them to ask the name of the music used in one of their television ads,  In about 15 or 20 minutes — seriously — I got a response back from what I assume was a real human being that identified song title, artist and album, and even included a cheerful "please let us know if there is anything else we can do"  It was such a good response, I’m very thankful that they didn’t ask "So now, what about that new Hummer?"  I would have had to respond "Uh, sure…now which one is the plug-in hybrid model again?" which might have made them regret being so nice in the first place.  All joking aside, the Hummer web site set an all-time high for a great response to clicking Contact Us.  It’s my hope, over time, I’ll be able to launch a complementary Contact Us Hall of Fame — where I can put success stories such as these.

As always, thank you very much for reading, and let me know what you think.

SQL Server 2005 Object Naming Standards - IV: createUdt and modifyUdt

When attempting to track down problems in the database, a very useful thing to know is the date and time the data was first created, and when it was most recently modified.  As part of a standardized approach to database design, I have gotten into the habit of adding two columns to every table definition; createUdt and modifyUdt that provide this capability.

What’s In a Name?  If you’ve read the post on column naming standards, you will recognize the Base Name udt.  This means the column reflects UTC time, not local time.  This is a nod toward internationalization, which I think every application should take into consideration, particularly in the early stages of design and implementation.  The prefixes create and modify are self-explanatory, and are required, of course, to differentiate between the two types of dates in a given table definition.  Finally, the two dates have equal status and value, and therefore they both have prefixes.

NULL and DEFAULT  createUdt and modifyUdt should be specified as NOT NULL, because they always need to be populated.  Also, the DEFAULT for both should be specifed as GETUTCDATE, so that they get the right date and time when the record is created. 

Keeping Them Right  If these columns are going to be useful, they should never be directly manipulated by your application, or a tool like SQL Server Management Studio.  You have to have absolute faith that they really do reflect the date and time of creation or modification, and not somebody’s interpretation of these facts.  To ensure this, I use a trigger (shown below) that automatically corrects these values in the event that they are directly changed in some way.  This is slightly different in either case; createUdt gets set back to its original value, whereas modifyUdt, is always set to the current system date and time. 

CREATE TRIGGER DogTrg
ON Dog
FOR UPDATE
AS
BEGIN

	UPDATE
		Dog
	SET
		Dog.modifyUdt = GETUTCDATE()
	FROM
		Dog AS Dog INNER JOIN
		INSERTED AS INSERTED ON INSERTED.uid = Dog.uid

	UPDATE
		Dog
	SET
		Dog.createUdt = DELETED.createUdt
	FROM
		Dog AS Dog INNER JOIN
		DELETED AS DELETED ON DELETED.uid = Dog.uid
	WHERE
		Dog.createUdt != DELETED.createUdt

END

Of course, if you intend to use this trigger in your own application, you will have to change the table name.  Also, a really determined user  can always delete or disable the trigger, but let’s assume that it keeps the honest people honest, 99% of the time, and that you have set up rights in such a way that only person with a need to see the trigger code, can.   With a trigger such as this set up for each table, you can then entirely forget about keeping these columns updated — it’s entirely automatic.

It always surprised me that Microsoft did not have these capability bundled right out of the box — I can remember using databases in the dim and distant that had this data available as a standard feature of every table definition.  Heck, maybe they do, and I just haven’t found it yet. 

Also, In addition to date and time, it is also useful to keep track of who modified the data.  That will be covered off in a future post, where I’ll talk about mapping users to sessions, and then mapping the session to collections of updates.  In the interim, thank you very much for reading, and let me know what you think.

Enabling the RSS Feed Button in Internet Explorer

In a previous post, I provided a very basic ‘how to’ on setting up Internet Explorer as a simple RSS feed reader.  In that post, I referred to the fact that there is some code embedded into the web page that activates the RSS Feed button (the ‘chiclet’) in the IE Command Bar.  Here’s the  ultra simplified version that explains how this works.

All web pages consist of HTML of one form or another.  If the web page at which you’re looking has an RSS feed associated with it, the HTML header will have link elements embedded the header pointing to the RSS or Atom feeds.  I have illustrated this in  HelloRSSandAtomWorld.htm, shown below;

<html>
     <head> 
          <link
               rel="alternate"
               type="application/rss+xml"
               title="My RSS Feed"
               href="http://www.xyz.com/rssfeed.rss">
          <link 
               rel="alternate"
               type="application/atom+xml"
               title="My Atom Feed"
               href="http://www.xyz.com/atomfeed.xml">
     </head> 
     <body> 
     <p>Hello (RSS and Atom) World!</p>
     </body>
</html>

The first link element points to an RSS feed, and the second link element points to an Atom feed.  You can specify as many feeds as you like, but I’ve chosen to show just one of each.  The fancy indented formatting is not significant and certainly not mandatory — it’s just to make the code a little easier to read.  If you view it using IE, though, you’ll see that, yes, the RSS Feed button is enabled, and if you click the drop down to the immediate right of the enabled, orange button, you’ll see the two feeds specified by the link elements shown above.

For those wanting a lot more information, check out Windows RSS Publisher’s Guide which in turn, has additional resources specified.  That’s it — there’s really nothing to it!  Thank you very much for reading, and please let me know what you think.

Using Internet Explorer as an RSS Feed Reader

I’ve found Internet Explorer can be used a fairly serviceable RSS feed reader.  It has some limitations but it’s a convenient choice that will get your feet wet with RSS feeds without having to hunt down and install new software.  The following is a simple example of how to set IE up to read an RSS feed.

This example is not intended to be a comprehensive review of all of the RSS-related features of IE, and assumes that you already have some idea of the general nature of RSS.  

With that in mind, the first thing you need is a web page that has some sort of embedded RSS feed.  The RSS Feed button (the ‘chiclet’) indicates the presence (or absence) of this.  If it’s greyed out, at least superficially it means there is no RSS feed associated with the page.  This is illustrated in the example below.  In some cases, the button will be greyed out, but will appear somewhere else on the web page.

Let’s assume for this simple tutorial that the chiclet is enabled, as it is on the The New Yorker web page shown below.  When you visit their page or any page with an embedded RSS feed, the chiclet turns a jaunty orange colour, as show in the screen shot below.

If you click on the RSS Feed button, you will get the default RSS page, shown below.  There will be a link embedded on the page that invites you to subscribe to the feed, as well as showing you the current stories making up the feed.

Click on this link, as shown above, and a dialogue box will appear that will ask you to confirm that you want to subscribe.  There are a couple of options, such as the name you want to associate with the feed in your Favorites and the folder in which you want to place it.  For the time being, leave these at their default settings — you can go back and change them later — and simply click the Subscribe button as shown below.

The next screen simply confirms that you have successfully subscribed to the feed.   This seems a bit unnecessary to me, but I guess it doesn’t hurt to see this either.   That’s it!  If you got this far, then you have successfully subscribed to The New Yorker (or whatever) RSS feed.

However, that’s only half the story.  The next thing you will want to do is to open the pane in IE that allows you to view and check the status of your feeds.  To do this, click the little gold Favorites star, as shown in the screen shot below, and the Favorites pane will appear.  If necessary, click the Feeds button at the top of the Favorites pane.  You should see the newly established feed for The New Yorker.  What’s more it should be in bold.  This means that there are new items in the feed that are ready to be viewed.

If you click the feed, you will see new articles that have appeared in the feed since you last looked at it. Once you click away from this feed, it will go from bold to normal type. When the feed is refreshed — the schedule for which is established below — and there are new articles, the feed name will once again appear in bold.

I’ve found it handy to keep the Favorites pane open while browsing other web pages.  To pin it in place, you click the small icon at the top right-hand corner of the Favorites pane.

By default, the feed will have been establish with properties that are generally acceptable.  The most important setting is how often the feed gets refreshed automatically.  By default, it is once every 24 hours, and for feeds for magazines, this is more than adequate.  However, some feeds are updated so frequently that you may want to up the frequency with which you update the feeds.  To do this, right click on the feed as shown below.

The properties page for the feed will appear, and you can then change the frequency of updates by clicking on the settings button.
 
That’s it.  You’re now up and running with IE as RSS feed reader.  Thank you very much for reading, and let me know what you think.

SQL Server 2005 Object Naming Standards - III: Key Columns

Having covered the basic naming conventions for tables and columns, I’m now going to cover standards for naming primary and foreign key columns in a table.  For those of you who thought Parts I and II were, well — a little pedestrian — stay tuned for a real magic show in this post. Or at least that will be true if you lead a pretty sheltered life, otherwise.

By definition, a foreign key involves two tables; the table that is referring to information in another table, and the table that is being referenced.   I will simply call the former the referring table, and the latter the referenced table, in the rules below.

Primary Key Column Naming Rules

id or uid  Every table has a primary key, and it’s always the first column of the table.  This breaks the alphabetical sequence rule described in a previous post, but it seems like a reasonable exception given the importance of this column.  To make things even easier, there are only two possibilities when it comes to the naming the column; use either id or the uid.  The former is always INT type, and the latter always UNIQUEIDENTIFIER.  The assumption is columns are virtually zero-cost, so why not have a column that has no other job than to uniquely identity each record.  This is better than cluttering up the identification role with some other kind of business information.

Which One?  If you know — really know — that the table created will serve as an absolute and definitive reference for the rest of your application, regardless of time and space, then you use id.  However, use uid if you feel there is a possibility of two or more copies of the table structure will exist simultaneously, each populated independently of each other.  Furthermore, if the possibility exists these two (or more) copies will be merged in the future, that seals the deal, it’s uid.  If you set up replication, which may well be the tool you use to achieve this type of merge, it will add it’s own uid column if you don’t already have one.  So just use uid, and you’ll be covered in the event you want to merge tables in the future.  The choice between the two types of primary keys, therefore, is based on whether a given table is a likely candidate for merge replication, or not.   Oh, yes, and make sure that you set the column as RowGuid, the value of which will be covered when I talk about replication, in a future post.

Foreign Key Column Naming Rules

Primary Key of Referenced Table  The first part of the foreign key is taken from the name of the primary key in the referenced table.   So, for example, if Dog references Breed, and id is the primary key of Breed, then the foreign key starts with id. But there may already be one of those in the Dog table, and the ambiguity must be resolved, so….

Add The Referenced Table Name  The referenced table, in the example started in the previous point, is Breed.  That is going to be the second part of the foreign key name.  But it’s not quite complete yet, two components need to be put together in some way.  To do that…
 
Separate with Underscore  Previously, I lambasted the use of underscores or other characters when naming database objects.  Foreign keys are the big, whopping exception to that rule.   I arrived at this rule by process of elimination.  If both your primary and foreign keys in a given table are id, you’ve got a conflict right off the top.  One option was to prefix id with the name of the table, such as breedId.  Not bad, but it seems too much like the other, information-oriented columns in the table.  So, how about IdBreed?  Yes, but it seems to be in conflict with the idea that qualifiers are always prefixes, rather than suffixes.  So, it’s the underscore character, resulting in the finished name Dog.id_Breed.  I thought this particularly appropriate because the underscore really does connote the notion of separation – a visual metaphor for the data in the referenced table.  You’ll find that they really jump out of the definition at you when reviewing table structures in the future.

Place at End of Table  Foreign keys are always found at the end of the table definition.  They appear in alphabetical order, based on the name of referenced table.  So if you had three foreign keys id_Breed, id_Colour and id_Association, they would appear in the table definition as follows;

Dog.id_Association
Dog.id_Breed
Dog.id_Colour

As with the primary key, this breaks the overall alphabetical sequence rules spelled out in the basic table naming conventions, but I’ve found it’s valuable to have the foreign keys located in one region of the table definition, and away from the primary key, so the end it is.

Other Ideas

Creating the Index  These posts cover naming standards.  It’s not intended to be a general, ‘how to’ on creating indexes, but I think it’s worth pointing out that simply creating a compliant name doesn’t create the index.  You have to do that as a separate step.  The best way I’ve found to do that is to use the database diagramming tool of SQL Server Management Studio.  Without going into a lot of detail, here, it’s a simple process of clicking on the foreign key and dragging it to the referenced table, and then setting a couple of properties.  Very straightforward indeed.

Performance of uid Keys  I’ve heard the argument that uids as a primary key are a bad idea for ‘performance reasons’.  The admittedly limited testing I have done in this regard indicates there is not much of an impact.  I would speculate that if you’re manipulating millions of records that there may be a measureable impact, but in the record volumes that most deal with regularly, you’ll be hard pressed to see the difference, I’m sure.  This a subject I’ll delve into in more detail in a future post.

‘Pseudo’ Key Columns  I’ve tried to make the case that identification of records – which, after all, is the whole raison d’etre for keys – should not be diluted by adding any other meaning to them.   There are going to be columns, however, that are going to be more verbose in nature and will still have a role in uniquely identifying records.  The best example is the nm column which will feature in many tables.  The important point is that any column that should be unique in the context of your application should be enforced by putting a unique index on that column.  The same applies for compound keys.  If a combination of columns is unique in the context of the table and application, then a unique index should be put on those columns.

Naming of Indexes  Like any other SQL Server 2005 object, indexes have names, but in most respects, they aren’t really encountered on a regular basis.  The SQL Server 2005 default names for indexes can simply be accepted as is.  For example, the primary key index for the Dog table would be named PK_Dog.  If names conflict, SQL Server tacks on a 1, or 2 or whatever, which I’ve always felt was a cheap shortcut, but it works.  Recently I’ve taken to giving indexes more meaningful names, but the only place you see them is when you use SQL Server Management Studio to review index definitions. 

That’s it, for this post.  You can take your database design ideas a long ways with table, column and key naming rules, but there’s lots more coming up.  Thank you very much for reading, and let me know what you think.

SQL Server 2005 Object Naming Standards - II: Columns

Previously, I discussed standards as they relate to naming tables in SQL Server 2005. This time;

Column Naming Rules

Base Names  The concepts modeled with database tables tend to be described in similar terms. Many tables contain a column with descriptive information, a column with a name or a column with amounts, for example. For these, it’s useful to have a list of standardized column names used repeatedly. These are referred to as Column Base Names, or just Base Names. The most important concept is that the table in which the Base Name is found provides a context.  If you are using one of the Base Names by itself, as will often be the case, it is inferred that the thing being described with the dsc Base Name or named with the nm Base Name is the idea being modeled in the table in which these columns are found.  More to the point, it’s not necessary to repeat some portion of the table name itself within the column name. For example, Whale.nm (think ‘whale name‘) reads a lot better than Whale.whaleNm (think ‘whale whale name‘). Nothing against whales - in fact, some of my best friends are whales — but the second whale in the identification of this column adds nothing in terms of clarity.

Elimination of Redundancy  Of course, there are many situations where there is a requirement for multiple instances of a given Base Name within a table. In the table Dog, for example, it’s reasonable to expect a nm column; Dog.nm. However, let’s assume the application design evolves and the dog’s formal show name needs to be captured. A column with the qualifying prefix show would be added to the column name; hence showNm. As a result, the Dog table contains Dog.nm and Dog.showNm. It’s also tempting to rename the existing nm column to nickNm or familiarNm. However, the two names don’t have equal status. For example, if you take your champion Weimaraner out for a scrape, you are not likely to say; “Not there, Hampson Whiskeyman of Foucalt!“, but rather, the less pretentious and slightly less definitive “Not there, Spot!” The latter is the name you regularly use, therefore it makes sense to use the Base Name for the familiar name of the dog in question. It’s sort of like a ‘default’ name.  There are situations, however, where names have absolutely equal status; Person.firstNm and Person.lastNm is a classic example. Unless you are Oprah, Cher or Madonna, you need both names (at least) to fully identify a given instance of Person. In this case, it’s important to eliminate the Base Name from the table. This reflects the fact that in this kind of situation, all nm columns have equal status. The presence — or absence — of the Base Name implicitly tells you something about the structure of the data being modeled.

Capitalization The first letter in the column name is lowercase, which is the opposite of table names described previously. If tables are the database analogue of class, then columns are the database analogue of class properties. I come from the school of thinking that the first letter of a property name is lowercase, to distinguish them from class names. So, Person.lastNm, never Person.LastNm. I believe this is in conflict with Microsoft’s approach to naming of properties, but first-letter-lowercase is my story, and I’m sticking to it. Similar to tables names, use camel case to distinguish the various parts of the column name, never underscores, spaces or some other character.

Abbreviations As with all names for database objects, do not abbreviate for all but the most self-evident ideas. The Base Names described above are an obvious exception to this rule. Due to their very frequent use, however Column Base Names become part of the visual furniture. You will see dsc, but in your mind, you will hear the verbose ‘description‘. In all other cases, no abbrevations — the argument for which was spelled out in the rules around naming tables.

Alphabetical Order Columns always appear in alphabetical order. The trick comes when modifying an existing table structure to add a new column name; I use SQL Server Management Studio to make table modifications, and it permits insertion of the column where it needs to go in the alphabetical sequence. You can also use the ALTER TABLE statement, but I’m fairly sure that you get no choice as to the sequence of the columns.  Alphabetical order is useful if you have a particularly long collection of columns in a table. You can glance at the structure and determine if a particular column exists or not. The significant exception to this rule is both primary and foreign keys, to be dealt with in a future post.

Summary Example

The following continues the example of the Dog table that I started above, and it applies all of the above rules;

Dog.birthDt
Dog.dsc
Dog.firstShowDt
Dog.heightAmt
Dog.lbl
Dog.mostRecentShowDt
Dog.nm
Dog.showNm
Dog.weightAmt

Keep in mind that there are a few other rules that I have not yet covered that will impact the table structure, but I will add those items to this basic table structure as and when they are covered in future posts.

Other Thoughts

When JOINing two tables that use the standards described above, you will have to get used to using the fully-qualified column name. This is because there is a pretty good likelihood your JOIN will expose two column names that are the same. For example;

SELECT DISTINCT
     nm
FROM
     Job INNER JOIN
     Role ON Role.id_Person = Person.id

will likely fail, because there is a pretty good chance that both Job and Role will have a nm column associated with them. Therefore, the syntax must read;

SELECT DISTINCT
     Job.nm
     ,Role.nm
FROM
     Job INNER JOIN
     Role On Role.id_Person = Person.id

At first, this would appear to be a significant drawback, but I believe in using fully-qualified columns names, anyway, for the purposes of making the SQL statement more readable. The example above also provides a sneak peak at standards for primary and foreign keys, but these will be covered in more detail in a future post.

Conclusion

There are quite a few other rules that I apply to the naming of columns in database tables, but I will leave those idea for a future post. Thank you very much for reading, and let me know what you think!

Microsoft XML Notepad 2007

In pursuit of an easier way to edit the dtsConfig files that are generated when setting up configuration files for SSIS packages, I stumbled across a handy XML editor available for free download from Microsoft; XML Notepad 2007.  Up until now, I had been using a combination of Internet Explorer and Notepad to make direct changes to XML files.  That was OK, but I thought that in this day and age, it’s pretty silly to be doing it this way.  XML Notepad 2007 seems to be the answer.  At first, it’s a little disorienting because you don’t see the usual tree-like view presented by IE.  You get a tree, but it looks quite a bit different — it’s a stylized, icon-oriented thing that at first, I had trouble visualizing as XML.  However, once I was over that, I was able to modify the XML very easily and in a intuitive manner. 

I’ll provide additional commentary as I used it, but until then I would encourage you to download and give it try.  Thank you very much for reading and let me know what you think!

SQL Server 2005 Object Naming Standards - I: Tables

This is the first part in a series of posts on SQL Server 2005 object naming standards. In the previously posted Introduction, I laid out the case for the value of using these naming standards in general, and in this post, I talk specifically about table names.

Table Naming Rules

Singular  Table names are always singular;  Goose rather Geese, Crow as opposed to Crows, and Whale, not Whales.  Simple enough.  However, by saying singular, that does not preclude collective terms.  So Gaggle, Murder and Pod would all be perfectly good table names, whereas Gaggles, Murders and Pods would not.  To understand why, it helps to think about the columns that will eventually populate this table, and in particular, how the combination of table name and column name will sound when read or discussed.  I would argue that Crow.nm (think ‘crow name’) and Crow.dsc (’crow description’), make more sense than the plural equivalents (’crows name’ and ‘crows description’).  Or, stated another way, the name and description columns relate to a particular crow in one row of the table, not the entire population of crows in all the rows of the table.

One Noun   I strive for one noun that best describes the collection of data the table contains — the simpler the name, the more self-evident the table and its content becomes.  Car would be a good table name, whereas CarFender would not.  There can’t be much confusion as to what the Car table is likely to contain.  CarFender might contain information about cars, or fenders or maybe somebody who ends carfs (whatever they are), or possibly all three.  Just looking at the latter makes me want to split it into separate tables with some sort of relationship with each other.  If you’re really stuck for good, unique name, try plugging some possible names into a thesaurus, and see what synonyms exist.  You may be pleasantly surprised at what you find.  There is one major exception to the One Noun rule, described in Associative Tables, below.

Capitalize First Letter  The first letter of the table name is always capitalized.  I took a queue from object-oriented practices in this regard; tables seems like the database analogue of class, and the records within the table seem like instances of the class.  By convention, the first letter of a class name is capitalized, and hence, the database table name should follow that convention.  If it’s simply not possible to follow the One Noun rule noted immediately above, then use the so-called camel case to distinguish the words within the name.  Don’t resort to underscores, or worse yet, spaces in tables names to separate words — it’s just a waste of screen real estate, and does nothing that camel case doesn’t accomplish just as well.

Elimination of Redundancy   If you really work at applying the One Noun rule, it’s just a matter of time before there’s a situation where the same, simple noun describes at least two completely different ideas.  Point is a good example; it could represent a physical location on the surface of the earth, or a single bullet on a presentation slide, or maybe the sharp end of a pencil.  In some cases, the overlap may be eliminated by the use of schemas (more on that in a future post), but for the time being, this is the one situation where a single, clarifying adjective prefixed to the noun is in order. GeographicPoint and TalkingPoint will create the necessary distinction between the two ideas.  However, avoid preemptive use of this technique; “I had better name the table GeographicPoint just in case another type of point comes along in the future.“  Wait until you actually have an overlap, then deal with it, rather than trying to imagine all of the possible future situations — you’ll just drive yourself crazy and in the interim, degrade the simplicity and clarity of your table names.

Associative Tables  These tables resolve many-to-many relationships that exist in your data structure.  Recently, I have started breaking — er, evolving — the first, One Noun, rule because I came to the conclusion associative tables really are fundamentally different from other tables.  They describe a relationship, not a thing.  Hence in their case, I would use — horrors! — a transitive verb.  For example, the tables Presentation and Document are related in the sense that a given presentation could consist of many documents, and a given document could be used in many presentations.  This is a classic many-to-many relationship.  The associative table that resolves the relationship could be the transitive verb Contain.  Think ‘Document-Contain-Presentation’ and ‘Presentation-Contain-Document’.  I’m of mixed opinion about the use of the singular in this case.  I like the absolute nature of the Singular rule, but use of the plural does make it read better. So let’s go with singular, for the purposes of consistency, at least for now.

No Abbreviations For all but the most completely self-evident ideas, abbreviations should be avoided.  Just about any reason to use ultra-short tables names vaporized with the advent of near-zero-cost data storage and the various Intellisense-like development systems available today.  Let’s face it, you’re just not typing the entire table name all that often.  You will read it frequently, however, so what you lose in brevity is made up by the immense value of your code being understandable by yourself and others, now and years in the future.

Other Thoughts

Since early 2005, I have either led or participated in software development projects using the Agile methodology.  Primarily due to time constraints, only the tables that were actually needed during the then-current iteration were created.  Even if you’re not following Agile, it is strongly recommended that you create only the tables demanded by specific requirements you clearly understand.  To borrow from and butcher von Moltke a little; “A design never survives first contact with the user.“  What you learn in each iteration of development may well eliminate tables you were sure you needed.  At the same time, you will be required to create tables you previously never even imagined.  Let specific, current, feature-oriented requirements drive out structure  — always.

The Object Explorer pane of SQL Server Management Studio provides a pretty decent way of reviewing the tables you have created.  If you are successful in applying all of the rules above, you should be able to look at the list of tables and have a pretty good sense of the structure, without having to resort to system documentation that may (or more likely, may not) exist.  In fact, that’s the litmus test — if you look at the list of table names and you can’t more-or-less figure out what the structure is, then you’ve probably got some more work to do.

You may conclude that following all of the rules above very narrowly defines the possibilities; two people, separated by the time and space but looking at the same subject matter, might very well come up with very similar or even identical table names.  For example, if one of entities that is being described is a human being, exactly how many different words are there?  Eliminating profanity, pejoratives and the politically incorrect ones, not that many, really.  So why does everybody keep creating the very same Person table from scratch?  Good question, for which I can’t currently think of a good answer — the subject of a future post, possibly?

I will not attempt to address the question of logical vs. physical modelling, or the mapping of entities to physical objects — I may in a future post.  What’s certain, though, is that sooner or later the data winds up in a table structure within SQL Server 2005, and I assert that a little care and attention to its name will pay off over the long term.

Next up, naming standards for columns.  Thank you for reading, and please let me know what you think!