Semi-Technical – Intern to the Rescue Episode 93

New and Improved Bacon Flowchart

Occasionally I unearth a problem that I had no idea existed, but in doing so I save the world from something that they never even thought would be happening. Yesterday was one such event, and I saved Broadcom at least a week worth of debugging by spending a couple hours working with cursors and a temp database to check the integrity of our data after a major change to the contents. This is only semi-technical, so bear with me…

Overview

I am currently working on a project with high visibility within Broadcom. We have two disparate applications both tracking projects, their revision, a number of other overlapping fields and many other fields that are not shared. We defined one as the Master, because it has uniqueness and a number of data integrity checks, and the other as the Slave. The catch is that we have a many to one mapping between Slave and Master which means we have to look deeper.

Slave has a set of files that Master never cared to keep track of, ignoring the fact that it has never known about it, along with versioning, an audit log and a series of other fields that are still very important in the universe of problems that Slave solves. That information cannot be moved over to the other application just yet though because these two Applications are not mapped, and so Phase 1 hit me – build a mapping system for Slave to allow the users to map their projects to that single project in Master. We jumped through that hoop with minor issues and landed on Phase 2 – Email the project members, which evolved into just the owners and then their managers and so forth up the line, asking them to map their projects to the Master table and to close them out if they don’t map.

image

Think of this diagram as a fractal. You can replace the Master_1 project with each project, respectively, in T_Projects, the table on the left will become T_Files which is mapped via the T_ProjectFiles table. This filtering goes further by allowing each file to have multiple audit log entries, check out histories and so forth for all of its data.

Phase 2 is now complete and we are now using that mapping information to merge the projects together. Phase 3 involves a number of changes to the application, mostly database related for the purposes of this post (Sure does look like a lot of background, but I assure you its worth it). Phase 1 included a new table, T_MasterSlaveMapping. We are going to walk through all of these entries, grouped by the master project mapped (Slave_1, Slave_2, Slave_3 are mapped to Master_1) and start tying things together.

  1. Create a new project, Master_1, in Slave, echoing relevant data from Master if necessary
  2. Create new entries, merging where applicable, for all of the files, audit log and so forth, for Slave_1, Slave_2 and Slave_3, and associate them with the Master_1
  3. Create the related permissions entries for the Master_1 for all members of Slave_1, Slave_2 and Slave_3
  4. Deactivate Slave_1, Slave_2 and Slave_3

The script was written to do all of this and executed, and at first check everything looked good. A couple of errors in the script were found and corrected. Nothing is worse than if you write something and it works fine the first time, “its always better if you find errors.” the developer I was working with said. And so my PAR began.

Problem

Testing is something I consider an irritating and all around depressing part of my day. I subscribe to the idea of automating testing, and I’ve found that even when I am told that I will only be doing something one time, it will likely be done again some weeks from now, and even if it isn’t, I will have to at least explain the input, patterns and so forth that I used which caused the issue. The only thing I hate more than testing is documenting, so i automate everything, as best I can.

In stirring up the database and lots of data having the opportunity to be moved around we have executed a couple smoke tests, with nothing substantial being uncovered. While my colleague turned his eye to the rest of Slave I turned my eye to the database and began looking into the integrity of the data. Testing the system is quite complex, leading to a number of hairy situations:

  • Are the old projects being deactivated
  • Are the new projects of the right type
  • Do the users have access to the proper projects
  • Do the users have access to the proper files
  • Etc.

I decided to focus on the permissions, because in this situation I should not be losing access to any files, but I could easily gain access to more. The problem is – how to keep track of that.

Action

We are trying to compare two databases, one PreMerge and one PostMerge, where we have emulated a real world set of data, to confirm that I PostMerge have at least the same access that I had PreMerge. there are a number of ways to do this, most more complex than others, but you can take advantage of an MSSQL “feature” to lighten our load and have the ability to keep track of other information along the way.

The plan is to iterate through all of the projects that are mapped, grab their users, and build a table of users, the files they have access to and the projects that provide that access. We can add a flag for each side (Pre and Post) so we can run a query at the end and get a full list of the related files they lose access to, and respectively which ones they gain access to with a simple query.

CREATE TABLE #UserFileAccess
{
	UserID int,
	PreProjectID int,
	PostProjectID int,
	FileName varchar(500),
	FileRevision varchar(50),
	PreMergeAccess int,
	PostMergeAccess int
}

We have our temp database, now we play with some MSSQL Tricks and we will see something like this for our file structure:

  1. Create the temporary table
  2. Move to the first database you need to check and initialize variables
  3. Create a cursor for the mapped projects
  4. Iterate through them all
    1. grab the files that each user has access to
    2. Insert them into the temp table
  5. Close out and deallocate the cursor
  6. Move to the second database
  7. repeat steps 3 through 5 here with exceptions when iterating
    1. If the file, revision, user and project already, we just want to update the entry with the new project we are given access from and toggle the related bit
    2. Otherwise we will insert a new entry with the PreAccess entries as 0 or “”, respectively

[download id=”25″]

This process takes advantage of the way that a temp table works, by not existing in an actual database we can switch between databases and still use it. Once this is completed you can run any queries like the following to verify the integrity of the database.

SELECT *
       FROM #UserFileAccess
            WHERE PreMergeAccess = 1
                  AND
                  PostMergeAccess = 1;
SELECT *
       FROM #UserFileAccess
            WHERE PreMergeAccess = 1
                  AND
                  PostMergeAccess = 0;
SELECT *
       FROM #UserFileAccess
            WHERE PreMergeAccess = 0
                  AND
                  PostMergeAccess = 1;

Result

With this script running, it took a full 42 minutes because of the Cursor in a Cursor and probably some other non-optimized code, I took a break and read a book. When I returned I had 82k records inserted into my temp database, and in executing variations on the above SQL queries we found that our users all had access to everything that they had previously had access to, save a couple bugs with Project types being overwritten and dropped.

Thankfully this script allowed us to catch this before pushing to Production, where we would have had users screaming. This table also gave us a clear means of seeing who has gained access.