Feeds:
Posts
Comments

Archive for the ‘MultiVue’ Category

I’ve just concluded a good week in the Mother City performing a Master Data Management Architecture Review for a potential client. I last visited Cape Town five years ago and it was good to spend some time again in the beautiful city. The grass is, however, not always greener on the other side and Cape Town does come with its fair share of frustrations…

  1. Parking: there isn’t any, and when there is a spot, it’s taken
  2. Wind: man, when the South Easter gets going there’s no hiding. I was surprised at the number of skew trees in the parks…
  3. The drivers: quite snobbish actually. We’re a lot better here in Jozi

Other than that it’s an amazing place and the good things definitely make up for the bad. To be quite honest, the mountain cancels out all the bad things! Quite a pity I was there to work and not on holiday.

The first day involved extracting client profiles from an Oracle database. Easy enough or at least I thought so. I had installed the Oracle 11g client tools before my trip so all I really had to do was setup my tns names file and perform an import into my SQL Server 2008 R2 database. After three hours and still no closer to connecting to the Oracle database, I began feeling the heat. I’m not too sure exactly what the problem was but not even my good mate Google to get me any closer.

The on-site support staff struggled to get the connection working as well and they eventually decided to install an application called Oracle SQL Developer. Not the best looking tool, but there was an option to return rows in a table and then export them to CSV. What a relief. From CSV back to SQL and I was a happy camper again.

That evening I got to spend some time at the very nice Protea Hotel: Fire & Ice. Known for their legendary burgers and milkshakes, a work colleague and I were keen to check them out. We settled for the Kudu Filler burger with Lindt chilli chocolate sauce (as the Cheesy, Cheese, Cheese was sadly not available!). I was a little disappointed as the meat was overcooked but it looked good none the less.

The rest of the week involved performing a data quality check and then a MultiVue POC to determine the severity of the duplications. I must say that MultiVue is an excellent application for these architecture reviews and very quickly was able to demonstrate capabilities and results which the client could not get done before. What more and more clients are realising nowadays is that data quality is critical to the success of so many initiatives and they are keen at exploring new ways of improving the single view of the customer.

So, a good experience all up and I look forward to visiting (or possibly even moving to) Cape Town sometime in the future.

Advertisements

Read Full Post »

So I’m back from my short break and looking forward to 2011. 2010 was a great year from a personal perspective. After many years of slogging I finally finished my BSc IT degree. I now have a little more free time on my hands but that is soon to disappear as my lovely wife is expecting a little girl in 6 weeks (really looking forward to becoming a dad). I also had my first trip overseas and moved into a new position at work which I’m enjoying immensely! Finally, presenting at Tech-Ed was a great experience and something that I look forward to in 2011 if given another opportunity.

Right, down to business… In December I started a POC using a product called MultiVue. The product is an MDM (Master Data Management) solution developed by a company called VisionWare in Scotland. It uses probabilistic matching algorithms to match and merge clients, products, properties and any other entities which may be duplicated in the organisation. It is very powerful and I was fortunate enough to be part of the first production installation in Africa.

In short, I was trying to accomplish the following for the POC:

  • Import client and product data from three source systems
  • Setup some matching rules
  • Identify duplicate clients and present back to the client

Easy enough, or so I thought. After importing the source data, I created some matching rules and kicked off the process. I returned a couple of hours later only to find out that the process had failed due to insufficient disk space. Strange… I then extended the hard disk of my virtual machine, altered a few of the matching rules and started the process again. To my dismay I once again ran out of disk space! What could be going wrong? At this point I was getting extremely frustrated and decided it would be better off to take a break.

The next day I tried the matching process again but with a smaller subset of the original source data. Closer investigation into the temporary tables that were being created in the background led to the discovery of the problem. 600,000+ matches were being created on a dataset that only had 100,000 records. The root cause: I had committed the cardinal sin of not properly profiling the source data before importing it. One of the fields I was matching on was the client’s email address. The data capturers thought it would be a good idea to specify a 0 for 10,000 odd clients. The matching algorithms were therefore matching thousands of clients with thousands of others that were in fact not the same individuals. I removed the invalid entries, re-ran the process on the full dataset and received the results I was expecting.

Lesson learnt, always have a good look at your source data and understand the peculiarities before starting the actual work. The Data Profiling Task in SSIS is a great way to quickly pick up any issues in your source data and I’ll definitely be using it more often in the future.

Read Full Post »