Feeds:
Posts
Comments

Archive for the ‘Business Intelligence’ Category

I’ve been trawling the web trying to find a decent explanation on how one will go about reusing the security model defined in Microsoft SQL Server Analysis Services (SSAS) cubes when publishing reports to Tableau Server 7. There are bits and pieces eluding to some of the concepts described below but nothing comprehensive. After some proper R&D I can now share the end-to-end functionality with you. Happy reading.

SSAS Security

One way to define security on a SSAS cube is to add a Role to the SSAS database. The Role Properties dialog, portrayed below, allows one to add Active Directory (AD) users or groups on the Membership page. One can then restrict access to certain Cubes and Dimension Data.

01_SSASSecurity

Users accessing the cube using Excel, SQL Server Reporting Services, Tableau Desktop Professional and some other reporting applications will then only be allowed to access the data they are allowed to see. All good.

Now the snag comes in when you introduce Tableau Server. The classic double-hop (passing the users login credentials from their machine to Tableau Server and then onto the SSAS server) comes into play as Tableau Server and SSAS are unlikely to be installed on the same server. I thought good old Kerberos was going to be required but as you’ll see a little later, it isn’t.

Create Tableau SSAS Connection

A user will be provided with the Analysis Services Connection dialog depicted below when creating a new SSAS connection in Tableau Desktop Professional 7.

02_TableauSSASConn

After you have specified the Server you will need to choose between Windows NT Integrated security or specifying a username and password. This is generally where the first mistake is made. If you select Windows NT Integrated Security, the credentials will not be used by Tableau Server when a user accesses a report. Instead the Tableau Server’s Run As User account will be used to access the cube and all the security roles defined for the SSAS database will be bypassed. Users will therefore all have the same access, which in my case was not good enough.

So, go ahead and specify an AD user in the Username text box along with the associated Password and go about creating your report.

Publish Tableau SSAS Connection

Once your report has been created it’s time to publish your Data Source to Tableau Server. To do this, right-click on the data source and select Publish to Server… In the Authentication drop down, select Prompt user and click on the Publish button.

An example of the Publish Data Source dialog is depicted below.

03_PublishConn

Publish Workbook

Next up it’s time to publish your workbook to Tableau Server. This is done by selecting the Server menu item in Tableau Desktop Professional and choosing Publish Workbook… One will then be presented with the Publish Workbook dialog (example below).

04_PublishReport

In the bottom left you will see an Authentication button. You will be presented with a second dialog after clicking on this button. Next, select Prompt user from the Authentication drop down and click OK. Finally click on Publish and you’re all done.

05_PublishReportUser

It’s important to know that if you selected the Windows NT Integrated security option when you defined the SSAS connection, the Authentication button will not be there!

Accessing the View through Tableau Server

Now for some testing. Login to your Tableau Server and browse to the view that has just been published. After have you clicked to open it you will be presented with a login screen (example below).

06_ViewSecurity

Type in your AD credentials and the view will be displayed. You have the option of caching your credentials until you physically log out which saves you from having to re-enter the details for each view that is accessed.

Admin User Access

In my scenario I was accessing a Speed Fines cube. I had defined an Admin user that could access all vehicle makes and then another user that could only see statistics for Audis and BMWs. Below is the view after entering the Administrator’s login credentials.

07_AdminUser

Tableau User Access

When entering the second user’s credentials and I access the same view, only the two vehicle makes are displayed. Like.

08_AnotherUser

Administration: Clearing & Saving Passwords

Finally there are some useful settings in the Tableau Server Administration Maintenance section to make accessing the reports that little bit easier.

09_AdvancedSettings

In the snippet above, an Administrator has the opportunity for clearing all saved passwords and there is also a checkbox that will allow users to save data source passwords across multiple visits and browsers. Not a good idea though if multiple users share the same workstation.

Read Full Post »

About a month ago I presented on the new Tabular model at one of our customers. Now, I have been using PowerPivot / Tabular for quite some time but I by no means know all the ins and outs of the technology. I came across a great bunch of resources during my research and thought it would be a good idea to share this with the rest of the population. Here we go…

1.     Some of the most asked about questions regarding this technology go something like this:

a. What are the differences / limitations to the classic Multi-dimensional model?

b. How do I know which model to use when?

Now, this has been well documented my many individuals around the world. Here are some links to the best resources I found:

In my personal opinion, I’d have to say that Tabular is going to be the way forward. I’m not saying that Multi-dimensional is dead, but if you look where all the investment for SQL Server 2012 was made, only minimal changes were introduced for Multi-dimensional. All the changes and R&D have gone into Tabular.

2.     Date Table Best Practices

  • Never use the date time column from the fact table in time functions.
  • Always create a separate Time table with contiguous dates (i.e. without missing day gaps in the date values).
  • Create relationships between fact tables and the Time table.
  • Check that relationships are based on a date time column (and not on another artificial key column). If it is joining on an artificial key (e.g. scenario where a Tabular model is being wrapped around an existing data warehouse), mark the Time Dimension Table as a Date Table and select the actual date time column as the unique identifier
  • Make sure you have full years’ worth of data in the Time table. For instance, even though your transaction table may only have sales data up to May 2012, the Time table should have dates up to December 2012 (same goes for fiscal years).
  • The date time column in the Time table should be at day granularity (without fractions of a day).

Here is a good blog from Dan English talking to retail and marketing calendars and how to deal with the relevant DAX time calculations: http://denglishbi.wordpress.com/2010/09/24/calculating-previous-year-marketing-calendar-values-with-dax/

3.     Best practices around processing and query optimisation

Good practice dictates pushing as many calculations as far back as possible e.g. the data warehouse. This is true for both Tabular and Multi-dimensional projects. This will also result in better compression in the Tabular model.

The extracts below were all taken from a session conducted by Ashvini Sharma and Allan Folting at TechEd North America 2012. I can really recommend watching the presentation which is available at: http://channel9.msdn.com/Events/TechEd/NorthAmerica/2012/DBI414 They discuss all the ins-and-outs of the xVelocity engine. Here is a small summary with screen shots from the session.

The xVelocity engine is designed with 3 principles in mind:

  • Performance, Performance, Performance!
  • Query Performance is much more important than Processing Performance
  • Accommodate changes without forcing reload, if possible

When loading data, it goes through various stages:

  • Encoding: everything is converted to integers, even strings (as it is the fastest data type)
    • Value: one just has to do math to get to the answer. E.g. 1.5, 2.5 and 3.5 will be multiplied by 10 to get a whole number. You could then subtract 15 to get to a smaller number that has to be stored (DATA ID)
    • Hash: Used for strings. Useful for very sparse value distribution. DATA ID is the index allocated to the string when it is inserted in the table for the first time. Requires decompression for computation. One can’t do math on hash encoding

Encoding is done automatically to conserve space. On the first load, a sample set of data will be checked to determine the best type of encoding applicable for that column.

  • Compression (per segment): Calculated columns are only addressed after the compression phase.

If you want to see the columns that are taking up the most space, go to the folder where the Tabular databases are stored, search on *.* and sort descending on size. Files with a .idf extension are the in-memory data files. Dictionary and hierarchy files are also present.

Query parallelism is by segment. E.g. if you have 16 cores and 4 segments, only 4 cores will be used. One ideally wants a balance between the number of segments and the number of cores.

Processing Phases are depicted below.

Here is a screen shot showing the VertiPaq storage.

There are a number of processing options available once data has been added to the model:

A note on Incremental Processing: Only do a Process Recalc right at the end after all the Process Data tasks as it goes across all partitions.

And finally, some Advanced Processing Options:

Kasper de Jonge wrote another excellent blog on Tabular Analysis Server memory usage: http://www.powerpivotblog.nl/what-is-using-all-that-memory-on-my-analysis-server-instance He also made available a PowerPivot file that you can use to further interrogate your Tabular models.

4.     Controlling Segment Size and Compression

There are two important Tabular Analysis Server Properties which you can tune to increase compression and query speed. They are DefaultSegmentRowCount and ProcessingTimeboxSecPerMRow. You can access these settings by connecting to your Tabular instance using Microsoft SQL Server Management Studio. Right click on the Tabular instance and select Properties. Check the Show Advanced (All) Properties check box and scroll all the way to the bottom. See image below.

The screen shot below specifies the defaults and what can be expected if you change the values.

5.     Additional Resources

a. DAX Resource Centre: http://social.technet.microsoft.com/wiki/contents/articles/1088.dax-resource-center.aspx

Check the Websites and Blogs section for other top resources in the PowerPivot and Tabular space. Cathy Dumas, who I saw at SQL PASS 2011, is my favourite 🙂 Check out her old Microsoft blog for additional info: http://blogs.msdn.com/b/cathyk/

b. Microsoft SQL Server 2012: The BISM Tabular Model book: http://cwebbbi.wordpress.com/2012/08/15/microsoft-sql-server-2012-the-bism-tabular-model-is-now-available-to-buy/

Read Full Post »

tX2012 was released earlier in the month and we will be upgrading a lot of out clients to this new version. One would assume that you could export your timeXtender project created in 4.5.X and then just import it into tX2012 but unfortunately it’s not that straight forward. If you do try this, you’ll get the error message below:

In order to perform a successful upgrade, backup your 4.5.x projectRepository database. Uninstall timeXtender 4.5.x and install tX2012. After opening the new application, point it to the restored database. You will then be prompted with the following popup:

 

 

 

 

Click on the Upgrade button. You will then be presented with another pop-up confirming if you would like to use Local files or the Web service method. Select Web service and wait while your project database is upgraded.

I did receive an error message after the upgrade process (see below) but my upgraded projects processed without any issues despite this. A call has been logged with the timeXtender support team and I’ll update the post as soon as I have more information.

In conclusion: upgrade your project in a Development or QA environment first and perform thorough testing before upgrading your Production environment.

Read Full Post »

Final day of PASS! Right, so not off to a great start. I was dead keen to see Marco Russo’s Vertipaq vs. OLAP session but it was held in the smallest room and was over subscribed. Will have to catch that one on the PASS website once the session recordings are made available.

Enriching Your BI Semantic Models with DAX

– Some nifty DAX functions: DISTINCTCOUNT, RANKX, TOPN, SWITCH (use instead of nested IIF statements), HASONEVALUE (returns true or false), USERELATIONSHIP (use an inactive relationship instead of the default active one), PATH (use for parent-child relationships), PATHITEM (gets a specific item from the defined PATH), LOOKUPVALUE (get an attribute based on another value), USERNAME (can be used for row level security)
– Week time intelligence functions are not available. One will need to add in the relevant attributes to your calendar dimension and then write a custom calculated measure
– There will be a DAX query plan! Available as a new option in SQL Server Profiler

SQLCAT: Configuring and Securing Complex BI Applications in a SharePoint 2010 Environment with SQL Server 2012

– When setting up your SPNs, use both the fully qualified and net bios user names
– Use the -S SPN option: will ensure that duplicate SPNs are not added. Use -X to list duplicate SPNs
– Important to create a service account that will be running the Claims to Windows Token Service. This service will be on all servers that have SharePoint installed
– Constrained delegation will not cross domain boundaries, no matter what
– On the Delegation tab, ensure that the Use any authentication protocol option is selected
– Kerberos does a lot of caching. Some of the caches can only be cleared after rebooting the servers
– Kerberos only needs to be configured for PowerPivot if one of the data sources is a linked SQL server
– Power View does not support the prompting of credentials on an .rsds connection file

Using ColumnStore / Vertipaq Indexes in SQL Server 2012

– Huge, huge, huge performance gains!
– ColumnStore indexes will not always be the best solution e.g. for very restrictive queries
– All data for a single column is stored on a separate page; access only the columns you need
– At some stage the BISM model will struggle with huge data volumes. Because of things like Direct Query, one can use the ColumnStore index to increase performance for BISM models
– Some really smart stuff has been done from an encoding point of view. E.g. Instead of storing the true sales number, an offset is stored and all numbers are divided by this offset resulting in smaller numbers that need to be stored
– Run Length Encoding is also another smart way compression is being handled
– Data can be used in query processing without decompression
– On average, compression will be 1.8 times more effective than page compression
– Consider using ColumnStore indexes on tables that have 5 million + rows
– Create the index on ALL the columns. You can only have one ColumnStore index per table
– Outer joins and Unions will mess with batch processing

Wayne Snyder was an outstanding presenter, loved his ‘old-man with 100 years experience’ presenting style.

And that concludes a great SQL PASS conference! There is some game-changing content on the way and I can’t wait for SQL Server 2012. Next year’s conference will be held in Seattle once again in early November.

Read Full Post »

Time for some feedback on Day 2…

Thursday, Opening Keynote

– The Fantastic 12 of SQL Server 2012:
+ Required 9’s and Protection
+ Blazing Fast Performance: Performance enhancements to the RDBMs, SSAS and SSIS. And then there is the new ColumnStore Index – really looking forward to using this one in our DW environments!
+ Rapid Data Exploration: Power View and PowerPivot 2.0. Great features and enhancements in this space. Also included is the new Reporting Alerts capability – ability to send out alerts based on data conditions
+ Managed Self-service BI through SharePoint
+ Credible, Consistent Data: BI Semantic Model, DQS and MDS
+ Organisational Compliance: Expanded Audit – User-defined, Filtering, User-defined Server Roles
+ Peace of Mind: Production-simulated Application Testing, System Center Advisor and Management Packs, Expanded Support – Premier Mission Critical
+ Scalable Data Warehousing
+ Fast Time to Solution: SQL Server Appliances – optimised and pre-tuned
+ Extend Any Data, Anywhere: ODBC drivers for Linux and Change Data Capture for SSIS and Oracle, Semantic Search
+ Optimised Productivity: SQL Server Data Tools, Unified Across Database and BI, Deployment and Targeting Freedom
+ Scale on Demand: AlwaysOn, Deployment across public and private, Elastic Scale (scaling in the cloud with SQL Azure Federations)

Data Quality Services – A Closer Look

– Setup your Knowledge Base and then perform a Knowledge Discovery exercise. This is similar to profiling your data and will populate the Knowledge Base
– Spelling mistakes are highlighted just like in Microsoft Word – very nice
– Next up, create a Data Quality Project. This will enable you to validate your data according to the rules that have been defined in the Knowledge Base. Corrections will be made and alternatives suggested
– Data that needs to be cleaned can also be validated against external data sources. Only Azure Data Market right now
– One can setup rules to override the corrections provided by the external data provider
– Term-Based Relations: find and replace
– Matching: one has the ability to create a composite domain (using a delimiter to split out multiple fields)
– New feature – Use Knowledge Base Parsing: use this along with the delimiter to use your Knowledge Base to automatically fix mistakes. E.g. If users had swopped around fields in certain instances causing the parsing to be invalid
– Create a Matching Policy and define weightings to the fields you want to match on
– Create a Data Quality Matching project to determine the duplicates in your data and you’re all done!
– The matching algorithm is a black box, not too much detail given here 🙂

DQS looks like quite an ok product, considering it is only the first release. Only downside is the fact that South Africa will not be able to access most of the external data sources and the ones they can access may not be that useful.

Developing and Managing a BISM in Analysis Services

– Microsoft continuum: Personal BI (PowerPivot), Team BI (PowerPivot for SharePoint) and Corporate BI (Analysis Services)
– Row and Dynamic security will be available!
– SQL Server Data Tools is the new name for BIDS
– Add calculated measures in the measure window. [Measure Name]:= [Measure Formula]
– Security: read permissions will give you access to the entire database. Use a DAX filter to further restrict the data
– The Tabular model, which contains a GUID in the name, talks to the Workspace Database – a place where all development changes take place
– When creating a new Tabular project, use the Preview and Filter option to restrict the number of rows that are imported into the model. This is particularly useful when dealing with large datasets. A partition will automatically be created for this subset of data. Once the data has been imported, create additional partitions for the rest of the data. The Process All option will bring in the new data
– One can copy data from Excel and paste it in as a new table in Visual Studio. Very nice for prototyping
– One can hide a Security table by applying a DAX filter of =FALSE(). Hiding the table from the client tools does not hide it altogether
– DAX security filter example …security[allowed region], security[login], username(),…
– One can process a database, a table or a partition
– The Process Data option only processes data, calculated columns will not be updated. One will need to Process Recalc afterwards before you can Evaluate (return rows from) the table
– Use Process Add to incrementally load data into new partitions. Use Process Defrag afterwards
– Automatic partitions can be added using AMO
– Use the Extension Manager in Visual Studio to include the DAX Editor – to be made available on CodePlex soon! Great add-in for formatting DAX
– There is also another sweet add-in to script out all the DAX measures into a separate window, all with auto-complete. Cheers MDX!

Cathy Dumas did an excellent job with this session, I’ll definitely go and see her in the future again.

Building the Perfect BI Semantic Model for Power View

– Power View allows for the creating of different views in the same report, the same way you would add a new slide to a PowerPoint deck – very cool feature
– Power View reports can also be created from .rsds connections
– One can perform data mining in BISM by creating a connection to a SSAS source which contains the Mining Model
– In SharePoint, on the .rsds connection properties, check the Set execution context property if Kerberos is not setup. The logged in user’s credentials will then be passed through to the data source
– For numeric attributes, like year, set the aggregation property to Do not summarise. This attribute will then not be aggregated when creating a pivot table report
– Development tip: create a BISM connection to the Workspace database instead of the one that is deployed – speeds up trouble shooting etc.
– Descriptions defined against tables and measures will be shown as hover text in Power View
– Set additional Default Label and Default Image properties on attributes in SQL Server Data Tools. Power View will then know which fields to use when switching between card and chart views

The Power View demo by Carl Rabeler was not great as he had to look down at his script the whole time. Kasper de Jonge was a lot better and gave good insight into the technical bits of BISM.

I can’t wait to start using this technology – hopefully another CTP release will be made available early in the new year…

Read Full Post »

So I’m in Seattle attending my first SQL PASS event and thought I’d quickly share some of the highlights of Day 1.

Opening Keynote

– SQL Denali is officially renamed SQL 2012 and will be released during the first half of next year. I.e. 30 June 2012 🙂
– Project Crescent will be known as Power View. It will be available on most mobile devices and is loaded with quite a few new features compared to CTP3. Further down the line (post the SQL 2012 release) one will be able to take a pic of a movie (for example) and use the Bing imaging service to identify the movie cover and filter the report for that movie!

Amir Netz did an excellent job showing off Power View using a movie sales dataset. I’d love to get hold of it for client demos. My only concern is that Power View is still going to be tightly coupled to SharePoint Enterprise (aka the centre of the Microsoft universe). Small to medium size clients will then not be able to leverage this great reporting tool.

Visual Report Design: Bringing Sexy Back

– Principles of effective design include: form (visual design skills), fit (functional business skills) and function (technical skills)
– See http://www.colourlovers.com/ for palette colours and examples of which colours work well together
– Items to consider for your report design:
+ Can the user glean the information in 5 seconds or less?
+ Know your audience
+ Get a second opinion
+ Maintain balance
+ Use POCs and throw them away
+ Accept feedback and be willing to learn

Remember, the novelty of sexy graphs and gauges will disappear over time. Users will just want to get the relevant information to make the decision.

What does it take to add MDS to your DW?

– Create your MDS model in the web UI and then the entities through the new MDS Excel add-in
– Use Attribute Properties to create a relationship between entities. E.g. Product and Category
– Constrained lists can also easily be created using the data in existing columns. I.e. Creating entities for lookups
– Create derived hierarchies back in the MDS UI. Users can then change the hierarchy setup by dragging and dropping items in the tree view
– Add in any required business rules and validate the data through Excel or the MDS UI
– Annotations can also be added to changed data before it is published back to MDS
– Subscription Views make available the data to other systems that need to consume it. Parent-child hierarchies can also be flattened out using the views
– One can install MDS 2012 and still use SQL 2008 R2 on the database engine side, you don’t need to upgrade the entire SQL environment
– Microsoft believe MDS can scale into the millions of records to easily manage customer master data, for example. I’d like to test this out 🙂

Introduction to Data Mining in SQL Server Analysis Services

– Great introduction by Brian Knight: stating all his credentials but then down playing them at the same time using a split screen
– Data mining skills will get you a salary increase (maybe)
– Clustering is great for fraud detection. It will measure the distance between two points on a scatter chart and determine a likelihood
– See http://www.sqlserverdatamining.com
– The key is to flatten out your dataset. However, when creating a Mining Structure you can use nested tables if there is a relationship defined
– SAS has good data mining algorithms, Microsoft’s ones are about an 80% fit. The clincher, you already have the MS ones and no additional purchase is required

I’m going to give this a whirl on some client data when I’m back at the office. Data mining, although not that well used, can add a lot of insight to the existing data warehouse.

Brian was the best presenter of the day for me.

Read Full Post »

Two weeks ago I was luckily enough to join our general manager for a Business Architecture review in Dubai. This was my first trip to the UAE and I was in for a few surprises. First up as an “eye examination” at 01:00am in the morning. Apparently all the chaps with ZAF passports have really poor eyesight for some reason. Needless to say my eyes weren’t tested. My form got stamped and I was told to re-join the first queue I had stood in. Next up was the heat, a cool 35 degrees when we were picked up at the airport. Temperatures managed to get to 45 degrees at their peak making it less than pleasant to go for a stroll outside. Mind you, the humidity was pretty insane as well. And then finally, the cost of food and booze. A Heineken will set you back around 60 ZAR and for a 300g fillet – anything between R400 – R650. Eish.

On the brighter side were the friendly people, amazing buildings and malls you could never even dream of – full with ski slopes, aquariums and 1,200 stores!

Ok, so why were we out there? A potential client had asked us to review their infrastructure, business processes and technical implementation with the core focus relating to operational efficiencies. I was there more from a technical perspective and had the exciting job of building out a few POCs while we were on-site. This is where I have to give credit to the Microsoft stack, especially to PowerPivot. In just a few hours one is able to mash up multiple data sources and present back analysis that the client thought would take a very long time to complete. Our rapid BI development tool, timeXtender, also came in very handy when building out a finance cube on a subset of the client’s database.

These tools not only accelerate the ability to quickly create POCs, but also significantly reduce the development time for full-scale projects. The company I work for has recently had numerous success stories where ROI has been realised very quickly, sometimes within the first month of going live! The common denominator – timeXtender.

The review came to an end all too quickly but I was happy to be back in SA again. There is nothing quite like your own bed. Well, we’ve compiled an exciting roadmap for the client – let’s hope they want us to partner with them on the journey.

Read Full Post »

Older Posts »