About a month ago I presented on the new Tabular model at one of our customers. Now, I have been using PowerPivot / Tabular for quite some time but I by no means know all the ins and outs of the technology. I came across a great bunch of resources during my research and thought it would be a good idea to share this with the rest of the population. Here we go…
1. Some of the most asked about questions regarding this technology go something like this:
a. What are the differences / limitations to the classic Multi-dimensional model?
b. How do I know which model to use when?
Now, this has been well documented my many individuals around the world. Here are some links to the best resources I found:
- http://www.jamesserra.com/archive/2012/04/sql-server-2012-multidimensional-vs-tabular/ There are a lot of additional links at the bottom of the page.
- Session by Marco Russo at Tech-Ed 2012: http://channel9.msdn.com/Events/TechEd/NorthAmerica/2012/DBI319
In my personal opinion, I’d have to say that Tabular is going to be the way forward. I’m not saying that Multi-dimensional is dead, but if you look where all the investment for SQL Server 2012 was made, only minimal changes were introduced for Multi-dimensional. All the changes and R&D have gone into Tabular.
2. Date Table Best Practices
- Never use the date time column from the fact table in time functions.
- Always create a separate Time table with contiguous dates (i.e. without missing day gaps in the date values).
- Create relationships between fact tables and the Time table.
- Check that relationships are based on a date time column (and not on another artificial key column). If it is joining on an artificial key (e.g. scenario where a Tabular model is being wrapped around an existing data warehouse), mark the Time Dimension Table as a Date Table and select the actual date time column as the unique identifier
- Make sure you have full years’ worth of data in the Time table. For instance, even though your transaction table may only have sales data up to May 2012, the Time table should have dates up to December 2012 (same goes for fiscal years).
- The date time column in the Time table should be at day granularity (without fractions of a day).
Here is a good blog from Dan English talking to retail and marketing calendars and how to deal with the relevant DAX time calculations: http://denglishbi.wordpress.com/2010/09/24/calculating-previous-year-marketing-calendar-values-with-dax/
3. Best practices around processing and query optimisation
Good practice dictates pushing as many calculations as far back as possible e.g. the data warehouse. This is true for both Tabular and Multi-dimensional projects. This will also result in better compression in the Tabular model.
The extracts below were all taken from a session conducted by Ashvini Sharma and Allan Folting at TechEd North America 2012. I can really recommend watching the presentation which is available at: http://channel9.msdn.com/Events/TechEd/NorthAmerica/2012/DBI414 They discuss all the ins-and-outs of the xVelocity engine. Here is a small summary with screen shots from the session.
The xVelocity engine is designed with 3 principles in mind:
- Performance, Performance, Performance!
- Query Performance is much more important than Processing Performance
- Accommodate changes without forcing reload, if possible
When loading data, it goes through various stages:
- Encoding: everything is converted to integers, even strings (as it is the fastest data type)
- Value: one just has to do math to get to the answer. E.g. 1.5, 2.5 and 3.5 will be multiplied by 10 to get a whole number. You could then subtract 15 to get to a smaller number that has to be stored (DATA ID)
- Hash: Used for strings. Useful for very sparse value distribution. DATA ID is the index allocated to the string when it is inserted in the table for the first time. Requires decompression for computation. One can’t do math on hash encoding
Encoding is done automatically to conserve space. On the first load, a sample set of data will be checked to determine the best type of encoding applicable for that column.
- Compression (per segment): Calculated columns are only addressed after the compression phase.
If you want to see the columns that are taking up the most space, go to the folder where the Tabular databases are stored, search on *.* and sort descending on size. Files with a .idf extension are the in-memory data files. Dictionary and hierarchy files are also present.
Query parallelism is by segment. E.g. if you have 16 cores and 4 segments, only 4 cores will be used. One ideally wants a balance between the number of segments and the number of cores.
Processing Phases are depicted below.
Here is a screen shot showing the VertiPaq storage.
There are a number of processing options available once data has been added to the model:
A note on Incremental Processing: Only do a Process Recalc right at the end after all the Process Data tasks as it goes across all partitions.
And finally, some Advanced Processing Options:
Kasper de Jonge wrote another excellent blog on Tabular Analysis Server memory usage: http://www.powerpivotblog.nl/what-is-using-all-that-memory-on-my-analysis-server-instance He also made available a PowerPivot file that you can use to further interrogate your Tabular models.
4. Controlling Segment Size and Compression
There are two important Tabular Analysis Server Properties which you can tune to increase compression and query speed. They are DefaultSegmentRowCount and ProcessingTimeboxSecPerMRow. You can access these settings by connecting to your Tabular instance using Microsoft SQL Server Management Studio. Right click on the Tabular instance and select Properties. Check the Show Advanced (All) Properties check box and scroll all the way to the bottom. See image below.
The screen shot below specifies the defaults and what can be expected if you change the values.
5. Additional Resources
a. DAX Resource Centre: http://social.technet.microsoft.com/wiki/contents/articles/1088.dax-resource-center.aspx
Check the Websites and Blogs section for other top resources in the PowerPivot and Tabular space. Cathy Dumas, who I saw at SQL PASS 2011, is my favourite :) Check out her old Microsoft blog for additional info: http://blogs.msdn.com/b/cathyk/
b. Microsoft SQL Server 2012: The BISM Tabular Model book: http://cwebbbi.wordpress.com/2012/08/15/microsoft-sql-server-2012-the-bism-tabular-model-is-now-available-to-buy/