I’ve presented and taught recently on topics around the cognitive shift we go through when moving from an OLTP data model to a OLAP data model. This shift isn’t easy for most people. As a consultant I ran into push back on almost every project where the customer questions the de-normalization or flattening out of the data. This is perfectly understandable and I place more fault on the consultant who can’t clearly explain why we do what we do than on the client who questions what we do.
With that being said, this blog is not necessarily about that. What this blog is about is my musing on whether this will even be a question in the future. Hints of it are everywhere: in-memory databases, NoSQL, and tabular model. My guess is if a business has a choice of going down the path of building out a large data warehouse infrastructure or using their existing OLTP systems and bypassing the ETL heavy lifting – they’ll choose the latter.
OLAP refers to a multi-dimensional cube. Dimensional models facilitate cube building. When I talk about the end of dimensional modeling, I am really talking about the end of cubes. This may seem far-fetched but I wouldn’t put it past the realm of possibility. Could OLAP analytics someday go the route of Poloroid cameras, floppy disk, or the Diskman?
A more interesting dynamic is the rising of Hadoop as the primary location for data storage. From there a choice can be made to completely circumvent traditional RDBMS (Teradata, PDW, Oracle, SQL Server) systems altogether or use them as a drop box for new data sources. The refine approach pulls semi-structured data into Hadoop for staging. The data is massaged and managed and then sent to your traditional RDBMS systems for processing by traditional analytic applications.
You do have the choice to remove relational systems from the equation. This allows for a lower cost, exploratory type of analysis using traditional products like Tableau or Excel (the compatible list grows almost daily) which pull data directly from Hadoop. Your traditional ETL is performed on the lower cost Hadoop framework and modeling can be performed within an analytic tool such as PowerPivot.
Another approach is to eliminate the relational system by including a NoSQL solution side-by-side with Hadoop (Hbase, for example) and using Hadoop as the batch analytics platform and NoSQL for more real-time analytics over massive data sets. Other access methods include Tez which move us closer to true real-time analytics running directly against Hadoop using optimized MR processing. This is the enrich approach because it allows a company to combine disparate data to derive new meanings. The platform is highly scalable horizontally at relatively low cost compared to more expensive vertical scaling. It also completely eliminates cubes.
This is not to say companies will suddenly abandon the significant investments they’ve already placed into their OLAP infrastructure but, as time and opportunity go on, there will be a transition as companies explore new lower cost, faster methods to the same goal. The goal is speed to analytics. The first company to the analytics is the winner. Building out a complex ETL and OLAP framework is a bottleneck. The death of OLAP is a long way away but companies might be already exploring ways to kill it.
Most database administrators of relational systems have never heard of the CAP Theorem but this theorem is a core foundation for defining the type of database system you’re using. The theorem began at the University of California, Berkeley in 2000 by Eric Brewer and was later formalized in 2002 by Seth-Gilbert and Nancy Lynch of MIT http://en.wikipedia.org/wiki/CAP_theorem. The theorem classifies database systems and demonstrates implementation trade-offs based on the technology used to implement these systems.
The concept is fundamental but powerful. You view the CAP Theorem as a Venn diagram in which the circles are defined as Consistency, Availability, and Partition Tolerance. The core of the theorem is that when you take into account these three solutions you can only have two. There is always a trade-off. Any database architecture must sacrifice one of the solutions in order to gain the other two. Let’s first break down what each one means.
When looking at traditional relational databases, consistency is implemented through transactions. All relational systems are ACID compliant – this is how E.F. Codd defined a relational database. When we enter data into a relational system the data is guaranteed to be consistent. Consistency is solved through transactions and also primary and foreign key constraints.
Not all database systems are consistent. This shouldn’t be a surprise because non-consistent architectures are all around us. Take the case of the Internet. If I make a change to my webpage in the United States, will someone in China immediately see the change? Of course not. It will take time for the change to propagate to the various geographical cached locations. Also think of Active Directory. When I make a security change in Active Directory it takes time for the change to become effective elsewhere in the system. This is referred to as Eventual Consistency and is the bread-and-butter of many non-relational databases. It simply means that a data change may not immediately be visible to other users of the system.
When looking at traditional relational systems availability is a key component. RDBMS’s solve the availability problem through transaction log backups as well as database backups. We look to most RDBM’s to be highly available. Other systems may not be bothered with such restrictions and may not implement any kind of transaction logging whatsoever. Those systems may not consider themselves to be available although most do allow for some form of recovery – just not point in time.
This states that if any part of the system fails that, as long as a single part still remains, the system is operational. That is a lot of words for what is essentially distributed computing across nodes. Partition Tolerance is distributed computing. If you can run your system across multiple nodes and, if anyone of those nodes can fail without a total system failure, you are partition tolerant. When thinking about partition tolerance think of a a database running on a Hadoop-like filesystem. Riak is a good example.
Why is this Important?
So what does all this mean? It means that if you require data consistency then you must choose between either availability or partition tolerance. If you choose partition tolerance you must select either availability or consistency. Systems are classified based off the CAP Theorem. So, for example, a relational database like SQL Server is considered AC (available and consistent). All relational systems are AC. This is obvious because you can’t distribute a SQL Server (or Oracle) database across multiple nodes. A system like Riak is considered AP. It is highly available but can also be distributed across nodes. The data is not promised to be consistent, but does allow for eventual consistency. Neo4j, a graph database, is much like a relational system and is defined as AC.
Keep in mind there are ways to game the system. Solutions like Riak can let you customize CAP restrictions for each file write. Still, the restrictions remain. The CAP Theorem is a powerful way to codify true technical trade-offs when it comes to selecting a particular data management solution. The reality is when outside of relational systems there is a reliance on BASE architecture – Basically Available, Soft State, Eventual Consistency. Both RDMS and BASE architectures have their uses and neither one is meant to replace the other.
The data landscape is changing. A recent article showed that 90% of the world’s data was created within the last 2 years http://www.sciencedaily.com/releases/2013/05/130522085217.htm. There won’t be one database solution to handle this kind of curve. You’ll need to get used to making difficult decisions. I view database solutions as tools in a toolbox. You need to know what tool to use for your data.
This is a really well done analogy explaining the CAP Theorem
Another great primer with quotes from the source. Also brings in the concepts of yield and harvest.
Brewer’s CAP Theorem slide deck
Security is always a trade-off between allowing those who need access to do their jobs and preventing those who want access from having too much. There is a fine line and many times the line is crossed not because of some sinister effort to provide unwanted access, but instead; the line is crossed because of vague security requirements.
It is the DBA’s responsible to control, maintain, and monitor good security practices throughout the SQL Server environment. This includes creating SQL accounts with strong passwords, reducing the number of administrators on each system, and reducing the SQL Server surface area configuration.
The primary object to secure in any SQL environment is the data. Especially if the data contains sensitive information. The first step to securing data is a secure data center infrastructure. The second step is to contain who has access to the data through SQL Server. This is accomplished by locking down both SQL and the application. Application security is outside the realm of most DBA’s, so it falls to the DBA to secure data access in SQL Server. Following is a list of suggestions:
- Always remove BUILTIN\ADMINSTRATORS from the server. This should be the first thing you do but make sure you have an alternate administrator assigned and have the sa password available
- Mix up your SQL Server service accounts. Don’t use the same account for all your servers. In the past I’ve created a pool for about 5 or 6 accounts and equally distributed these across your servers. This can complicate management (especially security for an NFS backup location) but the purpose is so if one account is comprised it won’t affect all your servers.
- If a vendor requires sa rights for application support, provide only temporary access. Remove the access once the job is done. I would also suggest running a trace while they are troubleshooting so you can capture any SQL they execute
- Application accounts should never need sa rights. Provide them dbo or less. If a vendor requests that the application needs sa, push back and get documentation as to what specifically the application does that requires the elevated rights. I usually tell them that if you aren’t going to restart the services then you don’t need sa.
- Provide the least rights needed. This goes for any system.
- Create a DBA Active Diretory group. Only this group (and the security group) should have access to the sa and service passwords. Only this group should have sa.
- Keep passwords in a secure, encrypted location.
- Develop a means to generate complex SQL passwords. Ideally, you should incorporate some key to create a random password. Keep this key secret. You should also be able to reproduce the password generation in order to retrieve it if necessary.
- Every time a DBA leaves or someone leaves who knows the sa password, you will need to change the password.
- Make sure your service accounts on not in an OU that has a password expiration policy. You will still want to periodically change the account passwords but you want to control when this happens.
- If you do not need SQL Server accounts then setup SQL to use only Windows Logins.
- Configure the max error log count different from the default
This list is by far not comprehensive. Security is a complex subject and can involve any number of solutions – encryption only being one of many. There is little doubt that increased security will increase maintenance and may involve an elevated skill for managing and implementing good, consistent security practices. Don’t fall into the trap off throwing away good security because of an increase in maintenance. The best approach is a sustainable approach and one that meets the needs of the business. It may take a little more work and you may also be portrayed as the bad guy, but the risk of doing less may prove much worse.
My last post (oh, it was too long ago) Dynamic Security Part 2 explained how to implement dynamic security in a dimensional model using MDX. I only touched the surface of this topic and there are other much more capable folks you can follow-up with to find out more. Keep in mind too there is also an excellent PASS Analytics Conference on the horizon. If you can’t find your answers there then there are no answers to be found.
As a reminder, the overall strategy was to have a security model built to handle all models: dimensional, relational, and tabular. For this post we’ll talk about the tabular model. What you’ll find is the differences between setting up the tabular security is not that much different than the dimensional model. Let’s take a look.
Setting up Roles in a Tabular Model
Roles in the tabular world are similar to those in SSAS except tabular uses the DAX language instead of MDX. Roles are defined within the SSDT tool and DAX filters are defined, much like SSAS, at the dimension level. All measures must be defined beforehand in the tabular project. Once the role is created and defined you then deploy the role to the tabular instance in order for the role to take effect. You do have the option to define roles in SSMS but these roles would get overwritten once a new project is deployed.
We are still interested in creating two roles. One role will be specific to the CustomData value used by Performance Point and the other will be used exclusively for the EffectiveUserName and used by every other tool. The first step is to create a role in your BISM model. To do this you click on the role icon in the toolbar. Once you open the role tab you should see a role called Administrator. You will need to create two new roles. As in previous examples, we’ve created a role called CustomData and a role called EffectiveUserName. Both roles only need Read permission.
After you create the role you now need to define the filter for the dimension to which you want to apply the security. Starting with the CustomData role we will add the following DAX filter to the DIM_CLIENT dataset:
=DIM_CLIENT[CLIENT_DIM_KEY]=LOOKUPVALUE(FACT_USER_CLIENT[CLIENT_DIM_KEY],DIM_USER[USER_NAME], CUSTOMDATA(), FACT_USER_CLIENT[CLIENT_DIM_KEY], DIM_CLIENT[CLIENT_DIM_KEY])
Luckily we find that both MDX and DAX use the same CUSTOMDATA() filter for CustomData. You will also notice that the filter is applied on the DIM_CLIENT dimension in the model. This is defined in SSDT and then processed up to the server.
Next we want to apply the filter for the EffectiveUserName role. The process is exactly the same only we use the USERNAME() function in the DAX filter:
=DIM_CLIENT[CLIENT_DIM_KEY]=LOOKUPVALUE(FACT_USER_CLIENT[CLIENT_DIM_KEY],DIM_USER[USER_NAME], USERNAME(), FACT_USER_CLIENT[CLIENT_DIM_KEY], DIM_CLIENT[CLIENT_DIM_KEY])
Remember that no members are needed for the CustomData role because the role is defined in the Performance Point connection. You will need to add the account specified
in the data connection to the EffectiveUserName role in order to capture that account as they enter the system. You will also want to add individual user accounts to the role because this would capture any users connecting through an Excel client.
Configuring the Data Source for Power View
Power View uses the EffectiveUserName model which means any users using Power View will need to be a member of a role in the tabular model. This also means a connection account must be defined in the connection string this account needs to be given administrator rights to the tabular instance.
Provider=MSOLAP.5;Integrated Security=SSPI;Persist Security Info=True;Data Source=server07\sql2012astabular;Initial Catalog=TestTabular
The tablular instance is server07\sql2012astabular and the tabular cube is called TestTabular. The access account is listed in the connection string properties as a stored credential. You will also need to check both the Use as Windows Connection and the Set execution context to this account options. Unfortunately the account listed in the connection string must be an administrator on the tabular instance. Still, the password is hidden the connection and, as administrator, you can create data connections ahead of time.
Though not necessarily the simplest thing to do – there is a means to have dynamic security model that serves all BI models and tools. If you want to dig even deeper and stretch this model out beyond the current limitations then I suggest reading the paper Analysis and Reporting Services, Design and Implementation: A SQL Server 2008 Technical Case Study. One of the current limitations is that the security is only updated when the cube or tabular model is processed. This can be a significant restriction, especially if changes are frequent. I would also recommend looking into developing a means to manage the security tables by using a custom administrative GUI of some nature. Still, for quick and dirty dynamic security this will do the job.
In my previous blog post http://dbaexperience.com/2013/01/24/sharepoint-2010-sql-server-2012-bi-dynamic-security-part-1/ I walked through how to create a simple security model. Granted, the model is basic and simplistic but with a little ingenuity you should be able to model out some fairly complex hierarchies.
The next step is to create the roles. Remember that any access to a BISM or OLAP system is through role security. The trick is to have roles capture the correct login account and then leverage the security model to filter the data. I’ll create two roles. For illustrative purposes I’ll call one CustomData and the other EffectiveUserName. CustomData and EffectiveUserName are key concepts for dynamic security. Both are metadata in the SharePoint data connection for BISM, relational and OLAP.
For example, when you connect to Performance Point using an unattended service account the connection you would see in a trace would be the service account. Though the connection logs in as the service account there is additional connection detail showing the <CustomData> value as the account logged into SharePoint. So if UserA logs into SharePoint and runs PerformancePoint, a connection will be made with ServiceAccountA but the CustomData value will be <CustomData>UserA<CustomData>. CustomData is simply a string value. If the user is a domain user than the CustomData value will contain the domain name\username, otherwise; the value would only contain a name. As of SharePoint 2010, only Performance Point uses the CustomData value. Everything else uses EffectiveUserName.
Like CustomData, EffectiveUserName is metadata contained in the connection. The difference is whereas CustomData is a string value, EffectiveUserName is an Active Directory Account. It will always have a domain\username value.
The reason we need two roles is our filters must use one function to access CustomData and another function to access EffectiveUserName. This means we can’t have two filters in the same attribute for the same role. This restriction will change in SharePoint 2013 when everything will use EffectiveUserName. We can avoid PPS from using the EffectiveUserRole because we specify the role to use in the data connection. The CustomData role will have the accounts as members of the role as well as the service account. Why do you need the individual users included if everything is connecting via the service account? You need the individual users in case anyone is accessing the instance through an Excel client.
Now let’s get to the good stuff. Each role in SSAS and BISM will have a filter added to the dimension you are trying to restrict access to. SSAS will have a MDX filter while BISM will have a DAX filter. Luckily they both use the same functions to access either CustomData or EffectiveUserName. For my example using MDX for CustomData I used the following filter:
exists([Client].[Client Name].members,strToMember(“[User].[User Name].&[“ + CUSTOMDATA() + “]”), Fact_User_Client”)
I don’t claim to know much about MDX and, in this case, I definitely stepped on the shoulders of those who came before. The two things to notice is the CUSTOMDATA() function and the [User].[User Name] syntax. The first is the function to access the connection string while the second is the hierarchy representing the user name in the DIM_USER table. This filter will grab the value of CustomData in the connection string, search the DIM_USER table for a value, lookup this value in the FACT_USER_CLIENT bridge table and then bring back only the clients associated with that user. If the user is not in the table then the user sees nothing. Keep in mind this in the filter used only for SSAS accessed via PPS. Keep in mind that everything else will use EffectiveUserRole.
The MDX for the EffectiveUserName will be eerily similar. Here is the syntax:
exists([Client].[Client Name].members,strToMember(“[User].[User Name].&[" + USERNAME() + "]“),”Fact User Client”)
The astute reader will notice that the only difference is the USERNAME() function. This function is what grabs the EffectiveUserName value from the connection metadata.
Everything else is the same which makes our job much easier. Both filters are placed in the Data Dimension tab. For additional information concerning SSAS role security see the following link http://msdn.microsoft.com/en-us/library/ms174840(v=sql.110).aspx.
So now you have your SSAS roles created and your MDX filter included in each role. As long as you have an Active Directory user logging into SharePoint and that user account is added to the DIM_USER dimension and the bridge table as <domainname\username> format then the data they see will be filtered. This will work for any tool accessing the multi-dimensional model. If you are not using PPS or if you are using PPS in SharePoint 2013, then you do not need the CustomData role.
In the next part I will discuss how to implement the same security model in the BISM model. This will allow you to use the security tables to filter on PowerView and PowerPivot using DAX filters.
I recently installed SQL Server 2012 with SSIS onto my Windows 8 laptop. When trying to connect with an administrative account using SSMS I received the following Access Denied error:
This threw me back because I had full admin rights and I’ve never had this error before. I could successfully connect to the database engine, SSAS, and SSRS. There were some postings about changing rights in the COM objects for MSDTC but this didn’t work for me. Digging deeper I found this in the Event Log:
This lead me to think there was some validity to the COM solution. Within component services (you can find the component services in the Administrative Tools section) you should see a component called Microsoft SQL Server Integration Services 11.0.
Right click on the service and select Properties. Click on the Security tab and you should see the following options:
Click the Edit button on the Launch and Activation Permission section. Add your account, or whichever account is accessing SSIS, and make sure it has both the Local Launch and Local Activation selected.
That’s it. You should be able to connect.
Almost every BI project I’ve worked on encounters the same roadblock. The customer demands per user row-level security but Kerberos is a dysfunctional beast that no one can get working. Recently I was given the unique opportunity of addressing this security question head on. The project wasn’t just concerned with accessing SSAS through SSRS or Excel but instead, the project required that all tools (Power Pivot, Power View, SSRS, Excel clients, and Performance Point) be able to connect to their respective relational, multi-dimensional, and\or tabular model via SharePoint. Every tool had to provide row-level security based on a user login without using Kerberos. This meant that even data connections using an unattended service account still had to provide row-level security based off the user account connected to SharePoint.
We all know that SSAS and tabular models use role-based security. Normal procedure is to add the Active Directory user account or AD group to the role and then filter dimension attributes in the role properties. That’s all fine and dandy if the account logging into the instance is the account that is in the role or being impersonated by another account with that ability. But what if all access to the cube is with a service account without Kerberos? This means that all connections are made using a single account. How do you filter data per account when all connections are using the same account? This blog will walk you through how to do it. The information in this blog is a compilation of a few very good blogs which I will link to, but this is also an attempt to consolidate the solution into a single blog which will cover all the scenarios.
It’s too much information for one entry so I’ll break it up into multiple blog postings. The ultimate purpose is to show that you don’t need Kerberos in order to implement dynamic security in your BI\SharePoint projects. I’ll show the basic framework of the solution. How you expand it or what functionality you ultimately add is up to you.
First thing first. We need to create the security model. This part is well-documented. A primary source is a wonderful overview by Nick Barclay http://nickbarclay.blogspot.com/2008/01/pps-data-connection-security-with.html. He also provides links to other blogs which are very helpful in getting started. In addition to Nick there is a blog by Devin Knight which also illustrates these points http://www.bidn.com/articles/ssas-administration/134/analysis-services-dynamic-security.
At this stage we’re just going to create some tables. I’ll discuss later how these tables will be used. Just know for now that you have to have a means to capture the user name and associate the user name with what the user is allowed to see. Since we’re database folks it seems only fitting that we use a database to capture this relationship. In my model I will filter what the user sees based off a company name.
You only need to create two tables. The first table I’ll call DIM_USER. This table contains all the user names along with a unique surrogate key and any other user attribute information you want. I’ll keep it simple and just have two columns: user_key and user_name. The next table is actually a factless fact table. You can also refer to it as a bridge table. This table will be the intersection between the DIM_USER table and any other dimension tables you want to filter. In this example we will filter on a table called DIM_CLIENT. You could have as easily filtered on a products table or employees table. It only contains two columns. One column is the unique user key from the DIM_USER table and the other column is the unique client key from the DIM_CLIENT table. I’ll call the bridge table FACT_USER_CLIENT. You may want to call it something more generic if it will contain multiple dimension filter keys.
Finally, you will want to create these tables in your SQL Server data warehouse. This would be the same data warehouse which you use for your SSAS DSV and your tabular model. Eventually you will process these tables along with your cube or tabular instance. There is a more complicated, yet robust, method which stores these tables in the relational model which I’ll show later, but we’re getting ahead of ourselves.
The above model, though simple, will be the basis for all of your filtering. Keep in mind that each user accessing your BI environment will need to have an entry in the DIM_USER table. In addition, the accounts must be Active Directory accounts. Luckily, you only need to make these changes in one place and the filter will apply to all the tools. In the next blog post I’ll explain how to setup the roles in SSAS and BISM which will query the tables and apply the filtering through DAX and MDX.