09/03/2023
Using Microsoft olapR to Bring Analysis Services OLAP Data Into R
Online Anaytical Processing (OLAP) and the R statistical language are both children of the 1990s. However, for much of the intervening time they inhabited different worlds. OLAP, and its query language MDX, lived in the world of databases and queries while R was the domain of statisticians. Using R to analyze data from an OLAP database tended to be awkward at best.
As a part of its relatively newfound commitment to R and machine learning, Microsoft created an R package expressly designed to obtain data from MS Analysis Services.
The olapR package is part of Microsoft's free R Client distribution of R. (Of course, it is also included in the licensed Machine Learning Server.) As a result, there is no need to install olapR from a repository. When you load the package using the library(), you should notice that there is a warning message that the olapR package contains a function axis() that will mask the axis() function in the default graphics package. After loading olapR, you must type "graphics::axis" is you wish to use the graphics function.
> library(olapR)
Attaching package: 'olapR'
The following object is masked from 'package:graphics'::axis
Once the olapR is loaded, its use is just as you might expect, with one additional twist. You define a connection string and use the connection string to create a connection object. The connection is not immediately opened, however. When you execute your R query code, a connection is opened, the query executed on the server, and the connection immediately closed. There is no need to explicitly close the connection when you have your results.
The twist, of course, is that in contrast with SQL queries, MDX queries can potentially create resultsets (often called cellsets in the world of OLAP) that consist of three or more dimensions. For this reason, olapR provides two distinct "execute" functions. execute2D() obtains the results for MDX queries that return a two-dimensional set of rows and columns. The executeMD() function returns resultsets from queries of two, three, or more dimensions. Not surprisingly, execute2D() returns results as standard R dataframe. Because dataframes are two-dimensional structures, executeMD() must return its results as a matrix object. Even if you execute a two-dimensional query using executeMD, you will obtain a two-dimensional array (i.e. "matrix) as a result.
Executing OLAP Queries
Now let's take a look at some code.
cnnstr <- "Data Source=localhost\\datamining; Provider=MSOLAP;"
ocs <- OlapConnection(cnnstr)
mdx <- "SELECT {[Measures].[Internet Order Count],
[Measures].[Internet Sales Amount]} ON COLUMNS,
{[Product].[Product Line].MEMBERS} ON ROWS
FROM [Adventure works]
WHERE [Sales Territory].[Sales Territory Country].&[Australia];"
result01 <- execute2D(ocs, mdx)
Note that a named instance of Analysis Services is specified in the connection string; we must remember to "escape" the backslash when creating character strings in R.
This query yields the following dataframe:
> result01
[Product].[Product Line].[Product Line].[MEMBER_CAPTION] [Measures].[Internet Order Count] [Measures].[Internet Sales Amount]
1 Accessory 3591 127128.61
2 Components NA <NA>
3 Mountain 2333 2906994.4486
4 Road 3348 5029120.4058
5 Touring 703 997757.12
Automatic MDX
Microsoft seems to be concerned that many analysts and data scientists will be put off by the intricacies of the MDX query language. They have therefore included in olapR capabilities to let the R analyst specify the data he or she wants and generate the necessary MDX query automatically. To be sure, there are many needs for which the MDX query must be crafted manually, but for most "plain vanilla" MDX queries automatic MDX generation works perfectly well. Here is the R code which will automatically generate the same MDX query that we executed above:
> cnnstr <- "Data Source=localhost\\datamining; Provider=MSOLAP;"
> ocs <- OlapConnection(cnnstr)
>
> qry <- Query()
> cube(qry) <- "[Adventure Works]"
> columns(qry) <- c("[Measures].[Internet Order Count]", "[Measures].[Internet Sales Amount]")
> rows(qry) <- c("[Product].[Product Line].[Product Line].MEMBERS")
> slicers(qry) <- c("[Sales Territory].[Sales Territory Country].[Australia]")
>
> result1 <- execute2D(ocs, qry)
In this code, a query object is instantiated and the reference to this object is used to create columns, rows, slicers, and so forth. If you wish, you can use the SQL Server Profiler tool to examine the MDX that was submitted to the Analysis Services multidimensional engine.
Conclusion
The olapR library is simple and easy-to-use and provides a useful way to bring Analysis Services OLAP data into R data structures.
Have you explored our collection of SQL Server Training? Get the results you're looking for here!
This piece was originally posted Jul 25, 2018 and has been refreshed with updated links and formatting.