Overview of SQL Server 2008 Databases and analysis services

Practical - 1
Overview of SQL Server 2008 Databases and analysis services.

Download Practical

Data Mining
Data mining is the process of finding anomalies, patterns and correlations within large data sets to predict outcomes. Using a broad range of techniques, you can use this information to increase revenues, cut costs, improve customer relationships, reduce risks and more.
Importance
Data mining allows you to:
·         Sift through all the chaotic and repetitive noise in your data.
·         Understand what is relevant and then make good use of that information to assess likely outcomes.
·         Accelerate the pace of making informed decisions.
SQL Server 2008 Analysis Services
·         SQL Server Analysis Services (SSAS) is the technology from the Microsoft Business Intelligence stack, to develop Online Analytical Processing (OLAP) solutions. SSAS is used to create cubes using data from data marts / data warehouse for deeper and faster data analysis. 

Microsoft SQL Server Services
·        Analysis Services is an analytical data engine used in decision support and business analytics, providing the analytical data for business reports and client applications.
·         SSAS is used as a tool by organizations to analyse and make sense of information possibly spread out across multiple databases, or in disparate tables or files. 
·         Microsoft has included a number of services in SQL Server related to business intelligence and data warehousing. These services include Integration Services, Reporting Services and Analysis Services.
·         Analysis Services includes a group of OLAP and data mining capabilities and comes in two flavours:
    • Tabular Mode - Implement in-memory relational data modeling constructs (model, tables, columns, measures, hierarchies).
    • Multidimensional and Data Mining Mode - Implement OLAP modeling constructs (cubes, dimensions, measures).
·         Analysis Services has sophisticated tools to simplify the process of designing, creating, maintaining, and querying aggregate tables, which it then stores in its extremely efficient proprietary structures. Managing aggregations has always been an extremely strong feature of Analysis Services.
·         Because of its ability to create strategic aggregations and thus avoid data-explosion issues, Analysis Services can handle extremely large, multi-terabyte data warehouses.
·         Speed comes from storing pre-calculated values. Querying a 100-million-row table for a grand total is going to take much more time than querying a 100-row summary table.
·         It can store the data in a special format that is particularly efficient for storage and retrieval.
·         Analysis Services stores much more metadata than a relational data warehouse.
Analysis Services Tools
For the developer
·         Analysis Services provides Business Intelligence Development Studio (BIDS). This is actually a version of Visual Studio 2008 with business intelligence designers installed instead of designers for C# or VB.NET.
·         Within BIDS, you can have multiple developers working on different parts of a single project, deploying the Analysis Services application to development, test, or production servers as appropriate.
For the administrator
·         Analysis Services provides SQL Server Management Studio (SSMS). This
is another version of the Visual Studio environment configured to enable you to administer SQL Server, Analysis Services, Reporting Services, and Integration Services.

·         In SSMS you can modify a limited set of database properties, but you can't make design changes.
Microsoft SQL Server 2005 Analysis Services (SSAS) includes the following algorithms:
  • Microsoft Decision Trees Algorithm 
The algorithm makes predictions based on the relationships between input columns in a dataset. It uses the values, or states, of those columns to predict the states of a column that you designate as predictable.
  • Microsoft Clustering Algorithm
The algorithm uses iterative techniques to group cases in a dataset into clusters that contain similar characteristics. These groupings are useful for exploring data, identifying anomalies in the data, and creating predictions.
  • Microsoft Naive Bayes Algorithm 
The algorithm calculates the conditional probability between input and predictable columns, and assumes that the columns are independent. This algorithm is less computationally intense than other Microsoft algorithms, and therefore is useful for quickly generating mining models to discover relationships between input columns and predictable columns.
  • Microsoft Association Algorithm
This algorithm is useful for recommendation engines. A recommendation engine recommends products to customers based on items they have already bought, or in which they have indicated an interest. The rules that the algorithm identifies can be used to predict a customer's likely future purchases, based on the items that already exist in the customer's shopping cart.
  • Microsoft Sequence Clustering Algorithm
This algorithm is used to explore data that contains events that can be linked by following paths, or sequences. The algorithm finds the most common sequences by grouping, or clustering, identical sequences together. These sequences can take many forms, including:
·         Data that describes the click paths that users follow through a Web site. 
·         Data that describes the order in which a customer adds items to a shopping cart at an online retailer.

  • Microsoft Time Series Algorithm
The Microsoft Time Series algorithm is a regression algorithm provided by Microsoft SQL Server 2005 Analysis Services (SSAS) for use in creating data mining models to predict continuous columns, such as product sales, in a forecasting scenario. 
  • Microsoft Neural Network Algorithm (SSAS)
The Microsoft Neural Network algorithm creates classification and regression mining models by constructing a Multilayer Perceptron network of neurons. The Microsoft Neural Network algorithm is useful for analysing complex input data, such as from a manufacturing or commercial process, or business problems for which a significant quantity of training data is available but for which rules cannot be easily derived by using other algorithms.
  • Microsoft Logistic Regression Algorithm
The Microsoft Logistic Regression algorithm is a variation of the Microsoft Neural Network algorithm, where its setting will create a neural network model that does not contain a hidden layer, and that therefore is equivalent to logistic regression.
  • Microsoft Linear Regression Algorithm 
The Microsoft Linear Regression algorithm is a variation of the Microsoft Decision Trees algorithm, where the algorithm will never create a split, and therefore performs a linear regression. You can use linear regression to determine a relationship between two continuous columns.
Multidimensional Storage modes
Microsoft Analysis Services takes a neutral position in the MOLAP vs. ROLAP arguments among OLAP products. It allows all the flavours of MOLAPROLAP and HOLAP to be used within the same model.
Partition storage modes
·         MOLAP - Multidimensional OLAP - Both fact data and aggregations are processed, stored, and indexed using a special format optimized for multidimensional data.
·         ROLAP - Relational OLAP - Both fact data and aggregations remain in the relational data source, eliminating the need for special processing.
·         HOLAP - Hybrid OLAP - This mode uses the relational data source to store the fact data, but pre-processes aggregations and indexes, storing these in a special format, optimized for multidimensional data.
Dimension storage modes
·         MOLAP - dimension attributes and hierarchies are processed and stored in the special format
·         ROLAP - dimension attributes are not processed and remain in the relational data source.

Query languages
Data definition language (DDL)
·         DDL in Analysis Services is XML based and supports commands such as <Create>, <Alter>, <Delete>, and <Process>. For data mining models import and export, it also supports PMML(Predictive Model Mark-up Language).
Data manipulation language (DML)
·         MDX - for querying OLAP cubes
·         LINQ - for querying OLAP cubes from .NET using ADO.NET Entity Framework and Language Integrated Query
·         SQL - small subset of SQL for querying OLAP cubes and dimensions as if they were tables
·         DMX - for querying Data Mining models
·         DAX - for querying Tabular models

Conclusion
Analysis Services is simple enough for small, uncomplicated organizations and
powerful enough for large or complex organizations, allowing all types of organizations to add analytical power to their BI solutions. It provides tools for both developers and administrator, algorithms for easy analysis of data and much features.

Comments

Popular posts from this blog

Study of DB Miner Tool

Study of WEKA tool

Create calculated member using arithmetic operators and member property of dimension member