Design and Create cube by identifying measures and dimensions for Star Schema, Snowflake schema and fact Constellation Schema

Practical - 2
Design and Create cube by identifying measures and dimensions for Star Schema, Snowflake schema and fact Constellation Schema.

Download Practical

Software Required: Analysis services- SQL Server-2008.

1) Creating a New Analysis Services Project
To create a new Analysis Services project, follow these steps:
1.       Select Microsoft SQL Server 2008 SQL Server Business Intelligence. Development Studio from the Programs menu to launch Business Intelligence Development Studio.
2.       Select File  New Project. 
3.       In the New Project dialog box, select the Business Intelligence Projects project type.
4.       Select the Analysis Services Project template.
5.       Name the new project and select a convenient location to save it.
6.       Click OK to create the new project.


Figure 1- New Analysis Service Project

2) Defining a Data Source
To define a data source, use the Data Source Wizard. Launch this wizard by right-clicking on the Data Sources folder in new Analysis Services project
To define a data source for the new cube, follow these steps:

1.      Right-click on the Data Sources folder in Solution Explorer and select New Data Source.
2.      Read the first page of the Data Source Wizard and click Next.
3.      You can base a data source on a new or an existing connection. Because you don’t have any existing connections, click New.
4.      In the Connection Manager Dialog box, select the server containing your analysis services sample database from the Server Name combo box.
5.      Fill in your authentication information.
6.      Select the Native OLE DB\SQL Native Client provider (this is the default provider).
7.      Select the AdventureWorksDW database. Figure shows the filled-in Connection Manager Dialog box.
8.      Click OK to dismiss the Connection Manager Dialog box.
9.      Click Next.
10.  Select Default impersonation information to use the credentials you just supplied for the connection and click Next.
11.   Accept the default data source name and click Finish.


Figure 2- Setting up a connection
3) Defining a Data Source View
A data source view is a persistent set of tables from a data source that supply the data for a particular cube. BIDS also includes a wizard for creating data source views, which can be invoke by right-clicking on the Data Source Views folder in Solution Explorer.
To create a new data source view, follow these steps:
1.      Right-click on the Data Source Views folder in Solution Explorer and select New Data Source View.
2.      Read the first page of the Data Source View Wizard and click Next.
3.      Select the Adventure Works DW data source and click Next.
4.      Select the dbo.FactInternetSales table in the Available Objects list and click the  button to move it to the Included Object list. This will be the fact table in the new cube.
5.      Click the Add Related Tables button to automatically add all of the tables that are directly related to the dbo.FactInternetSales table. These will be the dimension tables for the new cube. Figure 3 shows the wizard with all of the tables selected.
6.      Click Next.
7.      Name the new view and click Finish. BIDS will automatically display the schema of the new data source view, as shown in Figure.


Figure 3- Selecting tables for the data source view 
Analysis Services

Figure 4- The data source view

4) Invoking the Cube Wizard
To invoke the Cube Wizard, right click on the Cubes folder in Solution Explorer. The Cube Wizard interactively explores the structure of data source view to identify the dimensions, levels, and measures in your cube.
To create the new cube, follow these steps:
1.      Right-click on the Cubes folder in Solution Explorer and select New Cube.
2.      Read the first page of the Cube Wizard and click Next.
3.      Select the option to build the cube using a data source.
4.      Check the Auto Build checkbox.
5.      Select the option to create attributes and hierarchies.
6.      Click Next.
7.      Select the data source view and click Next.
8.      Wait for the Cube Wizard to analyze the data and then click Next.
9.      Accept the default measures and click Next.
10.  Wait for the Cube Wizard to detect hierarchies and then click Next.
11.  Accept the default dimension structure and click Next.
12.  Name the new cube and click Finish.


Figure 5 – Invoking Cube wizard

5) Deploying and Processing a Cube
To deploy the cube, select Build  Deploy AdventureWorksCube. This will deploy the cube to local Analysis Server, and also process the cube, building the aggregates. BIDS will open the Deployment Progress window, as shown in Figure.

Figure 5- Deploying a cube

6) Exploring a Data Cube
To see the data in the, follow these steps:

1.      Right-click on the cube in Solution Explorer and select Browse.
2.      Expand the Measures node in the metadata panel (the area at the left of the user interface).
3.      Expand the FactInternetSales node.
4.      Drag the different amount measures and drop it on the Totals/Detail area.
5.      Expand the Dim currency node in the metadata panel.
6.      Drag the Currencykey property and drop it on the Row Fields area.
7.      Expand the Order Date node in the metadata panel.
8.      Drag the Order Date. Time key hierarchy and drop it on the Column Fields area.
9.      Expand the Dim Sales Territory node in the metadata panel.
10.  Drag the Sales Territory key property and drop it on the Filter Fields area.
11.  Click the dropdown arrow next to Sales Territory key; uncheck all of the checkboxes except for the one next to 1.


Figure 6- Exploring cube data in the cube browser

7) Star Schema:
Ø  The star schema architecture is the simplest data warehouse schema.
Ø  It is called a star schema because the diagram resembles a star, with points radiating from a centre. The centre of the star consists of fact table and the points of the star are the dimension tables. Usually the fact tables in a star schema are in third normal form (3NF) whereas dimensional tables are de-normalized.
Ø  Despite the fact that the star schema is the simplest architecture, it is most commonly used nowadays and is recommended by Oracle.


Fact Tables
Ø  A fact table typically has two types of columns: foreign keys to dimension tables and measures those that contain numeric facts.
Ø  A fact table can contain fact's data on detail or aggregated level.
Dimension Tables
Ø  A dimension is a structure usually composed of one or more hierarchies that categorizes data. If a dimension hasn't got a hierarchies and levels it is called flat dimension or list.
Ø  The primary keys of each of the dimension tables are part of the composite primary key of the fact table. Dimensional attributes help to describe the dimensional value.
Ø  They are normally descriptive, textual values. Dimension tables are generally small in size then fact table.
Ø  Typical fact tables store data about sales while dimension tables data about geographic region (markets, cities), clients, products, times, channels.

The main characteristics of star schema:
Ø  Simple structure -> easy to understand schema
Ø  Great query effectives -> small number of tables to join
Ø  Relatively long time of loading data into dimension tables -> de-normalization, redundancy data caused that size of the table could be large.
Ø  The most commonly used in the data warehouse implementations -> widely supported by a large number of business intelligence tools.

Figure 7- Star Schema
8) Snowflake Schema:
Ø  A snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape.
Ø  The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions.
Ø  Star and snowflake schemas are most commonly found in dimensional data warehouses and data marts where speed of data retrieval is more important than the efficiency of data manipulations. As such, the tables in these schemas are not normalized much, and are frequently designed at a level of normalization short of third normal form.
Ø  Normalization splits up data to avoid redundancy (duplication) by moving commonly repeating groups of data into new tables. Normalization therefore tends to increase the number of tables that need to be joined in order to perform a given query, but reduces the space required to hold the data and the number of places where it needs to be updated if the data changes.


Figure 8- Snowflake Schema

9) Fact constellation Schema:
Ø  Fact constellation is a measure of online analytical processing, which is a collection of multiple fact tables sharing dimension tables, viewed as a collection of stars.
Ø  This is an improvement over Star schema.
Ø  For each star schema it is possible to construct fact constellation schema(for example by splitting the original star schema into more star schemes each of them describes facts on another level of dimension hierarchies).
Ø  The fact constellation architecture contains multiple fact tables that share many dimension tables.
Ø  The main shortcoming of the fact constellation schema is a more complicated design because many variants for particular kinds of aggregation must be considered and selected.
Ø  Moreover, dimension tables are still large.



Figure 9- Fact constellation Schema

Comments

Popular posts from this blog

Study of DB Miner Tool

Study of WEKA tool

Create calculated member using arithmetic operators and member property of dimension member