May 20, 2010

Data Warehouse - datawarehousing project cycle

Requirement Gathering

· A list of reports / cubes to be delivered to the end users by the end of this current phase.

· Updated project plan that clearly identifies resource loads and milestone delivery dates.

Environment Setup

Set up a development environment and a production environment including different processes (such as ETL, OLAP Cube, and reporting).

Outcome: Hardware / Software setup document for all of the environments, including hardware specifications, and scripts / settings for the software

Data Modeling

A good data model will allow the data warehousing system to grow easily, as well as allowing for good performance. In data warehousing project, the logical data model is built based on user requirements, and then it is translated into the physical data model.

ü Identification of data sources.

ü Logical data model.

ü Physical data model

ETL Design Phase

This can easily take up to 50% of the data warehouse implementation cycle or longer. The reason for this is that it takes time to get the source data, understand the necessary columns, understand the business rules, and understand the logical and physical data models.

Ø Data Mapping Document

Ø ETL Script / ETL Package in the ETL tool


OLAP Cube/Report Design

Usually the design of the OLAP cube can be derived from the Requirement Gathering phase.

Ø Documentation specifying the OLAP cube dimensions and measures.

Ø Actual OLAP cube / report.


Front End Development (Web based) and Deployment Documentation


Report Development

ü User customization

ü Report delivery

ü Access privileges


Ø Report Specification Documentation.

Ø Reports set up in the front end / reports delivered to user's preferred channel.


Performance Tuning and Query Optimization:

1. ETL - Tune the ETL process as much as possible

2. Query Processing

Ø Understand how your database is executing your query

Ø Retrieve as little data as possible

Ø Store intermediate results - store the intermediate results in a temporary table

Query optimization strategies.

· Use Index

· Aggregate Table - Pre-populating tables at higher levels so less amount of data need to be parsed.

· Vertical Partitioning - Partition the table by columns. This strategy decreases the amount of data a SQL query needs to process.

· Horizontal Partitioning - Partition the table by data value, most often time. This strategy decreases the amount of data a SQL query needs to process.

· De-normalization - The process of de-normalization combines multiple tables into a single table. This speeds up query performance because fewer table joins are needed.

· Server Tuning - Each server has its own parameters, and often tuning server parameters so that it can fully take advantage of the hardware resources can significantly speed up query performance.

3. Report Delivery - Network traffic, Server setup and front-end code

Quality Assurance

Ø QA Test Plan

Ø QA verification that the data warehousing system is ready to go to production


Backup strategy


Rollout To Production- Go Live.

Take care to address the user education needs.


Production Maintenance

Consistent availability of the data warehousing system to the end users

Incremental Enhancements

Ø Change management documentation

Ø Actual change to the data warehousing system

Additional Observations

Ø Overall Implementation Time and effort

Ø How To Measure Success

Each sub section discussed above needs to address following this:

Ø Task Description -What needs to be accomplished?

Ø Time Requirement - Rough estimate of time.

Ø Deliverables - One or more documents which describe the steps and results of the task.

Ø Possible Pitfalls - Things to watch out for.

Recipes for Data Warehousing Project Failure

Ø Focusing On Ideology Rather Than Practicality

Ø Making The Process Unnecessarily Complicated

Ø Lack of Clear Ownership

Ø Not Understanding Proper Protocol - understand the organization protocol

Ø Not Fully Understand Project Impact Before The Project Starts

Ø Blindly Sticking To Certain Standards

Ø Bad Project Management


No comments: