A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process.
Ø Subject-Oriented: It can be used to analyze a particular subject area. For example, "sales" can be a particular subject.
Ø Integrated: It integrates data from multiple data sources.
Ø Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.
Ø Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered.
Functional view of a data warehouse:
A data warehouse is a copy of transaction data specifically structured for query and analysis.
What it takes to implement a successful data warehouse project?
o How can the system grow as your data storage needs grow?
o Which RDBMS and hardware platform can handle large sets of data most efficiently?
ü Parallel Processing Support:
ü RDBMS/Hardware Combination: Because the RDBMS physically sits on the hardware platform, there are going to be certain parts of the code that is hardware platform-dependent. As a result, bugs and bug fixes are often hardware dependent.
ETL (Extraction, Transformation, and Loading) - Tool Selection
Ø Functional capability: This includes both the 'transformation' piece and the 'cleansing' piece.
Ø Ability to read directly from your data source:
Ø Metadata support: The ETL tool plays a key role in your metadata because it maps the source data to the destination, which is an important piece of the metadata.
OLAP - Online analytical processing:
(Multidimensional OLAP - make it easy for users to look at the data from multiple dimensions. ) and ROLAP (Relational OLAP):
Ø Ability to leverage parallelism supplied by RDBMS and hardware: This would greatly increase the tool's performance, and help loading the data into the cubes as quickly as possible.
Ø Performance: In addition to leveraging parallelism, the tool itself should be quick both in terms of loading the data into the cube and reading the data from the cube.
Ø Customization efforts: More and more, OLAP tools are used as an advanced reporting tool. In such cases, the ease of front-end customization becomes an important factor in the tool selection process.
Ø Security Features: Because OLAP tools are geared towards a number of users, making sure people see only what they are supposed to see is important.
Ø Metadata support: Because OLAP tools aggregates the data into the cube and sometimes serves as the front-end tool, it is essential that it works with the metadata strategy/tool you have selected.
Reporting Tool Selection
Ø Number of reports
Ø Desired Report Distribution Mode (email or over the browser )
Ø Ad Hoc Report Creation (Will the users be able to create their own ad hoc reports?)
ü Front End: Data is useless if all it does is sit in the data warehouse. As a result, the presentation layer is of very high importance.
ü Data source connection capabilities - Two types of data sources:
I. The relationship database
II. The OLAP multidimensional data source
ü Scheduling and distribution capabilities
ü Security Features:
ü Customization: Provide easy way to pre-set the reports to look exactly the way that adheres to the corporate standard
ü Export capabilities: Excel/CSV, PDF, XML etc.
Buy vs. Build decision
ü Cost
ü Implementation Time
ü Documentation
ü Functionality /Features
ü Tailored for the exact needs
ü Reliance on third-party
Ø User technical skills
Ø Requirements
Ø Available budget
Ø Time
Ø Support
Data Warehouse Team
Ø Project Manager
Ø DBA
Ø Technical Architect from the backend hardware/software to the client desktop configurations.
Ø ETL Developer
Ø Front End Developer
Ø OLAP Developer
Ø Trainer
Ø Data Modeler
Ø QA Group
No comments:
Post a Comment