Data Warehousing Essay, Research Paper
Abstract
Data warehousing is the technology trend most often associated with enterprise
computing in today’s business environment. The data warehouse, in fact, is a
culmination of new developments in database technology, including entity-relationship
modeling, heuristic searches, mass data storage, neural networks, multiprocessing, and
natural-language interfaces. The data warehouse is a centralized, integrated repository of
information, one which can provide a vital competitive edge for corporate decision-
making and product development. Types of data warehouses include the operational data
store (ODS), the data mart, which is of great value in analyzing sales information, and the
enterprise data warehouse, which can take either a centralized or distributed approach,
PC Week, Feb 8, 1999.
Data Warehouses
The type of data warehouse an organization adopts should depend on the way the
business operates and the types of decision support it needs. One of the simplest types of
data warehouse, an operational data store (ODS) is a replicated production database that
has been adjusted for errors. An ODS is used primarily to generate standard operations
reports and to provide transaction detail for summary-level analysis. Since an ODS
replicates an OLTP system, some experts do not consider it a true data warehouse type.
However, because ODS fit the broad definition and many data warehouses contain
them. Depending on an organization’s reporting needs, an ODS may be updated
monthly, weekly, or more frequently, sometimes almost in real time (PC Week, Feb 8
1999). The main advantage is that it enhances production system performance, since
reporting and query functions are off-loaded from the OLTP system to the ODS (PC
Week, Feb 8 1999).
Another type of data warehouse is the data mart. Data marts are limited in scope,
usually taking their information from a single department or business process. Data marts
may be used for analyzing sales information in a specific region or for a particular
product line, for example. Data marts usually contain only summary data but they can be
linked to operational data stores for drilling down to transaction details if necessary. Data
marts can be managed by IT departments, but only as often as they are managed directly
by users in a department or work group (PC Week, Feb 8 1999).
While many OLAP applications can be performed on data marts, cross-
departmental analysis, executive information systems, and data-mining applications need
information gathered from the entire enterprise to be most effective. The enterprise data
warehouse is used for this type of extensive data collection and analysis. Because of its
scope and complexity, the enterprise data warehouse is usually managed by the central I
group.
As its name implies, an enterprise data warehouse contains information taken
from throughout an organization. This is the most complex type of warehouse to build
and maintain, since data must be merged from multiple systems into common subject
areas (PC Week, Feb 8 1999).
Data-mining tools work with various statistical techniques for modeling data and
for estimating and predicting outcomes based on what they have learned. Data-mining
work best with large data sets (PC Week, Feb 8 1999).
Data Warehouse Components
Although a data warehouse sounds like a single entity, it is really a multi-tiered,
multi-application conglomerate that comprises several components. Each component may
be handled by one or more pieces of hardware or software. No vendor has a complete
data warehouse package, (PC Week, Feb 8 1999).
Functionally, a data warehouse extracts data from operational systems and loads
it into a holding area where it is “scrubbed”, which means it is made to conform with
warehouse standards. Then the data will be merged, time-stamped and dated in the right
order, and loaded into databases for use by data access tools. Since the data goes through
a number of transformations and it is ultimately placed in data structures different from
the ones it came from, those changes are mapped in catalogs or dictionaries. Such
catalogs are managed with metadata tools. Data that defines or describes data in the
warehouse is called metadata. There are typically two kinds of metadata. Information that
users need to know, such as table and column names and definitions, are called frontend
metadata. Everything else, such as how a particular data element maps to its original
database, is backend metadata.
Security.
Security considerations for data warehouses are different from those for OLTP
systems. For a data warehouse to pay for itself, lots of users have to be able to benefit
from it, and therefore more users will need access to data than are traditionally authorized
by OLTP security (Computerworld, Feb 15,1999).
Conclusion
Today, many corporations come to appreciate that the information they gather
each day is an asset, they will rely more and more on data warehousing. But while a data
warehouse can provide managers with the means to ask questions of their data and get
back meaningful answers, it can not automatically make a company more profitable.
“Good technology can not substitute for good management.”
References
1. Computerworld, Feb 15, 1999 p14, “Human side key to data warehousing.”
By: Stewart Deck.
2. PC Week, Feb 8 1999 v16 i6 p56, “Data Warehouses in Need of Renovation.”
By: John Taschek.
3. PC Week, Feb 8, 1999 v16 i6 p71, “100 Top Data Warehousing Leaders Step Up.” By: Jeff Moad.