Information Management Part 2: Myths and Facts
By Lalitha Chikkatur, business intelligence consultant
As information
management is gaining acceptance, good governance is
becoming even more critical for the whole process of
retrieval, acquisition, organization and maintenance
of information. The crucial factor in information
and decision process analysis is an improved
design-thinking attitude. Only when decision-makers
use a good process and methodology for making
decisions under limited circumstances can people’s
information management needs and desires be made
technologically feasible. Because all the
information ultimately is managed by individuals,
wherever there is human intervention conflicts
between facts and myths exist. I have come across
eight myths in information management. Here are the
last four:
Myth #5: Disaster recovery plans are only for
financial firms dealing with critical data.
The fact is disaster recovery is for every business
running mission-critical applications and expecting
continuity of the business should a disaster occur.
One of the clients from a mid-sized organization
said when I proposed of having a DR solution, “We
don’t need a DR solution or strategy, because we
don’t run any critical information.” I thought that
was very interesting, and my next question was, “Do
you care if your systems are on fire today, or
doesn’t it matter to the business?” Obviously it did
for them, but their thinking was, why should one
waste money on something which may or may not
happen? This line of thinking might be true only if
the information that they are losing does not matter
to them in running their business.
I always keep a backup of my personal data. For me,
that data is very critical, and I always take
necessary preventive steps just in case. I am sure
most of us do this. So if we can’t afford to lose
our personal data, I don’t think any company or
business ever can afford to lose their data. In
short, everyone requires some kind of disaster
recovery plan. It can be as simple as a backup
in and external drive or implementing hot sites. The
magnitude depends on how much each business can
afford to wait or lose data should a disaster occur.
The disaster recovery started being recognized as a
capability for a data warehouse only over the last
couple of years - to be precise, when business
started capturing transactions and making decisions
in real time. Today, multiple implementation
scenarios and architectures can be tailored to
individual choice and solution requirements. There
is always a trade-off between cost and benefit while
designing a DR solution. It can range from having
hot backups in a remote site to just shipping tapes
overnight to the DR center. For matured progressive
DW systems, careful capacity planning and design can
be done so that the DR systems are utilized every
day and can act as a DR center when a disaster
happens. By architecting it carefully, you can make
the DR investment efficient to get performance in
your day-to-day business and avoid making the DR
systems simply wait for a disaster to occur. Once
the criticality is defined, i.e., once a business is
able to identify “must” information from “routine”
information, a careful strategy can be planned by
identifying the service levels around it.
Myth #6: A relational database model is the best
data model for all decision support systems, and the
dimensional data model works only with specific
subject areas and domains.
A relational model, which is an ER model, is a
design technique which addresses the relationship
between data elements at the most granular level.
This is a perfect technique for transaction
processing systems, whereas the concept of
dimensional modeling came into existence to address
end-user queries and retrieval of data. The purpose
of both these models, however, is quite different,
and each of them should be used very carefully in
their respective places and scenarios. The highest
level connection between both of the design
techniques which I can think of is that perhaps an
ER model can be broken down into various dimensional
models. Because the nature of the BI/DW systems
are to address end-user analysis and queries using
more recent and historical data, the dimensional
model design technique is the only suitable
technique to achieve this.
Lots of consultants claim that relational model is
the best way to deal with data getting into the data
warehouse. More emphasis is given to the data
warehouses that are being built for the first time.
The argument I have heard for this approach is that
“because you don’t know what you want to do with the
data warehouse, it’s a good idea to build it with
the relational model because it eliminates
redundancy rather than to keep adding layers, which
demoralize the data when required.” My take on this
kind of argument is, if you don’t know what you
would want to do with your data warehouse, it’s high
time you go back and figure it out with the business
sponsors. Then decide about modeling. On the
technical side, ETL loads for data warehouses
require the biggest chunk in terms of effort and
time. With a relational model, even to load a few
GBs might take a couple of hours. Apart from that,
scalability and associated issues can be well
understood automatically. Some amount of
normalization is seen depending on the complexity of
the data model, but claiming relational data model
is best for DW is not appropriate. Dimensional
modeling is the only powerful technique for overall
enterprise data warehouse. There are classic
examples in the industry with companies like P&G
that implemented dimensional modeling in building
their EDW back in the early ‘80s.
Myth #7: Data warehouse performance is generally
low if data is not summarized.
The general recommendation is to have optimal lowest
level operational data in the dimensional model for
a data warehouse. It’s a myth to think that a DW
performance is low because there is no summarized
data. Summarized or pre-aggregated data in a
warehouse without a specific goal or requirement can
be dangerous. One has to really understand the
appropriate situation and necessity for having
summarized data. There is no doubt that accessing
pre-aggregated data might be much faster in certain
defined scenarios, but one should not forget that
having a summarized or a pre-aggregated view is only
a performance-tuning method and not an architecture
replacement in the dimensional model. For standard
reports where the report requirements are predefined
and there is no scope for ad hoc reporting from
users, it might be a good idea to have summarized
tables. If the user is performing ad hoc reporting,
it’s always better to have the lowest level of
detail to avoid reaching dead ends while trying to
address some new business requirements.
Myth #8: Data models for the data warehouses
should be built as “generic.”
We have come across many instances where products
and their vendors promise to deliver generic data
model having generic business rules. I wonder how
this could give any efficient value to the data
warehouse delivery. Providing the most granular
level of data in data models often helps to handle
all the business questions that are not predefined.
But still, having lowest grain does not qualify to
have a generic data model, meaning, any application
can derive its results from one generic data model.
You can scale the data model to use the conformed
data and define the BI application with specific
business rules around it. Data models should
always be built customized to applications.
Having an application-oriented data model is
important to derive metrics related to that
application. These metrics, like profit-loss and
cost-benefit, cannot be queried and/or calculated on
the fly, and that’s the reason the ETL prepares the
data in the backend and presents it to the reporting
tools to utilize it. In thinking of creating generic
data models or business rules, one has to clearly
understand that the calculations and operating load
come only on the reporting front. The reporting tool
will never be able to handle that kind of load,
because it is not designed for that. It’s always
beneficial to assess the application and its
delivery requirements to make the data model
application specific, taking enough care to give
space for scalability when required.
Information is power and information management is
more like a behavioral science theory of management
where the critical factor in making decisions lays
with the individual’s limited ability to process
information and to make decisions under limitations.
New information management techniques and
technologies will keep emerging and existing ones
will keep undergoing a transformation, which is
normal. But the only crucial factor that
differentiates between the “intelligent” decisions
and “just” decisions is the capability of the
information management leaders. The demand for
information management professionals will continue
to increase in the foreseeable future. The leaders
who can successfully control the power of
information will be able to harness right value from
the unlimited information already existing as well
as from the generated data. These leaders should be
able to clearly differentiate between the objective
realities (facts) and fabulous statements (myths) in
every single data point in the information
management domain.
If you missed the first four myths & facts presented
in our Jan issue click here!
---Source: DM Review Special
Report, December 30, 2008. Lalitha Chikkatur is a
business intelligence consultant. She can be reached
at lchikkatur@gmail.com.
|