Data Quality Tools, Mailing Software, Lists, NCOA, Data Enhancements
  | Shopping Cart Cart | Newsletters | Search
Call 1-800-Melissa     Products         Solutions       Downloads & Trials       Support          Resources         Lookups       Contact Us  


 News

 Information Management Part 2: Myths and Facts
    By Lalitha Chikkatur, business intelligence consultant

As information management is gaining acceptance, good governance is becoming even more critical for the whole process of retrieval, acquisition, organization and maintenance of information. The crucial factor in information and decision process analysis is an improved design-thinking attitude. Only when decision-makers use a good process and methodology for making decisions under limited circumstances can people’s information management needs and desires be made technologically feasible. Because all the information ultimately is managed by individuals, wherever there is human intervention conflicts between facts and myths exist. I have come across eight myths in information management. Here are the last four:

Myth #5: Disaster recovery plans are only for financial firms dealing with critical data.
The fact is disaster recovery is for every business running mission-critical applications and expecting continuity of the business should a disaster occur. One of the clients from a mid-sized organization said when I proposed of having a DR solution, “We don’t need a DR solution or strategy, because we don’t run any critical information.” I thought that was very interesting, and my next question was, “Do you care if your systems are on fire today, or doesn’t it matter to the business?” Obviously it did for them, but their thinking was, why should one waste money on something which may or may not happen? This line of thinking might be true only if the information that they are losing does not matter to them in running their business.

I always keep a backup of my personal data. For me, that data is very critical, and I always take necessary preventive steps just in case. I am sure most of us do this. So if we can’t afford to lose our personal data, I don’t think any company or business ever can afford to lose their data. In short, everyone requires some kind of disaster recovery plan. It can be as simple as a backup in and external drive or implementing hot sites. The magnitude depends on how much each business can afford to wait or lose data should a disaster occur.

The disaster recovery started being recognized as a capability for a data warehouse only over the last couple of years - to be precise, when business started capturing transactions and making decisions in real time. Today, multiple implementation scenarios and architectures can be tailored to individual choice and solution requirements. There is always a trade-off between cost and benefit while designing a DR solution. It can range from having hot backups in a remote site to just shipping tapes overnight to the DR center. For matured progressive DW systems, careful capacity planning and design can be done so that the DR systems are utilized every day and can act as a DR center when a disaster happens. By architecting it carefully, you can make the DR investment efficient to get performance in your day-to-day business and avoid making the DR systems simply wait for a disaster to occur. Once the criticality is defined, i.e., once a business is able to identify “must” information from “routine” information, a careful strategy can be planned by identifying the service levels around it.

Myth #6: A relational database model is the best data model for all decision support systems, and the dimensional data model works only with specific subject areas and domains.
A relational model, which is an ER model, is a design technique which addresses the relationship between data elements at the most granular level. This is a perfect technique for transaction processing systems, whereas the concept of dimensional modeling came into existence to address end-user queries and retrieval of data. The purpose of both these models, however, is quite different, and each of them should be used very carefully in their respective places and scenarios. The highest level connection between both of the design techniques which I can think of is that perhaps an ER model can be broken down into various dimensional models. Because the nature of the BI/DW systems are to address end-user analysis and queries using more recent and historical data, the dimensional model design technique is the only suitable technique to achieve this.

Lots of consultants claim that relational model is the best way to deal with data getting into the data warehouse. More emphasis is given to the data warehouses that are being built for the first time. The argument I have heard for this approach is that “because you don’t know what you want to do with the data warehouse, it’s a good idea to build it with the relational model because it eliminates redundancy rather than to keep adding layers, which demoralize the data when required.” My take on this kind of argument is, if you don’t know what you would want to do with your data warehouse, it’s high time you go back and figure it out with the business sponsors. Then decide about modeling. On the technical side, ETL loads for data warehouses require the biggest chunk in terms of effort and time. With a relational model, even to load a few GBs might take a couple of hours. Apart from that, scalability and associated issues can be well understood automatically. Some amount of normalization is seen depending on the complexity of the data model, but claiming relational data model is best for DW is not appropriate. Dimensional modeling is the only powerful technique for overall enterprise data warehouse. There are classic examples in the industry with companies like P&G that implemented dimensional modeling in building their EDW back in the early ‘80s.

Myth #7: Data warehouse performance is generally low if data is not summarized.
The general recommendation is to have optimal lowest level operational data in the dimensional model for a data warehouse. It’s a myth to think that a DW performance is low because there is no summarized data. Summarized or pre-aggregated data in a warehouse without a specific goal or requirement can be dangerous. One has to really understand the appropriate situation and necessity for having summarized data. There is no doubt that accessing pre-aggregated data might be much faster in certain defined scenarios, but one should not forget that having a summarized or a pre-aggregated view is only a performance-tuning method and not an architecture replacement in the dimensional model. For standard reports where the report requirements are predefined and there is no scope for ad hoc reporting from users, it might be a good idea to have summarized tables. If the user is performing ad hoc reporting, it’s always better to have the lowest level of detail to avoid reaching dead ends while trying to address some new business requirements.

Myth #8: Data models for the data warehouses should be built as “generic.”
We have come across many instances where products and their vendors promise to deliver generic data model having generic business rules. I wonder how this could give any efficient value to the data warehouse delivery. Providing the most granular level of data in data models often helps to handle all the business questions that are not predefined. But still, having lowest grain does not qualify to have a generic data model, meaning, any application can derive its results from one generic data model. You can scale the data model to use the conformed data and define the BI application with specific business rules around it. Data models should always be built customized to applications. Having an application-oriented data model is important to derive metrics related to that application. These metrics, like profit-loss and cost-benefit, cannot be queried and/or calculated on the fly, and that’s the reason the ETL prepares the data in the backend and presents it to the reporting tools to utilize it. In thinking of creating generic data models or business rules, one has to clearly understand that the calculations and operating load come only on the reporting front. The reporting tool will never be able to handle that kind of load, because it is not designed for that. It’s always beneficial to assess the application and its delivery requirements to make the data model application specific, taking enough care to give space for scalability when required.

Information is power and information management is more like a behavioral science theory of management where the critical factor in making decisions lays with the individual’s limited ability to process information and to make decisions under limitations. New information management techniques and technologies will keep emerging and existing ones will keep undergoing a transformation, which is normal. But the only crucial factor that differentiates between the “intelligent” decisions and “just” decisions is the capability of the information management leaders. The demand for information management professionals will continue to increase in the foreseeable future. The leaders who can successfully control the power of information will be able to harness right value from the unlimited information already existing as well as from the generated data. These leaders should be able to clearly differentiate between the objective realities (facts) and fabulous statements (myths) in every single data point in the information management domain.

If you missed the first four myths & facts presented in our Jan issue click here!

---Source: DM Review Special Report, December 30, 2008. Lalitha Chikkatur is a business intelligence consultant. She can be reached at lchikkatur@gmail.com.
 

 

 

 

 

 


 



Follow us on:

Facebook           Twitter

           


Article Library | Direct Mail | Copywriting | Data Quality | eMail | Case Studies | Technical | Postal
Marketing Strategies | Internet & Web | Industry News | Subscript to Newsletters