Everyone knows that reliable decisions are impossible without reliable data. That’s nothing more than the old adage, “Garbage In, Garbage Out.” Yet somehow we seem to think that big data is the modern-day silver bullet that will solve a multitude of problems without the same real effort that went into accumulating the data we already have (our “little data,” as I sometimes like to call it).
Big data is more than these three questions: how do we go get more data (look on the Internet, of course), where do we put it all once we have it (in that new Hadoop cluster we just bought), and what do we do with it all (I don’t know — that’s why we hired those data scientists!)? They are questions that bespeak the underlying casual attitude toward big data initiatives: get it in, put it somewhere, do something with it. This all-too-prevalent attitude sadly has little use for one of the most valuable tools in the BI architect’s toolbox, data governance. Oftentimes data architects face development teams or project sponsors in their organisations who insist on circumventing data governance because of the velocity of the data involved, as though applying sound data practices will automatically fritter away the business value of high-velocity data streams. Nothing could be further from the truth.
Richard Neale summarizes the situation well in a recent article of his. He reminds us that, ”it’s difficult to justify why big data should be exempt from the normal rules of data governance. If you short cut this important process, you’ll end up with a very large mess.”1
The risks presented by unrestricted big data (including high-velocity or real-time data sources), the “very large mess” that Neale warns us about, are numerous and substantial. But data governance is the practice that mitigates those very real business risks: regulatory compliance (including reporting, entity knowledge such as KYC/AML, and privacy), data integrity, accuracy, ownership, consumer protection (including participation/permission, sharing and access, and opt-in/opt-out), among numerous others. In one manner of thinking, removing the guardrails from around big data integration efforts by exempting them from data governance oversight is nothing more than trading convenience for risk. And while some parts of the organisation might be comfortable with those trade-offs, it’s presumptuous in the extreme to give big data a blank pass on the data governance process. At the very least, they need to be evaluated on a case-by-case basis to determine which need detailed governance review and which don’t. And even that decision is the province of the company’s data governance practice!
The last note I want to make on this topic is that data governance can’t be shelfware, especially with regard to big data integration initiatives. That is, all the effort to design and empower effective data governance practices in an organisation is useless without commitment and follow-through. If you create practices but no review board, you can’t be surprised when compliance is sporadic. If you push for data governance as an institutional priority but fail to measure its impact through performance-based KPIs tied to someone’s evaluation, you can’t be surprised if everyone agrees it’s important, but no one actually does it.
Mr. Briggs has been active in the fields of Data Warehousing and Business Intelligence for the entirety of his 17-year career. He was responsible for the early adoption and promulgation of BI at one of the world’s largest consumer product companies and developed their initial BI competency centre. He has consulted with numerous other companies about effective BI practices. He holds a Master of Science degree in Computer Science from the University of Illinois at Urbana-Champaign and a Bachelor of Arts degree from Williams College (Mass)..
View Linkedin Profile->
Other Articles by Douglas->