Friday, March 16, 2012

What Is Data?

So, what is data, exactly?

From my perspective, data is any piece of information produced by a company, purchased from a vendor, or shared by a partner, that can be used to make a decision of any kind by anyone in the organization. It is very broad. For our purposes, we will only consider data recorded in electronic form, which represents the vast majority of data available to a business today.

Your company runs an e-commerce website? Then your list of products, customers, orders, promotions, the web traffic, affiliate websites, all of that is data. If those orders are fulfilled from a warehouse, the inventory of that warehouse is data. Those orders are shipped by mail or some other carrier, bam, more data.

Cash registers in a brick-and-mortar store produce a lot of detailed data about the content of the shopping cart. As far as data goes, the point-of-sale software is essentially doing the same thing as an e-commerce website. The manufacturing process of a plant produces data about materials, quality, duration, etc. Some products come with warranties, the issuance and claims on those warranties are another source of data. The accounting and finance functions of just about any company are major sources of data.

Some data is less clearly defined. For example, the content of a phone call in a customer service center is not recorded in electronic form, but the customer service agent typically uses a software application to capture information during the call, complete with categories, duration, contact information and more. This becomes another source of data.

Invoices from service providers such as phone companies are another source of data, which typically flow into the accounting process.

Knowing what data is available to an organization is an important success factor for data warehousing and business intelligence professionals. Making a piece of data from a different department available in the data warehouse can make a big difference in how useful the data warehouse will be.

For example, consider a data warehouse being created for a mid-size winery that wants to analyze its store sales. Wine making is complex, it combines many specialized skills. It would be normal to start the analysis of sales by looking at the financials, the marketing data, maybe the demographics. That's what I would do. The manufacturing process (the actual wine making) also plays a big role in the sales of wine, it determines fundamental attributes of the product, in this case the quality and price point. But there are so many areas of wine making, what piece of information is important? Does the type of barrel used to age the wine have an impact on the sales? Probably. What about the type of ink used to print the label? Maybe, it is hard to tell without spending some time looking into it.

My approach has always been that you can only analyze the data if you have it. In the case of the winery, I did include information about the ink in the data warehouse (it turns out the type of ink used to print the label has an impact on the perception of quality by the consumer, which varies by region and price point; also, in the US, there are laws controlling the labels, different for every state). Each piece of data on its own may or may not be useful, but putting this data together makes every data element more powerful.

And THAT is what a data warehouse is all about.

No comments:

Post a Comment