A Synopsis of “Big Data in Practice”

Ali Hirji

Ali Hirji, ORION Community Manager

A synopsis and perspective on a breakout session led by IBM InfoSphere BigInsights’ Worldwide Technical Sales Leader, Dirk deRoos at the 2013 THINK Conference

Data is the new oil. But it could also be that silicon snake oil. Dirk deRoos engages us on how to gather data, analyze it and act upon this analysis in a meaningful way. Bad data management will result in garbage in and garbage out. Dirk provided some ways to help prevent this.

Before I offer my summary, I will categorize the essential nuggets in three sections:

Data available to an organization is different from data an organization can use. Data has social realms and data needs to be analyzed in terms of impact. In order to manage and utilize data effectively, organizations need to ask themselves:

1) What are our data requirements?

Bring data to scale! On an individual or institutional basis, you need a defined paradigm for your data. Knowing your data requirements influences and justifies any further responses to data. Just because there is a lot of data does not mean you must have it all: beware of enterprise amnesia.

2) When are we planning to use this data?

Recognize the growing velocity of data: data has a lifetime. If the data will not be used immediately, storing it appropriately and relating it to daily work influences the data’s potency.

3) Where is the infrastructure to support your data plans and requirements?

The architectures you use to manage data: more amalgamation of traditional to contemporary data warehouse approaches are required (consider checking Dirk’s slides for more).

Data, as a theoretical concept is fairly clear. However, what does it translate into practically? If data is the new oil, what does it help us to produce? What does data enable? And, given this oil is a non-perishable resource, how can all this data generated be converted into meaningful outputs?

Data, if you will, is information waiting to be converted into knowledge. Knowledge, then, needs to be utilized effectively to take positive effect. Understanding data in practice compels us to uncover the new means of data production.

Now, though, data via the cloud for example, reveals how data permeates every cabinet of our daily activities. We are drowning in data: 12 plus terabytes of data on Twitter and 25 plus terabytes of data on Facebook mobilizes a significant challenge to data utilization. How do we sift through the garbage and analyze this data?  Furthermore, how do we accommodate for storing such data and what sets the bar on a relevant analytic tool?

During the morning talks, we heard about sentiment analysis. Yes, perhaps a tweet can be compared via a valency factor; but data comes in more than just 140 characters.

Heck, it is not a pool of data we are drowning in, it is just a drop.

During the conversation, a representative of Durham College suggested that they are an example of what many have become: data hoarders. If you have all of this data, what are you doing with it? A representative from Queens University explained that data keeps get piled upon for future reference.

But there is a time-sensitivity here.

Dirk explained how companies like Nike are taking advantage of data from twitter to predict what adverts would be most compelling to geographic specific people. However, this data can only be useful if it is planned and managed properly.

For ORION, we recognize the expanding volume of data we transfer. Perhaps, now, we need to further engage the Research, Education, and Innovation world with how this speedy and sizeable data can evolve into more innovative work ideas.

Dirk closed his session with a call for a change in human behaviour and how we manage data. We need to be responsible with our data.