Big Data- What,Where,Why and How

Posted in Operations & IT Articles, Total Reads: 2756 , Published on 07 August 2013

Emergence of Big data

Imagine visiting your favorite retail store and finding all the items in your laundry list displayed in front of you within easy reach and a proactive customer service representative walking up to you with a newly launched product that perfectly suits your tastes and aesthetics. Or a special scheme launched by your telecom operator through which you are able to make calls to your most connected friends at slashed down rates. With big data analytics, fueled by the increasing digitization of information shared across multiple platforms, all the above scenarios are possible. Companies are going out all out to rein in these huge truckloads of data to steer their businesses towards growth and profits and also gain a competitive advantage.

Credit-card purchase data, geographic data like residential and official address and psychographic data gathered from registrations and subscriptions to a variety of services and information from blogging sites and social networking platforms are critically analyzed to make sense and derive insights out of this seemingly disparate data.  The insights generated help companies better serve its target customer groups, enable product development by creating and enhancing products suited to consumer’s wants and preferences and thereby improve customer relations and service.  A few companies even use these analyses to conduct controlled experiments to boost employee productivity and improve operating margins.

4V’s of big data

Now what differentiates big data from traditional data? To exemplify this we come to the 4V’s namely volume, velocity, variety and variability. Setting the record straight, big data is high volume, high velocity, high variety and high variability poly-structured data

1. High Volume- Big data refers to very unstructured (and semi- structured )data-sets.  With information explosion, there is an enormous volume of this data to be collected, processed and made use of.

2. High Velocity- With the proliferation of internet technologies and smart devices, these data travel at unfathomable rates. For instance, Youtube reports 4 billion video views per day and 60 hours of video upload per minute.

3. High Variety- Data is mined from a variety of sources. Equipment sensor data, machine-generated data in the form of web logs, call –center records, GPS transmissions, and data generated from social media streams together create this phenomenon called big data.

4. High Variability-With data collected from a variety of sources as noted above, there is an abject absence of structure.  Data in the form of text, images, email messages, audio and video files are shared across a multitude of platforms are extracted and analyzed to understand patterns and unveil interesting trends.

How traditional rdbms differs from big data

However, big data is not quite that unprecedented a phenomenon as it is made out to be, it was very much in the offing. With sudden expansion in the volume of data produced on a daily basis and with that it becoming an important factor of production in business alongside labor and capital, improved data management systems became a necessity to handle the enormity and the complexity of data abounding the digital world.

With so much transactional data generated over the Web and a surge in communications across various networking platforms, conventional relational databases residing on client-server architectures are not equipped to handle the amount of data. The overwhelming variety of data makes it difficult for relational database management systems to process the data. Thus the need of the hour is big data management systems able to capture, curate and process these bulky and diverse data sets. With web applications accessible to consumers 24X7, to keep up with that level of activity, massive parallel processing systems, analytical databases and advanced business intelligence technologies are required.

Big data architecture

To implement big data management systems enter Hadoop and NoSQL.  The Apache Hadoop, an open-source framework, presents an altogether new way of storing and processing data. It involves distributed parallel processing of huge amounts of data using commodity hardware of inexpensive industry-standard servers that house and process the data. The Hadoop architecture is readily scalable. The set of parallel computers that process the data is known as the Hadoop Cluster.  In spite of the incompatibilities of the data sources, Hadoop can accommodate all types of data by using a redundant data structure and no specialized schemas. The cost efficiencies of the Hadoop system built over industry-standard servers makes it score over legacy systems that are not equipped to handle generic large-volume data sets.

The unstructured characteristic of big data has spawned the NOSQL paradigm. NOSQl stands for Not Only SQL. It is mostly suited for storing and retrieval of datasets that may not be particularly be related to one another. The ACID properties of relational databases are not complied with completely. The storage of data across several NoSQL nodes is premised on redundancy, where the failure of one or two servers won’t affect the systems. However, for querying and analytical purposes it may need to harness a relational database management system.

Figure: Hadoop Cluster

Business Value generation

Now, let’s understand how big data unleashes value for businesses.

Reveal startling insights

Big data being inherently complex in character when studied in detail reveal hidden insights hitherto unknown. For example weather and soil data can be analyzed to predict crop pattern, enabling farmers to plan their produce in advance, boosting agricultural productivity and preventing wastage.

Take more data-driven and less risky decisions.

With more insightful data backing business decision-making, the downside risks are greatly reduced. Walmart, for instance, developed Polaris search engine having a semantic search capability. This machine-learning semantic search enabled users to quickly search products saving their time. This resulted in a 10-15 percent increase in incremental sales. By using data about its online shoppers, understanding their online purchase behavior on the basis of the data captured and analyzed, Walmart’s decision to invest in an intelligent search engine reaped huge dividends.


Automation of business processes

Organizations adept at managing big data employ advanced technologies to process the data in real time allowing more automated business processes by incorporating analytics in their workflow. This results in streamlined business processes.

Is Big Data for everyone?

All said and done, at the end of the day it’s what one does with data that determines the success of an organization. It is to be noted that not all companies require big data analytics. In companies where workflows employ data sets conforming to a specific structure and data is consistent over a period of time and not prone to sudden increases in volume, they should adhere to their traditional relational database systems. Even in the extreme case of mergers, they are usually planned months in advance allowing the company sufficient time to combine the relational data sets.

However, in case of companies in high-growth sectors regularly accessing technology and dealing with unstructured data in the form of web logs, email messages, email archives, data captured from sensors ,  and metadata, big data management systems are required to bring about breakthrough innovation in their products and services. To fully realize the potential of big data systems, companies would require gathering data from a variety of sources including third parties. In case of third-party data, issues related to breach of privacy, security and copyright may arise. Organizations need to consider these factors to effectively and appropriately leverage big data.

As a concluding comment on the future of big data, with twitter programmers aggressively working on Hadoop development and big data analytics helping Obama win another presidential term, it is certainly the way forward.

The article has been authored by Sabornee Jana, IIM Indore



If you are interested in writing articles for us, Submit Here