Posted in Operations & IT Articles, Total Reads: 2500
, Published on 02 June 2014
We are drowning in data. The volume of data being generated every second is enormous and the rate is increasing exponentially. We have come a long way into a digitised world where every human action is associated with data generation. Social media websites such as twitter and Facebook daily generate data in zeta bytes. In future, it is widely believed that biggest data is not created by humans but by the ‘Internet of things’. Internet of things is a scenario in which objects equipped with unique identifiers and they automatically communicate by transferring data without human intervention.
Traditional computing cannot process this high magnitude of data. Relational databases are not designed to handle this huge and highly unstructured data. Capturing, storing, and generating value from this Big data poses a real challenge. The characteristics and challenges posed by Big data are classified are widely described by three Vs as follows.
Big data implies enormous data. Enterprises are amassing zeta bytes, petabytes of data daily. Before it used to be employees created data but now it is machines, networks and human social interaction on machines like social media sites. Every vertical in an enterprise is collecting and deriving value from huge chunks of data. Marketing analytics in enterprises has evolved over the years and presently consumer purchasing behaviour is better understood by analysing the data he previously generated. Financial firms collect and analyse data to evaluate the credit worthiness of a customer. The sectors which are using digital information is on a rise leading to a data explosion in the future.
The velocity or the pace at which Big data is flowing in is massive and continuous from business processes, networks etc. This increasing rate of flow of data is growing beyond the capacity of what IT systems in organizations can store and process. Many organizations are moving from the traditional way of analysing the data by batch processing to real time processing of Big data. The real time data helps researchers and businesses make valuable decisions that can provide competitive advantages.
The variety of Big data refers to many types of data which is both structured and unstructured. Traditionally, data monitored and analysed was the structured data like the data present record, file, spreadsheets and relational databases. But now the newly increasing data are the semi structured data such as XML, RSS feeds, RFID data from supply chain, geo spatial data from logistics and unstructured data such as photos, audio, video, pdf files, wikis and many more thanks to the social media revolution.
Big Data technologies:
Hadoop an open source framework from Apache is the leading one used globally. This framework which is a set of tools allows processing of large sets of data by breaking into clusters. Unlike in traditional computing the processing capability using Hadoop can be scaled up from single servers to thousands of machines. The two main components of Hadoop Distributed file System (HDFS) and Hadoop MapReduce. HDFS is a distributed file system designed to run on a fault tolerant low cost hardware known as commodity hardware. MapReduce, which is based on the Google’s search technology distributes large data sets across multiple servers which individually do their processing of partly allocated data sets. LinkedIn is one of the companies which uses Hadoop to give real time personalized recommendations.
Analysts believe that 90% of the total data is created in the last few years and this pace is accelerating. It will not be a surprise when companies start to deal with Yotta bytes of data in the coming years. Businesses already have realized the importance of Big Data and started investing substantial amount of their capital into it. Big Data is the future, it has been a trending technology recently and soon we may see many advocates for it in the coming years.
This article has been authored by Yasoteja Balabhadra from IMT Ghaziabad