Unstructured data is the type of the data that has no predefined datatype, or which can’t be stored in relational database tables. It consists of text file, website content, email and messages, video file, audio, image file, spreadsheet data etc.

There are many sources of unstructured data. With the evolution of social media the volume of unstructured data has grown tremendously and it has been continuously growing at a fast pace. Apart of social media customer care voice recordings, you tube videos, satellite images, GPS data, email and phone messages, power point slides, spreadsheet fields and a number of things are the sources of unstructured data. So it can be said that 80% of the data in today’s world is unstructured data which can’t be stored in a relational database.


Unstructured data and structured data together form big data which is high in volume, large in variety and is of high velocity. The analysis of such data has become the need of the hour for many organizations. There are several technologies emerging which analyses unstructured data and draw analytics out of it. The analysis of unstructured data helps not only in descriptive analytics (knowing user’s behaviour) but also helps a lot in predictive analytics (predicting the future behaviour).

Hadoop is the example of one of the technologies that stores and analyses high volume of unstructured data. As such type of data is high in volume and velocity, distributed file systems are used to store such type of data.


