When talking about Big Data, as its names suggest it is about a huge amount of data, which cannot be managed (Stored, processed…) by none of the traditional data management tools.
In other words Big data can be defined as a complex and voluminous set of information containing structured, unstructured and semi-structured data sets that are difficult to manage using traditional data processing tools. It requires additional infrastructure for management, analysis and translation into statistics.
Although the concept of big data itself is relatively new, the origins of big data can be traced back to the 1960s and 1970s, when the world of data was just taking off with the first data centers and the development of the relational database. Around 2005, people began to realize how much data users were generating through Facebook, YouTube, and other online services. Hadoop (an open-source framework created specifically for storing and analyzing large data sets) was developed in the same year. NoSQL also began to gain popularity during this time. The development of open-source frameworks such as Hadoop (and more recently Spark) has been essential to the growth of big data because they make it easier to work with big data and cheaper to store it. In the years since then, the volume of big data has skyrocketed. Users are still generating massive amounts of data – but it's not just humans. With the advent of the Internet of Things (IoT), more objects and devices are connected to the Internet, collecting data about customer usage patterns and product performance. The emergence of machine learning has brought even more data. While big data has come a long way, its usefulness is just beginning. Cloud computing has expanded the possibilities of big data even further. The cloud offers truly elastic scalability where developers can simply spin up ad hoc clusters and test a subset of data. And graph databases are also increasingly important because of their ability to display vast amounts of data in a way that makes analysis fast and comprehensive.
One
of the best examples of Big Data is Social media. Such as Facebook, Instagram,
Twitter, LinkedIn etc.
From the below statistics (Written by Bernard Marr) you will be able to understand the amount of data we produce every single day.
Internet
- We conduct more than half of our
web searches from a mobile phone now.
- More than 7 billion humans
use the internet (that’s
a growth rate of 7.5 percent over 2016).
- On average, Google now processes more than 40,000 searches EVERY second (3.5 billion
searches per day)!
- While 77% of searches are conducted on Google, it would be remiss not to remember other search engines are also contributing to our daily data generation. Worldwide there are 5 billion searches a day.
Social
Media
- Snapchat users share 527,760 photos
- More
than 120 professionals join LinkedIn
- Users
watch 4,146,600 YouTube videos
- 456,000 tweets are sent on Twitter
- Instagram users post 46,740 photos
Communication
- We send 16
million text messages
- There are
990,000
Tinder swipes
- 156 million emails are sent; worldwide it is expected that there will be 9
billion email users by 2019
- Every
minute there are 103,447,520 spam emails sent
- There are
154,200
calls on Skype
Hope
those help you to get an idea about the amounts of data produces every single day.
When talking about Big
Data knowing this 5 V concept is important.
1.
Volume:
Massive amount of data being generated.
2.
Velocity:
At which seed the data is being generated.
3.
Variety:
Different types of Data (Structured, Unstructured, and semi-structured)
4.
Value:
Ability to turn data into useful insights.
5. Veracity: Quality and Accuracy of Data.
Big data is not a fad. We are just at the beginning of a revolution that will touch every business and every life on this planet. However, many people still treat the concept of big data as something they can choose to ignore – when in reality they will soon be run over by the steamroller that is big data. Don't believe me? Below are some mind blowing facts about Big data.
1. Data volumes are exploding, with more data being created in the last two years than in the entire previous history of the human race.
2. Data is growing faster than ever, and by 2020, about 1.7 megabytes of new information will be created every second for every person on the planet.
3. By then, our accumulated digital universe of data will have grown from today's 4.4 zettabytes to approximately 44 zettabytes, or 44 trillion gigabytes.
4. We create new data every second. For example, we perform 40,000 search queries every second (on Google alone), which means 3.5 searches per day and 1.2 trillion searches per year.
5. Distributed computing (performing computing tasks using a network of computers in the cloud) is very real. Google uses it every day to engage about 1,000 computers to answer a single search query that takes no more than 0.2 seconds to complete.