Sunday, May 29, 2022

Rise of Big Data

When talking about Big Data, as its names suggest it is about a huge amount of data, which cannot be managed (Stored, processed…) by none of the traditional data management tools.

In other words Big data can be defined as a complex and voluminous set of information containing structured, unstructured and semi-structured data sets that are difficult to manage using traditional data processing tools. It requires additional infrastructure for management, analysis and translation into statistics.

Although the concept of big data itself is relatively new, the origins of big data can be traced back to the 1960s and 1970s, when the world of data was just taking off with the first data centers and the development of the relational database. Around 2005, people began to realize how much data users were generating through Facebook, YouTube, and other online services. Hadoop (an open-source framework created specifically for storing and analyzing large data sets) was developed in the same year. NoSQL also began to gain popularity during this time. The development of open-source frameworks such as Hadoop (and more recently Spark) has been essential to the growth of big data because they make it easier to work with big data and cheaper to store it. In the years since then, the volume of big data has skyrocketed. Users are still generating massive amounts of data – but it's not just humans. With the advent of the Internet of Things (IoT), more objects and devices are connected to the Internet, collecting data about customer usage patterns and product performance. The emergence of machine learning has brought even more data. While big data has come a long way, its usefulness is just beginning. Cloud computing has expanded the possibilities of big data even further. The cloud offers truly elastic scalability where developers can simply spin up ad hoc clusters and test a subset of data. And graph databases are also increasingly important because of their ability to display vast amounts of data in a way that makes analysis fast and comprehensive.

One of the best examples of Big Data is Social media. Such as Facebook, Instagram, Twitter, LinkedIn etc.

From the below statistics (Written by Bernard Marr) you will be able to understand the amount of data we produce every single day.

Internet

  • We conduct more than half of our web searches from a mobile phone now.
  • More than 7 billion humans use the internet (that’s a growth rate of 7.5 percent over 2016).
  • On average, Google now processes more than 40,000 searches EVERY second (3.5 billion searches per day)!
  • While 77% of searches are conducted on Google, it would be remiss not to remember other search engines are also contributing to our daily data generation. Worldwide there are 5 billion searches a day.


Social Media

  • Snapchat users share 527,760 photos
  • More than 120 professionals join LinkedIn
  • Users watch 4,146,600 YouTube videos
  • 456,000 tweets are sent on Twitter
  • Instagram users post 46,740 photos

 





Communication

  • We send 16 million text messages
  • There are 990,000 Tinder swipes
  • 156 million emails are sent; worldwide it is expected that there will be 9 billion email users by 2019
  • Every minute there are 103,447,520 spam emails sent
  • There are 154,200 calls on Skype

 

Hope those help you to get an idea about the amounts of data produces every single day.

 

When talking about Big Data knowing this 5 V concept is important.

1.       Volume: Massive amount of data being generated.

2.       Velocity: At which seed the data is being generated.

3.       Variety: Different types of Data (Structured, Unstructured, and semi-structured)

4.       Value: Ability to turn data into useful insights.

5.       Veracity: Quality and Accuracy of Data.

Big data is not a fad. We are just at the beginning of a revolution that will touch every business and every life on this planet. However, many people still treat the concept of big data as something they can choose to ignore – when in reality they will soon be run over by the steamroller that is big data. Don't believe me? Below are some mind blowing facts about Big data.

1. Data volumes are exploding, with more data being created in the last two years than in the entire previous history of the human race.

2. Data is growing faster than ever, and by 2020, about 1.7 megabytes of new information will be created every second for every person on the planet.

3. By then, our accumulated digital universe of data will have grown from today's 4.4 zettabytes to approximately 44 zettabytes, or 44 trillion gigabytes.

4. We create new data every second. For example, we perform 40,000 search queries every second (on Google alone), which means 3.5 searches per day and 1.2 trillion searches per year.

5. Distributed computing (performing computing tasks using a network of computers in the cloud) is very real. Google uses it every day to engage about 1,000 computers to answer a single search query that takes no more than 0.2 seconds to complete.




 

Monday, March 1, 2021

Common job Roles comes under data science.

 



Below is a general description of a few main job roles in the Data science field.

1. Data Scientist

Being a data scientist may be intellectually demanding, and analytically fulfilling, and it can put you at the cutting edge of new technological developments. As big data continues to be more crucial to how businesses make choices, data scientists are becoming more prevalent and in demand.

Data scientists decide what questions their team should be asking and then work out how to use data to respond to those questions. For forecasting and reasoning, they frequently create predictive models.

Daily duties for a data scientist could include the following:

  • Analyze databases for patterns and trends to gain new insights.
  • To predict outcomes, create algorithms and data models.
  • Utilize machine learning methods to raise the caliber of data or product offerings.
  • Inform senior staff and other teams of your recommendations.
  • Use data analysis software like Python, R, SAS, or SQL.
  • Keep up with advancements in the field of data science.

***According to Glassdoor, the average compensation for a data scientist in the United States is $122,499 asof April 2022.


 2. Data Analyst

To find the solution to a problem or provide an answer to a question, a data analyst gathers, purifies, and analyzes data sets. They work in a variety of fields, including as government, business, finance, law enforcement, and science.

The practice of extracting information from data to guide better business decisions is known as data analysis. Five iterative phases typically comprise the data analysis process:

Choose the data you want to examine:

  • Collect the data
  • Clean the data in preparation for analysis
  • Analyze the data
  • Interpret the results of the analysis

When it comes to what Data Analyst hat actually do, Here’s what many data analysts do on a day-to-day basis:

  1. Gather data: Analysts frequently do their own data collection. This can entail completing surveys, monitoring website visitor demographics, or purchasing datasets from data collection experts.
  2. Clean data: Raw data may include outliers, duplicates, or errors. In order to prevent inaccurate or distorted interpretations, cleaning the data refers to maintaining the quality of data in a spreadsheet or through a programming language.
  3. Model data: This requires developing and planning a database's structural elements. You may decide which data kinds to save and gather, how to tie different data categories to one another, and how the data will actually look.
  4. Interpret data: Finding patterns or trends in the data will enable you to interpret it and use it to support your interpretation of the question at hand.

***The average base pay for a data analyst in the United States in December 2022 is $62,382, according to job listing site Glassdoor


3. Data Engineer

Data engineering is the practice of developing large-scale data collection, storage, and analysis systems. It covers a wide range of topics and has uses in almost every business. Massive volumes of data can be gathered by organizations, but to make sure that it is in a highly useable shape by the time it reaches data scientists and analysts, they need the right personnel and the right technology.

The following are some of the most typical duties of a data engineer:

  • Architecture development, construction, testing, and maintenance
  • Align the architecture with the needs of the business
  • data gathering
  • Create data set procedures.
  • Utilize tools and programming languages
  • Determine how to increase data quality, efficiency, and reliability.
  • Make inquiries about your industry and business through research
  • Utilize vast data sets to solve business problems
  • Utilize high-end analytics software, machine learning, and statistical techniques.
  • gather information for predictive and prescriptive modelling
  • Utilize data to find hidden patterns.
  • Find tasks that can be automated using data.
  • based on analytics, provide stakeholders with updates


4. Machine Engineer

Machine learning engineers are in high demand today. However, the job profile has some difficulties. Machine learning engineers are expected to perform A/B testing, design data pipelines, and implement popular machine learning algorithms like classification, clustering, etc., apart from having a deep understanding of some of the most powerful technologies like SQL, and REST API. , etc.

A few important roles and responsibilities of a machine learning engineer include:

  • Design and development of machine learning systems
  • Exploring machine learning algorithms
  • Testing machine learning systems
  • Application/product development based on client requirements
  • Extending existing machine learning frameworks and libraries
  • Exploring and visualizing data for a better understanding
  • Training and retraining systems
  • Know the importance of statistics in machine learning

 

***According to Glassdoor the average salary for a Machine Learning Engineer is $107270 per year in US.



 5. Data Architect

A data architect creates data management plans so that databases can be easily integrated, centralized and protected with the best security measures. They also ensure that data engineers have the best tools and systems to work with.

A few important roles and responsibilities of a data architect include:

  • Development and implementation of an overall data strategy aligned with the business/organization
  • Identification of data collection sources in accordance with the data strategy
  • Collaborate with cross-functional teams and stakeholders for smooth operation of database systems
  • End-to-end data architecture planning and management
  • Maintaining database systems/architecture with efficiency and security in mind
  • Regularly audit the performance of the data management system and make changes to improve the systems accordingly.

 

Data Scientist Roles and responsibilities.

 Data scientists work closely with business stakeholders to understand their goals and determine how data can be used to achieve those goals. The design data modelling processes create algorithms and predictive models to extract the data the business needs and help analyze the data and share insights with peers. While each project is different, While each project is different, the process for gathering and analyzing data generally follows the below path:

1. Ask the right questions to begin the discovery process.

2. Acquire data.

3. Process and clean the data.

4. Integrate and store data.

5. Initial data investigation and exploratory data analysis

6. Choose one or more potential models and algorithms.

7. Apply data science techniques, such as machine learning, statistical modelling, and artificial intelligence.

8. Measure and improve results.

9. Present final result to stakeholders

10. Make adjustments based on feedback.

11. Repeat the process to solve a new problem.




 

 

 

 


Data scientist: Sexiest job in 21st century

As we are living in the Big Data Era, Data Science is becoming a very promising field to harness and process huge volumes of data generated from various sources. Data Science is a vast discipline in itself, consisting of specialized skill sets such as statistics, mathematics, programming, computer science and so on. Data science consists of several elements, techniques and theories including math, statistics, predictive analysis, data modelling, data engineering, data mining, and visualization.

In this modern era, data scientists are the super powered heroes who lead the digital world.

Who is actually a Data scientist? What do they actually do? Are they struggling with data all day and night or experimenting in his/her laboratory with complex mathematics?  

Let’s explore!

There are several definitions available on Data Scientists. In simple words, A data scientist is a professional responsible for collecting, analyzing, and interpreting extremely large amounts of data.

They’re part mathematician, part computer scientist and part trend-spotter. And, because they straddle both the business and IT worlds, they’re highly sought-after and well-paid.


They’re also a sign of the times. Data scientists weren’t on many radars a decade ago, but their sudden popularity reflects how businesses now think about 
big data. That unwieldy mass of unstructured information can no longer be ignored and forgotten. It’s a virtual gold mine that helps boost revenue – as long as there’s someone who digs in and unearths business insights that no one thought to look for before.

To take an idea will see some definition of data scientist from different popular websites.


  • Data scientists are big data wranglers, gathering and analyzing large sets of structured and unstructured data. A data scientist’s role combines computer science, statistics, and mathematics. They analyze, process, and model data then interpret the results to create actionable plans for companies and other organizations. (Masters in data science )

  • Data Scientist practices the art of Data Science. (Edureka)

  • A data scientist is a professional responsible for collecting, analyzing and interpreting extremely large amounts of data. The data scientist role is an offshoot of several traditional technical roles, including mathematician, scientist, statistician and computer professional. This job requires the use of advanced analytics technologies, including machine learning and predictive modelling.(TechTarget)


Inspiring Facts: Top 5 data scientist   (From AnalyticsInsights)

 

1. Geoffrey Hinton

Geoffrey Hilton is called the Godfather of Deep Learning in the field of data science. Mr Hinton is best known for his work on neural networks and artificial intelligence. A PhD in artificial intelligence, he is accredited for his exemplary work on neural nets.

Twitter- @geoffreyhinton

 

Awards– AM Turing (2019), BBVA Foundation Frontiers of Knowledge Award in Information and Communication Technologies (2016), IEEE Frank Rosenblatt Award (2014), IJCAI Award for Research Excellence (2005), Rumelhart Prize (2001).

 

2. Jeff Hammerbacher

 

The co-founder of the term, “Data Science”, Jeff Hammerbacher developed methods and techniques for capturing, storing, and analysing a large amount of data. Credited to start Facebook’s data science team, he threw his weight behind adopting Hadoop enabling the social media giant’s data team to process tons of data in real-time at a lightning-fast speed. Mr Hammerbacher is the co-founder at Cloudera and also been an instructor at the Icahn School of Medicine.

Twitter- @hackingdata

Book- Beautiful Data

 

3. Dhanurjay Patil

Dhananjay Patil is a former US Chief Data Scientist, and along with Jeff Hammerbacher he coined the term “data science”. A doctorate in Applied Mathematics from the University of Maryland College Park, the distinguished Dhanurjay Patil has been a principal consultant to many blue-chip companies which include LinkedIn, Skype, Salesforce, PayPal, eBay, and Greylock Partners.

Twitter- @dpatil

Awards– Medal for Distinguished Public Service

 

4. Alex “Sandy” Pentland

 

Alex “Sandy” Pentland is termed as one of the world’s seven most powerful data scientists along with Larry Page, by Tim O’Reilly in 2011. Mr Pentland also founded and leads an MIT-wide program that works actively in pioneering computational social science using Big Data and AI. A serial entrepreneur he co-leads the World Economic Forum Big Data and Personal Data initiatives and is a founding member of the Advisory Boards for Motorola Mobility, Telefonica, Nissan, and a variety of start-up firms.

Mr Pentland leads the Media Lab Entrepreneurship Program promoting companies using cutting edge technologies to solve real-world problems. Mr Pentland is also an advisor to the Enigma Project & Endor.

Twitter- @alex_pentland

Awards– McKinsey Award from Harvard Business Review, Brandeis Award, The 40th Anniversary of the Internet (from DARPA)

 

5. Dean Abbott

Founder and president of Abbott Analytics, Dean Abbott is a seasoned data science professional. With over 21 years of enriching experience, he is adept at deploying advanced and complex data mining techniques into data preparation and data visualization.

Mr Abbot is credited for his outstanding expertise in fraud detection mechanics, data and modelling, missile guidance, survey analysis, predictive toxicology, and signal processes.

Twitter- @deanabb

Books – IBM SPSS Modeler Cookbook and Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst.

 

 

Sunday, February 28, 2021

Inspiring Facts: 5 Famous Companies and brands use Data Science to improve their performances


This is a little effort to show the power behind Data Science. Let’s take a closer look at some companies and brands using such platforms to improve performance and efficiency and deliver better customer experiences.

 

#01 Walmart



  • Walmart uses data mining to discover patterns in point of sales data. Data mining helps Walmart find patterns that can be used to provide product recommendations to users based on which products were bought together or which products were bought before the purchase of a particular product.
  • A familiar example of effective data mining through association rule learning technique at Walmart is – finding that Strawberry pop-tarts sales increased by 7 times before a Hurricane. After Walmart identified this association between Hurricane and Strawberry pop-tarts through data mining, it places all the Strawberry pop-tarts at the checkouts before a hurricane. 
  • Another noted example is during Halloween, sales analysts at Walmart could look at the data in real-time and found that thought a specific cookie was popular across all Walmart stores, there were 2 stores where it was not selling at all. The situation was immediately investigated, and it was found that simple stocking oversight caused the cookies not being put on the shelves for sales. This issue was rectified immediately which prevented further loss of sales.

 #02 McDonald's



    McDonalds is another famous company that use data science to increase their performances. Their updated mobile app allows customers to order and pay almost entirely via their mobile devices. To make the experience that much more enjoyable, they gain access to exclusive deals, too. In return for the convenience, McDonald’s collects essential information about their audience. They can see what foods and services customers order, how often or even whether they visit the drive-thru or go inside. All this data allows for more targeted promotions and offers. In fact, Japanese customers using the company’s mobile app spend an average of 35 percent more because of spot-on recommendations just before they are ready to order food.


#03 Spotify



Spotify is another brand name which uses Big data for a better user experience. it uses AI and big data to deliver better playlists and streaming content recommendations to its users. The Discover Weekly feature is an excellent example of this in action. Each week, Spotify offers every user a personalized playlist with music recommendations based on their listening and browsing history. It’s kind of like a curated mixtape from the platform, offering new tracks and artists, showing you new genres you might enjoy or even updating you on your favorite music.

This feature is possible thanks to a vast trove of information and data they collect from their user base. When you have millions of people listening to music every day, you gain some pretty deep insights into user habits and preferences.

The company has also launched a “Spotify for Artists” app that lets bands and music artists see analytics related to their content.

 

#04 Amazon



The online retail giant has access to a massive amount of data on its customers; names, addresses, payments and search histories are all filed away in its data bank.

While this information is obviously put to use in advertising algorithms, Amazon also uses the information to improve customer relations, an area that many big data users overlook.

The next time you contact the Amazon help desk with a query, don't be surprised when the employee on the other end already has most of the pertinent information about you on hand. This allows for a faster, more efficient customer service experience that doesn't include having to spell out your name three times.


#05 CocaCola



The company collects data on its customers to boost current consumption and upsell new products, which has led to a more efficient operation that cuts costs and boosts profits. As consumers share their opinions of the product through social media, phone or email, it allows the company to adjust its approach and better align with consumer interests and demands. The data the company collects is aimed at improving the brand experience and developing greater customer loyalty.


Saturday, February 27, 2021

Why Data Science? & Why it’s so important?



Now we have a clear idea about what data science is and about the history of the data science. Will now explore why we need something like data science and what’s the importance of the data science to the world. 

Before that will look at why data matters this much in the current world.

Data is the electricity in the current world, Fuel to run the world. As it was mentioned in the 1st article, we are living in the age of the 4th industrial revolution. Which is the era od Artificial Intelligence and Big Data.  There is a massive data explosion that has resulted in the culmination of new technologies and smarter products. Around 2.5 exabytes of Data is created each day. The need for data has risen tremendously in the last decade.

Just think of this amount of data produced in every millisecond throughout the world! Then assume the world without data science! All the data will be just another raw material simply a rubbish which will be gathered and disposed without any usage.

Before data scientist there were statisticians who use data. These statisticians experienced in qualitative analysis of data and companies employed them to analyze their overall performance and sales. With the advent of a computing process, cloud storage, and analytical tools, the field of computer science merged with statistics.

 This gave birth to Data Science!

Data is a magic while data scientists are wizards who know how to use data in a insightful way. Data Scientist will know how to dig out meaningful information with whatever data he comes across. He helps the company in the right direction.

Summarizing Data science or data-driven science enables better decision making, predictive analysis, and pattern discovery. It lets you.

Before

·         Find the leading cause of a problem by asking the right questions.

·         Perform exploratory study on the data.

·         Model the data using various algorithms. 

·         Communicate and visualize the results via graphs, dashboards, etc.

In practice, data science is already helping the airline industry predict disruptions in travel to alleviate the pain for both airlines and passengers. With the help of data science, airlines can optimize operations in many ways, including:

·         Plan routes and decide whether to schedule direct or connecting flights.

·         Build predictive analytics models to forecast flight delays.

·         Offer personalized promotional offers based on customers booking patterns. 

Decide which class of planes to purchase for better overall performance.


Rise of Big Data

When talking about Big Data, as its names suggest it is about a huge amount of data, which cannot be managed (Stored, processed…) by none of...