Skip to main content

Command Palette

Search for a command to run...

Quick Intro to "Data Science Life Cycle"

Published
2 min read
S

Hi 👋, I'm Srishti, a Machine Learning Enthusiast

🔭 I’m a self-motivated coding enthusiast seeking for an opportunity to work in a challenging environment to enhance my skills and knowledge. I aim to utilise my problem-solving skills to bring advancements to the technical picture of the day.

🌱 I’m currently learning Machine Learning.

👯 I’m looking to collaborate on any problem related to data science and computer vision.

💬 Ask me about anything.

📫 How to reach me msrishtimahajan@gmail.com

image.png

Data Acquisition

Today, Data is unstructured and is present in multiple formats (numbers, images, videos, text, etc.) and present at multiple sources.

image.png

  • Data is collected from multiple sources.
  • It is then stored in a central storage repository, which can be called a 'Data Warehouse' . The collected data has multiple data points and different structures, which are integrated into a single data structure and stored in a data warehouse.
  • From the data warehouse, we find the target data. Target data means we take only the essential data points that will be required to solve the problem.

Data Pre Processing

This stage is known as Exploratory Data Analysis. This stage, in the data science life cycle, takes most of the time. image.png

  • The data acquired is raw data, which needs to be converted to tidy data, on which ML Algorithms can be applied.
  • Under EDA or Pre-Processing, there are two tasks:
  • Data Manipulation: Consider a huge spreadsheet comprising 10,000 columns and 1 million rows in which you need to find all the employees whose salary is > 8 lakh rupees and age is > 30. Instead of checking manually, we will use data manipulation techniques using different languages like Python, R, and SQL. We just need to write a line of code to find all the names
  • Data Visualization: To find insights from a spreadsheet, we can plot graphs to have a better view of the data.

Machine Learning

Once you have tidy data, you can build ML Algorithms depending on the problem statement. image.png

Some ML Algorithms include:

  1. Classification: Used for classification into groups.
  2. Regression: Used for prediction.
  3. Clustering: Used for clustering in different segments.

Pattern Evaluation

Here, you evaluate the results you get from ML Algorithms, i.e., you check for accuracy & usefulness of results. If accuracy is less, then you need to tweak the model to get better accuracy.

image.png

Knowledge Representation

Once you get better accuracy, you can consider the final result and show it to stakeholders or clients in form of graphs. This stage is known as knowledge representation.

image.png

That's all about the data science life cycle!