Life Cycle Of Data Science Projects!

From Collecting Data Till Model Deployment (End to End)

SagarDhandare
3 min readJun 20, 2021

1. Data Collection

Data Collection is the first step in the project of the data science life cycle. It is one of the most important things in the life cycle. The data can be taken from various places likes the internet, company data, database, and many more…

2. Exploratory Data Analysis

After collecting the data we need to do exploratory data analysis. It is the way of visualizing, summarizing, and interpreting the information that is hidden in the row and columns features.

3. Feature Engineering

a. Handling Missing Values: Missing values is one of the most common problems you saw when you are doing feature engineering/data preparation. The main reason for the missing values because of humans errors and data privacy and so on…

b. Handling Duplicate Data: Usually we have to remove the duplicate data in our dataset because it may lead to an Overfitting problem.

c. Handling Outliers: As most of the machine learning and deep learning algorithms are sensitive, it may take a longer time for training also gives us a less accurate model and poor results.

d. Handling Categorical Feature: As the algorithm doesn't understand the categorical data, we need to convert that into numeric values.

e. Handling Imbalanced Data: An imbalance occurs when one or more classes have very low proportions/probability in the training data as compared to the other classes. Our model gives us poor predictive performance, specifically for the minority class if we don't handle the imbalanced data.

4. Feature Scaling

After performing the feature engineering part, we need to do a feature scaling. The goal of feature scaling is to bring down all the data on the same scale, as each feature may vary in a different range.

5. Feature Selection

Feature selection is used for removing irrelevant and unnecessary features. By using feature selection, we can use only important features that are important for model prediction.

6. Train-Test Split

We split our data into training and testing to avoid our model from overfitting and see how our model performed.

7. Model Creation

We will train a model over a set of training data, providing it a machine-learning algorithm that it can learn from those data and predict future data.

8. Hyper Parameter Tunning

Hyperparameter is used for choosing the optimal parameters for learning the model so that our model can solve the data science problem effectively.

9. Model Deployment

Model deployment is the last stage in the data science life cycle project. The main goal of building a data science model is to solve a problem, and a data science model can only do when it is in production and actively in use by consumers.

Please feel free to drop your comments, advice, or any mistakes.😊

Connect me on: LinkedIn | GitHub | Email

Happy Learning!!! ^_^

--

--