Hi, I am a Data Scientist with 4 years of experience in AI, particularly in Machine Learning and Deep Learning. During these four years, I have interviewed with almost 30 data science companies, including Wipro, Tata Technology, Oracle, Cognizant, Tech Mahindra, Mphasis, Capgemini, Robert Bosch, and more, focusing on Data Science Interview Questions.
Prepare for your upcoming interviews with these essential Data Science Interview Questions to boost your confidence and readiness.
“Discover the most common data science interview questions with detailed answers on machine learning, statistics, Python, and SQL.”
These Data Science Interview Questions are designed to assess your knowledge in key areas relevant to the field.
Understanding these Data Science Interview Questions will help you prepare effectively for your interviews.

Must-Know Data Science Interview Questions
Get ready to tackle the most pressing Data Science Interview Questions that interviewers often ask.
First they will ask introduction.
When preparing for Data Science Interview Questions, it’s crucial to practice common questions and understand the underlying concepts.
1 .Tell me about your self(all company)
Ans According to your experience you explain about your self
2. How much project you have done. explain any one or explain latest (all company).
Ans. According to requirement if opening in ml is there they will ask ml project if requirement in dl is there they will ask deep learning project
Many Data Science Interview Questions will specifically evaluate your familiarity with data science methodologies.
Practicing these Data Science Interview Questions will give you a significant edge over other candidates.
Focus on brushing up on core concepts that are often highlighted in Data Science Interview Questions for better preparation.
you have to explain according wise.
Many Data Science Interview Questions will focus on your familiarity with machine learning algorithms and data preprocessing techniques.
Practicing Data Science Interview Questions will give you a significant edge over other candidates.
Q2what is difference between feature engineering and exploratory data analysis(CAPGEMINI).
Ans exploratory data analysis (EDA) focuses on understanding the overall characteristics and patterns within a dataset by visualizing and summarizing the data,
To excel, candidates should anticipate various Data Science Interview Questions that assess their problem-solving skills and technical knowledge.
Consider researching frequently asked Data Science Interview Questions to tailor your preparation accordingly.
while feature engineering actively transforms raw data into new, potentially more informative features specifically designed to improve the performance of a machine learning model.
Feature Creation → Generating new features from existing ones
Many interviewers will utilize Data Science Interview Questions to evaluate your analytical thinking and coding skills.
In feature engineering technique used
Consider researching frequently asked Data Science Interview Questions in your target industry to tailor your preparation.
✅ Feature Transformation → Scaling, encoding categorical data, log transformations
✅ Feature Selection → Removing irrelevant or redundant features
✅ Dimensionality Reduction → PCA, LDA to reduce feature size
Utilizing online resources to practice common Data Science Interview Questions will enhance your preparation and confidence.
✅ Handling Missing Values → Imputation, removing null values
Prepare to encounter behavioral Data Science Interview Questions that assess your team collaboration and project management abilities.
📌 Techniques Used in EDA
✅ Descriptive Statistics → Mean, median, mode, standard deviation
Reviewing past Data Science Interview Questions can help you anticipate potential inquiries and formulate your responses effectively.
Utilizing online resources to practice common Data Science Interview Questions will enhance your confidence and readiness.
✅ Data Visualization → Histograms, scatter plots, box plots, correlation heatmaps
✅ Outlier Detection → Using IQR, Z-score, boxplots
✅ Checking Missing Values → Finding null values
✅ Feature Relationships → Correlation analysis
Q4.Explain what is performance matrix in your ml project (asked in Boss).
Ans explain your precision score and recall score in your ml project.
Q5.Which model you used in your project and why you used. why you not used relevant model(all company).
Ans explain according to your ml project.explain why because of good result we got in project
Q6.What is precision(maximum company)
Ans precision is a metric that measures how many positive predictions are correct.
It’s calculated by dividing the number of true positives by the total number of positive predictions.
Precision = True Positives / (True Positives + False Positive)
Q7what is precision score in your ml project.(Robert boss)
Ans explain according to you got in your project
Q8 difference between precision and recall (many company)
Ans Precision
The ratio of true positives to the total number of predictions
Measures how often predictions for the positive class are correct
Focuses on the accuracy of the retrieved information
Recall
The ratio of true positives to the total number of actual positive samples
Measures how well the model finds all positive instances in the dataset
Measures the completeness or comprehensiveness of the query result
Q8What is PCA.if in any dataset 100 column is there after PCA how much column it will reduced(scenarios based).(startup company)
Ans Principal component analysis, or PCA, reduces the number of dimensions(column) in large datasets to principal components that retain most of the original information.
It does this by transforming potentially correlated variables into a smaller set of variables, called principal components.
in dataset if there is 100 column is there they will reduced to 10 or 11
Mock interviews can be beneficial when preparing for Data Science Interview Questions, as they simulate the real experience.
Q9What is multicollinearity. why it used.(Genpacket)
Ans Multicollinearity occurs when two or more independent variables (features) in a dataset are highly correlated,
meaning they provide redundant or similar information to a machine learning or statistical model.
Why is it Important?
It makes it difficult to determine which feature has the most impact on the dependent variable.
It affects the stability of regression models, leading to unreliable coefficient estimates.
It increases variance, making model predictions less accurat
Q10.What is outlier(altimetric) .
Ans An outlier is a data point that is significantly different from other observations in a dataset.
It can be much higher or lower than the rest of the values, which may indicate an error, a rare event, or valuable hidden information
Q11 Assumption of linear regression(indigene)
Ans Linear regression models are based on several assumptions, including linearity, independence, normality, and homoscedasticity.
Q12.did you used any data base .why you u are using database.(startup)
Ans. Yes we are using data base .if data is large then for storing we are using data base
Q13 what is imbalance data(capegemini)
Ans imbalance data set when two or more classes in data set are not equal
Q14 What is sampling(Tech mahindra)
Ans Sampling means selecting the group that you will actually collect data from in your research. For example,
if you are researching the opinions of students in your university, you could survey a sample of 100 students.
In statistics, sampling allows you to test a hypothesis about the characteristics of a population
Q15 What is up sampling and down sampling(Tech mahindra)
Ans Up sampling and down sampling are techniques for changing the sampling rate of data, images, or signals.
Up sampling
Increases the number of samples in a dataset
Used to increase the spatial resolution of an image
Used to zoom in on an image or eliminate pixelation
Used to correct imbalanced data in a dataset
Also known as oversampling
Down sampling
Decreases the number of samples in a dataset
Used to reduce the storage and transmission requirements of images
Used to decrease the bit rate when transmitting over a limited bandwidth
Also known as decimation
Q15 which tools you are using while sending your code to your manager(asked all commands)
Ans.According to your company used .like jeera
Q16 What is p value(Capgemini)
Ans The p-value (probability value) is a statistical measure that helps
determine whether the results of an experiment or hypothesis test are significant or occurred by chance.
Why is the P-Value Important?
The p-value tells us if we should reject or fail to reject a null hypothesis (H₀) in statistical testing.
A small p-value (≤ 0.05) → Strong evidence against the null hypothesis → Reject H₀ (significant result)
A large p-value (> 0.05) → Weak evidence against the null hypothesis → Fail to reject H₀ (not significant)
Q17 What hypothesis testing(oracle)
Ans Hypothesis testing is a statistical method used to determine whether there is enough evidence in a dataset to support a certain claim (hypothesis) about a population.
It helps in decision-making by testing an assumption using sample data and drawing conclusions about the entire population
Q18 What is random forest(maximum company)
Ans Random forest is a machine learning algorithm that combines multiple decision trees to make predictions. It’s a popular and flexible method for classification and regression tasks.
How it works
Each decision tree in the forest is trained on a random subset of the data.
Each tree considers a random subset of features when making splits.
For classification tasks, the forest predicts by majority voting among trees.
For regression tasks, it averages the predictions.
Why it’s used
It’s easy to use and flexible.
It can handle both classification and regression problems.
It’s robust to overfitting.
It performs well with large datasets.
It can handle both numerical and categorical features.
Q19What is decision tree(more than 10 company)
Ans A decision tree is a flowchart-like diagram that shows the possible outcomes of a series of decisions.
It’s a popular tool for making decisions, solving problems, and building predictive models.
How it works
Start with a main idea, or root node
Branch out based on the consequences of each decision
Each branch represents an alternative course of action
Each branch ends with a node that represents a chance event or decision
Each branch ends with a node that represents the outcome of that chance event or decision
Q20 What is bias(maximum company)
Ans .Bias means that the model favor one result more than others. bias is the simplifying assumption made by a model to make the target function easier to learn.
the model with high bias pays very little attention to the training data and oversimplified the model.it always leads to high error in training and test data
Q21 what is variance(some ml company)
Ans variance is the amount that the estimate of target function will change if different training data was used.
the model with high variance pay a lot of attention to training data and does not generalize on the data which it has not seen before.
as a result such model perform very well on training data but have high error on test data.
Q22What is overfitting(all company almost)
Ans Overfitting is a modeling error that occurs when a model fits too closely to a given set of data. This can make the model less accurate at predicting new data.
How it happens
A model can overfit when it learns too much from the training data, including irrelevant details like noise or outliers
A model can overfit when it’s too complex or trains for too long on sample data
How to identify overfitting
Check validation metrics like accuracy and loss
Compare the model’s performance on training data to its performance on evaluation data
Q22 what is Under fitting(almost all company asked)
Ans Underfitting occurs when a machine learning model is too simple to capture the patterns in the training data. This results in poor performance on both the training data and new data.
Causes
Insufficient data: There may not be enough data or the right type of data for the task.
Inadequate training time: The model may not have had enough time to train.
Model complexity: The model may not be complex enough to represent the relationships in the data.
Signs of underfitting
High error rate on both the training set and unseen data High bias and low variance
Q22 What is bagging technique(some company)
Ans Bagging (bootstrap aggregating) is a machine learning technique that combines multiple models to improve the accuracy and stability of a predictive model.
It’s a popular ensemble learning method that’s used for classification and regression tasks.
How bagging works
Randomly select subsets of the training data with replacement
Train a model on each subset
Combine the predictions from each model to get a more accurate prediction
Q23 What is boosting technique (more than 10 company)
Ans When weak learners combined together make a strong learner is called boosting technique.it is bootstrap technique.
Q24 What is entropy(ml company)
Ans Entropy is a measure of disorder or impurity in a dataset, and it’s used in decision trees to split data into more homogeneous subsets.
Q25 What is Ginni index(ml company)
Ans The Gini index is a measure of how impure or random a dataset is. It’s used in decision trees to evaluate splits and reduce impurities.
How it works
The Gini index ranges from 0 to 1.
A Gini index of 0 indicates a pure dataset, where all elements belong to the same class.
A Gini index of 1 indicates a completely impure dataset, where elements are randomly distributed across different classes.
The Gini index aims to reduce impurities from the root nodes to the leaf nodes of a decision tree
Q26What is Clustering(Mphasis)
Ans Clustering in machine learning is a technique that groups similar data points into clusters. It’s an unsupervised learning method, which means it doesn’t require prior knowledge or labeled data.
How does clustering work?
A clustering algorithm analyzes the data points
It groups data points that are similar to each other into clustersIt identifies natural groupings or patterns in the data
Why use clustering?
Clustering can help identify patterns and connections in data
It can be used to group data into separate groups based on similarities
It can help identify groups of similar records and label them
Q27What is k means clustering in k means clustering what is k .(wipro)
Ans What is K-Means Clustering?
K-Means Clustering is an unsupervised machine learning algorithm used for grouping similar data points into “K” clusters.
It is commonly used in pattern recognition, customer segmentation, and anomaly detection.
What is “K” in K-Means Clustering?
“K” is the number of clusters you want to divide your data into.
If K = 3, the algorithm will group data into 3 clusters.
Choosing the right K is crucial, often determined using the Elbow Method or Silhouette Score
Q28What is neural network(some dl company)
Ans A neural network is a machine learning model that makes decisions like the human brain.
It uses processes that mimic how biological neurons work together.
Q29 What is backward propagation(startup company)
Ans Backpropagation is a machine learning algorithm that trains artificial neural networks by adjusting weights and biases.
It’s a key part of neural network learning.
How it works
Forward pass: Input data is passed through the network’s layers to generate an output.
Error calculation: The difference between the input and the output is calculated.
Backward pass: The error is propagated backward through the hidden layers.
Weights update: The weights are adjusted to reduce the error.
Q30 What is activation function(almost all company)
Ans An activation function is a mathematical function that determines the output of a neuron in an artificial neural network.
It’s a key part of neural networks that enable them to learn complex patterns in data.
How activation functions work
The activation function takes the weighted sum of inputs and a bias term
It decides whether to activate a neuron based on this sum
It transforms the input signal into an output signal
The output signal is then passed on to the next layer
Q31What is vgg16.differnence between resonet 50 and vgg 16(nityo infotech)
AnsVGG16 (Visual Geometry Group 16) is a deep convolutional neural network (CNN) architecture developed by the Visual Geometry Group at Oxford.
It was introduced in the 2014 ILSVRC (ImageNet Large Scale Visual Recognition Challenge) and is widely used for image classification, feature extraction, Key Differences: VGG16 vs. ResNet-50
| Feature | VGG16 | ResNet-50 |
| Developed By | Oxford (VGG) | Microsoft |
| Year | 2014 | 2015 |
| Number of Layers | 16 | 50 |
| Architecture Type | Sequential CNN | Residual Learning (Skip Connections) |
| Vanishing Gradient Problem | Yes (deeper versions struggle) | No (solved using skip connections) |
| Number of Parameters | ~138M | ~25.6M |
| Computational Cost | High | Lower than VGG16 |
| Accuracy on ImageNet | Good | Better than VGG16 |
| Training Time | Slower | Faster |
| Best For | Simple tasks, transfer learning | Complex tasks, deep learning applications |
and transfer learning.
Q32 What is pretrained model(some dl company)
Ans AI, pre-trained models are neural network models that have been trained on a large dataset to perform a specific task
Q33How much images you used in deep learning project.(wipro)
Ans according to your project how much images used inform
Q34Who given these images.(wipro)
Ans also explain sources where you images taken
Q35Where you find any challenges in your project .explain how you overcome from it.(all company)
according to your project where you stuck in project and how you resolved
Q35Which model you used in deep learning project .explain them.(almost all company)
Ans .According to your project which model you worked ssd or yolo or other
Q36What is early stopping criteria(dl startup company)
Ans “early stopping criteria” refers to a technique where the training process is halted when a model’s performance on a validation dataset starts to degrade,
indicating that the model is overfitting to the training data and should not be trained further,
thus preventing excessive complexity and improving generalization ability;
essentially, it means stopping training when the model begins to perform worse on unseen data despite continued training on the training set.
Q37What is gradient decent(almost all company)
Ans Gradient descent is a mathematical algorithm that finds the lowest value of a function.
It’s often used to train machine learning models and neural networks.
How it works
Start with a random input
Find the direction in which the function decreases the most
Take a small step in that direction
Repeat until the function is close to zero
Q38What is vanishing gradient decent(more than 10 company)
AnsThe vanishing gradient problem is a challenge that occurs when gradients become very small during training of deep neural networks.
This can slow down or stop the learning process.
How it happens
The problem occurs during backpropagation, when the algorithm updates weights by computing gradients
The gradients become smaller as they are multiplied during backpropagation through the layers
This is more likely to happen in networks with many layers
Why it’s a problem
When gradients are close to zero, the model’s parameters don’t update much
This can lead to poor learning performance
It can limit the network’s ability to learn complex features
Q39What exploding gradient decent(more than 10 company)
Ans Exploding gradient descent is a problem that occurs when gradients in a neural network become too large, which can destabilize training. This can lead to poor model performance. Causes
Initial weights: The initial values of the network’s weights can cause exploding gradients.
Network architecture: Deep neural networks with many layers can cause exploding gradients.
Activation functions: The activation functions used in the network can cause gradients to amplify.
Numerical instability: The chain rule in calculus can cause gradients to amplify as they move through the network.
Q40 What is Batch gradient decent
Ans Batch Gradient Descent is an optimization algorithm used to minimize a loss function in machine learning by updating model parameters based on the entire dataset in each iteration.
It is commonly used in training models like linear regression, logistic regression, and deep learning neural networks.
Q41 What is Stochastic Gradient decent
Ans Stochastic Gradient Descent (SGD) is an optimization algorithm used to update model parameters based on the gradient of the loss function with respect to a single training example at a time. It is commonly used indeep learning, linear regression, logistic regression, and SVMs.
Q42 can you used ml model in object detection which model you can used.(startup company)
Ans yes you can do but its slow and accuracy will be less it slow
Q43Can you use ml model in 0deep learning project .if yes why if not why.(startup company)
Ans Yes you can do but its slow and accuracy will be less it slow
Q44What is segmentation(Tata technology).
Ans Segmentation in deep learning is the process of dividing an image into segments or regions.
It’s a crucial step in computer vision and is used in many applications, including self-driving cars and medical imaging.
Q45What is knn model
Ans k-Nearest Neighbors (k-NN) is a supervised machine learning algorithm used for classification and regression. It is a non-parametric,
instance-based learning algorithm, meaning it makes predictions based on the similarity of new data points to existing ones.
How k-NN Works
1️⃣ Choose a value of 𝑘
k (number of nearest neighbors).
2️⃣ Calculate the distance between the new data point and all points in the dataset (e.g., using Euclidean distance).
3️⃣ Find the k closest points (neighbors).
4️⃣ For classification: Assign the most common class among the k neighbors.
5️⃣ For regression: Compute the average of the k nearest values.
Q46 What is svm model.
Ans Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression tasks.
It is particularly effective for high-dimensional datasets and works well for both linear and non-linear classification.
How SVM Works
1️⃣ Finds the best decision boundary (hyperplane) that maximizes the margin between different classes.
2️⃣ Support vectors are the closest data points to the hyperplane; they influence its position.
3️⃣ Objective: Maximize the margin between the nearest points from different classes.
For linearly separable data → SVM finds the best straight line (2D) or plane (3D+).
For non-linearly separable data → SVM uses kernels to transform data into higher dimensions.
Q47What is linear regression model.(some company)
Ans Linear Regression is a supervised learning algorithm used for predicting continuous values by finding the best-fitting line
that minimizes the difference between actual and predicted values.
Q48 What is logistic regression.
Ans The logistic regression technique involve the dependent variable , which can be represented in the binary(0,1,true or false yes or no) value,
which means that the outcome could only be in either one from of two ,for example ,it can be utilized when we need to find the probability of a successful or fail event
Q49 What is Rnn.
Ans Recurrent neural network (RNN) is a type of artificial neural network that can process sequential data. It can be used to solve problems like speech recognition, language translation, and time series prediction
Q50What is difference between shallow copy deep copy(cognizant)
Ans Key Differences Between Shallow Copy and Deep Copy
| Feature | Shallow Copy (copy.copy()) | Deep Copy (copy.deepcopy()) |
| Creates a new object | Yes | Yes |
| Copies nested objects? | No (only references) | Yes (full copy) |
| Changes in nested objects affect the original? | Yes | No |
| Memory Usage | Lower (references only) | Higher (full copy) |
| Performance | Faster | Slower |
Q52Whats module(tech mahindra)
Ans In Python, a module is a file containing Python code that defines functions, classes, and variables.
Modules allow you to organize your code into reusable components,
making it easier to maintain and share code across different projects.
Q53What is package(Tech mahindra)
Ans A package in Python is a collection of modules that help organize and reuse code efficiently.
Packages allow you to import and use pre-written functions instead of coding from scratch.
Q54create a table using pandas .sum of individual column .delete some rows. mean of any column. what is groupby function.(startup)
Ans
# Creating a DataFrame
data = {
‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’, ‘David’, ‘Eve’],
‘Age’: [25, 30, 35, 40, 45],
‘Salary’: [50000, 60000, 70000, 80000, 90000],
‘Department’: [‘HR’, ‘IT’, ‘IT’, ‘Finance’, ‘HR’]
}
df = pd.DataFrame(data)
print(df) # Display the table
Sum of Individual Columns
sum_age = df[‘Age’].sum()
sum_salary = df[‘Salary’].sum()
print(“Total Age:”, sum_age)
print(“Total Salary:”, sum_salary)
Delete Some Rows
df = df.drop(2) # Drops row with index 2
print(df)
What is groupby() in Pandas?
groupby() is used to group data based on a specific column and perform operations like sum, mean, count, etc.
grouped = df.groupby(‘Department’)[‘Salary’].mean()
print(grouped)
Q59 Sorting of number without using sort function(more then 10 company)
Ans
def bubble_sort(arr):
n = len(arr)
for i in range(n):
for j in range(0, n – i – 1):
if arr[j] > arr[j + 1]: # Swap if the element is greater
arr[j], arr[j + 1] = arr[j + 1], arr[j]
return arr
# Example usage
numbers = [64, 25, 12, 22, 11]
sorted_numbers = bubble_sort(numbers)
print(“Sorted Numbers:”, sorted_numbers)
aQ59Difference between list and set(more than 5 company)
AnsDifference Between List and Set in Python
| Feature | List (list) | Set (set) |
| Definition | Ordered collection of elements | Unordered collection of unique elements |
| Duplicates | Allows duplicates | Does not allow duplicates |
| Indexing | Supports indexing (list[0]) | Does not support indexing |
| Order | Maintains insertion order | Does not maintain order |
| Mutability | Mutable (can change elements) | Mutable (can add/remove elements) |
| Performance | Slower for search operations (O(n)) | Faster for search (O(1)) due to hashing |
| Use Case | Storing and accessing ordered data | Storing unique elements |
Q60 Remove duplicate without using set
Ans def remove_duplicates(lst):
unique_list = [] # Empty list to store unique elements
for item in lst:
if item not in unique_list:
unique_list.append(item)
return unique_list
# Example usage
numbers = [1, 2, 2, 3, 4, 4, 5, 6, 6, 7]
print(remove_duplicates(numbers))
Q61 what is pixel(wipro)
Ans A pixel, short for “picture element,” is the smallest unit of a digital image or display,
representing a single point of color and brightness that, when combined with others,
forms the images we see on screens.
Q62 explain size of color rgb in each pixel in image
Ans Each pixel in a digital image is made up of three color components: Red (R), Green (G), and Blue (B). This is known as the RGB color model.
Each color component has a value between 0 and 255 (for 8-bit images), meaning a pixel can have 256 × 256 × 256 = 16.7 million possible colors!
Q62can be size in is color will be more than 256 in each pixel if yes then why
Ans Yes! The number of colors per pixel can be more than 256 depending on the bit depth of the image.
Understanding Bit Depth & Color Depth
Bit depth determines how many shades each R, G, and B channel can have.
| Bit Depth | Colors per Channel (R/G/B) | Total Colors per Pixel |
| 8-bit | 256 (0–255) | 16.7 million (256³) |
| 10-bit | 1,024 (0–1023) | 1.07 billion (1024³) |
| 12-bit | 4,096 (0–4095) | 68 billion (4096³) |
| 16-bit | 65,536 (0–65535) | 281 trillion (65536³) |
| 32-bit (HDR) | Over 4.2 billion | More than trillions |
Q61.Polindrome of words in sentence given
Ans def is_palindrome(word):
return word == word[::-1] # Check if word is the same when reversed
def check_palindromes(sentence):
words = sentence.split() # Split sentence into words
result = {word: is_palindrome(word) for word in words} # Check each word
return result
# Example usage
sentence = “madam level racecar hello world noon”
palindrome_results = check_palindromes(sentence)
# Print results
for word, is_pal in palindrome_results.items():
print(f”‘{word}’ is a palindrome: {is_pal}”)
Note some company asked easy leet code question some company asked medium and easy hacker earth(tiger analytica)