Data Science - Enterprise Edition
Multiple added modules for advanced concepts
​
​
​
​
Machine Learning, Deep Learning with Python programming and demonstration real-time projects on multiple business domain
Modules 1 – Data Engineering Arcitecture
This understanding is must for professionals to showcase experience on Data Engineering. Data Science is just one part of this Data Engineering Echo system. You will be trained on High-Level Designs created for MNCs.
Syllabus:
-
Data Architecture Principles of building Data Platform
-
Project methodologies, deliverables and designs
-
Data Integration concepts. ER & Dimensional modelling.
-
OLTP & OLAP: Reports, Dimensions, Facts, Star & Snowflake Schemas
-
Slowly Changing Dimensions
Module 2 – Python programming
Data Science is one of the application streams of software development. There is no alternative to scale-up as full-stack programmer.
Syllabus
-
Program involving VM architecture,
-
Popular IDEs for Python
-
Conditions, Loops, Functions, Comprehensions & Lambda functions
-
OOPs: Objects, Classes, Constructors, Methods, Modifiers, Polymorphism, Overloading, Overriding, Abstract classes, Interfaces, Multi-Threading, Multi-Processing, GPU’s, Packages, Exception handling, File I/O
-
OS Module & Regular Expression
-
Data Structures: Lists, Dictionaries, Tuples & Sets
-
NumPy Library: Data Handling
-
All important Libraries and their significance
-
Pandas Libraries : Data handling operations
-
Python to Database with from SQL lite
Module 3: Data Science in nutshell
Broad spectrum of Data Science for strategic decisions of business
-
Definition of AI, DS, ML, DL and their Applications
-
Supervised, Semi-Supervised, Unsupervised and Reinforcement Learning
-
Different Roles & Responsibility in Data Science
-
Programming languages for Data Science and their significance
-
Pros, Cons and Challenges of Data Science
-
Data Properties and processing steps
Module 4: Descriptive statistics & Exploratory Data Analytics
Before prediction or building models, this is common process to understand data
-
Descriptive vs Inferential statistics
-
Mean, Median, Mode, Standard Deviation, Variance & Correlation
-
Pearson’s corelation coefficients
-
Outliers & IQR
-
Distribution: Uniform, Normal, Standard Normal distributions
-
Central Tendency
-
Measures of Variability
-
Modality
-
Chebyshev's & Markov’ theorem
-
Skewness
-
Kurtosis
Module 5 - Data Visualization
Visualization is the key to interpret data.
-
Libraries: matplotlib & Seabourne
-
Line chart, Bar chart & heatmap, scatter plat, swamp, and regression charts, Outliers
-
Uses cases and projects
-
There are many more which will be dealt with for each modules
Other visualizations will be trained with indivisual Models
Module 6 - Inferential statistics
Inference of strategic outcome before making business decisions. ML models will extend further to below concepts for implementation.
-
Binomial theorem
-
Probability Distributions
-
Bayesian Statistics
-
Hypothesis testing
-
Z & t tests and P-value
-
Library statsmodels & scipy
Module 8 - Linear & Polynomial Regression
One of the widely used supervised learning models to understand Linear relationship between dependent and influencing parameters. To handle labelled data.
-
Machine Learning Data Types
-
Univariate analysis
-
Linear Relationships
-
Approaches OLS (SSE), RMSE, MAPE, MAE
-
R-square & Adjusted R-Square
-
Data Sampling – Population & Samples
-
Train & Test split
-
Linear Regression Model
-
Understanding LR Coefficients
-
Approaches to build regression models
-
Interpreting results
-
Selecting influencing Independent Variables
-
Assumptions of Linear Regression
-
Overfit and Underfit models
-
Polynomial Regression – Extension of Linear Regression
-
Projects
Module 9 - Logistic Regression
Essential part of supervised learning model to prediction binomial outcome from the independent parameters
-
Linear vs Logistic Regression
-
Cheat code
-
MLE
-
Plotly & graph_objs Libraries
-
Exploratory Data Analysis
-
Logit summary from Model
-
ROC curves
-
Inference
-
Projects
Module 10 - Decision Tree & Random Forest Models
Tree models can be applied for both supervised and unsupervised learning. It has its significance in feature engineering
-
Introduction
-
Tree structure
-
Gini Index & Entropy
-
Tuning – CART regression
-
Graphviz library
-
Over Fit & Underfit Models
-
Bias & Variance
-
Random Concepts
-
Feature Engineering projects
Module 11 - Clustering Models, Hierarchical & K-Means
Unsupervised learning models to Group and classify un-labelled data clusters
-
Unsupervised Learning
-
Content & Collaboration Filtering
-
Recommendation Engine
-
Wards & Dendogram
-
KNN versus Hierarchical clustering
-
Visualizing Clusters
-
Limitations of K-Means
-
Tuning: Silhouette Analysis - Elbow Method
-
Projects
Module 12 - SVM Support Vector Machines
Widely used model often used for accuracy with less data
-
Introduction to SVM
-
Hyper-plane
-
C & Gamma parameters
-
Applying for multidimension
-
Kernel Trick
-
Tuning SVM parameters
-
Projects
Module 12 - Ensemble ML Algorithms: Bagging, Boosting, Voting
These models often combine the estimates of other models and plays important supporting role
-
Demonstration with Project
-
Ensemble concept
-
Combine Model Prediction into Ensemble Prediction
-
Bagging Algorithms – Decision Tree & Random Forest
-
Extra Trees Classifier
-
Ada Boost
-
Stochastic Gradient Boosting
-
Voting Ensemble
-
Model Comparison
-
Summary
-
Projects
Module 14 - Miscellaneous Concepts
-
Cross Validation, Lasso, Ridge Regression concept
-
Chi-square test & PCA - Principal Component Analysis
-
Targeting Multicollinearity with Python
-
VIF
​
​