My Cart 0



Comprendre comment tirer parti de vos données pour prévoir les événements et comportements futurs et découvrir des informations commerciales plus approfondies
Obtenez les compétences pratiques requises pour faire tourner le pipeline d’analyse prédictive
Apprenez à réussir l’architecture, le développement et la gestion de projets de science des données



Ce cours sur mesure est destiné aux ingénieurs logiciels, aux analystes de données, aux ingénieurs de données et à tous ceux qui prévoient d’amorcer et d’utiliser des modèles prédictifs dans la recherche et la production.


1 – Machine Learning


  • An introduction to machine learning tasks and definitions
  • Core principles of building machine learning algorithms
  • A diversity of machine learning algorithms: from linear regression to random forest
  • Core Python packages for machine learning


  • Linear and logistic regressions
  • k-nearest neighbors and k-means
  • Decision trees and random forest
  • Handling classification, regression, and clustering tasks

*Packages of choice are Pandas/NumPy/scikit-learn


  • LASSO/Ridge (regularization)
  • PCA/SVD (dimensionality reduction)
  • Advanced clustering algorithms, such as DBSCAN, expectation-maximization (different similarity approaches to data)
  • Naive Bayes (The Bayes theorem)
  • Complex ensembling schemes, gradient boosting, stacking (iterative refinement)
  • Algorithmic hyperparameter tuning


  • PCA
  • DBSCAN, expectation-maximization, agglomerative clustering, mean shift
  • Naive Bayes
  • Gradient boosting machine, stacking
  • Tree-structured Parzen estimator

*Packages of choice are Pandas/NumPy/scikit-learn/HyperOpt/XGBoost


  • Feature engineering
  • Dealing with missing data and outliers
  • Dealing with imbalanced classification
  • Advanced validation schemes
  • Handling of model versioning
  • CRISP-DM as a major machine learning development methodology


  • Feature engineering: polynomial and logarithmic features, combinations of features; periodic feature encoding; target encodings
  • Imbalanced сlassification: advanced metrics for classification, threshold tuning, over- and undersampling (SMOTE)
  • DBSCAN, expectation-maximization, agglomerative clustering, mean shift
  • Missing data handling: imputation of missing values using k-nearest neighbors or decision trees
  • Advanced validation: cross-validation for time series

*Packages of choice are Pandas/NumPy/scikit-learn

2.  Data Science Applications

Algorithmic text processing is a vast area for neural network application. From text classifications to text understanding, there’s a successful applications of machine learning. We’ll look at basic NLP techniques and for State-of-the-Art applications of NLP:

  • Bag-of-Words approach to text related tasks
  • Sequential approach using RNN architectures
  • Embeddings as richer and dense representations of words
  • State of the Art: contextual embeddings and attention mechanism

Computer Vision is a huge field with most of successes of deep learning, starting from winning of neural networks win in ImageNet competition in 2012. We’ll try to dive a little into some useful applications of it that are constantly present here:

  • Image-specific data transformations
  • Object detection using YOLO/SSD model
  • Image segmentation using U-Net/LinkNet/R-CNN algorithms
  • Architectures for real-time image processing

Transaction data is largely prevalent type of datasets, especially in telecom/banking. Purpose of this module is to show an approach for this data to retrieve useful insights.

  • Data preparation of transactional data
  • Time series specific family of algorithms
  • Statistical and Neural Network approaches for this task

Reinforcement Learning generalizes whole concept of machine learning while allowing to solve some intricate problems. In this module we’ll make an explanation of the concept of reinforcement learning and guide you from basic algorithms that support this concept to methods that lay foundation to latest State-of-the-Art results. We’ll go through this set of algorithms:

  • Markov Decision Process
  • Multi-Armed Bandit Algorithms
  • Q-learning
  • Policy algorithms
3. Deep Learning

We’ll look at a surprisingly strong machine learning techniques that have become really popular recently and will cover the following topics:

  • Structure of neural networks, feedforward neural networks
  • A mechanism for learning neural networks
  • Means of neural network learning process control


  • Neural networks for supervised learning with Keras

*Packages of choice are Pandas/NumPy/scikit-learn/Keras/TensorFlow

Convolution as the core of the neural network layer for spatial data processing. Topics for the day:

  • Image features and representation learning
  • A convolution layer and a deep convolutional network
  • Supporting layers for convolutional neural networks
  • State-of-the-art architectures for image processing
  • Transfer learning and fine tuning

We will:

  • Build a convolutional neural network from scratch to learn image classification
  • Fine-tune existing networks to perform image-related tasks on a different data sets

*Packages of choice are Pandas/NumPy/scikit-learn/Keras/TensorFlow

Neural network architecture for sequential data modelling. Topics for the day:

  • Examples of sequential data and related machine learning tasks
  • The vanilla recurrent neural network architecture and its limitations
  • The advanced recurrent neural network layers architecture

We will implement:

  • Character – and word-level natural language model
  • Fine-tune existing networks to perform image-related tasks on a different data sets

*Packages of choice are Pandas/NumPy/scikit-learn/Keras/TensorFlow

Neural network architectures that developed to solve non-standard tasks such as representation learning and data generation. Topics are:

  • Autoencoder architecture blueprint
  • Properties of autoencoder representations
  • Generative Adversarial Networks
4. Big Data

In this module you’ll learn:

  • Basic knowledge and principal of Hadoop (Yarn, HDFS)
  • Main concepts of Spark such as RDD, Shared Variables, Persistency, Spark architecture, Spark under the hood
  • Сore principles of Spark SQL
  • Basic principles of Spark MLIB work

This module covers real-time processing data which is based on such processing engine as

  • Spark Stream processing data in real-time
  • Spark Structured Stream processing engine built on the Spark SQL engine
  • Apache Kafka stream processing platform

Additionally, you will get knowledge about

  • Integration Kafka and Spark Streaming
  • Integration Kafka and Spark Structured Streaming

The following topics will be covered in this module

  • Introduction, Foundation and Operation wit Apache Cassandra
  • Concept of DSE Analytics(Spark + Cassandra) and DSE Solo Analytics
  • Main principles of SparkConnector by DSE