Quick Overview
Data Sources
Minding Data
Recommender systems
Target Marketing
Datatypes
Structured vs unstructured
Static vs streamed
Attitudinal, behavioural and demographic data
Data-driven vs user-driven analytics
data validity
Volume, velocity and variety of data
Models
Building models
Statistical Models
Machine learning
Data Classification
Clustering
kGroups, k-means, the nearest neighbours
Ant colonies, birds flocking
Predictive Models
Decision trees
Support vector machine
Naive Bayes classification
Neural networks
Markov Model
Regression
Ensemble methods
ROI
Benefit/Cost ratio
Cost of software
Cost of development
Potential benefits
Building Models
Data Preparation (MapReduce)
Data cleansing
Choosing methods
Developing model
Testing Model
Model evaluation
Model deployment and integration
Overview of Open Source and commercial software
Selection of R-project package
Python libraries
Hadoop and Mahout
Selected Apache projects related to Big Data and Analytics
Selected commercial solution
Integration with existing software and data sources |