About The Course:-
This course will cover concept such as HDFS, Hadoop Cluster, Hadoop Architecture etc.,.
Who Should Take this?
Systems administrators, linux administrators, windows administrators, Infrastructure engineers, Big Data Architects, DB Administrators, IT managers and Mainframe Professionals.
Pre-requisites for this training
A fundamental knowledge of Java and J2EE is required.
After completion of the course you should be able to understand :
Data Import Techniques
Exploratory Data Analysis
Session 1: Hadoop Distributed File System – Importing and Exporting Data
Learning Objectives – This module will focus on the Hadoop and its architecture in detail. Next we focus on process of importing and exporting data to and from HDFS. The sources and destinations include the local filesystem, relational databases, NoSQL databases, distributed databases, and other Hadoop clusters.
Session 2: Big Data Analysis
Learning Objectives – This module is designed to highlight many of the more powerful features of the various tools. You will find many of these features and operators useful as you begin solving your own problems.
Session 3: Introduction to Data Analytics
Learning Objectives – This module tells you what Business Analytics is and how R can play an important role in solving complex analytical problems. It tells you what is R and how it is used by the giants like Google, Facebook, Bank of America, etc.
Session 4: Introduction to R Programming
Learning Objectives – This module starts from the very basics of R programming like datatypes and functions. We present a scenario and let you think about the options to resolve it. E.g which datatype would you use to store the variable or which R function can help you in this scenario.
Session 5: Data Manipulation in R
Learning Objectives – In this module, we start with a sample of a dirty data set and perform Data Cleaning on it, resulting in a data set, which is ready for any analysis. Thus using and exploring the popular functions required to clean data in R.
Session 6: Data Import Techniques in R
Learning Objectives – This module tells you about the versatility and robustness of R which can take-up data in a variety of formats, be it from a csv file to the data scraped from a website. This module teaches you various data importing techniques in R.
Session 7: Exploratory Data Analysis
Learning Objectives – In this module, you will learn that exploratory data analysis is an important step in the analysis. EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis. You will also learn about the various tasks involved in a typical EDA process.
Session 8: Data Visualization in R
Leaning Objectives – In this module, you will learn that visualization is the USP of R. You will learn the concepts of creating simple as well as complex visualizations in R.
Session 9: Data Mining: Clustering Techniques
Learning Objectives – This module lets you know about the various Machine Learning algorithms.The two Machine Learning types are Supervised Learning and Unsupervised Learning and the difference between the two types. We will also discuss ‘K-means Clustering’ and implement it in this module.
Session 10: Data Mining: Association Rule Mining and Sentiment Analysis
Learning Objectives – This module discusses the very popular ‘Association Rule Mining’ Technique. The algorithm and various aspects of the same have been discussed in this module.We will also discuss what ‘Sentiment Analysis’ is and how we can fetch, extract and mine live data from twitter to find out the sentiment of the tweets.
Session 11: Linear and Logistic Regression
Learning Objectives – This module touches the base with the ‘Regression Techniques’. Linear and logistic regression is explained from the very basics with the examples and it is implemented in R using two case studies dedicated to each type of Regression discussed.
Session 12: Anova and Predictive Analysis
Learning Objectives – This module tells you about the Analysis of Variance (Anova) Technique. Another topic that is discussed in this module is Predictive Analysis.
Session 13: Data Mining: Decision Trees and Random Forest
Learning Objectives – This module covers the concepts of Decision Trees and Random Forest.The Algorithm for creation of trees and forests is discussed in a step wise approach and explained with examples. At the end of the class, these are the concepts implemented on a real-life data set. The case studies are present in the LMS.
Session 14 & 15 Project
Learning Objectives – This module discusses the concepts taught throughout the course and their implementation in a Project.