About The Course:-
This course will cover concept such as HDFS, Hadoop Cluster, Hadoop Architecture etc.,.
Who Should Take this?
Systems administrators, linux administrators, windows administrators, Infrastructure engineers, Big Data Architects, DB Administrators, IT managers and Mainframe Professionals.
Pre-requisites for this training
This course requires no prior knowledge of Java, Hadoop Cluster Administration or Apache Hadoop. Fundamental knowledge of Linux basics is necessary as Hadoop runs on Linux.
After completion of the course you should be able to unserstand :
- The core technologies of Hadoop
- How to populate HDFS from external sources
- How to plan your Hadoop cluster hardware and software
- How to deploy a Hadoop cluster
- What issues to consider when installing Pig, Hive, and Impala
- What issues to consider when deploying Hadoop clients
- How Cloudera Manager can simplify Hadoop administration
- How to configure HDFS for high availability
- What issues to consider when implementing Hadoop security
- How to schedule jobs on the cluster
- How to maintain your cluster
- How to monitor, troubleshoot, and optimize the cluster
- Management and monitoring tools
1. Hadoop Cluster Administration
- Introduction to Big Data
- Hadoop Architecture
- MapReduce Framework
- A typical Hadoop Cluster
- Data Loading into HDFS
2. Hadoop Architecture and Cluster setup
- Hadoop server roles
- Rack Awareness
- Anatomy of Write and Read
- Replication Pipeline
- Data Processing
- Hadoop Installation and Initial Configuration
- Deploying Hadoop in pseudo-distributed mode
- deploying a multi-node Hadoop cluster and Installing Hadoop Clients
3. Hadoop Cluster: Planning and Managing
- Planning the Hadoop Cluster
- Cluster Size
- Hardware and Software considerations
- Managing and Scheduling Jobs
- types of schedulers in Hadoop
- Configuring the schedulers and run MapReduce jobs
- Cluster Monitoring and Troubleshooting.
4. Backup Recovery and Maintenance
- Configure Rack awareness
- Setting up Hadoop Backup
- whitelist and blacklist data nodes in a cluster
- setup quota’s
- upgrade Hadoop cluster
- copy data across clusters using distcp
- Diagnostics and Recovery
- Cluster Maintenance.
5. Hadoop 2.0 and High Availability
- Configuring Secondary NameNode
- Hadoop 2.0
- YARN framework
- Hadoop 2.0 Cluster setup
- Deploying Hadoop 2.0 in pseudo-distributed mode
- Deploying a multi-node Hadoop 2.0 cluster.
6. Advanced Topics: QJM HDFS Federation and Security
- Configuring HDFS Federation
- Basics of Hadoop Platform Security
- Securing the Platform
- Configuring Kerberos.
7. Oozie Hcatalog/Hive and HBase Administration
- Hcatalog/Hive Administration
- HBase Architecture
- HBase setup
- HBase and Hive Integration
- HBase performance optimization.
8. Project: Hadoop Implementation
- Create a Hadoop Cluster for a Real World Use Case