Welcome to Comnet Group Inc.

Courses

Big Data Hadoop Administrator

Course number: CGIBDA40

Hadoop Administration certification training will guide you to gain expertise in maintaining large and complex Hadoop Clusters. You will learn exclusive Hadoop Admin activities like Planning, Installation, Configuration, Monitoring & Tuning. Furthermore, you will learn about Cloudera Hadoop 2.0, and you will master the security implementation through Kerberos and Hadoop v2 through industry-level cases studies.

Prerequisites

There are no prerequisites as such for Hadoop Administration Training, but basic knowledge of Linux command line interface will be considered beneficial.

Target Audience

Hadoop administrators, Linux systems administrators, database administrators, network administrators, and developers who need to know how to install and manage their Hadoop development clusters will benefit from this course.

Certification

Big Data Hadoop Administrator by Cloudera

Exam

Cloudera’s CCA Administrator Exam (CCA131)

• Number of Questions: 8–12 performance-based (hands-on) tasks on pre-
configured Cloudera Enterprise cluster.
• Time Limit: 120 minutes
• Passing Score: 70%
• Language: English

Accreditation

Post class completion, students can appear for the Cloudera’s CCA Administrator Exam (CCA131).
Students will also receive a “Certificate of Completion” from COMNet Group Inc.

Course Content
Lesson 1: Understanding Big Data and Hadoop

Learning Objectives: Understand Big Data and analyze limitations of traditional solutions. You will learn about the Hadoop and its core components and you will learn the difference between Hadoop 1.0 and Hadoop 2.x.

Topics covered are:

  • Introduction to big data
  • Common big data domain scenarios
  • Limitations of traditional solutions
  • What is Hadoop?
  • Hadoop 1.0 ecosystem and its Core Components
  • Hadoop 2.x ecosystem and its Core Components
  • Application submission in YARN
Lesson 2: Hadoop Cluster and its Architecture

Learning Objectives: In this module, you will learn about Hadoop Distributed File System, Hadoop Configuration Files and Hadoop Cluster Architecture. You will also learn the roles and responsibilities of a Hadoop administrator.

Topics covered are:

  • Distributed File System
  • Hadoop Cluster Architecture
  • Replication rules
  • Hadoop Cluster Modes
  • Rack awareness theory
  • Hadoop cluster administrator responsibilities
  • Understand working of HDFS
  • NTP server
  • Initial configuration required before installing Hadoop
  • Deploying Hadoop in a pseudo- distributed mode
Lesson 3: Hadoop Cluster Setup & Working with Hadoop Cluster

Learning Objectives: Learn how to build a Hadoop multi-node cluster and understand the various properties of Namenode, Datanode and Secondary Namenode

Topics covered are:

  • OS Tuning for Hadoop Performance
  • Pre-requisite for installing
  • Hadoop Hadoop Configuration Files
  • Stale Configuration
  • RPC and HTTP Server Properties
  • Properties of Namenode, Datanode and Secondary Namenode
  • Log Files in Hadoop
  • Deploying a multi-node Hadoop cluster
Lesson 4: Hadoop Cluster Administration and Maintenance

Learning Objectives: In this module, you will learn how to add or remove nodes to your cluster in adhoc and recommended way. You will also understand day to day Cluster Administration tasks like balancing data in cluster, protecting data by enabling trash, attempting a manual failover, creating backup within or across clusters.

Topics covered are:

  • Commissioning and Decommissioning of Node
  • HDFS Balancer
  • Namenode Federation in Hadoop
  • High Availabilty in Hadoop
  • Trash Functionality
  • Checkpointing in Hadoop
  • Distcp
  • Disk balancer
Lesson 5: Computational Frameworks, Managing Resources and Scheduling

Learning Objectives: Get to know about the various processing frameworks in Hadoop and understand the YARN job execution flow. You will also learn about various schedulers and MapReduce programming model in the context of Hadoop administrator and schedulers.

Topics covered are:

  • Different Processing Frameworks
  • Different Phrases in Mapreduce
  • Spark and its Features
  • Application Workflow in YARN
  • YARN Metrics
  • YARN Capacity Scheduler and Fair Scheduler
  • Service Level Authorization (SLA)
Lesson 6: Hadoop 2.x Cluster: Planning and Management

Learning Objectives: In this module, you will understand the insights about Cluster Planning and Managing, what are the aspects one needs to think about when planning a setup of a new cluster.

Topics covered are:

  • Planning a Hadoop 2.x cluster
  • Cluster sizing
  • Hardware, Network and Software considerations
  • Popular Hadoop distributions
  • Workload and usage patterns
  • Industry recommendations
Lesson 7: Pig, Hive Installation and Working (Self-paced)

Learning Objectives: Get to know the working and installation of Hadoop ecosystem components such as Pig and Hive.

Topics covered are:

  • Explain Hive
  • Hive Setup
  • Hive Configuration
  • Working with Hive
  • Setting Hive in local and remote metastore mode
  • Pig setup
  • Working with Pig
Lesson 8: HBase, Zookeeper Installation and Working (Self-paced)

Learning Objectives: In this module, you will learn about the working and installation of HBase and Zookeeper.

Topics covered are:

  • HBase Architecture
  • MemStore, WAL, BlockCache
  • HBase Hfile
  • Compactions
  • HBase Read and Write
  • HBase balancer and hbck
  • HBase setup
  • Working with HBase
  • Installing Zookeeper
Lesson 9: Understanding Oozie (Self-paced)

Learning Objectives: In this module, you will learn about Apache Oozie which is a server- based workflow scheduling system to manage Hadoop jobs.

Topics covered are:

  • Oozie overview
  • Oozie Features
  • Oozie workflow, coordinator and bundle
  • Start, End and Error Node
  • Action Node
  • Join and Fork
  • Decision Node
  • Oozie CLI
  • Install Oozie
Lesson 10: Data Ingestion using Sqoop and Flume (Self-paced)

Learning Objectives: Learn about the different data ingestion tools such as Sqoop and Flume.

Topics covered are:

  • Types of Data Ingestion
  • HDFS data loading commands
  • Purpose and features of Sqoop
  • Perform operations like, Sqoop Import, Export and Hive Import
  • Sqoop 2
  • Install Sqoop
  • Import data from RDBMS into HDFS
  • Flume features and architecture
  • Types of flow
  • Install Flume
  • Ingest Data From External Sources With Flume
  • Best Practices for Importing Data
Lesson 11: Hadoop Security and Cluster Monitoring

Learning Objectives: Learn about the Hadoop cluster monitoring and security concepts. You will also learn how to secure a Hadoop cluster with Kerberos.

Topics covered are:

  • Monitoring Hadoop Clusters
  • Hadoop Security System Concepts
  • Securing a Hadoop Cluster With Kerberos
  • Common Misconfigurations
  • Overview on Kerberos
  • Checking log files to understand Hadoop clusters for troubleshooting
Lesson 12: Cloudera Hadoop 2.x and its Features

Learning Objectives: In this module, you will learn about the Cloudera Hadoop 2.x and various features of it.

Topics covered are:

  • Visualize Cloudera Manager
  • Features of Cloudera Manager
  • Build Cloudera Hadoop cluster using CDH
  • Installation choices in Cloudera
  • Cloudera Manager Vocabulary
  • Cloudera terminologies
  • Different tabs in Cloudera Manager
  • What is HUE?
  • Hue Architecture
  • Hue Interface
  • Hue Features

Available Formats

Live Online
Register