IT Training Courses
IT Training Courses
 

Do you need IT and Project Management Training ?

COMNet Group can help!  Please call our support numbers now:

 

Illinois:  (847) 458-8281

 

North Carolina: 

 

Charlotte Area: (704) 909-2792

 

RTP Area: (919) 827-4364


or send us an email at info@comnetgroup.com.

LOCATIONS:

ILLINOIS:

 

Schaumburg/Hoffman Estates Area:

 

COMNet Group Inc.

2815 Forbs Avenue, Suite 107

Hoffman Estates, IL 60192

 

Oak Brook and Naperville Area:

 

COMNet Group Inc.

4320 Winfield Road, Suite 200

Cornerstone @Cantera

Warrenville, IL 60555

 

Gurnee/Waukegan/Grayslake/Lake Forest Area:

 

COMNet Group Inc.

100 Saunders Road, Suite 150

Lake Forest, IL 60045

 

NORTH CAROLINA:

 

University Executive Park Area

CHARLOTTE:

 

COMNet Group Inc.

301 McCullough Drive, Suite 400

Charlotte, NC 28262

 

Phone: (704) 909-2792

 

DURHAM:

 

2530 Meridian Pkwy, Suite 200

Durham, NC 27713

 

Phone: (919) 827-4364

 

RALEIGH:

 

4208 Six Forks Road, Suite 1000

Raleigh, NC 27609

 

Phone: (919) 827-4364

 

Big Data – Apache Hadoop

 

Overview

 

Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytesof data in a single data set. The target moves due to constant improvement in traditional DBMS technology as well as new databases like NoSQL and their ability to handle larger amounts of data. With this difficulty, new platforms of "big data" tools are being developed to handle various aspects of large quantities of data.

 

Outline

 

Implement Hadoop jobs to extract business value from large and varied data sets
Write, customize and deploy MapReduce jobs to summarize data
Load and retrieve unstructured data from HDFS and HBase
Develop Hive and Pig queries to simplify data analysis
Test and debug jobs using MRUnit
Monitor task execution and cluster health
Big Data overview

 

Structure of a Hadoop cluster

 

  • Name Nodes
  • Data nodes
  • Job trackers
  • Task trackers
  • Cluster modes
  • Stand alone
  • Distributed
  • Pseudo-distributed
  • Basic operations through Hadoop Cli
  • Read and Write operations - behind the scenes
  • Configurations and setup
  • Compression
  • Persistence
  • Ganglia
  • XML files
  • Permissions
  • MR job
  • Word count example
  • Input Splits
  • Input formats
  • Output formats
  • Mappers
  • Reducers
  • Partioners/combiners
  • Counters
  • Optimization
  • More complicated MapReduce program
  • Ecosystem
  • Streaming
  • Zookeeper
  • Oozie
  • Hbase
  • Pig
  • Sqoop
  • Hive
  • Monitoring
  • Ganglia
  • Job tracker
  • Troubleshooting
  • Fsck

 

Print Print | Sitemap
© COMNet GROUP INC. 2005-2017 All Rights Reserved.