IT Training Courses
IT Training Courses

Do you need IT and Project Management Training ?

COMNet Group can help!  Please call our support numbers now:


North Carolina: 

Charlotte Area: (704) 909-2792

RTP Area: (919) 827-4364



Chicagoland Area: (847) 458-8281


or send us an email at




Schaumburg/Hoffman Estates Area:


COMNet Group Inc.

2815 Forbs Avenue, Suite 107

Hoffman Estates, IL 60192



Gurnee/Waukegan/Grayslake/Lake Forest Area:


COMNet Group Inc.

100 Saunders Road, Suite 150

Lake Forest, IL 60045




University Executive Park Area



COMNet Group Inc.

301 McCullough Drive, Suite 400

Charlotte, NC 28262


Phone: (704) 909-2792



Cary/Raleigh Area - Weston Parkway



COMNet Group Inc.

1000 Centregreen Way, Suite 200

Cary, NC 27513




COMNet Group Inc.

2530 Meridian Parkway, Suite 300

Durham, NC 27713


Phone: (919) 827-4364



Big Data – Apache Hadoop




Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytesof data in a single data set. The target moves due to constant improvement in traditional DBMS technology as well as new databases like NoSQL and their ability to handle larger amounts of data. With this difficulty, new platforms of "big data" tools are being developed to handle various aspects of large quantities of data.




Implement Hadoop jobs to extract business value from large and varied data sets
Write, customize and deploy MapReduce jobs to summarize data
Load and retrieve unstructured data from HDFS and HBase
Develop Hive and Pig queries to simplify data analysis
Test and debug jobs using MRUnit
Monitor task execution and cluster health
Big Data overview


Structure of a Hadoop cluster


  • Name Nodes
  • Data nodes
  • Job trackers
  • Task trackers
  • Cluster modes
  • Stand alone
  • Distributed
  • Pseudo-distributed
  • Basic operations through Hadoop Cli
  • Read and Write operations - behind the scenes
  • Configurations and setup
  • Compression
  • Persistence
  • Ganglia
  • XML files
  • Permissions
  • MR job
  • Word count example
  • Input Splits
  • Input formats
  • Output formats
  • Mappers
  • Reducers
  • Partioners/combiners
  • Counters
  • Optimization
  • More complicated MapReduce program
  • Ecosystem
  • Streaming
  • Zookeeper
  • Oozie
  • Hbase
  • Pig
  • Sqoop
  • Hive
  • Monitoring
  • Ganglia
  • Job tracker
  • Troubleshooting
  • Fsck


Print Print | Sitemap
© COMNet GROUP INC. 2005-2017 All Rights Reserved.