IT Training Courses
IT Training Courses

Do you need IT and Project Management Training ?

COMNet Group can help!  Please call our support numbers now:


Illinois:  (847) 458-8281


North Carolina: 


Charlotte Area: (704) 909-2792


RTP Area: (919) 827-4364

or send us an email at




Schaumburg/Hoffman Estates Area:


COMNet Group Inc.

2815 Forbs Avenue, Suite 107

Hoffman Estates, IL 60192


Oak Brook and Naperville Area:


COMNet Group Inc.

4320 Winfield Road, Suite 200

Cornerstone @Cantera

Warrenville, IL 60555


Gurnee/Waukegan/Grayslake/Lake Forest Area:


COMNet Group Inc.

100 Saunders Road, Suite 150

Lake Forest, IL 60045




University Executive Park Area



COMNet Group Inc.

301 McCullough Drive, Suite 400

Charlotte, NC 28262


Phone: (704) 909-2792




2530 Meridian Pkwy, Suite 200

Durham, NC 27713


Phone: (919) 827-4364




4208 Six Forks Road, Suite 1000

Raleigh, NC 27609


Phone: (919) 827-4364


Big Data – Apache Hadoop




Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytesof data in a single data set. The target moves due to constant improvement in traditional DBMS technology as well as new databases like NoSQL and their ability to handle larger amounts of data. With this difficulty, new platforms of "big data" tools are being developed to handle various aspects of large quantities of data.




Implement Hadoop jobs to extract business value from large and varied data sets
Write, customize and deploy MapReduce jobs to summarize data
Load and retrieve unstructured data from HDFS and HBase
Develop Hive and Pig queries to simplify data analysis
Test and debug jobs using MRUnit
Monitor task execution and cluster health
Big Data overview


Structure of a Hadoop cluster


  • Name Nodes
  • Data nodes
  • Job trackers
  • Task trackers
  • Cluster modes
  • Stand alone
  • Distributed
  • Pseudo-distributed
  • Basic operations through Hadoop Cli
  • Read and Write operations - behind the scenes
  • Configurations and setup
  • Compression
  • Persistence
  • Ganglia
  • XML files
  • Permissions
  • MR job
  • Word count example
  • Input Splits
  • Input formats
  • Output formats
  • Mappers
  • Reducers
  • Partioners/combiners
  • Counters
  • Optimization
  • More complicated MapReduce program
  • Ecosystem
  • Streaming
  • Zookeeper
  • Oozie
  • Hbase
  • Pig
  • Sqoop
  • Hive
  • Monitoring
  • Ganglia
  • Job tracker
  • Troubleshooting
  • Fsck


Print Print | Sitemap
© COMNet GROUP INC. 2005-2017 All Rights Reserved.