IT Training Courses
IT Training Courses

Do you need IT and Project Management Training ?

COMNet Group can help with your IT Training Needs!  Please call our support numbers now:


North Carolina: 

Charlotte Area: (704) 323-7762

RTP Area: (919) 827-4364


South Carolina: 

Rock Hill : (803) 403-1970



Chicagoland Area: (847) 458-8281


or send us an email at




Schaumburg/Hoffman Estates Area:

COMNet Group Inc.

2815 Forbs Avenue, Suite 107

Hoffman Estates, IL 60192



Gurnee/Waukegan/Grayslake/Lake Forest Area:

COMNet Group Inc.

100 Saunders Road, Suite 150

Lake Forest, IL 60045


Naperville/ Oakbrook Area:

COMNet Group Inc.

4320 Winfield Road, Suite 200

Warrenville, IL 60555





University Executive Park Area


COMNet Group Inc.

301 McCullough Drive, Suite 400

Charlotte, NC 28262

Phone: (704) 323-7762



Cary/Raleigh Area - Weston Parkway


COMNet Group Inc.

5000 Centregreen Way, Suite 500

Cary, NC 27513

Phone: (919) 827-4364





Rock Hill:

COMNet Group Inc.

331 East Main Street, Suite 200

Rock Hill, SC 29730

Phone: (803) 403-1970





Santa Clara / Silicon Valley:

COMNet Group Inc.

5201 Great America Pkwy, Suite 320
Santa Clara, CA 95054

Phone: (408) 916-4937


Big Data – Apache Hadoop




Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytesof data in a single data set. The target moves due to constant improvement in traditional DBMS technology as well as new databases like NoSQL and their ability to handle larger amounts of data. With this difficulty, new platforms of "big data" tools are being developed to handle various aspects of large quantities of data.




Implement Hadoop jobs to extract business value from large and varied data sets
Write, customize and deploy MapReduce jobs to summarize data
Load and retrieve unstructured data from HDFS and HBase
Develop Hive and Pig queries to simplify data analysis
Test and debug jobs using MRUnit
Monitor task execution and cluster health
Big Data overview


Structure of a Hadoop cluster


  • Name Nodes
  • Data nodes
  • Job trackers
  • Task trackers
  • Cluster modes
  • Stand alone
  • Distributed
  • Pseudo-distributed
  • Basic operations through Hadoop Cli
  • Read and Write operations - behind the scenes
  • Configurations and setup
  • Compression
  • Persistence
  • Ganglia
  • XML files
  • Permissions
  • MR job
  • Word count example
  • Input Splits
  • Input formats
  • Output formats
  • Mappers
  • Reducers
  • Partioners/combiners
  • Counters
  • Optimization
  • More complicated MapReduce program
  • Ecosystem
  • Streaming
  • Zookeeper
  • Oozie
  • Hbase
  • Pig
  • Sqoop
  • Hive
  • Monitoring
  • Ganglia
  • Job tracker
  • Troubleshooting
  • Fsck


Print Print | Sitemap
© COMNet GROUP INC. 2020 All Rights Reserved.