IT Training Courses
IT Training Courses
COMNet Group is the Proud sponsor PMI-Chicagoland 2018 Symposium
COMNet Group is the Proud sponsor PMI-Chicagoland 2018 Symposium

Do you need IT and Project Management Training ?

COMNet Group can help with your IT Training Needs!  Please call our support numbers now:

 

North Carolina: 

Charlotte Area: (704) 909-2792

RTP Area: (919) 827-4364

 

South Carolina: 

Rock Hill : (803) 403-1970

 

Illinois:

Chicagoland Area: (847) 458-8281

 

or send us an email at info@comnetgroup.com.

LOCATIONS:

ILLINOIS:

 

Schaumburg/Hoffman Estates Area:

COMNet Group Inc.

2815 Forbs Avenue, Suite 107

Hoffman Estates, IL 60192

 

 

Gurnee/Waukegan/Grayslake/Lake Forest Area:

COMNet Group Inc.

100 Saunders Road, Suite 150

Lake Forest, IL 60045

 

Naperville/ Oakbrook Area:

COMNet Group Inc.

4320 Winfield Road, Suite 200

Warrenville, IL 60555

 

 

NORTH CAROLINA:

 

University Executive Park Area

CHARLOTTE:

COMNet Group Inc.

301 McCullough Drive, Suite 400

Charlotte, NC 28262

Phone: (704) 909-2792

 

 

Cary/Raleigh Area - Weston Parkway

Raleigh/Cary/Durham:

COMNet Group Inc.

1000 Centregreen Way, Suite 200

Cary, NC 27513

Phone: (919) 827-4364

 

Durham:

COMNet Group Inc.

2530 Meridian Parkway, Suite 300

Durham, NC 27713

Phone: (919) 827-4364

 

 

SOUTH CAROLINA:

 

Rock Hill:

COMNet Group Inc.

331 East Main Street, Suite 200

Rock Hill, SC 29730

Phone: (803) 403-1970

 

Big Data – Apache Hadoop

 

Overview

 

Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytesof data in a single data set. The target moves due to constant improvement in traditional DBMS technology as well as new databases like NoSQL and their ability to handle larger amounts of data. With this difficulty, new platforms of "big data" tools are being developed to handle various aspects of large quantities of data.

 

Outline

 

Implement Hadoop jobs to extract business value from large and varied data sets
Write, customize and deploy MapReduce jobs to summarize data
Load and retrieve unstructured data from HDFS and HBase
Develop Hive and Pig queries to simplify data analysis
Test and debug jobs using MRUnit
Monitor task execution and cluster health
Big Data overview

 

Structure of a Hadoop cluster

 

  • Name Nodes
  • Data nodes
  • Job trackers
  • Task trackers
  • Cluster modes
  • Stand alone
  • Distributed
  • Pseudo-distributed
  • Basic operations through Hadoop Cli
  • Read and Write operations - behind the scenes
  • Configurations and setup
  • Compression
  • Persistence
  • Ganglia
  • XML files
  • Permissions
  • MR job
  • Word count example
  • Input Splits
  • Input formats
  • Output formats
  • Mappers
  • Reducers
  • Partioners/combiners
  • Counters
  • Optimization
  • More complicated MapReduce program
  • Ecosystem
  • Streaming
  • Zookeeper
  • Oozie
  • Hbase
  • Pig
  • Sqoop
  • Hive
  • Monitoring
  • Ganglia
  • Job tracker
  • Troubleshooting
  • Fsck

 

Print Print | Sitemap
© COMNet GROUP INC. 2005-2017 All Rights Reserved.