Every day, at least 2.5 quintillion bytes of data are getting created. Also it is estimated that the percentage of unstructured data is much higher than the structured. Big data related technologies offer methods and platforms for processing huge amount of structured and unstructured data. By 2015, it is estimated that 4.4 million IT jobs globally will be created to support big data.

Objectives:

At the end of this course, you will be able to

  • Understand big data related problems
  • Understanding importance of Big data
  • Learn about Hadoop ecosystem
  • Design & implement solution for big data related problems using hadoop and related technologies

 

What you will learn:

  • Understand what is big data
  • Distributed computation & storage
  • HDFS
  • Hadoop Eco-System (HDFS, Map-Reduce, Pig, Hive, HBase, Oozy, Sqoop, Zookeeper )
  • Learn data processing and extraction in hadoop
  • What kind of problem can be solved using hadoop
  • Big data related databases (HBase)

 

Prerequisites:

  • Freshers/ Developers who have experience in developing software using java and want to develop a      Career in big data

 

Course Outline

What is big data?

  • Big data problems
  • Limitation of big data
  • Solving big data problems

Hadoop cluster

  • Deployment
  • Components
  • Configuration
  • Regular file system vs. HDFS
  • HDFS I/O operation
  • Adding & Removing node

 

Lab work

Map-Reduce

  • What is Map-Reduce
  • How it works
  • Map-Reduce related problems
  • Map-Recuce & Java
  • Input format
  • Output format
  • Combiners and Partitioners
  • Error handling and testing

 

Lab work

PIG (analytics using Pig)

  • What Is Pig?
  • Why Is It Important?
  • How does it work?
  • Pig Vs MR
  • What is Pig Latin?
  • Where I Should Use Pig?
  • Programming with Pig

 

Lab work

HIVE

  • What is Hive?
  • How does it work?
  • Pig Vs MR Vs Hive
  • Abilities of HIVE Query Language
  • Data model
  • Where I Should Use Pig?
  • Programming with Pig
  • Hive file formats

 

Lab work
Zookeeper

  • What is Zookeeper?
  • Why zookeeper is required?
  • Zookeeper features
  • Coordination Service
  • ZooKeeperData Model
  • ZooKeeperService
  • ZooKeeperAPI
  • How Zookeeper works
  • Problem & solution using zookeeper

 

Lab work
Oozie

  • What is Oozy?
  • Oozie features
  • How to use Oozie
  • Write oozie workflow
  • Deploy & run oozie workflow

 

Sqoop

  • What is Sqoop?
  • Sqoop features
  • How to use sqoop
  • Sqoop Connectors
  • Importing and exporting data using Sqoop
  • Example sqoop usage
Lab work
Hadoop Administration and Monitoring

  • Performance Monitoring
  • Performance Tuning
  • Troubleshooting & logs

 

Hbase

  • Few real life problems
  • Problems with RDBMS
  • What is Hbase?
  • HBase architecture
  • Practice creation, updation of HBase table on shell
  • Loading data into HBase
  • Interacting with HBase through programs

 

Hbase