Every day, at least 2.5 quintillion bytes of data are getting created. Also it is estimated that the percentage of unstructured data is much higher than the structured. Big data related technologies offer methods and platforms for processing huge amount of structured and unstructured data. By 2015, it is estimated that 4.4 million IT jobs globally will be created to support big data.

 Big data Hadoop Training: Objectives

At the end of this course, you will be able to


  • Understand big data related problems
  • Understanding importance of Big data
  • Learn about Hadoop ecosystem
  • Design & implement solution for big data related problems using hadoop and related technologies


    What you will learn:


  • Understand what is big data
  • Distributed computation & storage
  • HDFS
  • Hadoop Eco-System (HDFS, Map-Reduce, Pig, Hive, HBase, Oozy, Sqoop, Zookeeper )
  • Learn data processing and extraction in hadoop
  • What kind of problem can be solved using hadoop
  • Big data related databases (HBase)




    • Freshers/ Developers who have experience in developing software using java and want to develop a Career in big data



    Course Outline

    What is big data?


  • Big data problems
  • Limitation of big data
  • Solving big data problems
    Hadoop cluster


  • Deployment
  • Components
  • Configuration
  • Regular file system vs. HDFS
  • HDFS I/O operation
  • Adding & Removing node
    Lab work



  • What is Map-Reduce
  • How it works
  • Map-Reduce related problems
  • Map-Recuce & Java
  • Input format
  • Output format
  • Combiners and Partitioners
  • Error handling and testing

    Lab work

    PIG (analytics using Pig)


  • What Is Pig?
  • Why Is It Important?
  • How does it work?
  • Pig Vs MR
  • What is Pig Latin?
  • Where I Should Use Pig?
  • Programming with Pig
  • Lab work



  • What is Hive?
  • How does it work?
  • Pig Vs MR Vs Hive
  • Abilities of HIVE Query Language
  • Data model
  • Where I Should Use Pig?
  • Programming with Pig
  • Hive file formats
  • Lab work


  • What is Flume?
  • Flume features
  • How to use Flume
  • Develop Flume dataflow
  • Deploy & run Flume dataflow configurations
  • Lab work


  • What is Oozy?
  • Oozie features
  • How to use Oozie
  • Write oozie workflow
  • Deploy & run oozie workflow
  • Sqoop


  • What is Sqoop?
  • Sqoop features
  • How to use sqoop
  • Sqoop Connectors
  • Importing and exporting data using Sqoop
  • Example sqoop usage
  • Lab work

    Hadoop Administration and Monitoring


  • Performance Monitoring
  • Performance Tuning
  • Troubleshooting & logs



  • Few real life problems
  • Problems with RDBMS
  • What is Hbase?
  • HBase architecture
  • Practice creation, updation of HBase table on shell
  • Loading data into HBase
  • Interacting with HBase through programs