Data Warehousing Training

Data WarehousingTraining

Our Data Warehousing training institute is maintaining placement record comparing any other institutes in Chennai. Our students are well enough to clear the first interview after finishing the classes here. Because they already aware of what is Data Warehousing? and how the companies are using that for their projects. Before completing the course with us our students must do a complete real-time project. It is a mandatory one.

STUDENTS

  •  After your Degree, Learn this course and get a job
  •  Enter into E-commerce or IT/Software Industry with Good Salary.
  •  While doing a job, Earn extra Money working as an SEO Freelancer
  • Digital Marketing has a good scope in future

Power Center Components

  • Designer
  • Repository Manager
  • Workflow Manager
  • Workflow Monitor
  • Power Center Admin Console

Informatica Concepts and Overview

  • Informatica Architecture.

Sources

  • Working with relational Sources
  • Working with Flat Files

Targets

  • Working with Relational Targets
  • Working with Flat file Targets

Transformations – Active and Passive Transformations

  • Expression
  • Lookup –Different types of lookup Caches
  • Sequence Generator
  • Filter
  • Joiner
  • Sorter
  • Rank
  • Router
  • Aggregator
  • Source Qualifer
  • Update Strategy
  • Normalizer
  • Union
  • Stored Procedure
  • Slowly Changing Dimension
    1. SCD Type1
    2. SCD Type2 — Date, Flag and Version
    3. SCD Type3

Workflow Manager

  • Creating Reusable tasks
  • Workflows, Worklets & Sessions
  • Tasks
    1. Session
    2. Decision task
    3. Control Task
    4. Event wait task
    5. Timer task
  • Monitoring workflows and debugging errors

Indirect Loading

Constraint-based load ordering

Target Load plan

Worklet, Mapplet, Reusable transformation

Migration –XML migration and Folder Copy.

Scheduling Workflow

Parameter and variables

XML Source, Target and Transformations

Performance Tuning

  • Pipeline Partition
  • Dynamic Partition
  • Pushdown optimization

We have weekly demo classes for Informatica, do feel free to give us a call and drop in.

Introduction to Big Data & Hadoop Fundamentals

Goal : In this module, you will understand Big Data, the limitations of the existing solutions for Big Data problem, how Hadoop solves the Big Data problem, the common Hadoop ecosystem components, Hadoop Architecture, HDFS, Anatomy of File Write and Read, how MapReduce Framework works.

Objectives – Upon completing this Module, you should be able to understand Big Data is a term applied to data sets that cannot be captured, managed, and processed within a tolerable elapsed and specified time frame by commonly used software tools.

  • Big Data relies on volume, velocity, and variety with respect to processing.
  • Data can be divided into three types—unstructured data, semi-structured data, and structured data.
  • Big Data technology understands and navigates big data sources, analyzes unstructured data, and ingests data at a high speed.
  • Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment.

Hadoop training in Chennai Syllabus

Topics:

Apache Hadoop

  • Introduction to Big Data & Hadoop Fundamentals
  • Dimensions of Big data
  • Type of Data generation
  • Apache ecosystem & its projects
  • Hadoop distributors
  • HDFS core concepts
  • Modes of Hadoop employment
  • HDFS Flow architecture
  • HDFS MrV1 vs. MrV2 architecture
  • Types of Data compression techniques
  • Rack topology
  • HDFS utility commands
  • Min h/w requirements for a cluster & property files changes

Module 2   (Duration :03:00:00)

MapReduce Framework

Goal : In this module, you will understand Hadoop MapReduce framework and the working of MapReduce on data stored in HDFS. You will understand concepts like Input Splits in MapReduce, Combiner & Partitioner and Demos on MapReduce using different data sets.

Objectives – Upon completing this Module, you should be able to understand MapReduce involves processing jobs using the batch processing technique.

  • MapReduce can be done using Java programming.
  • Hadoop provides with Hadoop-examples jar file which is normally used by administrators and programmers to perform testing of the MapReduce applications.
  • MapReduce contains steps like splitting, mapping, combining, reducing, and output.

Topics:

Introduction to MapReduce

  • MapReduce Design flow
  • MapReduce Program (Job) execution
  • Types of Input formats & Output Formats
  • MapReduce Datatypes
  • Performance tuning of MapReduce jobs
  • Counters techniques

Module 3   (Duration :03:00:00)

Apache Hive

Goal : This module will help you in understanding Hive concepts, Hive Data types, Loading and Querying Data in Hive, running hive scripts and Hive UDF.

Objectives – Upon completing this Module, you should be able to understand Hive is a system for managing and querying unstructured data into a structured format.

  • The various components of Hive architecture are metastore, driver, execution engine, and so on.
  • Metastore is a component that stores the system catalog and metadata about tables, columns, partitions, and so on.
  • Hive installation starts with locating the latest version of tar file and downloading it in Ubuntu system using the wget command.
  • While programming in Hive, use the show tables command to display the total number of tables.

Topics:

Introduction to Hive & features

  • Hive architecture flow
  • Types of hive tables flow
  • DML/DDL commands explanation
  • Partitioning logic
  • Bucketing logic
  • Hive script execution in shell & HUE

Module 4   (Duration :03:00:00)

Apache Pig

Goal : In this module, you will learn Pig, types of use case we can use Pig, tight coupling between Pig and MapReduce, and Pig Latin scripting, PIG running modes, PIG UDF, Pig Streaming, Testing PIG Scripts. Demo on healthcare dataset.

Objectives – Upon completing this Module, you should be able to understand Pig is a high-level data flow scripting language and has two major components: Runtime engine and Pig Latin language.

  • Pig runs in two execution modes: Local mode and MapReduce mode. Pig script can be written in two modes: Interactive mode and Batch mode.
  • Pig engine can be installed by downloading the mirror web link from the website: pig.apache.org.

Topics:

  • Introduction to Pig concepts
  • Pig modes of execution/storage concepts
  • Pig program logics explanation
  • Pig basic commands
  • Pig script execution in shell/HUE

Module 5   (Duration :03:00:00)

Goal : This module will cover Advanced HBase concepts. We will see demos on Bulk Loading, Filters. You will also learn what Zookeeper is all about, how it helps in monitoring a cluster, why HBase uses Zookeeper.

Objectives – Upon completing this Module, you should be able to understand  HBasehas two types of Nodes—Master and RegionServer. Only one Master node runs at a time. But there can be multiple RegionServersat a time.

  • The data model of Hbasecomprises tables that are sorted by rows. The column families should be defined at the time of table creation.
  • There are eight steps that should be followed for installation of HBase.
  • Some of the commands related to HBaseshell are create, drop, list, count, get, and scan.

Apache Hbase

  • Introduction to Hbase concepts
  • Introdcution to NoSQL/CAP theorem concepts
  • Hbase design/architecture flow
  • Hbase table commands
  • Hive + Hbase integration module/jars deployment
  • Hbase execution in shell/HUE

Module 6   (Duration :02:00:00)

Goal : Sqoop is an Apache Hadoop Eco-system project whose responsibility is to import or export operations across relational databases. Some reasons to use Sqoop are as follows:

  • SQL servers are deployed worldwide
  • Nightly processing is done on SQL servers
  • Allows to move certain part of data from traditional SQL DB to Hadoop
  • Transferring data using script is inefficient and time-consuming
  • To handle large data through Ecosystem
  • To bring processed data from Hadoop to the applications

Objectives – Upon completing this Module, you should be able to understand Sqoop is a tool designed to transfer data between Hadoop and RDBs including MySQL, MS SQL, Postgre SQL, MongoDB, etc.

  • Sqoop allows the import data from an RDB, such as SQL, MySQL or Oracle into HDFS.

Topics:

Apache Sqoop

  • Introduction to Sqoop concepts
  • Sqoop internal design/architecture
  • Sqoop Import statements concepts
  • Sqoop Export Statements concepts
  • Quest Data connectors flow
  • Incremental updating concepts
  • Creating a database in MySQL for importing to HDFS
  • Sqoop commands execution in shell/HUE

Module 7   (Duration :02:00:00)

Goal : Apache Flume is a distributed data collection service that gets the flow of data from their source and aggregates them to where they need to be processed.

Objectives – Upon completing this Module, you should be able to understand Apache Flume is a distributed data collection service that gets the flow of data from their source and aggregates the data to sink.

  • Flume provides a reliable and scalable agent mode to ingest data into HDFS.

Topics:

Apache Flume

  • Introduction to Flume & features
  • Flume topology & core concepts
  • Property file parameters logic

Module 8   (Duration :02:00:00)

Goal : Hue is a web front end offered by the ClouderaVM to Apache Hadoop.

 Objectives – Upon completing this Module, you should be able to understand how to use hue for hive,pig,oozie.

Topics:

Apache HUE

  • Introduction to Hue design
  • Hue architecture flow/UI interface

Module 9   (Duration :02:00:00)

Goal : Following are the goals of ZooKeeper:

  • Serialization ensures avoidance of delay in reading or write operations.
  • Reliability persists when an update is applied by a user in the cluster.
  • Atomicity does not allow partial results. Any user update can either succeed or fail.
  • Simple Application Programming Interface or API provides an interface for development and implementation.

Objectives – Upon completing this Module, you should be able to understand ZooKeeper provides a simple and high-performance kernel for building more complex clients.

  • ZooKeeper has three basic entities—Leader, Follower, and Observer.
  • Watch is used to get the notification of all followers and observers to the leaders.

Topics:

Apache Zookeeper

  • Introduction to zookeeper concepts
  • Zookeeper principles & usage in Hadoop framework
  • Basics of Zookeeper

Module 10   (Duration :05:00:00)

Goal:

Explain different configurations of the Hadoop cluster

  • Identify different parameters for performance monitoring and performance tuning
  • Explain configuration of security parameters in Hadoop.

Objectives – Upon completing this Module, you should be able to understand  Hadoop can be optimized based on the infrastructure and available resources.

  • Hadoop is an open-source application and the support provided for complicated optimization is less.
  • Optimization is performed through xml files.
  • Logs are the best medium through which an administrator can understand a problem and troubleshoot it accordingly.
  • Hadoop relies on the Kerberos based security mechanism.

Topics:

Administration concepts

  • Principles of Hadoop administration & its importance
  • Hadoop admin commands explanation
  • Balancer concepts
  • Rolling upgrade mechanism explanatio

Big Data Analytics Training Syllabus

Big Data Analytics introduction

  • Big Data overview
  • What is a data scientist?
  • What are the roles of a data scientist?
  • Big Data Analytics in industry

Data analytics lifecycle

  • Data Discovery
  • Data Preparation
  • Data Model Planning
  • Data Model Building
  • Data Insights

Data Analytic Methods Using R

  • Introduction to R
  • Analyzing and Exploring the Data
  • Model Building and Evaluation
  • Machine learning-Theory and Methods
  • Introduction to analytics for unstructured data-MapReduce and Hadoop
  • Sample analytics project
  • Creating final deliverables

Datastage Introduction

  • DataStage Architecture
  • DataStage Clients
    • Designer
    • Director
    • Administrator
  • DataStage Workflow

Types of DataStage Job

  • Parallel Jobs
  • Server Jobs
  • Job Sequences

Setting up DataStage Environment

  • DataStage Administrator Properties
  • Defining Environment Variables
  • Importing Table Definitions

Creating Parallel Jobs

  • Design a simple Parallel job in Designer
  • Compile your job
  • Run your job in Director
  • View the job log
  • Command Line Interface (dsjob)

Accessing Sequential Data

  • Sequential File stage
  • Data Set stage
  • Complex Flat File stage
  • Create jobs that read from and write to sequential files
  • Read from multiple files using file patterns
  • Use multiple readers
  • Null handling in Sequential File Stage

Platform Architecture

  • Describe parallel processing architecture Describe pipeline & partition parallelism
  • List and describe partitioning and collecting algorithms
  • Describe configuration files
  • Explain OSH & Score

Combining Data

  • Combine data using the Lookup stage
  • Combine data using merge stage
  • Combine data using the Join stage
  • Combine data using the Funnel stage

Sorting and Aggregating Data

  • Sort data using in-stage sorts and Sort stage
  • Combine data using Aggregator stage
  • Remove Duplicates stage

Transforming Data

  • Understand ways DataStage allows you to transform data
  • Create column derivations using userdefined code and system functions
  • Filter records based on business criteria
  • Control data flow based on data conditions

Repository Functions

  • Perform a simple Find
  • Perform an Advanced Find Perform an impact analysis
  • Compare the differences between two Table Definitions and Jobs.

Working with Relational Data

  • Import Table Definitions for relational tables.
  • Create Data Connections.
  • Use Connector stages in a job.
  • Use SQL Builder to define SQL Select statements.
  • Use SQL Builder to define SQL Insert and Update statements.
  • Use the DB2 Enterprise stage.

Metadata in Parallel Framework:

  • Explain schemas.
  • Create schemas.
  • Explain Runtime Column Propagation (RCP).
  • Build a job that reads data from a sequential file using a schema.
  • Build a shared container.

Job Control

  • Use the DataStage Job Sequencer to build a job that controls a sequence of jobs.
  • Use Sequencer links and stages to control the sequence a set of jobs run in.
  • Use Sequencer triggers and stages to control the conditions under which jobs run.
  • Pass information in job parameters from the master controlling job to the controlled jobs.
  • Define user variables.
  • Enable restart.
  • Handle errors and exceptions.

An introduction to BI

  • Business Intelligence
  • OLAP
  • Introduction Of BI tools
  • Database Overview

Introduction Of Microstrategy

  • Microstrategy Architecture
  • Microstrategy Desktop
  • Microstrategy Web
  • Microstrategy Servers
  • Administration
  • Folder Structure
  • My Personal Objects
  • Public Object
  • Schema Object
  • Metadata
  • Report View
  • Data – Export
  • AutoStyles
  • Custom Groups
  • Facts
  • Tables
  • Update Schema

Advance Features

  • Project Configuration
  • Attribute Creation
  • Metric Creation
  • Drill Map
  • Templates
  • Prompt
  • Filter
  • Administration Facts
  • Creation Of Reports
  • Grid Report
  • Analyzing Data
  • Transformations
  • Hierarchies
  • Data Explorer
  • Adhoc Report
  • Report Creation on Web
  • Searches
  • Documents
  • Joins

Experts Features And Administration

  • Project
  • Installation
  • Intelligence Server
  • User Creation
  • User Privilege
  • Security Implementation
  • Object manger
  • Command Manager
  • Formatting Report
  • Understanding Requirement
  • Performance Improvement
  • SQL Creation
  • Challenges in Report
  • Administrative Configurations