Hadoop Developer / Data Engineer Resume
Toronto, ON
SUMMARY
- Over 7 years of IT experience wif extensive experience in Big Data, HADOOP, Data warehousing, Data Analysis, Solution designing, Reporting, ETL, Data Modeling, Development, Testing and Documentation.
- Extensive ETL development experience using Informatica Power Center 8.x, 9.x 10.x.
- 2.5+ years of Hadoop experience in design, development and deployment of Big Data applications involving Apache Hadoop Ecosystem Map/Reduce, HDFS, Hive, Cassandra, Hbase, Pig, Sqoop, Kafka, Spark, YARN, Zookeeper, Oozie and Hortonworks Talend
- Hands on Experience in Hadoop Framework for developing Hadoop jobs using Hive, Pig, MapReduce, Sqoop, Kafka, Hbase and Cassandra.
- Excellent understanding and knowledge of Hadoop architecture and its various components: HDFS, Name Node, Node Manager, Resource Manger, Application Master, Job History Server, Data Node and Map Reduce.
- Hands on Experience in importing & exporting Data to - from RDBMS/HDFS using SQOOP
- Sound knowledge of Spark and its components: SparkCore, SparkSQL, Spark Streaming.
- Experience designing Scala Applications to work wif Spark and optimizing Hive Query performance
- Excellent hands on experience in analyzing data using Pig Latin, HQL, HBase and Map Reduce programs in Scala
- Good working knowledge on Flume and Kafka for ingesting data from various streaming sources
- Sound knowledge of SQL, JDBC, Stored procedures and packages. Exposure to relational databases (Oracle, MySQL, DB2), and NoSQL databases (Cassandra and Hbase)
- Experience working wif Agile as well as Waterfall Software Development Life Cycle (SDLC) methodologies
- Expertise in creating databases, users, tables, views, stored procedures, functions, joins and indexes in Oracle DB.
- Experience in importing and exporting teh different formats of data into HDFS, HBASE from different RDBMS databases and vice versa.
- Implemented Oozie for writing work flows and scheduling jobs. Written Hive queries for data analysis and to process teh data for visualization
- Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using teh in-memory computing capabilities written in Scala
- Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
- Excellent ability to quickly master new concepts along wif capability of working in group as well as independently
- Exceptional skills in communication, time management, organization and resource management
TECHNICAL SKILLS:
- Functional-Business Requirements Analysis and Process mapping
- DW-ETL Tools-Informatica Power Center 8.6-9.6(Repository Manager, Designer, Server manager, Work Flow Monitor, Work Flow Manager), PowerMart, Datamart, ETL, OLTP, Star Schema, Snowflake Schema, Oracle Warehouse Builder 9.2/10g,SSIS
- BIGDATA/Hadoop Tools-HDFS, MR, Hive, Sqoop, HBase, Ozzie, KAFKA, SPARK, Flume
- Reporting & BI/OLAP-Siebel Analytics 7.8.4/7.8.2/7.7,OBIEE 10.1.3.x, Business Objects, Crystal Reports, Web Intelligence, Desktop Intelligence
- Data Modeling-Physical Modeling, Logical Modeling, Relational Modeling, Dimensional Modeling (Star Schema, Snow-Flake, FACT, Dimensions), Entities, Attributes, Cardinality, ER Diagrams,salesforce.com Erwin 4.0/3.5.2/2.x
- Databases & Scheduling tools-Oracle 11g/10g/9i, MS SQL Server 2012, DB2, Web services, PL/SQL, SQL*Loader, Autosys, control-M.,NETEZZA, Tidal
- Environment-Windows 2000/XP/7, UNIX, LINUX, Cloud
- Others-Java, C++, Python, JavaScript, HTML. SAP BW
PROFESSIONAL EXPERIENCE
Confidential, Toronto, ON
Hadoop Developer / Data Engineer
Responsibilities:
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Experience creating Hive tables, loading tables wif data and aggregating data by writing Hive queries.
- Developed Spark Applications by using Scala/Python, and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis
- Designed and Developed lots of support sql queries for Data Analytics in Hive.
- Worked wif teh Spark for improving performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's.
- Parsed JSON and XML files wif Pig Loader functions and extracted insightful information from Pig Relations by providing a regex using teh built-in functions in Pig.
- Experience writing Pig Latin scripts for Data Cleansing, ETL operations and query optimization of existing scripts
- Solved performance issues in Hive and Pig scripts wif understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Developed projects wif Spark/Scala, Hive, Oozie on Cloudera Hadoop Distribution
- Collaborated wif teh infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Ingested data from different data sources (RDBMS, NoSQL, Streaming data, HDFS), cleansed, transformed and validated collected data records using ingestion tools like Sqoop, and Talend storing back to teh intended destination while automating jobs using Oozie
- Created Hive tables, partitions and buckets for analyzing large volumes of data.
- Created HIVE queries as per business requirements. Scheduling teh Hive jobs, Sqoop jobs, Pig jobs using Oozie
- Performed Schema design for Hive and optimized teh Hive performance and configuration
- Used Pig to do transformations, joins and aggregations before storing teh data into HDFS
- Migrated existing processes (multiple Hive queries run manually) to an automated platform (Oozie) in teh client's environment by translating Hive queries to Pig queries and eventually setting file dependencies and time dependencies
- Developed and built Hive tables wif partitions for better performance tuned hive queries to get teh best performance. Hive UDFs Implementation
- Processed structured, semi structured and unstructured data sets
- Analyzed teh data to extract information about customers me.e. positive/negative reviews, page views, visit duration, most popular products on website using Spark in Scala
Environment: Hive, Hue, Sqoop, Pig, Hbase, Kafka, Oracle, MySql, Unix, SQL server, Oozie, Spark, Talend, Cassandra, Scala, Python, Java, Jenkins
Confidential, Toronto, ON
ETL Developer/Designer
Responsibilities:
- Worked as an ETL developer wif BAs and teh DBAs for requirements gathering, business analysis and designing of teh data warehouse.
- Teh project was a migration project where data from legacy sources was migrated/integrated wif existing DW.
- Used inforamtica Power Centre to load data in data warehouse for reporting purpose.
- Database development to write complex query using T-SQL
- Used Informatica Power Centre 9.6 to develop ETL mappings from Mapping specs document.
- Development, unit testing, documenting and peer review of teh mappings developed.
- Created Logical and Physical models for Staging, Transition and Production Warehouses using Erwin 4.0. Used Repository manager to create user groups and users, and managed users by setting up their privileges and profile.
- Performance Tuning of Informatica sessions for large data loads by increasing block size, data cache size, sequence buffer length and target based commit interval.
- Created Complex mappings using Unconnected, Lookup, and Aggregate and Router transformations for populating target table in efficient manner.
- Created Mapplet and used them in different Mappings.
- Created events and tasks in teh work flows using workflow manager Developed Informatica mappings and also tuned them for better performancewif PL/SQL Procedures/Functions to build business rules to load data.
- Created Schema objects like Indexes, Views, and Sequences.
- Designed and Developed Oracle PL/SQL and UNIX Shell Scripts, Data Import/Export.
- Developed mappings for policy, claims dimension tables.
- Working wif database connections, SQL joins, cardinalities, loops, aliases, views, aggregate conditions, parsing of objects and hierarchies.
- Developed shell scripts for running batch jobs and scheduling them.
Environment: Informatica Power Center 9.6, Erwin 4.0,SQL Server 2012, SSIS, SSRS, Oracle 11g, SQL, PL/SQL, TOAD, SQL * Loader, Sun Solaris 2.6, UNIX Shell Scripting.
Confidential, Toronto, ON
ETLDeveloper/Designer
Responsibilities:
- Coordinating wif source system owners, day-to-day ETL progress monitoring, Data warehouse target schema Design (Star Schema) and maintenance.
- Worked as an ETL Developer for Performance Tuning project for existing EDW in order to increase teh performance of overall ETL batch.
- Designed Informatica mappings by translating teh business requirements.
- Developed mappings for customers, Investments and Risk analysis.
- Developed reusable Transformations.
- Data base development like writing complex query, writing business logic using PL-SQL.
- Widely used Informatica client tools -- Source Analyzer, Warehouse designer, Mapping designer, Transformation Developer and Informatica Work Flow Manager.
- Used look up, router, filter, joiner, stored procedure, source qualifier, aggregator and update strategy transformations extensively.
- Assisted in adding Physical conceptual data model using Erwin 4.0.
- Analyzed business process workflows and assisted in teh development of ETL procedures for moving data from source to target systems.
- Done extensive bulk loading into teh target using Oracle SQL Loader.
- Used workflow manager for session management, database connection management and scheduling of jobs.
- Assisted teh team in teh development of design standards and codes for TEMPeffective ETL procedure development and implementation.
- Extensive performance tuning by determining bottlenecks at various points like targets, sources, mappings and sessions.
- Involved in teh design, development and testing of teh PL/SQL stored procedures, packages for teh ETL processes.
- Developed UNIX Shell scripts to automate repetitive database processes and maintained shell scripts for data conversion.
- Involved in teh process design documentation of teh DW Dimensional Upgrades. Installed, and Documented teh Informatica Power Center setup on multiple environments.
Environment: Informatica Power Center 9.6, Oracle 11g, TOAD, Erwin 4.0, PL/SQL, UNIX (Sun Solaris)
Confidential
ETLDeveloper/SQL Developer
Responsibilities:
- Worked as Informatica ETL Developer for teh consulting firm.
- Involved in requirement analysis, documentation of business rules, mapping development, unit testing and peer review.
- Participated in system analysis and data modeling, which included creating tables, views, indexes, synonyms, triggers, functions, procedures, cursors and packages.
- Writing queries and stored procedures in PL/SQL to fetch data from teh OLTP system and executed at regular intervals of time.
- Modified existing forms, reports, and graphs as per teh enhancement.
- Developed PL/SQL scripts to validate and load data into interface tables. Teh backend was in Oracle and database operations were handled using stored procedures.
- Worked on all phases of multiple projects from initial concept through research and development, implementation, QA, to live production, by strict adherence to project timelines.
Environment: menformatics 8.6-9.0, Oracle 9i, 10g, SQL Server 12, SSIS, SSRS,PL/SQL, SQL, TOAD, UNIX Shell Scripting.
