We provide IT Staff Augmentation Services!

Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Phoenix, AZ

SUMMARY

  • Overall 7 Plus years of IT experience in analysis, design, develop, implementation of applications running on various platforms.
  • 3+ Years of experience on the Hadoop Eco System with a good knowledge on Map Reduce, YARN, Hdfs, Hive, Scala and Spark.
  • Have good Hands on Experzience on development of Big data projects using Hadoop, Hive, Spark, Kafka and MapReduce open source tools/technologies.
  • Good hands on experience in writing Map Reduce Jobs
  • Effectively used big data loading tools like Stream sets, SAP Big data Hub, SAP Vora.
  • Hands of experience with HiveQL.
  • Good experience in NoSQL such as HBase, Cassandra and MongoDB.
  • Developed Python Scripts for automation& monitoring Jobs.
  • Hands on Experience in handling Spark for large data processing in streaming process along with Scala.
  • Have a Good understanding of the Machine Learning Libraries.
  • Built Spark Streaming applications to receive real time data from the Kafka and store the data to HDFS.
  • Managing scalable Hadoop clusters including Cluster designing, provisioning, custom configurations, monitoring and maintaining using different Hadoop distributions: Cloudera CDH, Hortonworks HDP.
  • Have excellent knowledge in development of Java, J2EE platforms in n - tier applications.
  • Working knowledge on Object Oriented Principles (OOP), Design & Development and have good understanding of programming concepts like data abstraction, concurrency, synchronization, multi-threading and thread communication, networking, security.
  • Extensive experience in applying best practices where ever possible in the overall application development process such as using Model-View-Controller (MVC) approach for better control on the application components.
  • Developed Hive Queries and automated those queries for analyzing on Hourly, Daily and Weekly Basis.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Deep understanding of data warehouse approaches, industry standards and industry best practices Created Hive tables, loaded data and wrote Hive queries that run within the map.
  • Hands on Experience in development of Big data projects using Hadoop, Hive, Oozie, Spark, Kafka and MapReduce, HDFS, PIG, Zookeeper, Flume, Sqoop, Impala open source tools/technologies.
  • Strong development experience in Apache Spark using Scala.
  • Experience on Spark for handling large data processing in streaming process along with Scala.
  • Skilled in creating workflows using Oozie for cron jobs.
  • Experienced in writing custom UDFs and UDAFs for extending Hive and Pig core functionalities.
  • Ability to develop Pig UDF'S to pre-process the data for analysis.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS), Teradata and vice versa.
  • Extensive knowledge with Relational &dimensional data modeling, star schema/snowflakes schema, fact and dimension tables and process mapping using the top-down and bottom-up approach.
  • Experience in using Informatica Client Tools - Designer, Source Analyzer, Target Designer, Transformation Developer, Mapplet Designer, Mapping Designer, Workflow Manager and Workflow Monitor, Analyst, Developer tools.
  • Experience in creating High Level Design and Detailed Design in the Design phase.
  • Experience in integration of various data sources like Oracle, DB2, MS SQL Server and Flat Files.
  • Experience in identifying Bottlenecks in ETL Processes and Performance tuning of the production applications using Database Tuning, Partitioning, Index Usage, Aggregate Tables, Session partitioning, Load strategies, commit intervals and transformation tuning.

TECHNICAL SKILLS

Tools: /Data MoETLdeling tools: Informatica Power Center, Power Exchange 10.1/9.x/8.x/7.1,MSBI. (Repository Manager, Designer, Server Manager, Work Flow Monitor,Work Flow Manager), Erwin, FACTS and Dimdension Tables, Physical and Logical Data Star join schema Modeling

Datsbases: Oracle 12c/11g/10g/9i, MS SQL SERVER, MS ACCESS,SQL,PL/SQL

Tools: Toad,SQL developer, vislo

Big Data Ecosystems: HDFS, Oozie,Hive,Pig,sqoop,Zookeeper,and Hbase,Spark,Scala

Languages: SQL,PL/SQL/T-SQL,UNIX,Shell Scripting,Bath Scripting

Operating Systems: UNIX,WINDOWS Server 2008/2003, LINUX.

Job Scheduling: Informatica Scheduler, Tidal Enterprise Scheduler, Control M, CA Autosys

PROFESSIONAL EXPERIENCE

Confidential, phoenix, AZ

Hadoop Developer

Responsibilities:

  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
  • Migrated existing java application into microservices using spring boot and spring cloud.
  • Working knowledge in different IDEs like Eclipse, Spring Tool Suite.
  • Working knowledge of using GIT, ANT/Maven for project dependency / build / deployment.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Worked as a part of AWS build team.
  • Create, configure and managing S3 bucket(storage).
  • Experience on AWS EC2, EMR and Cloud Watch.
  • Import the data from different sources like HDFS/Hbase into Spark RDD.
  • Experienced with batch processing of data sources using Apache Spark and Elastic search.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis
  • Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
  • Experience in developing Spark Applications using Spark RDD, Spark-SQL and Dataframe APIs.
  • Used Apache Oozie to schedule workflows to run Spark jobs to transform data on a persistent schedule.
  • Migrated Hive QL queries on structured into Spark QL to improve performance.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
  • Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
  • Involved in Administration, installing, upgrading and managing distributions of Hadoop, Hive, Hbase.
  • Involved in performance of troubleshooting and tuning Hadoop clusters.
  • Created Hive tables, loaded data and wrote Hive queries that run within the map.
  • Implemented business logic by writing Hive UDFs in Java.
  • Developed Shell scripts and some of Perl scripts based on the user requirement.
  • Wrote XML scripts to build OOZIE functionality.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Worked on creating End-End data pipeline orchestration using Oozie.
  • Built Data set, Lens and visualization charts/graphs in the PLATFORA environment.

Environment: HDFS, AWS, Hive, Pig, SQL, Sqoop, Oozie, Shell scripting, Cron Jobs.

Confidential, San Antonio,Texas

Hadoop Developer

Responsibilities:

  • Responsible for Installation and configuration of Hive, Pig, Sqoop, Flume and Ozzie on the Hadoop cluster.
  • Involved in design Cassandra data model, used CQL (Cassandra Query Language) to perform CRUD operations on Cassandra file system
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
  • Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
  • Developed Spark scripts by using Scala Shell commands as per the requirement.
  • Developed and implemented core API services using Scala and Spark.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
  • Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
  • Installing, Upgrading and Managing Hadoop Clusters
  • Administration, installing, upgrading and managing distributions of Hadoop, Hive, Hbase.
  • Advanced knowledge in performance troubleshooting and tuning Hadoop clusters.
  • Created Hive tables, loaded data and wrote Hive queries that run within the map.
  • Implemented business logic by writing Pig UDF's in Java and used various UDFs from Piggybanks and other sources.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Extensively worked on creating End-End data pipeline orchestration using Oozie.
  • Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
  • Processed the source data to structured data and store in NoSQL database Cassandra.
  • Created alter, insert and delete queries involving lists, sets and maps in Cassandra.
  • Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
  • Responsible for continuous monitoring and managing Elastic MapReduce cluster through AWS console.
  • Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Informatica Power Center.
  • Build the Dimension & Facts tables load process and reporting process using Informatica
  • Involved in the data analysis for source and target systems and good understanding of Data Warehousing concepts, staging tables, Dimensions, Facts and Star Schema, Snowflake Schema.
  • Extracted data from various data sources such as Oracle, SQL Server, Flat files and transformed and loaded into targets using Informatica.
  • Created Mappings and used transformations like Source Qualifier, Filter, Update Strategy, Lookup, Expression, Router, Joiner, Normalizer, Aggregator Sequence Generator and Address validator.
  • Developed mappings to load Fact and Dimension tables, SCD Type 1 and SCD Type 2 dimensions and Incremental loading and unit tested the mappings.
  • Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.

Environment: Map Reduce, HDFS, Hive, Pig, HBase, SQL, Sqoop, Flume, Oozie, Apache Kafka, Zookeeper, J2EE, Eclipse, Informatica PowerCenter.

Confidential, Malvern, PA

ETL Developer

Responsibilities:

  • Provide debugging and troubleshooting support for SQL stored procedures, PowerShell scripts, triggers and Visualforce components.
  • Create validation queries to support quality control procedures that ensure data integrity in Data Warehouse.
  • Design, build and support methods to manage numerous data integration and reporting tasks. Tools include Informatica, MSSQL.
  • Drive organization towards meta data driven ETL procedures using automated and self-auditing scripts and the referenced tools, replacing human driven data duplication using Explorer, Access and Excel.
  • Developed various SQL queries using joins, sub-queries & analytic functions to pull the data from various relational DBs i.e. Oracle & SQL Server.
  • Created complex DataMart views for the corresponding products.
  • Created various complex PL/SQL stored procedures to manipulate/reconcile the data and generate the dashboard reports.
  • Performed Unit Testing& prepared the deployment plan for the various objects by analyzing the inter dependencies.
  • Developed several UNIX shell scripts for the files Archival & Compression.
  • Created various AutoSys jobs for the scheduling of the underlying ETL flows.
  • Co-ordinated with various team members across the globe i.e. Application teams, Business Analysts, Users, DBA and Infrastructure team to resolve any technical and functional issues in UAT and PROD.
  • Created various technical documents required for the knowledge transition of the application which includes re-usable objects (Informatica & UNIX).
  • Handling all Hadoop environment builds, including design, capacity planning, cluster setup, performance tuning and ongoing monitoring.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Loading data from large data files into Hive tables.
  • Importing and exporting data into HDFS and Hive using Sqoop.

Environment: Informatica Power Center 10/9.6.1, IDQ, Oracle 11g, SQL Server 2012, MS Access 2010, SQL*Loader, UNIX, WinSCP, Putty, Erwin 7.2, SQL, PL/SQL, Hadoop, Hive, Sqoop.

Confidential

DW Engineer

Responsibilities:

  • Implement procedures to maintain, monitor, backup and recovery operations for ETL environment.
  • Conduct ETL optimization, troubleshooting and debugging.
  • Extensively used Informatica Designer to create and manipulate source and target definitions, mappings, mapplets, transformations, re-usable transformations.
  • Written Complex SQL overrides for source qualifier and Lookups in mappings.
  • Planned, defined and designed data flow processes for data migration to the Data Warehouse using SSIS.
  • Designed and developed validation scripts based on business rules to check the Quality of data loaded into EBS.
  • Implemented best practices in ETL Design and development and ability to load data into highly normalized tables and star schemas.
  • Designed and developed mappings making use of transformations like Source Qualifier, Joiner, Update Strategy, Connected Lookup and unconnected Lookup, Rank, Expression, Router, Filter, Aggregator and Sequence Generator, Web services Consumer, XML Generator Transformations.
  • Wrote UNIX shell scripts for Informatica ETL tool to run the Sessions.
  • Stored reformatted data from relational, flat file, XML files using Informatica (ETL).
  • Developed mapping to load the data in slowly changing dimension.
  • Involved in Design Review, code review, test review, and gave valuable suggestions.
  • Worked with different Caches such as Index cache, Data cache, Lookup cache (Static, Dynamic and Persistence) and Join cache while developing the Mappings.
  • Worked on CDC (Change Data Capture) to implement SCD (Slowly Changing Dimensions) Type 1 and Type 2.
  • Responsible for offshore Code delivery and review process
  • Used Informatica to extract data from DB2, XML and Flat files to load the data into the Tread
  • Prepared SQL Queries to validate the data in both source and target databases.
  • Extracted data from various data sources transformed and loaded into targets using Informatica.

Environment: Informatica Power Center 8.6/9.1, Oracle 9i, DB2, Sybase, Rapid Sql Server, SSIS, Erwin, UNIX.

We'd love your feedback!