We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

5.00/5 (Submit Your Rating)

AtlantA

OBJECTIVE:

Looking for exciting opportunities in Enterprise level Big Data leveraging my vast experience in Hadoop / MPP/RDBMS.

SUMMARY:

  • Over 8 years of Software Industry experience (services and product development)
  • Around 3 years of experience in providing solutions for Big Data using Hadoop 2.x, HDFS, MR2, YARN, PIG, Hive, Impala, Tez, Sqoop, HBase, Spark, Zoo keeper, Oozie,UC4, Hue, CDH5 & HDP 2.x.
  • Experienced in Big Data, Hadoop, NoSQL and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce2, YARN programming paradigm.
  • Experience in ETL, Data warehousing and Business intelligence.
  • Implementation of Big data batch processes using Hadoop Map Reduce2, YARN, Tez, PIG and Hive.
  • Experience in Software Development Life Cycle (Requirements Analysis, Design, Development, Testing, Deployment and Support).
  • Experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to Relational Database Systems and vice - versa.
  • Automation of workflows and scheduling jobs using Oozie and UC4 Automata.
  • Cloudera Manager and Ambari for installation and management/ monitoring of Hadoop cluster.
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive
  • Extensively worked on the ETL mappings, analysis and documentation of OLAP reports requirements.
  • Solid understanding of OLAP concepts and challenges, especially with large data sets.
  • Experience in integration of various data sources like Oracle, DB2, Sybase, Teradata, Netezza, SQL server and MS access and non-relational sources like flat files into staging area.
  • Implemented Hive and Pig custom UDF's to transform large volumes of data with respect to business requirement and achieve comprehensive data analysis.
  • Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
  • Good knowledge in Object Oriented Analysis and Design and solid understanding of Unified Modeling Language (UML).
  • Experience in writing PIG scripts and Hive Queries for processing and analyzing large volumes of data.
  • Experience in optimization of Map Reduce algorithm using Combiners and Petitioners to deliver best results.
  • Used Talend big data platform to move data from various source systems to HDFS.
  • Highly proficient in Extract, Transform and Load the data into target systems using Informatics.
  • Experienced in writing complex shell scripts and schedule them using CRON to run on recurring basis.
  • Hands on experience in application development using JAVA, RDBMS and Linux shell scripting.
  • Good knowledge on Amazon AWS concepts like EMR,EC2 & S3 web services which provides fast and efficient processing of Big Data
  • Strong knowledge of data warehousing, including Extract, Transform and Load Processes.
  • Strong knowledge of Massively Parallel Processing (MPP) databases data is partitioned across multiple servers or nodes with each server/node having memory/processors to process data locally.
  • Massively Parallel Processing (MPP) database using several Postgres database instances like Teradata/Netezza and HDFS storage.
  • Ability to work independently as well as in a team and able to effectively communicate with customers, peers and management at all levels in and outside the organization.
  • Having good working experience in Agile/Scrum methodologies, technical discussion with client and communication using scrum calls daily for project analysis specs and development aspects.
  • Expertise in doing Sprint planning, story pointing, daily scrum, Sprint retrospective and Sprint reviews.
  • Expertise in development support activities including installation, configuration and successful deployment of changes across all environments.

TECHNICAL SKILLS:

Big Data Technologies: Hadoop, HDFS, MR2, Yarn, Pig, Hive, Tez, Zoo keeper, Sqoop, Kafka, Spark, Strom, HCatalog.

No SQL DB s: Hbase, Mongo DB, Cassandra

Schedulers: Oozie, UC4, Autosys, Control M, ESP

Hadoop Distributions: Cloudera CDH4/5, Horton Works HDP 2.x.

Programming Languages: Core JAVA, J2EE, Scala, Python

Scripting Languages: Java Scripting, Unix shell Scripting, Python

Web Services: SOAP, Restful API

Data Base: Oracle 9i/10g/11G/12C, MySQL, TeraData, Netezza

ETL and Reporting Tools: Informatica BDE, BDM, IDQ, SAP BODS, BO Webi, Tableau

Tools: Putty, WIN SCP, TOAD, GIT, FileZilla, SVN

Cloud: AWS EMR, EC2, S3.

PROFESSIONAL EXPERIENCE:

Confidential, Atlanta

Sr. Big Data Engineer

Responsibilities:

  • Handled data imports and exports from various operational sources, performed transformations using Sqoop, Hive, Pig and MapReduce.
  • Implemented partitioning, bucketing in Hive for better organization of the data.
  • Involved in deploying code into version control GIT and provided support of code validation after checked in.
  • Implemented the mapping, session, workflow to achieve the extract, transform and load by using Informatica.
  • Designed ETL control table to perform the incremental and delta loads.
  • Migrated the objects form lower environment to higher environment by using deployment groups in repository manager.
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins
  • Design and Implementation of Batch jobs using Sqoop, MR2, PIG, Hive, Impala .
  • Solved performance issues in hive and Pig scripts with understanding of Joins, Group and Aggregation and how does it translate to MapReduce jobs.
  • Extensive experience in writing Pig scripts to transform raw data from several data sources in to forming baseline data.
  • Involved to generate the extracts in hdfs with synchronized with existing system reports.
  • Implementation of ETL jobs and applying suitable data modelling techniques.
  • Used Reporting tools like Tableau to connect to Hive ODBC connector generate daily reports of data
  • Implemented Hive custom UDF's to transform large volumes of data with respect to business requirement and achieve comprehensive data analysis.
  • Responsible for building scalable distributed data solutions using Hadoop .
  • Data ingestion from Netezza to HDFS using automated Sqoop scripts.
  • Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Developed Sqoop scripts to import and export data from RDBMS and handled incremental loading on the customer and transaction information data dynamically.
  • Developed Map Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.

Environment: Hadoop 2.x, HDFS, MR2, YARN, PIG, HIVE, Impala, Cloudera 5.x, Tez, Ambari, Control M, Informatica BDE, Java, GIT, Eclipse IDE, Netezza, Python, TOAD,AWS(EMR,EC2,S3)

Confidential, Deerfield IL

Hadoop Consultant

Responsibilities:

  • Handled data imports and exports from various operational sources, performed transformations using Sqoop, Hive, Pig and MapReduce.
  • Involved in deploying code into version control GIT and provided support of code validation after checked in.
  • Design and Implementation of Batch jobs using Sqoop, MR2, PIG, Hive, Tez .
  • Involved to generate the extracts in hdfs with synchronized with existing system reports.
  • Implementation of ETL jobs and applying suitable data modelling techniques.
  • Implemented Hive custom UDF's to transform large volumes of data with respect to business requirement and achieve comprehensive data analysis.
  • Migrated the data from Data Stage and Ab Initio to Hadoop.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from various RDBMS, API's into HDFS using Sqoop.
  • Written Python applications to interact with the MySQL database using spark Sql Context and also accessed Hive tables using Hive Context.
  • Developed spark scripts by using Python shell commands as per the requirement.
  • Responsible for the Implementation of POC to migrate map reduce jobs into Spark RDD transformations using Scala.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Used Talend big data platform to move data from various source systems to HDFS
  • Responsible for building scalable distributed data solutions using Hadoop .
  • Data Cleansing and Processing through PIG and Hive.
  • Data ingestion from Teradata to HDFS using automated Sqoop scripts.
  • Designed and implemented Map Reduce for distributed and parallel programming.
  • Creation and managing Hive warehouse to store MR results. Pig scripts for data cleaning and ETL process.

Environment: Hadoop 2.x, HDFS, MR2, YARN, PIG, HIVE, HDP2.x, Tez, Ambari, ESP, Java, GIT, Eclipse, Data Stage, ab initio, Talend 6.2,Teradata, TOAD. Cluster Configuration: 84 Node cluster, 2.2 PB of disk storage, 5 TB of RAM.

Confidential NA., Schaumburg IL

Hadoop Developer / Data Engineer

Responsibilities:

  • Handled data imports from various operational sources, performed transformations using Hive, Pig and MapReduce.
  • Created Pig Latin scripts to support multiple data flows involving various data transformations on input data.
  • Involved in deploying code into version control SVN and provided support of code validation after checked in.
  • Involved in pre and post production deployment support for the code developed for each release. Provided support to fix production deployment issues in support with support and configuration team.
  • Involved in providing support to fix open production issues on regular basis.
  • Design and Implementation of Batch jobs using MR2, PIG, Hive, Tez .
  • Experience in working with job scheduler like Autosys and WFM tool.
  • Involved and supported creating code for Data Ingestion - Historical, Incremental. IT flattening, curation logic creation with complex business scenarios. Developing code and performing validation to support data movement from HDFS curation files to DB2, Netezza databases for end to end business logic validation.
  • Involved in creating code and supported unit testing for standardization of raw data from XML, Salesforce and JSON files with Pig.
  • Written Python script to automate the entire job flow of execution and integrating in one script
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive
  • Implemented ad-hoc queries using HiveQL, created partitions to load data.
  • Verified Hive Incremental updates using four-step strategy to load incremental data from RDBMS systems
  • Performed various data warehousing operations like de-normalization and aggregation on Hive using DML statements.
  • Executed workflows in autosys to validate automated tasks of pre-processing data with Pig, loading the data into HDFS and scheduling Hadoop tasks.
  • Involved in providing support of developing code for end to end creation of complex curation models and also generated business reports from curated data which supports business users to analyze the day to day business.
  • Involved in performing Unit testing support as per standard dev framework.

Environment: Hadoop 2.x, HDFS, MR2, YARN, PIG, HIVE, HDP2.x, Zookeeper, Tez, Ambari, Autosys, Java, GIT, Eclipse, Informatica, TOAD. Cluster Configuration: 93 Node cluster, 2.2 PB of disk storage, 6 TB of RAM.

Confidential, Denver, CO

Sr. Hadoop Consultant

Responsibilities:

  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioural data and financial histories into HDFS for analysis.
  • Involved in writing MapReduce jobs.
  • Involved in SQOOP, HDFS Put or CopyFromLocal to ingest data.
  • Experienced in Pig to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
  • Have hand on experience in developing Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
  • Used Hive to analyse the partitioned and bucketed data and compute various metrics for reporting.
  • Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
  • Involved in developing Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.
  • Using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
  • Computed various metrics using Java MapReduce to calculate metrics that define user experience, revenue etc.
  • Involved in various NOSQL databases like HBase in implementing and integration.
  • Responsible for developing data pipeline using flume, sqoop and pig to extract the data from weblogs and store in HDFS Designed and implemented various metrics that can statistically signify the success of the experiment.
  • Experienced in Eclipse and ant to build the application.
  • Involved in using SQOOP for importing and exporting data into HDFS and Hive.
  • Responsible for processing ingested raw data using MapReduce, Apache Pig and Hive.
  • Developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
  • Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
  • Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or CopyToLocal.
  • Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.

Environment: Hadoop, HDFS, Map Reduce, Pig, Hive, Hbase, Sqoop, Flume, Oozie, Zookeeper, Kafka, Spark, Storm, AWS EMR, HDP, java, Junit testing, python, Java Script, Oracle, MySQL, NoSQL, Teradata, MongoDB, Cassandra, Tableau, LINUX and Windows.

Confidential

Hadoop Consultant

Responsibilities:

  • Co-ordination between multiple cluster teams for business queries and migration.
  • Evaluation of Hadoop platform and its eco system tools for the batch process.
  • Responsible for building scalable distributed data solutions using Hadoop .
  • Designed the system workflow from data extraction to reaching customers.
  • Data ingestion from Teradata to HDFS using automated Sqoop scripts.
  • Designed and implemented Map Reduce for distributed and parallel programming.
  • Design and implementation of rules engine with regular expressions to identify the partner with high confidence.
  • Creation and managing Hive warehouse to store MR results. Pig scripts for data cleaning and ETL process.
  • Used UC4 and Oozie Scheduler to automate the workflows based on time and data availability.
  • Involved in moving the final results into Cassandra data base for transactional and activation needs.
  • Email marketing using Send Grid with required partner activation document.
  • Experienced in managing and reviewing Hadoop log file.
  • Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
  • Used Horton works Data Platform and eBay crawler.

Environment: Hadoop 2.x - HDP 2.1, HDFS, MR, PIG, Hive, Yarn, Apache Sqoop, Oozie, UC4, Cassandra, eBay Crawler, Java, Java Mail, Rest API, Teradata, Shell Script, GIT, Rally.

Confidential

Sr Engineer / SAP BODS Developer

Responsibilities:

  • Installed and configuring the Business Objects Data Services 4.1, with SAP BI, ECC handling SAP DS admin activities and Server configuration.
  • Involved in writing validations rules and generate score cords, Data insights, Matapedia, Cleansing package builder by using Information Steward 4.x
  • Configured different repositories (Local, Central, and Profiler) and job server.
  • Involved in meetings with functional users, to determine the flat file, Excel layouts, data types and naming conventions of the column and table names.
  • Prepared mapping documents capturing all the rules from the business.
  • Created SAP BW connection to interact with SAP BODS using RFC connection.
  • Created Info Object, Info Source, Info Area for SAP BW.
  • Created multiple data store configurations in Data services local object library with different databases to create unified data store.
  • Using Data services Created Batch and Incremental load(Change data capture) and wrote initialization scripts which control Workflows & Data flows
  • Created Data Integrator mappings to load the data warehouse, the mappings involved extensive use of simple and complex transformations like Key Generator, Table Comparison, case, Validation, Merge, lookup etc. in Data flows.
  • Created Data Flows for dimension and fact tables and loaded the data into targets in SAP BW
  • Tuned Data Integrator Mappings and Transformations for the better performance of jobs in different ways like indexing the source tables and using Data transfer transformation.
  • Scheduling jobs to run daily.

Environment: SAP BODS 4.1, Oracle11g, SAP ECC, SAP BW 7.3, SAP BO 4.0, Windows.

Confidential

Associate IT Consultant / BODS Consultant

Responsibilities:

  • Mainly involved in ETL Development/Enhancement.
  • Analyzed the sources, targets, transformed the data, and mapped the data and loading the data into Targets using BODS.
  • Developed various Jobs, Workflows, Data Flows and Transformations for migration of data using BODS Designer.
  • Designed various mappings for extracting data from various legacy Systems.
  • Using designer to create source definitions, design targets, create jobs and develop transformations.
  • Created different transformations for loading the data into oracle database e.g. Merge, SQL, Query, Table Comparison, Map Operation and History Preserving transformations.
  • Created Scripts using BODS Designer.
  • Involved in Unit Testing and prepared Test Case.
  • Resolving if any issues in transformations and mappings with help of Technical Specifications.

Environment: SAP BODS 4.0, 3.2, Oracle 10g, Windows

Confidential

ETL Developer

Responsibilities:

  • Involved in the development and implementation of the Enterprise Data Warehousing (EDW) process and Data Warehouse.
  • Building of the BODS Jobs, Workflows, Data Flows as per the Mapping Specification.
  • Worked extensively on different types of transformations like Query transformation, Merge, Case, Validations, Map-operation, History Preserving and Table Comparison transformations etc….
  • Extensively used ETL to load data from flat file and also from the relation database.
  • Extracted data from different sources such as Flat files, Oracle to load into Sybase IQ database.
  • Ensuring proper Dependencies and Proper running of loads (Incremental and Complete loads)
  • Maintained warehouse metadata, naming standards and warehouse standards for future application development.

Environment: SAP BODS 3.2, Oracle10g, Sybase IQ, Windows.

Confidential

Java Developer

Responsibilities:

  • Gathered requirements, designed and implemented the application using Java/J2EE technologies.
  • Involved in designing the front end screens & Designed Low-Level design documents for my Module.
  • Writing complex SQL and PL/SQL queries for writing stored procedures.
  • Used JavaScript functionality for development.
  • Designed the projects using MVC architecture providing multiple views using the same model and thereby providing efficient modularity and scalability.
  • Performed business validations at the back-end using Java modules and at the client side using JavaScript.
  • Developed many web based features such as survey editors, search utilities and secure application forms using J2EE technologies.
  • Developed Test suites using for performing unit testing for Test Driven Development.
  • Used Singleton, DAO, DTO, Session Façade, MVC design Patterns.
  • Involved in resolving Production Issues, Analysis, Troubleshooting and Problem Resolution.
  • Involved in development and deployment of application on Linux environments.
  • Developed Client applications to consume the Web services based on SOAP.
  • Involved in Designing and creating database tables.
  • Prepared project Metrics for Time, cost, Schedule and customer satisfaction (Health of the project).

Environment: Java, JSP/Servlets, JDBC, Java Bean, Struts 1.x, AJAX, Oracle 9i, WSAD 5.1, WebSphere, TOAD, VSS.

We'd love your feedback!