We provide IT Staff Augmentation Services!

Sr. Data / Hadoop Engineer Resume

4.00/5 (Submit Your Rating)

San Antonio, TX

SUMMARY

  • Around 9 years of work experience in Information Technology with skills in analysis, design, development, testing and deploying various software applications. Including 3+ years of experience in implementing Big Data applications.
  • Strong skills in developing applications involving Big Data warehouse systems using Hadoop ecosystem tools such as HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Zookeeper, Oozie and Falcon.
  • Knowledge on Storm, Scala and Spark.
  • Experience in managing Big data Ingestion, Data Transformation using Pig, Hive.
  • Expertise in writing UDF’s to extend functionality of Pig & Hive.
  • Excellent understanding of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Expertise in writing Hadoop Jobs for analyzing data using Hive and Pig.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins and Groups.
  • Good knowledge in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
  • Experience in using Text, Sequence, RC, ORC files formats and different compression techniques like ZLIB and Snappy.
  • Experience in Apache Falcon for creating workflows.
  • Knowledge of HBase for storing data that need faster access.
  • Knowledge of developing Apache Spark programs using Scala for large-scale data processing and using the in-memory computing capabilities for faster data processing with Spark core and Spark SQL.
  • Experience in Agile and Scrum methodologies.
  • Experienced in SQL, PL/SQL, Procedures/Functions and Triggers.
  • Strong hands on experience in Bash scripting in Linux.
  • Exposure to basic Python programming.
  • Familiar with data warehousing and ETL tools like Informatica PowerCenter.
  • Hands on experience in designing and coding web applications using Core Java and J2EE.
  • Experience in Web Services using XML, HTML and SOAP.
  • Involved in developing distributed Enterprise and Web applications using UML, Java/J2EE, Web technologies.
  • Exceptional analysis skills with an ability to transform Business requirements into functional and technical specifications.
  • Extensive experience in working with Business Analysts, Users, Architects, Infrastructure and support groups for system design, documentation and implementation.
  • Experience in Code reviews, fixing defects and enhancing application performance.
  • Strong Managing skills with offshore and onshore model.
  • A team player with strong communication, analytical, relationship management and problem solving skills.

TECHNICAL SKILLS

Big Data Ecosystems: Hadoop HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Zookeeper, Falcon, Hbase, Storm, Spark

Java & J2EE Technologies: Core Java, Servlets, JSP, JSF, JMS, JDBC

Frameworks: Struts, Hibernate, Spring

Programming languages: C, C++, SQL, PL/SQL, Shell Scripting, Scala, Java, basic Python

Databases: MySQL, MS-SQL Server

Web Technologies: HTML, XML, JavaScript, AJAX, SOAP

ETL Tools: Informatica, Teradata, SQL Sever

Methodology: Agile and Waterfall

Tools: Microsoft Word, Excel, PowerPoint, Visio, Box, SharePoint, RallyDev, Jira, WinSCP, Git, SVN, Putty, Log4j

Business Domain: Healthcare, Banking, Telecom and Energy

PROFESSIONAL EXPERIENCE

Confidential, San Antonio, TX

Sr. Data / Hadoop Engineer

Responsibilities:

  • Actively participated with the development team to meet the specific customer requirements and proposed effective Hadoop solutions.
  • Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
  • Developed Linux Scripts for data cleansing & preprocessing on huge volumes of data.
  • Wrote custom framework by using Java for creating hive ddl’s by reading excel.
  • Developed Pig Scripts and Hive Scripts to load data files.
  • Experienced in optimizing hive queries to handle different data sets.
  • Identified and created ORC formatted hive tables for high usage.
  • Used hive schema to create relations in pig using HCatalog.
  • Created Indexes and buckets for Hive tables to improve the performance of hive queries.
  • Imported data into clusters from various sources, such as MySQL using Sqoop and also exported data from Hive external tables to relational databases for generating reports.
  • Wrote Pig UDF’s as per requirement.
  • Developed Hive queries for data-mining delimited text file or Excel file to verify data transfer success, and support internal and external customer needs.
  • Leveraged Falcon workflows.
  • Designed and developed Oozie workflows for sequence flow of job execution.
  • Integrating with Hadoop ecosystem to retrive data using Spark core components and making use of in-memory processing.
  • Writing SQL queries to process the data using Spark SQL.
  • Managed Agile Software Practice using Rally by creating Product Backlog, Iterations and Sprints in collaboration with the Product Team.
  • Utilized Agile Scrum Methodology to manage and organize the team with regular code review sessions.

Environment: HDP 2.2 & 2.3, Hdfs, Hive, Pig, Spark, Scala, HCatalog, MapReduce, YARN, Falcon, Linux, Java/JDK1.7, GIT, RallyDev, MySql.

Confidential, Houston, TX

Data Engineer

Responsibilities:

  • Worked on writing various Linux scripts to ingest data from landing zone onto the data lake.
  • Developed several shell scripts, which acts as wrapper to start these Hadoop jobs and set the configuration parameters.
  • Actively involved in working with Hadoop Administration team to debugging various slow running MR Jobs and doing the necessary optimizations.
  • Load and transform large sets of structured data.
  • Involved in creating different Hive tables to load data and write hive queries, which will run internally in map, reduce way. Involved in designing a production process for extracting the final data and forwarding it to end users on an as-needed basis.
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Experience in managing and reviewing Hadoop log files.
  • Developed a Utility framework to generate Excel Reports by exporting data from HDFS.
  • Exporting data out of HDFS to Teradata client environment.
  • Developed MapReduce jobs using Hive, and Pig to extract and analyze data.
  • Extensively used Tivoli Workload Scheduler (TWS) to schedule periodical run of various scripts for initial and delta loads for various datasets.
  • Was actively involved in building the Hadoop generic framework to enable various teams to reuse some of the best practices.

Environment: Apache Hadoop, MapReduce, Hive, Pig, Sqoop, TWS, Java (jdk 1.6), XML, Teradata client, Linux.

Confidential, Houston, TX

Big Data Hadoop Consultant

Responsibilities:

  • Gathering data requirements and identifying sources for acquisition.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
  • Created external, partitioned hive tables and corresponding HDFS locations to load data.
  • Developed scripts to ingest data into hive tables that can be reused across the project.
  • Implemented optimized joins to gather data from different data sources using hive joins.
  • Developed pig scripts for transforming data into standardized data structures for consumers.
  • Written Hive queries for data analysis to meet the business requirements.
  • Developed scripts for monitoring HDFS data usage and cleanup.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Documented coding best practices.
  • Worked on different file formats like Text files, Sequence Files, Record columnar files (RC).
  • Experienced in monitoring the application process and suggesting improvements.
  • Conduct/Participate in project team meetings to gather status, discuss issues & action items.
  • Created deployment plan, runbook and implementation checklist.

Environment: Hadoop, MapReduce, HBase, Hive, Pig, Sqoop, Java (jdk 1.6), UINX.

Confidential, Houston, TX

Java Developer

Responsibilities:

  • Involved in Detailed Design documentation and implementation and Coding.
  • Established schedule and resource requirements by planning, analyzing and documenting development effort to include time lines, risks, test requirements and performance targets
  • Developed UI and client side validations using HTML, CSS, Java Script and JSP.
  • Developed User Interface module using Struts Framework, JSP and Servlets.
  • Designed and implemented the database interaction using JDBC.
  • Used JDBC to establish connection between the database and the application.
  • Wrote SQL queries and stored procedures using PL/SQL.
  • Design and develop enterprise web applications, for internal production support group, using Java (J2EE), Design Patterns and Struts framework
  • Developed server side utilities using J2EE technologies Servlets, JSP.
  • Created Functional Design Specification for the technical team.

Environment: Core Java, J2EE, Servlets, JSP, Struts, Hibernate, XML, SQL, PL/SQL, Eclipse IDE, JUnit, Java Script, HTML and CSS.

Confidential

Java Developer

Responsibilities:

  • Involved in requirements gathering and creating functional specifications by interacting with business users.
  • Responsible for analysis, design, development and unit testing.
  • Unit testing before check in the code for the QA builds.
  • Created web pages using XML, HTML and JavaScript.
  • Used AJAX for client-to-server communication
  • Involved in Production Support.
  • Used JavaScript for client side validations.
  • Used Log4j and commons-logging frameworks for logging the application flow.
  • Used SVN for Version controlling.
  • Developed JavaScript functions for the front-end validations.
  • Resolve system defects and perform bug fixes during testing phase
  • Committing the updated files to repository using SVN.

Environment: Apache Tomcat, Eclipse IDE, PL/SQL, HTML, AJAX, JavaScript, UML, Windows XP, SVN.

Confidential

Java Developer

Responsibilities:

  • Involved in the complete software development life cycle(SDLC) of the application from requirement analysis to testing.
  • Developed The UI using JavaScript, HTML, and CSS for interactive cross browser functionality and complex user interface.
  • Created complex SQL Queries, PL/SQL Stored procedures, Functions for back end.
  • Prepared the Functional, Design and Test case specifications.
  • Involved in writing Stored Procedures to do some database side validations.
  • Performed unit testing, system testing and integration testing
  • Developed Unit Test Cases. Used JUNIT for unit testing of the application.
  • Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects. Resolved more priority defects as per the schedule.

Environment: Java/J2EE, HTML, Java Script, CSS, PL/SQL, HTML, MySQL, JDBC 3.0, Junit, log4j, Eclipse.

We'd love your feedback!