Sr. Data / Hadoop Engineer Resume
San Antonio, TX
SUMMARY
- Around 9 years of work experience in Information Technology with skills in analysis, design, development, testing and deploying various software applications. Including 3+ years of experience in implementing Big Data applications.
- Strong skills in developing applications involving Big Data warehouse systems using Hadoop ecosystem tools such as HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Zookeeper, Oozie and Falcon.
- Knowledge on Storm, Scala and Spark.
- Experience in managing Big data Ingestion, Data Transformation using Pig, Hive.
- Expertise in writing UDF’s to extend functionality of Pig & Hive.
- Excellent understanding of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Expertise in writing Hadoop Jobs for analyzing data using Hive and Pig.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Solved performance issues in Hive and Pig scripts with understanding of Joins and Groups.
- Good knowledge in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
- Experience in using Text, Sequence, RC, ORC files formats and different compression techniques like ZLIB and Snappy.
- Experience in Apache Falcon for creating workflows.
- Knowledge of HBase for storing data that need faster access.
- Knowledge of developing Apache Spark programs using Scala for large-scale data processing and using the in-memory computing capabilities for faster data processing with Spark core and Spark SQL.
- Experience in Agile and Scrum methodologies.
- Experienced in SQL, PL/SQL, Procedures/Functions and Triggers.
- Strong hands on experience in Bash scripting in Linux.
- Exposure to basic Python programming.
- Familiar with data warehousing and ETL tools like Informatica PowerCenter.
- Hands on experience in designing and coding web applications using Core Java and J2EE.
- Experience in Web Services using XML, HTML and SOAP.
- Involved in developing distributed Enterprise and Web applications using UML, Java/J2EE, Web technologies.
- Exceptional analysis skills with an ability to transform Business requirements into functional and technical specifications.
- Extensive experience in working with Business Analysts, Users, Architects, Infrastructure and support groups for system design, documentation and implementation.
- Experience in Code reviews, fixing defects and enhancing application performance.
- Strong Managing skills with offshore and onshore model.
- A team player with strong communication, analytical, relationship management and problem solving skills.
TECHNICAL SKILLS
Big Data Ecosystems: Hadoop HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Zookeeper, Falcon, Hbase, Storm, Spark
Java & J2EE Technologies: Core Java, Servlets, JSP, JSF, JMS, JDBC
Frameworks: Struts, Hibernate, Spring
Programming languages: C, C++, SQL, PL/SQL, Shell Scripting, Scala, Java, basic Python
Databases: MySQL, MS-SQL Server
Web Technologies: HTML, XML, JavaScript, AJAX, SOAP
ETL Tools: Informatica, Teradata, SQL Sever
Methodology: Agile and Waterfall
Tools: Microsoft Word, Excel, PowerPoint, Visio, Box, SharePoint, RallyDev, Jira, WinSCP, Git, SVN, Putty, Log4j
Business Domain: Healthcare, Banking, Telecom and Energy
PROFESSIONAL EXPERIENCE
Confidential, San Antonio, TX
Sr. Data / Hadoop Engineer
Responsibilities:
- Actively participated with the development team to meet the specific customer requirements and proposed effective Hadoop solutions.
- Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
- Developed Linux Scripts for data cleansing & preprocessing on huge volumes of data.
- Wrote custom framework by using Java for creating hive ddl’s by reading excel.
- Developed Pig Scripts and Hive Scripts to load data files.
- Experienced in optimizing hive queries to handle different data sets.
- Identified and created ORC formatted hive tables for high usage.
- Used hive schema to create relations in pig using HCatalog.
- Created Indexes and buckets for Hive tables to improve the performance of hive queries.
- Imported data into clusters from various sources, such as MySQL using Sqoop and also exported data from Hive external tables to relational databases for generating reports.
- Wrote Pig UDF’s as per requirement.
- Developed Hive queries for data-mining delimited text file or Excel file to verify data transfer success, and support internal and external customer needs.
- Leveraged Falcon workflows.
- Designed and developed Oozie workflows for sequence flow of job execution.
- Integrating with Hadoop ecosystem to retrive data using Spark core components and making use of in-memory processing.
- Writing SQL queries to process the data using Spark SQL.
- Managed Agile Software Practice using Rally by creating Product Backlog, Iterations and Sprints in collaboration with the Product Team.
- Utilized Agile Scrum Methodology to manage and organize the team with regular code review sessions.
Environment: HDP 2.2 & 2.3, Hdfs, Hive, Pig, Spark, Scala, HCatalog, MapReduce, YARN, Falcon, Linux, Java/JDK1.7, GIT, RallyDev, MySql.
Confidential, Houston, TX
Data Engineer
Responsibilities:
- Worked on writing various Linux scripts to ingest data from landing zone onto the data lake.
- Developed several shell scripts, which acts as wrapper to start these Hadoop jobs and set the configuration parameters.
- Actively involved in working with Hadoop Administration team to debugging various slow running MR Jobs and doing the necessary optimizations.
- Load and transform large sets of structured data.
- Involved in creating different Hive tables to load data and write hive queries, which will run internally in map, reduce way. Involved in designing a production process for extracting the final data and forwarding it to end users on an as-needed basis.
- Extracted the data from Teradata into HDFS using Sqoop.
- Experience in managing and reviewing Hadoop log files.
- Developed a Utility framework to generate Excel Reports by exporting data from HDFS.
- Exporting data out of HDFS to Teradata client environment.
- Developed MapReduce jobs using Hive, and Pig to extract and analyze data.
- Extensively used Tivoli Workload Scheduler (TWS) to schedule periodical run of various scripts for initial and delta loads for various datasets.
- Was actively involved in building the Hadoop generic framework to enable various teams to reuse some of the best practices.
Environment: Apache Hadoop, MapReduce, Hive, Pig, Sqoop, TWS, Java (jdk 1.6), XML, Teradata client, Linux.
Confidential, Houston, TX
Big Data Hadoop Consultant
Responsibilities:
- Gathering data requirements and identifying sources for acquisition.
- Involved in loading data from edge node to HDFS using shell scripting.
- Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
- Created external, partitioned hive tables and corresponding HDFS locations to load data.
- Developed scripts to ingest data into hive tables that can be reused across the project.
- Implemented optimized joins to gather data from different data sources using hive joins.
- Developed pig scripts for transforming data into standardized data structures for consumers.
- Written Hive queries for data analysis to meet the business requirements.
- Developed scripts for monitoring HDFS data usage and cleanup.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Documented coding best practices.
- Worked on different file formats like Text files, Sequence Files, Record columnar files (RC).
- Experienced in monitoring the application process and suggesting improvements.
- Conduct/Participate in project team meetings to gather status, discuss issues & action items.
- Created deployment plan, runbook and implementation checklist.
Environment: Hadoop, MapReduce, HBase, Hive, Pig, Sqoop, Java (jdk 1.6), UINX.
Confidential, Houston, TX
Java Developer
Responsibilities:
- Involved in Detailed Design documentation and implementation and Coding.
- Established schedule and resource requirements by planning, analyzing and documenting development effort to include time lines, risks, test requirements and performance targets
- Developed UI and client side validations using HTML, CSS, Java Script and JSP.
- Developed User Interface module using Struts Framework, JSP and Servlets.
- Designed and implemented the database interaction using JDBC.
- Used JDBC to establish connection between the database and the application.
- Wrote SQL queries and stored procedures using PL/SQL.
- Design and develop enterprise web applications, for internal production support group, using Java (J2EE), Design Patterns and Struts framework
- Developed server side utilities using J2EE technologies Servlets, JSP.
- Created Functional Design Specification for the technical team.
Environment: Core Java, J2EE, Servlets, JSP, Struts, Hibernate, XML, SQL, PL/SQL, Eclipse IDE, JUnit, Java Script, HTML and CSS.
Confidential
Java Developer
Responsibilities:
- Involved in requirements gathering and creating functional specifications by interacting with business users.
- Responsible for analysis, design, development and unit testing.
- Unit testing before check in the code for the QA builds.
- Created web pages using XML, HTML and JavaScript.
- Used AJAX for client-to-server communication
- Involved in Production Support.
- Used JavaScript for client side validations.
- Used Log4j and commons-logging frameworks for logging the application flow.
- Used SVN for Version controlling.
- Developed JavaScript functions for the front-end validations.
- Resolve system defects and perform bug fixes during testing phase
- Committing the updated files to repository using SVN.
Environment: Apache Tomcat, Eclipse IDE, PL/SQL, HTML, AJAX, JavaScript, UML, Windows XP, SVN.
Confidential
Java Developer
Responsibilities:
- Involved in the complete software development life cycle(SDLC) of the application from requirement analysis to testing.
- Developed The UI using JavaScript, HTML, and CSS for interactive cross browser functionality and complex user interface.
- Created complex SQL Queries, PL/SQL Stored procedures, Functions for back end.
- Prepared the Functional, Design and Test case specifications.
- Involved in writing Stored Procedures to do some database side validations.
- Performed unit testing, system testing and integration testing
- Developed Unit Test Cases. Used JUNIT for unit testing of the application.
- Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects. Resolved more priority defects as per the schedule.
Environment: Java/J2EE, HTML, Java Script, CSS, PL/SQL, HTML, MySQL, JDBC 3.0, Junit, log4j, Eclipse.
