We provide IT Staff Augmentation Services!

Hadoop Engineer Resume

3.00/5 (Submit Your Rating)

Tampa, FL

SUMMARY

  • Around 8+ years of work experience in Information Technology with skills in analysis, design, development, testing and deploying various software applications. Including 4+ years’ experience in implementing Big Data applications.
  • Strong skills in developing applications involving Big Data warehouse systems using Hadoop ecosystem tools such as HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Zookeeper, Oozie, Hbase, Kafka, SparkSql.
  • Experience in managing Big Data Ingestion, Data Transformation using Pig, Hive.
  • Expertise in writing UDF’s to extend functionality of Pig & Hive.
  • Excellent understanding of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Expertise in writing Hadoop Jobs for analyzing data using Hive and Pig.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Good knowledge in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
  • Experience in using Text, Sequence, RC, ORC files formats and different compression techniques like ZLIB and Snappy.
  • Knowledge on Impala to perform low latency queries & also parallel processing database on top of Hadoop.
  • Experience in Apache Falcon for creating workflows.
  • Experience working on HBase for storing data that need faster access.
  • Developed Apache Spark programs using Scala for large-scale data processing and using the in-memory computing capabilities for faster data processing with Spark core and Spark SQL.
  • Experience in Agile and Scrum methodologies.
  • Good Knowledge of NoSQL databases such as Cassandra and MongoDB.
  • Good Knowledge on Cloud Technologies like AWS, Microsoft Azure.
  • Experienced in SQL, PL/SQL, Procedures/Functions and Triggers.
  • Strong hands on experience in Bash scripting in Linux.
  • Exposure to basic Python programming.
  • Familiar with data warehousing and ETL tools like Informatica PowerCenter.
  • Knowledge on preparing reports using data visualization tools like Tableau.
  • Experience in Base SAS (MACROS, Proc SQL, ODS), SAS/STAT and SAS/GRAPH.
  • Thorough knowledge in SAS Programming, Merging SAS Data Sets, Macro Facility, Preparing data, SAS Procedures, Producing Reports, SAS Formats, SAS Functions, Storing and Managing data in SAS Files.
  • Experience in using SAS to read, import, and export to other data file formats, like XML files, Excel files, and flat files.
  • Hands on experience in designing and coding web applications using Core Java and J2EE.
  • Experience in Web Services using XML, HTML, CSS and SOAP.
  • Involved in developing distributed Enterprise and Web applications using UML, Java/J2EE, Web technologies.
  • Exceptional analysis skills with an ability to transform Business requirements into functional and technical specifications.
  • Extensive experience in working with Business Analysts, Users, Architects, Infrastructure and support groups for system design, documentation and implementation.
  • Experience in Code reviews, fixing defects and enhancing application performance.
  • Strong Managing skills with offshore and onshore model.
  • A team player with strong communication, analytical, relationship management and problem solving skills.

TECHNICAL SKILLS

Big Data Ecosystems: Hadoop HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Zookeeper, Falcon, HBase, Kafka, Storm, Spark, Spark SQL, Impala

Java & J2EE Technologies: Core Java, JSP, JSF, JMS, JDBC

Frameworks: Struts, Hibernate, Spring

Programming languages: C, C++, SQL, PL/SQL, Shell Scripting, Scala, Java, basic Python

Databases: MySQL, MS-SQL Server

SAS Tools: SAS 8/9, Base SAS (MACROS, Proc SQL, ODS), SAS/ACCESS, SAS/GRAPH, SAS/STAT, SAS/CONNECT, SAS EG.

Web Technologies: HTML, XML, JavaScript, AJAX, SOAP

ETL Tools: Informatica, Teradata

Methodology: Agile and Waterfall

Tools: Microsoft Word, Excel, PowerPoint, Visio, Box, SharePoint, RallyDev, Jira, WinSCP, Git, SVN, Putty, Log4j, GIS, Tableau, Bedrock, Debeaver

Business Domain: Healthcare, Banking, Telecom and Energy

PROFESSIONAL EXPERIENCE

Confidential, Tampa FL

Hadoop Engineer

Responsibilities:

  • Actively participated with the development team to meet the specific customer requirements and proposed effective Hadoop solutions.
  • Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
  • Developed Linux Scripts for data cleansing & preprocessing on huge volumes of data.
  • Wrote custom framework by using Java for creating hive ddl’s by reading excel.
  • Developed Pig Scripts and Hive Scripts to load data files.
  • Experienced in optimizing hive queries to handle different data sets.
  • Identified and created ORC formatted hive tables for high usage.
  • Imported data into clusters from various sources, such as MySQL using Sqoop and also exported data from Hive external tables to relational databases for generating reports.
  • Wrote Pig UDF’s as per requirement.
  • Developed Hive queries for data-mining delimited text file or Excel file to verify data transfer success, and support internal and external customer needs.
  • Designed and developed Oozie workflows for sequence flow of job execution.
  • Experienced in working withSparkeco system usingSparkSQL with scala and queries on different data formats like Text file, CSV file.
  • DevelopedSparkscripts by using Scala shell commands as per the requirement.
  • Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDs.
  • Writing SQL queries to process the data using Spark SQL.
  • Utilized Agile Scrum Methodology to manage and organize the team with regular code review sessions.

Environment: HDP 2.2 & 2.3, Hdfs, Hive, Pig, Spark, Scala, Kafka, HCatalog, MapReduce, YARN, Falcon, Linux, Java/JDK1.7,GIT, RallyDev, MySql.

Confidential, Mayfield OH

Data Engineer

Responsibilities:

  • Worked on writing various Linux scripts to ingest data from landing zone onto the data lake.
  • Developed several shell scripts, which acts as wrapper to start these Hadoop jobs and set the configuration parameters.
  • Actively involved in working with Hadoop Administration team to debugging various slow running MR Jobs and doing the necessary optimizations.
  • Load and transform large sets of structured data.
  • Used hive schema to create relations in pig using HCatalog.
  • Involved in creating different Hive tables to load data and write hive queries, which will run internally in map, reduce way. Involved in designing a production process for extracting the final data and forwarding it to end users on an as-needed basis.
  • Created Indexes and buckets for Hive tables to improve the performance of hive queries.
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Experience in managing and reviewing Hadoop log files.
  • Migrated SAS Data Loaders to Hadoop.
  • SAS Read/Write and integrate data to and from Hadoop.
  • Implemented POC to extract data with Kafka and Spark into HDFS and Hbase.
  • Developed a Utility framework to generate Excel Reports by exporting data from HDFS.
  • Exporting data out of HDFS to Teradata client environment.
  • Developed MapReduce jobs using Hive, and Pig to extract and analyze data.
  • Leverages Falcon jobs.
  • Was actively involved in building the Hadoop generic framework to enable various teams to reuse some of the best practices.

Environment: Apache Hadoop, MapReduce, Hive, Pig, Sqoop, Java (jdk 1.6), Kafka, Spark, Oozie, Falcon, XML, SAS V9.3, Teradata client, Linux.

Confidential, Columbus OH

Hadoop Developer

Responsibilities:

  • Involved in architecture design, development and implementation of Hadoop deployment.
  • Performed root-cause analysis to analyze system failures and recommended course of actions.
  • Used MapReduce to load, aggregate, store and analyze data from different data sources.
  • Involved in end to end data processing like ingestion, processing and quality checks.
  • Created Hive ORC and External tables.
  • Involved in loading data from LINUX file system to HDFS.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
  • Processed HDFS data and created external tables using Hive, in order to analyze the data
  • Extracted files from Cassandra through Sqoop and loaded into HDFS and processed.
  • Worked on tuning the performance of Hive and Pig queries.
  • Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
  • Extensively used Tivoli Workload Scheduler (TWS) to schedule periodical run of various scripts for initial and delta loads for various datasets.
  • Documented the systems processes and procedures for future references

Environment: CDH4, Linux, Hive, Pig, Scoop, Cloudera Manager, TWS Java (jdk1.6), Cassandra, Eclipse, SVN, Maven.

Confidential

Hadoop Developer / Data Analyst

Responsibilities:

  • Gathering data requirements and identifying sources for acquisition.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
  • Created external, partitioned hive tables and corresponding HDFS locations to load data.
  • Developed scripts to ingest data into hive tables that can be reused across the project.
  • Implemented optimized joins to gather data from different data sources using hive joins.
  • Developed pig scripts for transforming data into standardized data structures for consumers.
  • Written Hive queries for data analysis to meet the business requirements.
  • Developed scripts for monitoring HDFS data usage and cleanup.
  • Worked on different file formats like Text files, Sequence Files, Record columnar files (RC).
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Developed reports as per business requirements and created various reports like summary reports, tabular reports, excel reports.
  • Designed and created SAS datasets from various sources like Excel datasheets, flat files and created reports and files from existing SAS datasets.
  • Hbase is used to maintain the metadata of the files and keep it the memory for fast retrieval for the different file formats and maintained the hive over hbase tables to real time access of the data.
  • The Apache Kafka is used to maintain the central logs for all the types of messages such as error, info, debug.
  • Used procedures such as PROC FREQ, PROC FORMAT, PROC MEANS, PROC SORT, PROC PRINT, PROC TABULATE AND PROC REPORT.
  • Worked on various SAS products SAS/BASE, SAS/SQL and SAS/MACROS etc to develop solutions.
  • Developed programs in SAS to generate reports, creating HTML listings, tables and reports using SAS ODS for ad-hoc and weekly report generation.
  • Documented coding best practices.
  • Experienced in monitoring the application process and suggesting improvements.
  • Conduct/Participate in project team meetings to gather status, discuss issues & action items.
  • Created deployment plan, runbook and implementation checklist.

Environment: Hadoop, MapReduce, HBase, Hive, Pig, Sqoop, Java (jdk 1.6), UINX, Base SAS, SAS Macros, SAS Enterprise Guide, SAS/GRAPH, SAS V9.2, Excel, Windows XP, SHELL SCRIPT

Confidential

Java Developer

Responsibilities:

  • Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.
  • Developed front-end screens using JSP, HTML, JQuery, JavaScript and CSS.
  • Used Spring Framework for developing business objects.
  • Used Eclipse for the Development, Testing and Debugging of the application.
  • Used DOM Parser to parse the xml files.
  • Log4j framework has been used for logging debug, info & error data.
  • Used Oracle10g Database for data persistence.
  • SQL Developer was used as a database client.
  • Used WinSCP to transfer file from local system to other system.
  • Performed Test Driven Development (TDD) using JUnit.
  • Used Ant script for build automation.

Environment: Windows XP, UNIX, Java, Design Patterns, Apache Ant, J2EE (Servlets, JSP), HTML, JSON, JavaScript, CSS, spring, Eclipse, Oracle 10g, SQL Developer, WinSCP, Log4J and JUnit.

We'd love your feedback!