We provide IT Staff Augmentation Services!

Hadoop Engineer Resume

5.00/5 (Submit Your Rating)

San Francisco, CA

PROFESSIONAL SUMMARY:

  • Overall 7+ years of experience in software development which includes 3+ years of experience in Big - data related technologies on various domains like Retail, Insurance and Telecomm.
  • Strong expertise in Apache Hadoop ecosystem components like HDFS, Map Reduce, Pig, Hive, Impala, HBase, SQOOP, Flume, Oozie, Spark, Kafka.
  • Experience in working with major Hadoop distributions like Cloudera 5.x and Hortonworks HDP 2.2 and above.
  • Good experience on Cloudera Impala and Apache Spark for real-time analytical processing.
  • Experience in optimizing Map Reduce Programs using combiners, partitioners and custom counters for delivering the best results.
  • Experience in writing Pig and Hive scripts and extending the core functionality by writing custom UDF’s.
  • Good knowledge on File formats like sequence File, RC, ORC, Parquet and compression techniques like gzip, snappy and LZO.
  • Experience in Integration with various Hadoop Eco-System Tools:
  • Integrated Hive and HBase for better performance.
  • Integrated Impala and HBase for real-time analytics.
  • Integrated Hive and Spark SQL for high performance.
  • Working knowledge of Spark and HBase Integration for OLTP.
  • Good experience in integrating Kafka-Spark streaming for high efficiency throughput and reliability.
  • Hands on experience on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
  • Experience in Importing Traditional RDBMS data to HDFS Using Sqoop and Exporting data from HDFS to RDBMS to generate reports.
  • Experience in writing both time and data driven workflows using Oozie.
  • Knowledge on NoSQL columnar databases like HBase,MongoDB and Cassandra.
  • Good Experience in working with Shell Scripting
  • Diverse experience in utilizing Java tools in business, Web, client-server platforms using Core Java, JSP, Servlets, Spring, Struts, Hibernate, Java Database Connectivity (JDBC) and application servers like Apache Tomcat.
  • Experience in developing and designing Web Services (SOAP and Restful Web services).
  • Hands on experience on optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Hands on experience in working on Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
  • Knowledge in using SQL Queries for backend database analysis.

TECHNICAL SKILLS:

Big Data/Hadoop: HDFS, Map Reduce, YARN, HIVE, Impala, PIG, Sqoop, Oozie, Flume, Zookeeper, HBASE, Kafka

Apache Spark: Spark Core, Spark SQL, Spark Streaming, Mem Sql

Hadoop Distributions: Cloudera and Hortonworks

Java/J2EE Technologies: Java, J2EE, Servlets, JDBC, XML, AJAX, REST

Frameworks: Struts 2/1, Hibernate, Spring

Methodologies: Agile, UML, Design Patterns (Core Java, J2EE)

Programming Languages: Java, Scala, C, Linux shell scripts,Ansible

NoSQL DB Technologies: HBase, Cassandra, MongoDB

Database: Oracle, MySQL, Hive

Web Servers: Tomcat

Web Technologies: HTML5, CSS, XML, JavaScript

Operating Systems: Ubuntu (Linux), Win 95/98/2000/XP, Mac OS, CentOS

Other Tools: Eclipse, InjelliJ, gEdit, MAVEN, SBT, GIT, SVN, Jira, Confluence, MRUnit, JUnit, Hue, Moba Xterm

PROFESSIONAL EXPERIENCE:

Confidential, San Francisco, CA

Hadoop Engineer

Responsibilities:

  • Performance tuning of Hadoop cluster along with H ive and Spark queries.Monitoring and helping business analysts write/tune Hive,spark queries.
  • Helps delivery team in migration from one Hadoop cluster to other cluster and also from traditional databases to Hadoop Datalake which includes incremental loading using Sqoop . Production support and fixes for ETL batch and real time jobs.
  • Wrote shell and Scala scripts for rolling day-to-day processes and it is automated through control-M, Developed bash scripts to bring the data from FTP server and then processing it to load into Hive tables. Used UDF's to implement business logic in Hadoop .
  • Partnering with systems admins for Hadoop infrastructure growth,Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.Working with data delivery teams to setup new users and managing security. Hands on experience in monitoring and managing the Hadoop cluster using Ambari. Manage and review Hadoop log files,File system management and monitoring.
  • Custom-built application that enabled users access to digital analytics data in real-time using Kafka, MemSql, spark connectors .Hands on experience in development of Kafka streaming integration, Developed Kafka Java producer and consumer with a various backend platform and also Created and configure Kafka integration connector prototypes
  • Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
  • Developed Spark scripts by writing custom RDDs in Scala for data transformations and perform actions on RDDs
  • Used Hive to form an abstraction on top of structured data resides in HDFS and implemented Partitions, Dynamic Partitions, Buckets in HIVE tables.
  • Worked on implementing Flume to import streaming data logs and aggregating the data to HDFS through Flume.
  • Worked on configuration based Database pipeline project (wrote code in scala).Implemented and worked on Rest API Integration .
  • Taking on-call responsibilities and responding whenever needed (if something goes wrong with Hadoop jobs or clusters).

Environment: Hortonworks, Hadoop, Hive, Sqoop, HBase, MapReduce, HDFS, Pig, Tez, Cassandra, Java, Oracle 11g/10g, FileZilla, Unix Shell Scripting, Spark, Scala, MemSql, kafka.

Confidential, Middletown, NJ

Hadoop Consultant

Responsibilities:

  • Data migration using Sqoop JDBC drivers for Oracle and IBM DB2 connectors.
  • Performing full and incremental imports and created Sqoop jobs.
  • Exporting the analyzed data to the Relational Databases using Sqoop for visualization and to generate reports for the BI team.
  • Loading data from local file system (Linux) to HDFS. Running Hadoop jobs to process Terabytes of xml format data.
  • Validating data using Pig scripts to eliminate the bad records.
  • Creating data model for structuring and storing the data efficiently. Implemented partitioning and of tables in HBase.
  • Involved in creating Hive tables, loading the data and writing Hive queries which will run internally in Map reduce way.
  • Handling with various Hadoop file formats , including ORC and P arquet. Involved in integration of Hive and HBase.
  • Tested Apache (TM) Tez , an extensible framework for building high performance batch and interactive data processing applications on Hive jobs.
  • Wrote Java API for HBase transactions on HBase tables and involved in building Oozie work flows.

Environment: Hortonworks, Hadoop, Hive, Sqoop, HBase, MapReduce, HDFS, Pig, Tez, Cassandra, Java, Oracle 11g/10g, FileZilla, Unix Shell Scripting, Spark, Scala.

Confidential, Boston, MA

Hadoop Consultant

Responsibilities:

  • Copying data generated by various telematic devices to HDFS for further processing using Flume.
  • Loaded data from Linux file system to HDFS and created separate directory for every four hour window.
  • Extensively used Pig for data cleansing and other validations. Used Oozie and Zookeeper operational services for coordinating cluster and scheduling workflows.
  • Modeled Impala partitions extensively for data separation to perform faster processing of data, and followed best practices for tuning.
  • Wrote complex Impala queries using aggregate and windowing functions.
  • Loading the data from the different Data sources like ( Teradata, DB2 ) into HDFS using Sqoop and load into Impala tables, which are partitioned.
  • Integration of Impala and HBase . Stored the customers data onto HBase for further transactions and historical trip data onto Impala.
  • Exporting the results into relational databases using Sqoop for visualization and to generate reports for the BI team using MSTR .
  • Reviewing and managing Hadoop log files.
  • Wrote Java API (REST) web services.

Environment: Hadoop(CDH5 ), Linux, HDFS, Map Reduce, Sqoop, Impala, Pig, Oozie, HBase, MSTR SVN, Teradata, IBM Db2, Eclipse.

Confidential, Chicago, IL

Hadoop Consultant

Responsibilities:

  • Loaded data from Linux file system to HDFS using Shell script.
  • Extensively used Pig for validations.
  • Hands on writing Map Reduce code to make unstructured data as structured data and for inserting data into HBase from HDFS.
  • Performed optimization on existing MapReduce programs by using customized partitioner, combiner.
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Wrote CLI commands using HDFS. Added Log4j to log the errors.
  • Used Eclipse for writing code and Git for version control.
  • Created Impala tables , loading with data and writing Impala queries for real-time analytical processing.
  • Worked on Oozie workflow engine for job scheduling. Monitored the health of Map Reduce Programs which are running on the cluster.
  • Developed Impala queries to process the data and generate the data cubes for visualizing and reports.

Environment: Hadoop, Linux, CDH4, CDH5, MapReduce, HDFS, Impala, Pig, Shell Scripting, Java, NoSQL, Eclipse, Oracle, Git.

Confidential

Java Developer

Responsibilities:

  • Involved in various phases of Software Development such as modeling, system analysis and design, code generation and testing using Agile Methodology.
  • Participated in daily Stand up meetings.
  • Designed and Developed web interface in J2EE framework using Struts framework (MVC Controller) HTML as per Use Case specification.
  • Involved in developing JavaScript for client data presentation and, data validation on the client side with in the forms.
  • Created connection through JDBC and used JDBC statements to call stored procedures.
  • Produced visual models of the system by generating UML use-case diagrams from the requirements.
  • Designed, developed and deployed application using Eclipse and Tomcat application Server.
  • Classes are designed by using Object Oriented Design (OOD) concepts like encapsulation, inheritance etc.
  • Created Custom Tags to rescue the common functionality.
  • Participated and review of the module using the user requirement documents.
  • Involved in testing the module as per user requirements.

Environment: Java, Eclipse2.0, Struts1.2, JDBC, JSP, Servlets, HTML, JavaScript, Hibernate.

Confidential

Java Developer

Responsibilities:

  • Involved in various phases of Software Development Life Cycle (SDLC) as design development and unit testing.
  • Involved in development of business domain concepts into Use Cases, Sequence Diagrams, Class Diagrams, Component Diagrams and Implementation Diagrams
  • Implemented various J2EE Design Patterns such as Model-View-Controller.
  • CSS and JavaScript were used to build rich internet pages.
  • Involved in developing code as per requirements.
  • Used JDBC to connect the web applications to Databases.
  • Developed PL/SQL, Stored Procedures for handling database in SQL.
  • Developed and deployed UI layer logics using JSP, JavaScript, and HTML.
  • Maintained, developed and fixed bugs for applications.

Environment: Java, J2ee, Struts, Eclipse, Oracle, HTML, JDBC, AJAX, JavaScript.

We'd love your feedback!