We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

Milwaukee, WI

SUMMARY:

  • 7 years of overall IT experience with 4 Years of comprehensive experience as an Apache Hadoop Developer.
  • Expertise in writing Hadoop Jobs for analyzing structured and unstructured data using HDFS, Hive, HBase, Pig, Spark, Kafka, Scala, Oozie and Talend ETL.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and MapReduce concepts.
  • Experience in working with different kind of MapReduce programs using Hadoop for working with Big Data analysis.
  • Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java.
  • Experience in importing/exporting data using Sqoop into HDFS from Relational Database Systems and vice - versa.
  • Extensive knowledge and experience on real time data streaming techniques like Kafka, Storm and Spark Streaming.
  • Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including PIG, HIVE, Sqoop, Oozie, Flume and zookeeper.
  • Good Knowledge in providing support to data analyst in running Pig and Hive queries.
  • Experience in writing shell scripts to dump the shared data from MySQL servers to HDFS.
  • Experience in designing both time driven and data driven automated workflows using Oozie.
  • Knowledge in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
  • Experience in working with various Cloudera distributions (CDH4/CDH5), Hortonworks and Amazon EMR Hadoop Distributions.
  • Extensively worked on Hive and Sqoop for sourcing and transformations.
  • Experience in automating the Hadoop Installation, configuration and maintaining the cluster by using the tools like Puppet.
  • Hands on experience knowledge in NoSQL databases like HBase, Cassandra, Mongo db.
  • Experience in working with flume to load the log data from multiple sources directly into HDFS.
  • Strong debugging and problem solving skills with excellent understanding of system developmentmethodologies, techniques and tools.
  • Wrote Flume configuration files for importing streaming log data into HBase with Flume.
  • Processing this data using Spark Streaming API with Scala.
  • Worked in complete Software Development Life Cycle (analysis, design, development, testing, implementation and support) in different application domain involving different technologies varying from object oriented technology to Internet programming on Windows NT, Linux and UNIX/ Solaris platforms and RUP methodologies.
  • Familiar with RDBMS concepts and worked on Oracle 8i/9i, SQL Server 7.0., DB2 8.x/7.x
  • Involved in writing shell scripts, Ant scripts for Unix OS for application deployments to production region.
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Experience in Hadoop Distributions like Cloudera, Hortonworks, Big Insights, MapR Windows Azure, and Impala.
  • Hands on experience in developing the applications with Java, J2EE, J2EE - Servlets, JSP, EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g and MS-SQL Server RDBMS.
  • Having very good POC and Development experience on Apache Flume, Kafka, Spark, Storm, and Scala.
  • Good understanding in using data ingestion tools- such as Kafka, Sqoop and Flume.
  • Good working knowledge on Hadoop hue ecosystems.
  • Good knowledge in evaluating big data analytics libraries and use of Spark-SQL for data exploratory.
  • Exceptional ability to quickly master new concepts and capable of working in-group as well as independently with excellent communication skills.

PROFESSIONAL EXPERIENCE

Sr. Hadoop Developer

Confidential - Milwaukee, WI

Responsibilities:

  • Good at working on Hadoop, MapReduce, and Yarn/MRv2 developed multiple MapReduce jobs for structured, semi-structured and unstructured data in java.
  • Involved in Configuring Hadoop cluster and load balancing across the nodes.
  • Developed MapReduce programs in Java for parsing the raw data and populating staging Tables
  • Created Hive queries to compare the raw data with EDW reference tables and performing aggregates.
  • Experienced in developing custom input formats and data types to parse and process unstructured and semi structured input data and mapped them into key value pairs to implement business logic in Map-Reduce.
  • Experience in implementing custom sterilizer, interceptor, source and sink as per the requirement in Flume to ingest data from multiple sources.
  • Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
  • Analyzing the requirement to setup a cluster.
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Experienced in analyzing data with Hive and Pig
  • Experienced knowledge over designing Restful services using java based API's like JERSEY.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Integrating bulk data into Cassandra file system using MapReduce programs.
  • Got good experience with NoSQL databases HBase, Cassandra.
  • Involved in HBase setup and storing data into HBase, which will be used for further analysis.
  • Expertise in designing, data modelling for Cassandra NoSQL database.
  • Experienced in managing and reviewing Hadoop log files.
  • Experienced in defining job flows using Oozie workflow
  • Involved in working with Spark on top of Yarn/MRv2 for interactive and Batch Analysis
  • Worked closely with AWS EC2 infrastructure teams to troubleshoot complex issues
  • Expertise in writing the Scala code using higher order functions for the iterative algorithms in spark for performance consideration
  • Experienced in analyzing and Optimizing RDD's by controlling partitions for the given data
  • Good understanding on DAG cycle for entire spark application flow on Spark application WebUI
  • Experienced in writing live Real-time Processing using Spark Streaming with Kafka
  • Developed custom mappers in python script and Hive UDFs and UDAFs based on the given requirement
  • Used HiveQL to analyze the partitioned and bucketed data and compute various metrics for reporting
  • Experienced in querying data using SparkSQL on top of Spark engine
  • Experience in managing and monitoring Hadoop cluster using Cloudera Manager
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop

Environment: CDH, Java(JDK1.7), Hadoop, MapReduce, HDFS, Hive, Sqoop, Flume, HBase, Cassandra, Pig, Oozie, Kerberos, Scala, Spark, SparkSQL, Spark Streaming, Kafka, Linux, AWS, Shell Scripting, MySQL Oracle 11g, PL/SQL, SQL*PLUS

Hadoop Developer

Confidential - Kansas City, MO

Responsibilities:

  • Installed and configured Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Develop MapReduce jobs for the users. Maintain, update and schedule the periodic jobs which range from updates on periodic MapReduce jobs to creating ad-hoc jobs for the business users.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Experienced in defining job flows.
  • Experienced in managing and reviewing Hadoop log files.
  • Extracted files from Couch DB through Sqoop and placed in HDFS and processed.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Responsible to manage data coming from different sources.
  • Got good experience with NOSQL database.
  • Developed a custom File System plug in for Hadoop so it can access files on Data Platform.
  • This plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
  • Designed and implemented Mapreduce-based large-scale parallel relation-learning system
  • Extracted feeds form social media sites such as Facebook, Twitter using Python scripts.
  • Setup and benchmarked Hadoop/HBase clusters for internal use.
  • Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
  • Involved in review of functional and non-functional requirements.
  • Facilitated knowledge transfer sessions.

Environment: Java, Eclipse, Oracle, Sub Version, Hadoop, Hive, HBase, Linux, MapReduce, HDFS, Hive, Java (JDK), Hadoop Distribution of Horton Works, Cloudera, MapReduce, DataStax, IBM DataStage, Oracle, PL/SQL, SQL*PLUS, UNIX Shell Scripting.

Hadoop Developer

Confidential - Utica, NY

Responsibilities:

  • Worked on Spark and Cassandra for the User behavior analysis and lightning speed execution
  • Installed and configured Hadoop Map-Reduce, HDFS and developed multiple Map-Reduce jobs in Java for data cleansing and preprocessing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Used UDF's to implement business logic in Hadoop
  • Extracted files from Oracle and DB2through Sqoop and placed in HDFS and processed.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Responsible to manage data coming from different sources.
  • Supported Map-Reduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Developed mapping parameters and variables to support SQL override.
  • Used existing ETL standards to develop these mappings.
  • Extracted the data from the flat files and other RDBMS databases into staging area and populated onto Data warehouse.
  • Worked on JVM performance tuning to improve Map-Reduce jobs performance

Environment: Hadoop, MapReduce, HDFS, Hive, Oracle, Java, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Tomcat.

Hire Now