We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Newton, MA

SUMMARY:

  • A proficient innovator with skills in data engineering, data analysis and application development.
  • Having 10 years of overall technology experience and having 7 years of experience in continuously working with Bigdata / Hadoop Ecosystem and developing using Java and Scala, Pyspark.
  • Well experience in building data pipelines for analyzing structured and unstructured data using HDFS, Hive, HBase, Pig, Spark, Kafka, Scala, Oozie and Talend ETL.
  • Having hands on experience working with various hadoop distributions mainly Cloudera (CDH), Hortonworks (HDP) and Amazon EMR.
  • Experience in importing and exporting data using Sqoop into HDFS from RDBMS like Teradata and Oracle for very large amount of data, and performed transformations on it using Hive, Pig and Spark.
  • Strong understanding of Distributed systems design, HDFS architecture, internal working details of MapReduce and Spark processing frameworks.
  • Well knowledge and experience working with Spark Framework for performing efficient data processing.
  • Involved in troubleshooting and performance tuning Spark Applications.
  • Worked extensively with Hive for performing advanced data preparation and data analysis.
  • Worked with NoSQL Databases like HBase and Cassandra.
  • Exposure with various AWS Services like EMR, S3, Redshift, Athena, etc.,
  • Experience in writing shell scripts to perform routine automation tasks.
  • Experienced in implementing scheduler using Oozie, Airflow, Crontab and Shell scripts.
  • In Hadoop, tools like NiFi and Hive can help running versatile analytics on data which is being merged into Data Lake.
  • Have experience in importing and exporting streaming data into HDFS using stream processing platforms like Flume and knowledge on Kafka messaging system.
  • Used Pig to do transformations, event joins, filter traffic and some pre - aggregations before storing the data onto HDFS.
  • Experience in installation, configuration, supporting and managing-Cloudera and Hortonworks Clusters.
  • Good familiarity with Data Migration process using Azure by integrating with GitHub repository and Jenkins. Expertise with managing and reviewing Hadoop log files.
  • Good experience working with Java for building Rest services using Spring Boot.
  • Strong experience working with Git for code repository management and Jenkins for continuous build and deployment automation.

TECHNICAL SKILLS:

Programming Languages: Core Java, Scala and Python

Big Data Ecosystem: Spark, Hadoop, HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Tez, Zookeeper, and Oozie.

NO SQL Databases: Cassandra, MongoDB, HBase.

Hadoop Distributions: Cloudera, Horton Works, AWS EMR.

Scripting Languages: SQL, Unix Shell Scripting, Hive QL, Pig Latin.

Version Control: Git, VSS (Visual Source Safe), CVS, SVN.

Databases: Oracle, SQL Server, PostgreSQL, TOAD.

BI Tools: Talend, Tableau.

Development Methodology: Agile and Waterfall.

PROFESSIONAL EXPERIENCE:

Sr. Hadoop Developer

Confidential- Newton, MA

Responsibilities:

  • Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem
  • Datasets will be loaded from two different sources like Oracle, MySQL to HDFS and Hive respectively on daily basis.
  • Installed and configured Hive on the Hadoop cluster.
  • Worked on HBase Java API to populate operational HBase table with Key value.
  • Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Developing and running Map-Reduce jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need.
  • Scheduling and managing jobs on a Hadoop cluster using Oozie work flow.
  • Experience in developing multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other file formats.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, HBase and Hive by integrating with Storm.
  • Designed and developed PIG Latin Scripts to process data in a batch to perform trend analysis.
  • Developed HIVE scripts for analyst requirements for analysis.
  • Developed java code to generate, compare & merge AVRO schema files.
  • Developed complex MapReduce streaming jobs using Java language that are implemented Using Hive and Pig.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig Latin scripts to study customer behavior.

Environment: HDFS, Sqoop, Hive, SerDe's, HBase, Sentry, Spark, Spark-SQL, Kafka, Flume, Oozie, Jason, Avro, Talend, EC2, S3, EMR, Zookeeper, Cloudera.

Confidential, Atlanta

Big Data Engineer / Hadoop Developer

Responsibilities: -

  • Ingested gigabytes of click stream data from external servers such as FTP server and S3 buckets on daily basis using customized home-grown Input Adapters.
  • Created Sqoop scripts to import/export data from RDBMS to S3 data store.
  • Developed various Spark applications using Scala to perform various enrichment of these click stream data merged with user profile data.
  • Involved in data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning and reporting.
  • Automated the data flow between the systems and managed flow of information.
  • Utilized Spark Scala API to implement batch processing of jobs. Troubleshooting Spark applications for improved error tolerance.
  • Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using Scala.
  • Fine-tuning spark applications/jobs to improve the efficiency and overall processing time for the pipelines
  • Understanding in using Kafka producer API to send live-stream data into various Kafka topics.
  • Developed Spark-Streaming applications to consume the data from Kafka topics and to insert the processed streams to HBase.
  • Utilized Spark in Memory capabilities, to handle large datasets.
  • Used Broadcast variables in Spark, effective & efficient Joins, transformations and other capabilities for data processing. Involved in migrating HiveQL queries into SparkSQL to improve performance.
  • Experienced in working with EMR cluster and S3 in AWS cloud.
  • Understanding of workflow design of Apache NiFi to get connected to AWS and store the final output in HDFS.
  • Creating Hive tables, loading and analyzing data using hive scripts. Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Involved in continuous Integration of application using Jenkins.
  • Interacted with the infrastructure, network, database, application and BA teams to ensure data quality and availability

Environment: - AWS EMR, S3, Spark, Hive, HDFS, Sqoop, Kafka, Oozie, HBase, Scala, MapReduce.

Confidential, Wilmington, DE

Big Data Engineer

Responsibilities: -

  • Extensively worked on migrating data from traditional RDBMS to HDFS using Sqoop.
  • Involved in developing spark application to perform ETL kind of operations on the data.
  • Modified existing MapReduce jobs to Spark transformations and actions by utilizing SparkRDDs, Dataframes and Spark SQL API’s.
  • Utilized Hive partitioning, Bucketing and performed various kinds of joins on Hive tables.
  • Involved in creating Hive external tables to perform ETL on data that is produced on daily basis.
  • Validated the data being ingested into HIVE for further filtering and cleansing.
  • Used Talend to load the data into our warehouse systems.
  • Involved in working with Spark ecosystem using Spark-SQL and Scala queries on different data file formats like .txt, .csv etc.
  • Developed Sqoop jobs for performing incremental loads from RDBMS into HDFS and further applied Spark transformations.
  • Loaded the data into hive tables from spark and used Parquet columnar format.
  • Created Oozie workflows to automate and productionize the data pipelines.
  • Migrating Map Reduce code into Spark transformations using Spark and Scala.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked with Tableau to connect to Impala for developing interactive dashboards.
  • Designed, documented operational problems by following standards and procedures using JIRA­­­.

Environment: -Cloudera Hadoop, HDFS, RDBMS, Spark, Scala, Sqoop, Oozie, Hive, Cent OS, MySQL, Oracle DB, Flume, Talend ETL.

Confidential, Reston, VA

Big Data / Hadoop Developer

Responsibilities: -

  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop suitable programs.
  • Installed and configured ApacheHadoop (all the distributed services) on 10 node cluster.
  • Developed Java MapReduce programs for the analysis of sample log file stored in cluster for data analysis and data cleaning.
  • Utilized Talend as an ETL tool to load data into our warehouse system.
  • Developed PIG Latin scripts for the analysis of semi structured data.
  • Responsible for creating External tables in Hive utilizing external MySQL database to store metadata.
  • Loaded historical log data into Hive table to analyze and explore useful data for business using Hive query language.
  • Used Sqoop to import/export data into HDFS and Hive from other data systems.
  • Migration of ETL processes from MySQL to Hive to test the easy data manipulation.
  • Involved in development, support and maintenance for ETL (Extract, Transform and Load) processes using Talend integration suite.
  • Performed sampling and dataset prototyping using Pig scripts.
  • Created appropriate partitions in Hive table to enhance the query speed. Also, developed Hive queries to process the data for visualizing.
  • Monitored system status and log file and diligently provided solutions for failure conditions.

Environment: Java, MapReduce, Apache Hadoop, HDFS, Cloudera CDH4, Azure, MySQL, Spring, Tableau, CentOS 6.4, Eclipse Indigo, Hive, PIG, Sqoop, Oozie, Cassandra, Talend ETL.

Confidential

Application Developer

Responsibilities: -

  • Designed and developed the UI JSP, HTML and JavaScript.
  • Used JUnit to implement test cases for Unit testing of modules.
  • Developed user interface using JSP, JSP Tag libraries (JSTL) to simplify the complexities of the application.
  • Usage of JMS messages to communicate results and interaction with the system asynchronously.
  • Monitored the error logs using Log4J.
  • Involved in Design documentation and implementation.
  • Build and maintain SQL scripts, indexes, and complex queries for data analysis and extraction.
  • Responsible for System Testing in report generation and assigning roles to users at different levels of hierarchy.
  • Used SVN version control for software configuration management.
  • Used My Eclipse IDE tool and deployed the application in Bea Web logic Application Server using ANT Scripts.

Environment: Java 5.0/J2EE, Servlets, JSP, Struts, Oracle 9i, Ant, Log4J, CSS, JavaScript, Bea Weblogic Application Server, Eclipse IDE, JMS .

Confidential

Java Developer

Responsibilities: -

  • Involved in developing the application using Java/J2EE platform. Implemented the Model View Control (MVC) structure using Structs.
  • Responsible to enhance the Portal UI using HTML, JavaScript, XML, JSP, Java, CSS as per the requirements and providing the client-side Java script validations and Server-side bean Validation Framework (JSR 303).
  • Used Spring Core Annotations for Dependency Injection.
  • Used Hibernate as persistence framework mapping the ORM objects to table using Hibernate annotations.
  • Responsible to write the different service classes and utility API which will be used across the frame work.
  • Used Axis to implementing Web Services for integration of different systems.
  • Developed Web services component using XML, WSDL and SOAP with DOM parser to transfer and transform data between applications.
  • Exposed various capabilities as Web Services using SOAP/WSDL.
  • Used SOAP UI for testing the Restful Webservices by sending and SOAP request.
  • Used AJAX framework for server communication and seamless user experience.
  • Created test framework on Selenium and executed Web testing in Chrome, IE and Mozilla through Web driver.
  • Used client-side java scripting: JQUERY for designing TABS and DIALOGBOX.
  • Created UNIX shell scripts to automate the build process, to perform regular jobs like file transfers between different hosts.
  • Used Log4j for the logging the output to the files.
  • Used JUnit/ Eclipse for the unit testing of various modules.
  • Involved in production support, monitoring server and error logs and foreseeing the Potential issues and escalating to the higher levels.

Environment: Java, J2EE, JSP, Servlets, Spring, Servlets, Custom Tags, Java Beans, JMS, Hibernate, IBM MQ Series, AJAX, Junit, Log4j, JNDI, Oracle, XML, SAX, Rational Rose, UML.

We'd love your feedback!