We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Union, NJ

SUMMARY:

  • About8+years of experience in IT industryincluding technical proficiency in Big data environment with extensive expertise in development on Hadoop ecosystemand Java.
  • Extensive experience in Hadoop platform components MapReduce (MRv1,YARN, Hive, Pig, Scoop,Oozie, Hbase, Spark, Spark streaming, Spark SQL, Elastic search, Scala.
  • Experience with working on cloud infrastructure Amazon Web Services (AWS).
  • Extensive knowledge on MongoDB concepts and good knowledge in administration.
  • Good experience in developing and implementing Sparkand its Streaming functionality using Scalaand Pythonto work with Real Time Data.
  • Proficient in writing Map Reduce Programs and using Apache Hadoop JavaAPI for analyzing the structured and unstructured data.
  • Extensive experiencein fine tuning, improving the performance and optimization of the Spark and Spark Streaming Jobs.
  • Worked on replacing MR jobs and Hive scripts with Spark SQL and Spark data transformations for efficient data processing.
  • Hands on experience on working complex MapReduce programs into Spark RDDoperations like transformations and actions.
  • Worked on loading PARQUET/TXT files in Spark Framework using Java/Scala language and created Spark Data frame and RDD to process the data and save the file in parquetformat in HDFS to load into fact table using ORC Reader.
  • Have knowledge on Apache Spark with Cassandra.
  • Monitoring the Data Streaming (DS) between web sources and HDFS (Hadoop Distributed File System).
  • Installation, configuration, management, supporting and monitoring Hadoop cluster using various distributions such as Apache Spark, Cloudera and AWS service console.
  • Development of Spark Streaming Consumer Application integrating Kafka.
  • Good understanding and knowledge of Hadoop architecture and Hands on experience with Hadoop components such asName Node, Data Node and Map Reduce concepts, Spark Execution Concepts and HDFS Framework.
  • Familiar with MongoDB clusters, Java scripting to load unstructured data into sharding environment.
  • Used Apache Kafka to aggregate log data from multiple servers and make them available in Downstream systems for analysis using spark streaming.
  • Involved in designing various stages of migrating stages from RDBMS to Cassandra.
  • Experience in launching EMRcluster, Redshiftcluster, EC2 instances, Amazon Data Pipeline,SimpleWorkflowServices.
  • Expert in working with Hive data warehouse tool - creating tables, data distribution by implementing Partitioningand Bucketing, writing and optimizing the HiveQL queries.
  • Experience in writing Pig Latin scripts to sort, group, join and filter the data.
  • Experience in writing UDF’S in java for hive and pig.
  • Successfully generated consumer lag groups from Kafka using their API.
  • Hands on experience in setting up workflows using Apache Oozie workflow engine for managing and scheduling Hadoop jobs.
  • Strong knowledge in NOSQLcolumn-oriented databases like Cassandra, MongoDB and its integration with Hadoop cluster. Working experience on HbaseandElastic Search.
  • Good Knowledge on Object Oriented Analysis and Design (OOAD) and Java Design patterns and good level of expertise in Core Java.
  • Comprehensive knowledge of Software Development Life Cycle, Agile methodology, coupled with excellent communication skills.
  • Strong analytical and Problem-solving skills.
  • Implementing Microservices in Scala along with Apache Kafka.
  • Experience working in both team and individual environments. Always eager to learn new technologies and implement them in challenging environment.
  • Team player with good Inter personnel skills, communication and presentation skills. Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines.

TECHNICAL SKILLS:

Hadoop Platform: MapReduce, Hive, Hbase, Pig, Sqoop, Oozie, Impala, Spark streaming, Spark SQL

NoSQL Databases: Hbase, MongoDB, Cassandra, Elastic Search

Programming: Core Java, SQL, Shell scripting, C, C++

AWS Hadoop Services: S3,EMR,SimpleWorkFlow,DataPipeline,Redshift Database

Operating Systems: Linux (RedHat, CentOS), Windows XP/7/8, Mac OS

NoSQL Databases: Cassandra,MongoDB,HBase,Bigtable,ElasticSearch

ETL: Pentaho Report Designer,Logstash

BI Tools: Tableau, Kibana

Hadoop platform Distributions: Hadoop,HDP,Cloudera,Hadoop Distribution CDH3, CDH4, CDH5, Pivotal HD(2.0), AWS, GCP

PROFESSIONAL EXPERIENCE:

Confidential, Union, NJ

Senior Hadoop Developer

Responsibilities:

  • Involved in building the data engineeringplatform on AWS for ingesting and aggregating and visualizing streaming real-time data from multiple sources.
  • Developed spark streaming jobs which streams the data from Kafka topics and performs transformations on the data.
  • Worked extensively on spark framework using Scalato perform ETL operations.
  • Involved in end to end development, testing and deployment of the spark jobs, doing performance tuning.
  • Worked on developing parsers using Scala API for parsing the data from different sources and data formats such as Byte code, JSON, CSV.
  • Designed and implemented by configuring Topics in new Kafka cluster in all environment.
  • Worked extensively in optimizing and tuning the spark streaming applicationto have a real-time access to data.
  • Managed Amazon Web Services (AWS)- ELB, EC2, S3, EMR and Cloud Watch.
  • Worked on receiver approach, as well as direct stream approach for streaming real-time data from Kafka using Spark Streaming.
  • Deployed EMR clusters on AWS.
  • Installed Kafka manager for consumer lags and for monitoring Kafka metrics, also this has been used for adding topics, partitions etc.
  • Involved in multiple code improvements resulting in significantly less processing time for a single streaming batch., optimizing the performance of the pipeline.
  • Hands on experience on working with Amazon EMR framework transferring data to EC2 Server.
  • Worked on developing a parser for converting the Network data in byte code format to Json format using Scala API.
  • Developed automated scripts for provisioning of the clusters for Kafka, Zookeeper, Elastic Search.

Environment: Scala, Spark, Spark Streaming, Kafka, ElasticSearch, Zookeeper, Python, Java, Shell Scripting, AWS EMR.

Confidential, Tampa, FL

Hadoop Developer

Responsibilities:

  • Involved in working on Spark SQL Code as an alternative approach for Faster Data Processing and better Performance.
  • Proposed an automated system using Shell script for the Hadoop jobs deployment process.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts,involved in writing Hive, Pig scripts for complex transformations.
  • Used Kafka functionality like distribution, partition, replicated commit log service by messaging systems by maintaining feeds and created applications, which monitors consumer lag within Apache Kafka clusters.
  • Implemented Hive custom UDF’s to achieve comprehensive data analysis.
  • Writing Oozie workflows to run multiple Hive, shell script and Pig jobs which run independently with time and data availability.
  • Prepared pig scripts and spark sql to handle all the transformations specified in the S2TM’s and to handle SCD2 and SCD1 scenarios.
  • Used Apache NiFi to implement a system to store, send and ingest data from hundreds of devices.
  • Load the data into Spark RDD and caching to avoid shuffling, experienced with batch processing of data sources using Apache Spark.
  • Experience on developing API and framework on YARN applications using Apache TEZ.
  • Developed a system to monitor Agile teams and performed log analysis on ELK Stack.
  • Experience in managing large-scale, geographically- distributed database systems, including relational (Oracle, SQL Server) and NOSQL (MongoDB, Cassandra) systems.
  • Involved in ingesting data into IDW staging directly through Spark Sqoop to push data into HDFS.
  • Handled installation, administration and configuration of ELK Stack on AWS and performed log analysis.
  • Experience in developing custom processors in Apache NiFi.
  • Designed a messaging system using Apache Kafka to send messages across teams.
  • Used Shell scripting for automation of scripts.
  • Worked on QA support activities, test data creation and Unit testing activities.
  • Worked in Agile development approach.

Environment: HortonworksDataPlatform Hadoop Platform, Apache TEZ, HDFS, Kafka,Spark RDD, HBase, Hive, Java, Sqoop, Oracle, MySQL, Spark, Storm, NOSQL, Apache NiFi, ELK Stack.

Confidential, Omaha, NE

Hadoop Developer

Responsibilities:

  • Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
  • Worked on automation of delta feeds from, Teradata using Sqoop, also from FTP Servers to Hive.
  • Involved in loading one of the largest tables (SCAN table) from Teradata to Hadoop using TPT utility.
  • Implemented Spark using Spark SQL for faster testing and processing of data.
  • Developed Spark code using Scala and Spark-SQL for faster testing and processing of data.
  • Responsible for load, aggregate and move large amounts of log data using Flume.
  • Worked on the core and Spark SQL modules of Spark extensively.
  • Established custom MapReduces programs to analyze data and used Pig Latin to clean unwanted data.
  • Strong skills on SQL, Hive, Impalato extract data from SQL server, Oracle and Hadoop databases.
  • Involved in analyzing the existing BTEQ scripts on mainframes and implementing the same logic Hadoop.
  • Created complex queries aggregating large datasets in Impala to perform data quality checks for the project.
  • Involved in exporting data from Hadoop to Greenplum using GPload utility.
  • Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis
  • Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side join’s.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
  • Participated in requirement gathering from the Experts and Business Partners and converting the requirements into technical specifications
  • Implemented daily workflow for extraction, processing and analysis of data with Oozie.
  • Involved in loading data from LINUX file system to HDFS.

Environment: Hadoop (Cloudera, Pivotal HD), Teradata 13.0, Pig, Hive, Sqoop, Flume, MapReduce, HDFS, LINUX, Oozie,Spark, Impala.

Confidential, Boston, MA

Java-Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce.
  • Worked on debugging, performance tuning of Hive & Pig Jobs.
  • Worked on tuning the performance Pig queries.
  • Experience working on processing unstructured data using Pig and Hive.
  • Worked on evaluating complex business metrics in Pig, MapReduce.
  • Created Hive scripts to process the data for analysis.
  • Focused on programming different Java modules and integration.
  • Implemented Java mail services for email notifications.
  • Actively involved in design and developing of Java/JEE components.

Environment: Kafka, Data Pipeline, MapReduce (Java), Map-Reduce, Hive, Pig

Confidential

Java Developer

Responsibilities:

  • Used Java, JSP, JSTL while enhancing the functionality and responsibility for creating database tables on DB2.
  • Written JavaScript code for front end validation.
  • Involved in various phases of Software development life cycle (SDLC) as requirement gathering, data modeling analysis, architecture design and development for the project.
  • Worked on Java Messaging Services (JMS) for developing messaging services.
  • Developed Server-Side services using Java concepts. Involved in core Java technologies, Multithreading and exceptional handling.
  • Involved in developing Front-end applications which will interact the mainframe applications using J2C connectors.
  • Used JDBC object relational mapping and persistence.
  • Designed and implemented scalable, Restful and microservices-based back-end. The back-end is written in Java using Spring Boot for simplicity and scalability.
  • Used Junit to develop test cases for performing Unit testing.
  • Used JIRA as a bug reporting tool for updating the bug report.
  • Developing new and maintaining existing functionality using SPRING MVC, Hibernate.

Environment: HTML, JavaScript, CSS, Servlets, JSP, XML, ANT, Soap, JIRA, Junit, Ajax, GIT

Confidential

Junior Java Developer

Responsibilities:

  • Involved in gathering business requirements, analyzing the project and creating UML diagrams such as Use cases, class diagrams and flow charts.
  • Developed front end using JSTL, JSP, HTML and JavaScript.
  • Creating new and maintained existing web pages build in JSP and Servlets.
  • Extensively worked on views, Stored procedures, triggers and SQL queries and for loading the data (Staging) to enhance and maintain the existing functionality.
  • Coded and developed multi-tiered architecture in Java, J2EE, Servlets.
  • Consumed Web Services (WSDL, SOAP, UDDI) from third party for authorized payments to/from customers.
  • Developed Hibernate Mapping file (. hbm.xml) files for mapping declarations.
  • Actively involved from the start of the project, gathering requirements to quality assurance testing.
  • Writing/ Manipulating the database queries, stored procedures for Oracle9i.

Environment: Java JDK 1.5, Oracle, Java/J2EE, JSP, Web Logic Application Server, HTML, Servlets, UML, XML, WSDL, SOAP, UDDI.

We'd love your feedback!