We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Austin, TX

SUMMARY:

  • Hadoop Developer with 8+ Years of IT experience including 4 years in Big Data and Analytics field in Storage, Querying, Processing and Analysis for developing E2E Data pipelines. Expertise in designing scalable Big Data solutions, data warehouse models on large - scale distributed data, performing wide range of analytics.
  • Expertise in all components of Hadoop/Spark Ecosystems - Spark, Hive, Pig, Flume, Sqoop, HBase, Kafka, Oozie, Impala, Stream sets, Apache NIFI, Hue, AWS.
  • 3+ years of experience working in programming languages Scala/Python.
  • Extensive knowledge on data serialization techniques like Avro, Sequence Files, Parquet, JSON and ORC.
  • Acute knowledge on Spark architecture and real-time streaming using Spark.
  • Hands on experience with Spark Core, Spark SQL and Data Frames/Data Sets/RDD API.
  • Good knowledge on Amazon Web Services (AWS) cloud services like EC2, S3, EMR and VPC.
  • Experienced in Data Ingestion, Data Processing, Data Aggregations, Visualization in Spark Environment.
  • Hands on experience in working with large volume of Structured and Un-Structured data.
  • Expert in migrating the code components from SVN repository to Bit Bucket repository.
  • Experienced in building Jenkins pipelines for continuous code integration from Github into Linux machine. Experience in Object Oriented Analysis Design (OOAD) and development.
  • Good understanding in end-to- end web applications and design patterns.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Experience in implementing by using agile methodology. Well versed in using Software development methodologies like Agile Methodology and Waterfall processes.
  • Experienced in handling databases: Netezza, Oracle and Teradata.
  • Strong team player with good communication, analytical, presentation and inter-personal skills.

TECHNICAL SKILLS:

Bigdata Technologies: HDFS, Map Reduce, Pig, Hive, Sqoop, Oozie, Scala, Spark, Kafka, Flume, Ambari, Hue

Hadoop Frameworks: Cloudera CDHs, Hortonworks HDPs, MAPR.

Database: Oracle 10g/11g, PL/SQL, MySQL, MS SQL Server 2012, DB2

Language: C, C++, Java, Scala, Python

AWS Components: IAH, S3, EMR, EC2,Lambda, Route 53, Cloud Watch, SNS

Methodologies: Agile, Waterfall

Build Tools: Maven, Gradle, Jenkins.

NOSQL Databases: HBase, Cassandra, MongoDB, DynamoDB

IDE Tools: Eclipse, Net Beans, Intellij

Modelling Tools: Rational Rose, Star UML, Visual paradigm for UML

Architecture: Relational DBMS, Client-Server Architecture

Cloud Platforms: AWS Cloud

BI Tools: Tableau

Operating System: Windows 7/8/10, Vista, UNIX, Linux, Ubuntu, Mac OS X

PROFESSIONAL EXPERIENCE:

Confidential, Austin, TX

Hadoop Developer

Responsibilities:

  • Worked on Hortonworks-HDP 2.5 distribution.
  • Involved in review of functional and non-functional requirements.
  • Responsible for designing and implementing the data pipeline using Big Data tools including Hive, Spark, Scala and Stream Sets.
  • Experience in using Apache Storm, Spark Streaming, Apache Spark, Apache NiFi, Kafka and Flume in creating data streaming solutions.
  • Developed and implemented Apache NIFI across various environments, written QA scripts in Python for tracking files.
  • Involved in importing data from Microsoft SQL Server, MySQL, and Teradata into HDFS using Sqoop.
  • Good knowledge in using Apache NIFI to automate the data movement.
  • Developed Sqoop scripts to import data from relational sources and handled incremental loading.
  • Extensively used Stream Sets Data Collector to create ETL pipeline for pulling the data from RDBMS system to HDFS.
  • Implemented the data processing framework using Scala and Spark SQL.
  • Worked on implementing the performance optimization methods to improve the data processing timing.
  • Experienced in creating the shell scripts and made jobs automated.
  • Extensively worked on Data frames and Datasets using Spark and Spark SQL.
  • Responsible for defining the data flow within Hadoop eco system and direct the team in implement them and exported the result set from Hive to MySQL using Shell scripts.
  • Worked on Kafka Streaming using stream sets to process continuous integration of data from Oracle systems to hive tables.
  • Developed a generic utility in Spark for pulling the data from RDBMS system using multiple parallel connections.
  • Integrated existing code logic in HiveQL and implemented in the Spark application for data processing.
  • Extensively used Hive/Spark optimization techniques like Partitioning, Bucketing, Map Join, parallel execution, Broadcast join and Repartitioning.

Environment: Spark, Python, Scala, Hive, Hue, UNIX Scripting, Spark SQL, Stream sets, Kafka, Impala, Beeline, Git, Tidal.

Confidential, Washington, D.C

Hadoop Developer

Responsibilities:

  • Worked on Hortonworks-HDP 2.5distribution.
  • Experience in implementing Scala framework code using IntelliJ and UNIX scripting to implement the workflow for the jobs.
  • Involved in gathering business requirement, analyze the use case and implement the use case end to end.
  • Worked closely with the Architect; enhanced and optimized product Spark and Scala code to aggregate, group and run data mining tasks using Spark framework.
  • Experienced in loading the raw data into RDDs and validate the data.
  • Experienced in converting the validated RDDs into Data frames for further processing.
  • Implemented the Spark SQL code logic to join multiple data frames to generate application specific aggregated results.
  • Experienced in fine tuning the jobs for better performance in the production cluster space.
  • Worked totally in agile methodologies, used Rally scrum tool to track the User stories and Team performance.
  • Worked extensively in Impala Hue to analyze the processed data and to generate the end reports.
  • Experienced working with hive database through beeline.
  • Worked on analyzing and resolving the production job failures in several scenarios.
  • Implemented UNIXscripts to define the use case workflow and also to process the data files, and automate the jobs..

Environment: Spark, Scala, Hive, Sqoop, UNIX Scripting, Spark SQL, IntelliJ, Hbase, Kafka, Impala, Hue, Beeline, Git.

Confidential, Atlanta, GA

Hadoop Developer

Responsibilities:

  • Worked on Cloudera CDH distribution.
  • Hand on experience on cloud services like Amazon Web Services (AWS)
  • Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location.
  • Involved in complete SDLC - Requirement Analysis, Development, Testing and Deployment into Cluster.
  • Worked hand-in-hand with the Architect; enhanced and optimized product Spark code to aggregate, group and run data mining tasks using Spark framework.
  • Extracted data from various SQL database sources into HDFS using Sqoopand also ran Hive scripts on the huge chunks of data.
  • Implemented a prototype for the complete requirements using Splunk, python and Machine learning concepts.
  • Design and Implementation of Map reduce code logic for Natural Language Processing of Free Form Text.
  • Deployed the project on Amazon EMR with S3 Connectivity.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service(S3).
  • Loaded the data into Simple Storage Service (S3) in the AWS Cloud.
  • Good Knowledge in using of Amazon Load Balancer for Auto scaling in EC2 servers.
  • Implemented Spark scripts to migrate map reduce jobs into Spark RDD transformations, streaming data using Apache Kafka.
  • Implemented Spark SQL queries which intermix the Hive queries with the programmatic data manipulations supported by RDDs and data frames in scala and python.
  • Involved in Deployment of Code Logic and UDFsacross the cluster.
  • Communicate deliverables status to user/stakeholders, client and drive periodic review meetings.
  • Worked on Data Processing using Hive queries in HDFS and the shell Scripts to wrap the HQL scripts.
  • Developed and Deployed Oozie Workflows for recurring operations on Clusters.
  • Experienced in performance tuning of hadoop jobs for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Used Tableau reporting tool to generate reports from the outputs stored in HDFS.

Environment: Hadoop, Spark, HDFS, Hive, Map Reduce, Sqoop, Oozie, Tableau.

Confidential

Hadoop Developer

Responsibilities:

  • Worked on Cloudera CDH distribution.
  • Design and Implement historical and incremental data ingestion techniques from multiple external systems using Hive, pig and sqoop ingestion tools.
  • Design physical data models for structured and semi-structured to validate the raw data into HDFS.
  • Design map/reduce logic and HIVES queries for generating aggregated metrics.
  • Involved in Design, implementation, development and testing phases in the project.
  • Responsible to monitor the jobs in production cluster while and trace the error logs when the jobs fails.
  • Design and Develop data migration logic for exporting data from MySQL to Hive.
  • Design and Develop complex workflow in Oozie for recurrent job execution.
  • Used SSRS reporting tool for the generation of data analysis reports.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Oozie, Eclipse, Cloudera, Sqoop, SSRS

Confidential

Software Developer

Responsibilities:

  • Involved in complete SDLC - Requirement Analysis, Development, Testing and Deployments.
  • Involved in resolving critical Errors.
  • Responsible to deploy the deliverables of sprints successfully.
  • Involved in capturing the client’s requirements and enhancements on the application document the requirements and populate to the associated teams.
  • Design and Implementation of REST Full services and WSDL in VORDEL.
  • Implemented complex SQL quires to get the analysis reports.
  • Created Desktop applications using J2EE, Swings.
  • Involved in developing applications using Java, JSP, Servlets, Swings.
  • Developed UI using HTML, CSS, Ajax, JQuery and developed Business logic and Interfacing Components using Business Objects, XML and JDBC.
  • Created applications, connection pools, deployment of JSP & Servlets.
  • Used Oracle, MySQL database for storing user information.
  • Developed backed for application using PHP for web applications.
  • Experienced with the Agile Methodologies.

Environment: SOAP, REST, HTML, WSDL, 22 Vordel, SQL Developer

We'd love your feedback!