We provide IT Staff Augmentation Services!

Hadoop/big Data Developer Resume

3.00/5 (Submit Your Rating)

New York City, NY

SUMMARY

  • Around 10 years of professional IT work and experience in Analysis, Design, Administration, Development, Deployment and Maintenance of critical software and big data applications.
  • Experience in managing and reviewing Hadoop log files.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Management System and vice - versa.
  • Using Apache Flume, collected and stored streaming data (log data) in HDFS.
  • Created producers, consumer and Zookeeper setup for oracle to Kafka replication.
  • Developed producers for Kafka which compress and bind many small files into a larger Avro and Sequence files before writing to HDFS to make best use of a Hadoop block size.
  • Expertise in application development using Scala, RDBMS, and UNIX shell scripting.
  • Extensive experience with SQL, PL/SQL and database concepts.
  • Experience developing Scala applications for loading/streaming data into NoSQL databases (HBASE) and into HDFS.
  • Worked on ingesting log data into Hadoop using Flume.
  • Experience in optimizing the queries by creating various clustered, non-clustered indexes and indexed views using and data modeling concepts.
  • Experience with scripting languages (Scala, Pig, Python and Shell) to manipulate data.
  • Worked with relational database systems (RDBMS) such as My SQL, and No SQL database systems like HBase and had basic knowledge on MongoDB and Cassandra.
  • Hands on experience in identifying and resolving performance Bottlenecks in various levels like sources, Mappings and Sessions.
  • Highly Motivated, Adaptive and Quick learner.
  • Ability to adapt to evolving Technology, Strong Sense of Responsibility and Accomplishment.
  • Proven organizational, time management and multi-tasking skills and ability to work independently and quickly learn new technology and adopt to new environment
  • Strong Knowledge on Architecture of Distributed systems and Parallel processing, In-depth understanding of MapReduce programing paradigm and Spark execution framework.
  • Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop.
  • Worked on NoSQL databases including HBase, Mark Logic.
  • Experience in Implementing Continuous Delivery pipeline with Maven, Ant.
  • Installing, deploying, configuring & troubleshooting Hadoop/Big Data ecosystem components.
  • Experience working with Cloudera Distributions of Hadoop
  • Converted Map Reduce applications to Spark.
  • Excellent understanding / knowledge of Hadoop architecture and various components of Hadoop ecosystem such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce &YARN
  • Hands on experience in using Hadoop ecosystem components like Map Reduce, HDFS, Hive, Pig, Sqoop, Spark, Flume, Zookeeper, Hue, Kafka, Storm & Impala.
  • Experience with Agile Methodology, Scrum Methodology.
  • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's and Datasets.

PROFESSIONAL EXPERIENCE

Confidential, New York City, NY

Hadoop/Big Data Developer

Responsibilities:

  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
  • Created Managed tables and External tables in Hive and loaded data from HDFS.
  • Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
  • Test Driven Development (TDD) process and extensive experience with Agile and SCRUM programming methodology.
  • Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA
  • Scheduled map reduces jobs in production environment using Oozie scheduler.
  • Worked with spark core, Spark Streaming and spark SQL modules of Spark
  • Involved in developing generic Spark-Scala functions for transformations, aggregations and designing schema for rows.
  • Experienced in working with Spark APIs like RDDs, Datasets, Data Frames and DStreams to perform transformations on the data.
  • Used Spark SQL to perform interactive analysis on the data using SQL and HiveQL.
  • Scheduled several times based Oozie workflow by developing Python scripts.
  • Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT to extract data from data files to load into HDFS.
  • Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
  • Worked on S3 buckets on AWS to store Cloud Formation Templates and worked on AWS to create EC2 instances.
  • Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, Sqoop, package and MySQL.
  • Responsible for installation and configuration of Hive, Pig, HBase and Sqoop on the Hadoop cluster and created hive tables to store the processed results in a tabular format.
  • Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
  • Developed the Sqoop scripts to make the interaction between Hive and vertical Database.
  • Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
  • Build servers using AWS: Importing volumes, launching EC2, creating security groups, auto-scaling, load balancers, Route 53, SES and SNS in the defined virtual private connection.
  • Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
  • Streamed AWS log group into Lambda function to create service now incident.
  • End-to-end architecture and implementation of client-server systems using Scala, Akka, Java, JavaScript and related, Linux
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Implementing Hadoop with the AWS EC2 system using a few instances in gathering and analyzing data log files.
  • Involved in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
  • Created partitioned tables and loaded data using both static partition and dynamic partition method.
  • Developed custom Apache Spark programs in Scala to analyze and transform unstructured data.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
  • Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.

Environment: HDFS, Map Reduce, Hive, Sqoop, Pig, Flume, Vertica, Oozie Scheduler, Java, Shell Scripts, Teradata, Oracle, HBase, MongoDB, Cassandra, Cloudera, AWS, Kafka, Spark, Scala and ETL, Python

Confidential, Houston, Texas

Hadoop/Big Data Developer

Responsibilities:

  • Involved in implementing a Real-time framework to capture Streaming data and store in HDFS using Kafka, Spark.
  • Developed Kafka consumer component for near real-time and Real-Time data processing in Java and Scala.
  • Defined multiple Kafka Topics with several Partitions and replication factors across data centers.
  • Part of designing and developing a custom Java daemon to pull data from source systems and publish the resultant to a specific Kafka Topic.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Developed Data archival and data purge in HDFS using Shell and Python scripts, UC4 workflow to automate the process.
  • Defined UC4 workflows for running sequential job flows in Production
  • Responsible for creating Deployment SMOP and LLDs Documents.
  • Created a Wiki page and migrated code to Git repository.
  • Performance tuned and optimized various SQL queries.
  • Hands on experience on Informatica to load data to Teradata by making various connections to load and extract data to and from Teradata efficiently.
  • Analyzed various Relational Databases such as Oracle, SQL, Teradata, and MongoDB to understand the source systems and develop application to migrate the data to Hadoop Environment.
  • Implemented several Batch Ingestion jobs for Historical data migration from various relational databases and files using Sqoop.
  • Hands on Developing a Near Real-Time Framework using Kafka, storm to ingest data from several source systems like Oracle, SQL, Teradata and MongoDB into Hadoop Environment.
  • Expertise in integrating Kafka with Storm and Spark streaming for near real-time and real-time Frameworks.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Prepared data analytics processing, and data egress for availability of analytics results to visualization systems, applications, or external data stores.
  • Builds large-scale data processing systems in data warehousing solutions, and work with unstructured data mining on NoSQL.
  • Responsible for design and development of Spark SQL Scripts based on Functional Specifications.
  • Used AWS services like EC2 and S3 for small data sets processing and storage.
  • Provisioning of Cloudera Director AWS instance and adding Cloudera manager repository to scale up Hadoop Cluster in AWS.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
  • Successfully migrated complex SQL transformations in Pentaho that belong to Xfinity Home Security, to Hadoop using Spark and Scala.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Responsible for creating Hive tables and views using Partitions, HQL Scripts in landing layer for analytics.
  • Configured Hive Meta store with MySQL, which stores the metadata for Hive tables.
  • Performed data analytics in Hive and then exported those metrics back to Oracle Database using Sqoop.
  • Performance tuning of Hive queries, MapReduce programs for different applications.

Environment: Hadoop, HDFS, Hive, Presto, Spark, Scala, Kafka, Sqoop, UC4, Rest API, Java (1.7 & 1.8), Shell Scripting, Python Scripting, MySQL Oracle 11g, SQL

Confidential, San Jose, California

Hadoop/Big Data Developer

Responsibilities:

  • Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
  • Did spark streaming and micro-batch processing using Scala as programming language.
  • Using Hive Script in Spark for data cleaning and transformation purpose. importing of data from various data sources; perform transformations using Hive, MapReduce, load data into HDFS and extract the data from MySQL into HDFS using Sqoop.
  • Export the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team. created data pipeline process for structuring, processing, and transforming data using Kafka and Scala. created Kafka spark streaming data pipelines for consuming the data from external source and performing the transformations in Scala.
  • Contributed towards developing a Data Pipeline to load data from different sources like Web, RDBMS, NoSQL to Apache Kafka or Spark cluster.
  • Extensively used Pig for data cleansing. Create partitioned tables in Hive.
  • Use Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
  • Created custom python/shell scripts to import data via SQOOP from Oracle databases.
  • Monitor and Troubleshoot Hadoop jobs using Yarn Resource Manager and EMR job logs using Genie and Kibana.
  • Real time streaming the data using Spark with Kafka for faster processing.
  • Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDDs in Scala.
  • Log data collected from the web servers was channeled into HDFS using Flume and spark streaming.
  • Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
  • Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
  • Load and transform Design efficient Spark code using Python and Spark SQL, which can be forward engineered by our code generation developers.
  • Utilized large sets of structured, semi structured and unstructured data.
  • Created big data workflows to ingest the data from various sources to Hadoop using OOZIE and these workflows comprises of heterogeneous jobs like Hive, SQOOP and Python Script.
  • Used Git-Hub for project version management.

Environment: s: Cloudera, Map Reduce, Spark SQL, Spark Streaming, Pig, Hive, Flume, Oozie, Java, Kafka, Eclipse, Zookeeper, Cassandra, HBase, Talend, GitHub.

Confidential, New York, New York

Software Developer

Responsibilities:

  • Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure
  • Implemented JSPs corresponding to the controller where in the data was propagated into from the model and view object from the Controller.
  • Designed and Implemented MVC architecture using Spring Framework.
  • Provided technical guidance to business analysts, gather the requirements and convert them into technical specifications/artifacts.
  • Build microservices for the delivery of software products across the enterprise.
  • Analyzed the low-level design (LLD) document and high-level design document (HLD) as per the business requirements.
  • Administered and Engineered Jenkins for managing weekly Build, Test and Deploy chain, GIT with Dev/Test/Prod Branching Model for weekly releases.
  • Actively participated in Story Card Reviews during the transition from Waterfall to Agile Scrum.
  • Built Apache Kafka Multi nodes cluster to monitor multiple clusters.
  • Involved in writing the microservices in Spring boot application with Spring annotations.
  • Used Quality Center for change management and defect tracking.
  • Used Spring Framework as middle tier application framework, persistence strategy using spring support for Hibernate for integrating with database.
  • Involved in Developing and performing Unit Testing and creating mock objects using JUnit.
  • Implemented business tier using Spring Dependency Injection in order to establish relationships between application components.
  • Developed Hibernate Mapping (hbm.xml) file for mapping declaration.
  • Involved in writing simplex and complex SELECT queries in SQL for back-end testing.
  • Created database tables and implemented SQL Stored Procedures and complex queries in Oracle SQL Developer.
  • Worked on CI/CD tools like Jenkins, Docker in Devops Team.
  • Coded the UI using Struts Forms and Tags.
  • Used Hibernate ORM Tool for the backend persistence and developed DAO interfaces for interaction with the database.
  • Analyzed and performed fixes on logged application defects by comparing the system behavior to the design specifications.

Environment: Java, JSP, JSTL, Struts, Spring, Hibernate, Oracle, Eclipse, Jenkins, JUnit

We'd love your feedback!