We provide IT Staff Augmentation Services!

Spark & Hadoop Developer Resume

0/5 (Submit Your Rating)

Cincinnati, OH

SUMMARY

  • Around 6 years of professional IT experience, 4+ years Big Data Ecosystem experience in ingestion, querying, processing and analysis of big data.
  • Experience in using Hadoop ecosystem components like Map Reduce, HDFS, HBase, Zoo Keeper, Hive, Experience in meeting expectations with Hadoop clusters using Cloudera and Horton Works.
  • Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build, Sqoop, Pig, Flume, Spark, Cloud era.
  • Knowledge and experience in Spark using Python and Scala.
  • Knowledge on big - data database HBase and NoSQL databases Mongo DB and Cassandra.
  • Experience in Spark applications using Scala for easy Hadoop transitions.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Thorough knowledge of Hadoop architecture and core components Name node, Data nodes, Job trackers, Task Trackers, Oozie, Scribe, Hue, Flume, HBase, etc.
  • Extensively worked on development and optimization of Map reduce programs, PIG scripts and HIVE queries to create structured data for data mining.
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
  • Worked in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS) and on private cloud infrastructure - Open stack cloud platform.
  • Worked with both Scala and Java, Created frameworks for processing data pipelines through Spark.
  • Implemented batch-processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
  • Very good experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for data analysis.
  • Experience in Database design, Data analysis, Programming SQL.
  • Experience in extending HIVE and PIG core functionality by using Custom User Defined functions.
  • Working knowledge on Oozie, a workflow scheduler system to manage the jobs that run on PIG, HIVE and SQOOP.
  • Excellent team player

TECHNICAL SKILLS

Languages: Python (2.7), Java

Databases: Oracle 11g

Operating Systems: Windows XP, Windows 7, Macintosh

Deployment Tool: Jenkins, Quick Build

Web Browsers: IE, Google Chrome, Firefox, Safari

Web Technology: HTML, XML, CSS, Java Script, Angular.js, Django

Others: MS office, WebLogic, Eclipse(PyDev), Tableau, SQL, Pentaho BI

PROFESSIONAL EXPERIENCE

Confidential - Charlotte, NC

Hadoop / Spark Developer

Responsibilities:

  • Worked with Hadoop Ecosystem components like HBase, Sqoop, Zookeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
  • Wrote PigScripts for sorting, joining, filtering and grouping the data.
  • Developed programs in Spark based on the application for faster data processing than standard MapReduce programs.
  • Developed spark programs using Scala, involved in creating Spark SQL Queries and Developed Oozie workflow for sparkjobs.
  • Developed the Oozie workflows with Sqoop actions to migrate the data from relational databases like Oracle, Teradata to HDFS.
  • Used Hadoop FS actions to move the data from upstream location to local data locations.
  • Written extensive Hive queries to do transformations on the data to be used by downstream models.
  • Developed map reduce programs as a part of predictive analytical model development.
  • Developed Hive queries to do analysis of the data and to generate the end reports to be used by business users.
  • Worked on scalable distributed computing systems, software architecture, data structures and algorithms using Hadoop, Apache Spark and Apache Storm etc.
  • And ingested streaming data into Hadoop using Spark, Storm Framework and Scala.
  • Got pleasant experience with NOSQL databases like MongoDB.
  • Extensively used SVN as a code repository and Version One for managing day agile project development process and to keep track of the issues and blockers.
  • Written spark python for model integration layer.
  • Implemented Spark using Scala, Java and utilizing Data frames and Spark SQL API for faster processing of data.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Developed a data pipeline using Kafka, HBase, Mesos Spark and Hive to ingest, transform and analyzing customer behavioral data.

Environment: Hadoop, Hive, Impala, Oracle, Spark, Scala, Python, Pig, Sqoop, Oozie, MongoDB, Map Reduce, SVN.

Confidential, Cincinnati, OH

Spark & Hadoop Developer

Responsibilities:

  • Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
  • Written Spark jobs in Scala to analyze the engineering data.
  • Used Spark to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
  • Used Spark API over Cloudera to perform analytics on data in Hive.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Created Hive external tables to perform ETL on data that is generated on daily basics.
  • Created SQOOP jobs to handle incremental loads from RDBMS into HDFS to apply Spark Transformations and Actions.
  • Developed job flows in Oozie to automate the workflow for pig and hive jobs.

Environment: Apache Hadoop, Hive, HDFS, Scala, Spark, Linux, MySQL, Eclipse, Oozie, Sqoop, Kafka, Cloudera Distribution, Oracle.

Confidential, Providence RI

Software Engineer/Data Analyst

Responsibilities:

  • Practiced the Agile Methodology/scrum process by participating in daily scrum meetings with Developers
  • Involved in all the Phases of SDLC and responsible for object-oriented analysis/object-oriented design and also
  • Worked closely with the business analysts in gathering, understanding and implementing the requirements
  • Elaborated use cases, interface definitions and services specifications in collaboration with Business and System Analysts
  • Implemented REST API’s using Python and Django Framework
  • Developed Web-based application using Python, Django, XML, CSS, HTML,
  • Worked on server side application with Django using Python.
  • Developed frontend of application using Bootstrap (Model, View, Controller), Angular.js framework
  • Used Jira for bug tracking and issue tracking.
  • Developed complex SQL queries to generate daily enrollment reports
  • Integrated Tableau with ORACLE database to visualize the daily report generated.
  • As part of stabilization/support of major releases, Leading the team of 3 people to provide Root Cause Analysis for Billing and Enrollment issues reported by customers.
  • Based on the RCA provide code and data fixes.
  • Daily on call meeting with off-shore resources to make sure progress is in sync with on-site.
  • Helped the conversion(ETL) team to map the extracted data before loading to new system from the legacy system.

Environment: Test Management Tool: Jira, AMPM (Application management and Process Manager)

We'd love your feedback!