Spark & Hadoop Developer Resume
Cincinnati, OH
SUMMARY
- Around 6 years of professional IT experience, 4+ years Big Data Ecosystem experience in ingestion, querying, processing and analysis of big data.
- Experience in using Hadoop ecosystem components like Map Reduce, HDFS, HBase, Zoo Keeper, Hive, Experience in meeting expectations with Hadoop clusters using Cloudera and Horton Works.
- Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build, Sqoop, Pig, Flume, Spark, Cloud era.
- Knowledge and experience in Spark using Python and Scala.
- Knowledge on big - data database HBase and NoSQL databases Mongo DB and Cassandra.
- Experience in Spark applications using Scala for easy Hadoop transitions.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Thorough knowledge of Hadoop architecture and core components Name node, Data nodes, Job trackers, Task Trackers, Oozie, Scribe, Hue, Flume, HBase, etc.
- Extensively worked on development and optimization of Map reduce programs, PIG scripts and HIVE queries to create structured data for data mining.
- Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
- Worked in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS) and on private cloud infrastructure - Open stack cloud platform.
- Worked with both Scala and Java, Created frameworks for processing data pipelines through Spark.
- Implemented batch-processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
- Very good experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for data analysis.
- Experience in Database design, Data analysis, Programming SQL.
- Experience in extending HIVE and PIG core functionality by using Custom User Defined functions.
- Working knowledge on Oozie, a workflow scheduler system to manage the jobs that run on PIG, HIVE and SQOOP.
- Excellent team player
TECHNICAL SKILLS
Languages: Python (2.7), Java
Databases: Oracle 11g
Operating Systems: Windows XP, Windows 7, Macintosh
Deployment Tool: Jenkins, Quick Build
Web Browsers: IE, Google Chrome, Firefox, Safari
Web Technology: HTML, XML, CSS, Java Script, Angular.js, Django
Others: MS office, WebLogic, Eclipse(PyDev), Tableau, SQL, Pentaho BI
PROFESSIONAL EXPERIENCE
Confidential - Charlotte, NC
Hadoop / Spark Developer
Responsibilities:
- Worked with Hadoop Ecosystem components like HBase, Sqoop, Zookeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
- Wrote PigScripts for sorting, joining, filtering and grouping the data.
- Developed programs in Spark based on the application for faster data processing than standard MapReduce programs.
- Developed spark programs using Scala, involved in creating Spark SQL Queries and Developed Oozie workflow for sparkjobs.
- Developed the Oozie workflows with Sqoop actions to migrate the data from relational databases like Oracle, Teradata to HDFS.
- Used Hadoop FS actions to move the data from upstream location to local data locations.
- Written extensive Hive queries to do transformations on the data to be used by downstream models.
- Developed map reduce programs as a part of predictive analytical model development.
- Developed Hive queries to do analysis of the data and to generate the end reports to be used by business users.
- Worked on scalable distributed computing systems, software architecture, data structures and algorithms using Hadoop, Apache Spark and Apache Storm etc.
- And ingested streaming data into Hadoop using Spark, Storm Framework and Scala.
- Got pleasant experience with NOSQL databases like MongoDB.
- Extensively used SVN as a code repository and Version One for managing day agile project development process and to keep track of the issues and blockers.
- Written spark python for model integration layer.
- Implemented Spark using Scala, Java and utilizing Data frames and Spark SQL API for faster processing of data.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Developed a data pipeline using Kafka, HBase, Mesos Spark and Hive to ingest, transform and analyzing customer behavioral data.
Environment: Hadoop, Hive, Impala, Oracle, Spark, Scala, Python, Pig, Sqoop, Oozie, MongoDB, Map Reduce, SVN.
Confidential, Cincinnati, OH
Spark & Hadoop Developer
Responsibilities:
- Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
- Written Spark jobs in Scala to analyze the engineering data.
- Used Spark to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
- Used Spark API over Cloudera to perform analytics on data in Hive.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Created Hive external tables to perform ETL on data that is generated on daily basics.
- Created SQOOP jobs to handle incremental loads from RDBMS into HDFS to apply Spark Transformations and Actions.
- Developed job flows in Oozie to automate the workflow for pig and hive jobs.
Environment: Apache Hadoop, Hive, HDFS, Scala, Spark, Linux, MySQL, Eclipse, Oozie, Sqoop, Kafka, Cloudera Distribution, Oracle.
Confidential, Providence RI
Software Engineer/Data Analyst
Responsibilities:
- Practiced the Agile Methodology/scrum process by participating in daily scrum meetings with Developers
- Involved in all the Phases of SDLC and responsible for object-oriented analysis/object-oriented design and also
- Worked closely with the business analysts in gathering, understanding and implementing the requirements
- Elaborated use cases, interface definitions and services specifications in collaboration with Business and System Analysts
- Implemented REST API’s using Python and Django Framework
- Developed Web-based application using Python, Django, XML, CSS, HTML,
- Worked on server side application with Django using Python.
- Developed frontend of application using Bootstrap (Model, View, Controller), Angular.js framework
- Used Jira for bug tracking and issue tracking.
- Developed complex SQL queries to generate daily enrollment reports
- Integrated Tableau with ORACLE database to visualize the daily report generated.
- As part of stabilization/support of major releases, Leading the team of 3 people to provide Root Cause Analysis for Billing and Enrollment issues reported by customers.
- Based on the RCA provide code and data fixes.
- Daily on call meeting with off-shore resources to make sure progress is in sync with on-site.
- Helped the conversion(ETL) team to map the extracted data before loading to new system from the legacy system.
Environment: Test Management Tool: Jira, AMPM (Application management and Process Manager)