- Around 6 years of professional IT experience, 4+ years Big Data Ecosystem experience in ingestion, querying, processing and analysis of big data.
- Experience in using Hadoop ecosystem components like Map Reduce, HDFS, HBase, Zoo Keeper, Hive, Experience in meeting expectations with Hadoop clusters using Cloudera and Horton Works.
- Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.
- Sqoop, Pig, Flume, Spark, Cloud era.
- Knowledge and experience in Spark using Python and Scala.
- Knowledge on big - data database HBase and NoSQL databases Mongo DB and Cassandra.
- Experience in Spark applications using Scala for easy Hadoop transitions.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Thorough knowledge of Hadoop architecture and core components Name node, Data nodes, Job trackers, Task Trackers, Oozie, Scribe, Hue, Flume, HBase, etc.
- Extensively worked on development and optimization of Map reduce programs, PIG scripts and HIVE queries to create structured data for data mining.
- Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
- Worked in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS) and on private cloud infrastructure - Open stack cloud platform.
- Worked with both Scala and Java, Created frameworks for processing data pipelines through Spark.
- Implemented batch-processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
- Very good experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for data analysis.
- Experience in Database design, Data analysis, Programming SQL.
- Experience in extending HIVE and PIG core functionality by using Custom User Defined functions.
- Working knowledge on Oozie, a workflow scheduler system to manage the jobs that run on PIG, HIVE and SQOOP.
- Excellent team player
Languages: Python, Java
Databases: Oracle 11g
Operating Systems: Windows XP, Windows 7, Macintosh
Deployment Tool: Jenkins, Quick Build
Web Browsers: IE, Google Chrome, Firefox, Safari
Web Technology: HTML, XML, CSS, Java Script, Angular.js, Django
Others: MS office, WebLogic, Eclipse(PyDev), Tableau, SQL, Pentaho BI
Confidential - Charlotte, NC
Hadoop / Spark Developer
- Worked with Hadoop Ecosystem components like HBase, Sqoop, Zookeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
- Wrote PigScripts for sorting, joining, filtering and grouping the data.
- Developed programs in Spark based on the application for faster data processing than standard MapReduce programs.
- Developed spark programs using Scala, involved in creating Spark SQL Queries and Developed Oozie workflow for sparkjobs.
- Developed the Oozie workflows with Sqoop actions to migrate the data from relational databases like Oracle, Teradata to HDFS.
- Used Hadoop FS actions to move the data from upstream location to local data locations.
- Written extensive Hive queries to do transformations on the data to be used by downstream models.
- Developed map reduce programs as a part of predictive analytical model development.
- Developed Hive queries to do analysis of the data and to generate the end reports to be used by business users.
- Worked on scalable distributed computing systems, software architecture, data structures and algorithms using Hadoop, Apache Spark and Apache Storm etc. and ingested streaming data into Hadoop using Spark, Storm Framework and Scala.
- Got pleasant experience with NOSQL databases like MongoDB.
- Extensively used SVN as a code repository and Version One for managing day agile project development process and to keep track of the issues and blockers.
- Written spark python for model integration layer.
- Implemented Spark using Scala, Java and utilizing Data frames and Spark SQL API for faster processing of data.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Developed a data pipeline using Kafka, HBase, Mesos Spark and Hive to ingest, transform and analyzing customer behavioral data.
Environment: Hadoop, Hive, Impala, Oracle, Spark, Scala, Python, Pig, Sqoop, Oozie, MongoDB, Map Reduce, SVN.
Confidential, Cincinnati, OH
Spark & Hadoop Developer
- Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
- Written Spark jobs in Scala to analyze the engineering data.
- Used Spark to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
- Used Spark API over Cloudera to perform analytics on data in Hive.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Created Hive external tables to perform ETL on data that is generated on daily basics.
- Created SQOOP jobs to handle incremental loads from RDBMS into HDFS to apply Spark Transformations and Actions.
- Developed job flows in Oozie to automate the workflow for pig and hive jobs.
Environment: Apache Hadoop, Hive, HDFS, Scala, Spark, Linux, MySQL, Eclipse, Oozie, Sqoop, Kafka, Cloudera Distribution, Oracle.
Confidential, Providence RI
Software Engineer/Data Analyst
- Practiced the Agile Methodology/scrum process by participating in daily scrum meetings with Developers
- Involved in all the Phases of SDLC and responsible for object-oriented analysis/object-oriented design and also Worked closely with the business analysts in gathering, understanding and implementing the requirements
- Elaborated use cases, interface definitions and services specifications in collaboration with Business and System Analysts
- Implemented REST API’s using Python and Django Framework
- Developed Web-based application using Python, Django, XML, CSS, HTML,
- Worked on server side application with Django using Python.
- Developed frontend of application using Bootstrap (Model, View, Controller), Angular.js framework
- Used Jira for bug tracking and issue tracking.
- Developed complex SQL queries to generate daily enrollment reports
- Integrated Tableau with ORACLE database to visualize the daily report generated.
- As part of stabilization/support of major releases, Leading the team of 3 people to provide Root Cause Analysis for Billing and Enrollment issues reported by customers.
- Based on the RCA provide code and data fixes.
- Daily on call meeting with off-shore resources to make sure progress is in sync with on-site.
- Helped the conversion(ETL) team to map the extracted data before loading to new system from the legacy system.
Environment: Test Management Tool: Jira, AMPM (Application management and Process Manager)