Sr. Hadoop/spark Developer Resume
3.00/5 (Submit Your Rating)
Kaiser Permanente, CA
SUMMARY:
- 10.5 years of working in IT Industry experience including 4+ years of experience in Big Data Analytics and development.
- Having good experience in Big data related technologies like Hadoop Frameworks, SparkCore, Spark Streaming, Hive, Sqoop, Kafka, Flume, Oozie
- Excellent knowledge on Hadoop ecosystems such as HDFS, YARN, Name Node, Data Node, Utility Node, Edge/Gateway Node and Map Reduce programming paradigm
- Experience in writing queries for moving data from HDFS to Hive and analyzing data using Hive - QL
- Excellent knowledge of Partitions, bucketing, Join optimization, Query optimization concepts in Hive
- Experience in importing and exporting data using Sqoop from Relational Database System (RDBMS), Hive into HDFS
- Experience in optimizing and tuning Hive, Spark, Mapreduce jobs to meet performance requirements
- Experience in Spark Streaming with live data stream. Used Flume and Kafka to ingest data into Spark Streaming
- Experience in Spark Core, Spark Sql, Spark Streaming, Data Frames, RDD's, Scala for Spark
- Experience in using Streams, Accumulator variables, Broadcast variables,RDD caching for Spark Streaming
- Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase
- Experience in Oozie and workflow scheduler to manage Hadoop jobs
- Familiarity with the Hadoop Architecture, Design of data ingestion pipeline, data mining and modeling and machine learning
TECHNICAL SKILLS:
BigData/Hadoop: HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume,Kafka, Spark, Spark Streaming, Oozie, HBase
Languages: Scala, Python, Java, SQL, PL/SQL, HiveQL, Pig Latin, Shell Scripting
Databases: MySQL, Oracle
BI Tools: Tableau
Development Tools: Eclipse, PyCharm, Toad, SqlDeveloper
PROFESSIONAL EXPERIENCE:
Confidential, CA
Sr. Hadoop/Spark Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model that gets the data from Kafka in near real time and persists into HBase.
- Involved in performance tuning and porting the system from Hive to Spark
- Developed Scala scripts using both Data frames/SQL and RDD in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Experience in scheduling jobs using Oozie workflow.
- Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis
- Worked extensively with Sqoop for importing/exporting data from/to Oracle
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Performance tuning of hive queries / jobs.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE
- Developed Hive queries to process the data and generate the data cubes for visualizing
- Implemented schema extraction for Avro file Formats in Hive.
- Experience with Talend open studio for designing ETL Jobs for Processing of data
- Used Reporting tools like Tableau to connect with Hive for generating daily reports of data
- Collaborated with the infrastructure, network, database, application and BI teams to ensure the data quality and availability
Hadoop Developer
Responsibilities:
- Prepare technical design documents based on business requirements and prepare data flow diagrams.
- Integrated Hadoop with Oracle in order to load and then cleanse raw structured data in Hadoop ecosystem to make it suitable for processing in Oracle using stored procedures and functions.
- Used SQOOP for importing data into HDFS and exporting data from HDFS to oracle database
- Developed Oozie Workflows for daily incremental loads, which gets data from Oracle and then imported into hive tables.
- Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Developed Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.
- Optimized the Hive tables using optimization techniques like Partitioning, Dynamic Partitions, and Buckets to provide better performance with HiveQL queries.
- Developed Pig Scripts for ETL kind of operation on captured data and delta record processing between newly arrived data and already existing data in HDFS.
- Used hive to do transformations, event joins, filter boot traffic and some pre-aggregations before storing the data onto HDFS.
- Worked in creating HBase tables to load large sets of semi structured data coming from various sources.
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
- Support/Troubleshoot hive programs running on the cluster and Involved in fixing issues arising out of duration testing
- Good experience in handling data manipulation using python Scripts.
- Worked extensively in performance optimization by adopting/deriving at appropriate design patterns of the MapReduce, Hive jobs by analyzing the I/O latency, map time, combiner time, reduce time etc.
- Actively involved in code review and bug fixing for improving the performance.
- Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs
- Created HBase tables to store various data formats of incoming data from different portfolios.
- Troubleshooting: Used Hadoop logs to debug the jobs execution
Sr. Database Engineer
Responsibilities:
- Coordinating offshore team
- Analyze business requirements
- Prepare technical design documents based on business requirements.
- Implement new design as per technical specifications.
- Writing the Advance PL/SQL ETL scripts for onetime data migration of Multi Billion of records i.e. Writing complex SQL queries and PL/SQL procedures to extract data from various source tables, Creating Database Objects like Tables, indexes, views, sequences, synonyms, tables, partitions, Global Temporary Tables, External Tables etc.
- Played key role in optimizing the migration scripts which included creating indexes and providing Hints using dbms state, explain plan, trace, tkproof utility so that it can complete data migration tasks within deployment window
- Writing audit scripts for auditing the migrated data.
- Development: Creating stored procedures, functions, Packages, Database triggers, constraints, indexes, grants and sequences based on business requirements
- Developed back end interfaces using PL/SQL packages, stored procedures, functions, Collections, Object Types, Oracle queues.
- Created PL/SQL scripts to extract the data from the operational database into simple flat text files using UTL FILE package
Sr. Database Engineer
Responsibilities:
- Analyzing the requirement
- Technical Design document preparation
- Development: Writing procedure, functions, packages, triggers using SQL, PL/SQL
- Performance tuning, which included creating indexes and providing Hints using explain plan, trace, tkproof utility
- Front-end development using Forms 10g and Reports in Reports10g
- Interacting with client to gather change requests, analyzing the affect of new changes.
- UT and SIT
- Analyzing and fixing the defects in various stages of testing cycle.
- Provide production support for the deployed project till it is stabilized
- Got appreciation from client for creating multilevel approval process (work flow) that is applied in overall application.
Oracle Developer
Responsibilities:
- Analyzing the requirement
- Technical Design document preparation
- Development: Writing procedure, functions, packages, triggers using SQL, PL/SQL
- Performance tuning, which included creating indexes and providing Hints using explain plan, trace, tkproof utility
- Development of many new, Enhancement and customizing of Forms, Oracle Reports