Senior Hadoop Developer Resume
OBJECTIVE:
To be instrumental in achieving challenging goals for the organization, putting my technical knowledge in the field of Big Data Analytics and Data Warehousing and my functional know - how in Retail and Manufacturing domain into relevant application.
SUMMARY:
- 7 years of progressive IT work experience, 4 Years worked as a senior Big Data developer on Consulting Assignment with Wal-Mart expertise in Hadoop Ecosystem Technologies - Apache SPARK, Hive, Oozie, Sqoop, Java MR and supporting technologies Oracle PL/SQL, Greenplum DB, PIG, Unix Shell Scripting with exposure to Impala, Active MQ, Cassandra, etc.
- Experience in working on Hadoop distributions with Cloudera(5.4,5.5,5.7), Pivotal HD(2.0 and 3.0) and Hortonworks(2.0).
- Hadoop Developer and Big Data Analyst with experience in design, development, deployment and supporting large scale distributed systems.
- Working primarily in the Retail, Manufacturing Domain.
- Participated in Hackathon events - Hackathon Charlotte MMXVI, Walmart Datathon2015. Imported data by Pentaho, Projected analytics with Tableau.
- Proficient in working with Hive, Oozie, Sqoop and many other Hadoop Eco System components for data storage and analysis.
- Having hands on Benchmarking & Performance Tuning of hive queries using Partitions, Bucketing and Map Side join’s. Enhanced performance with TEZ.
- Expertise in handling File Formats Sequence Files, RC, ORC, Text/CSV, Avro, Parquet and analyzed using HiveQL.
- Optimized hive queries with Spark-Scala programs and thereby reduced run time of capabilities.
- Experience in troubleshooting errors in Shell, Hive and MapReduce.
- Performed importing and exporting data between HDFS and Relational Systems like MySQL, Oracle, DB2 and Teradata using Sqoop. Adhoc data with SSIS, Pentaho.
- Designed and Implemented a generic data export to GreenPlum using GPLOAD utility through Local and Named Pipe transfer.
- Expert in executing Oozie workflows by automating parallel job executions.
- Experience in writing Pig Latin scripts to group, join and filter the data.
- Led the agile transformation for the team using Kanban and Redefine the support model for IT operations resulted in more effective data delivery to the customer. Strong skills in agile environment using Kanban and Scrum.
- Maintained code using GitHub, Tortoise SVN, MS VSS and Team Forge.
- Monitored and Followed-Up tasks using JIRA, Confluence and SharePoint.
- Good experience in generating Statistics/Extracts/Reports from the Hadoop.
- Used Kanban, Waterfall, Scaled Agile Scrum software development methodology in several projects.
- Experienced in developing custom UDF's for Hive to in corporate methods and functionality of Java into HiveQL.
- Also worked as Release Engineer supporting code releases to Production.
- Initial 2 years, associated with Confidential internal Project Confidential having in-depth hands on Oracle PL/SQL with experience in Software design, development, testing, deployment and maintenance of Web applications Framework on SABA using Core Java, WDK, SQL Server and Oracle.
- My technological forte used here are Oracle PL/SQL and Java programming.
- Proficient in working with UNIX servers, WebLogic Server System Administration and WDK page development (a framework based on XML).
- Extensive experience in developing applications using Core Java.
- Experience working with RDBMS ORACLE Database.
- Analyzed performance of database objects and suggesting DBA for Indexes, schema gathering, partitioning, explain Plan, TK PROF.
- Implemented PL/SQL to perform application security and batch job scheduling Written UNIX shell scripts for data files handling, FTP and executing the SQL*Loader.
- Created email & file I/O operations utilities using Stored Procedures. Performed thorough Unit testing, System Testing and User Acceptance Testing on given environment providing quality work for functional users.
- Experienced in Identifying improvement areas for systems stability and providing end-end high availability architectural solutions.
- Good in negotiation, bug fixing and developing complex algorithms.
- Determined, committed and hardworking individual with strong communication, interpersonal and organizational skills.
TECHNICAL SKILLS:
Hadoop Platforms: Cloudera, Pivotal HD, Hortonworks
Big Data Ecosystem: Apache SPARK: Scala, Hadoop, MapReduce, YARN, HDFS, Hive, Pig, Sqoop, Oozie.
File formats: Sequence Files, RC Files, ORC Files, Text/CSV, Avro, Parquee
Databases: Hive, Impala, Greenplum, Oracle9i, Oracle10g, SQL Server 2005, MySQL,NoSQL (Cassandra)
Languages: Hadoop Technologies, Scala, Python, Java, J2EE, PL/SQL, Unix Shell Scripting.
Open Frame works: Hadoop, WDK
IDE: Eclipse Kepler, PL/SQL Developer, PGAdminIII (for Greenplum),TOAD
Version control: GIT, MS VSS, Tortoise SVN, Team Forge
Project Tracking: JIRA, Confluence, Share Point
Server Access: WinSCP, Putty, Reflection FTP
Build Tools: Maven, Jenkins
Web Technologies: XML, XSLT, Java Script, HTML
Operating Systems: Unix, Linux, Windows, Mac OS X
PROFESSIONAL EXPERIENCE:
Confidential
Hadoop Platform: Pivotal HD, Horton Works
Senior Hadoop Developer
Responsibilities:
- Havebusiness continuityif one of the Hadoop cluster goes down
- Seamlessly handle migrationsand other cluster downtimes.
- Load balance based onresource availability(Memory, CPU) in future
- Load balance based ondata availabilityin different cluster as an enhancement to the policy.
- First team to be onHortonworks distribution working out unexplored issues in 2 months.
- Participated in Hackathon Charlotte MMXVI and suggested a new analytical model to raise the fund for a NGO organization through donation in terms of money and items.
Technologies: Spark Scala, Hive QL, Oozie, Sqoop, Pentaho, Oracle SQL, Unix Shell Scripting, Tableau
Confidential, Bentonville
Hadoop Platform: Pivotal HD, Horton Works (Supported from Pivotal HD 3.0)
Senior Hadoop Developer
Responsibilities:
- Interacts with business analysts and prospective application managers to gather requirements, guide implementations and production rollouts for ETL batch & real-time applications.
- Created Base Data Layer module which has set of common tables derived which can be used across capabilities in Assortment Discipline tool.
- Developed Store Clustering Module across 1000 demographics variable, later fed into R program to form reclassified store clustering.
- Calculated and Developed Substitutability model for determining best substitutable item from the distance calculated using two point formula.
- Developed and Implemented Yules Q model by deriving household counts on visits and other aggregated metrics values.
- Analyzed and developed item loyalty with household based on the visits and items purchased.
- Performed data processing using HIVE.
- Built customer analytical attributes using HIVE.
- Enhanced HIVE queries performance using TEZ.
- Involved in loading data from UNIX file system to HDFS.
- Performance Management & Monitoring of ETL applications to monitor the health of the environments to proactively address potential issues
- Analyze customer patterns based on the attributes
- Understood the business needs and lead the team accordingly for deliverables.
- Appreciated for Delivering Assortment Discipline tool in short time without compromising on quality
- Participated in DATATHON and suggested a new analytical model with the available data and I have been ed a on the same.
Technologies:Spark-Scala, Java-ActiveMQ, Hive QL, Oracle SQL, Oozie, Sqoop, Pentaho, Unix ShellScripting, Tableau
Confidential, Bentonville
Delivery Model: Scaled Agile
Hadoop Platform: Pivotal HD
Senior Hadoop Developer
Responsibilities:
- Transferred and load datasets from Hadoop Tables to Greenplum.
- Developed and Delivered Demand Transference Module to identify the items performing poor in the stores and analyze most suitable substitutes with prospective dollar at risk.
- Built and Optimized HIVE queries for Customer Attribution datasets.
- Orchestrated HIVE queries and Shell script using oozie workflows
- Developed Hive queries to process the data for visualizing and reporting.
- Managed Hadoop cluster using Pivotal HD
- Appreciated for developing a generic module GPLOAD from Hadoop to Greenplum which was started using by all teams as a horizontal tool and commended for the versatility and flexibility of the code.
Technologies:HiveQL, Greenplum, Oozie, Sqoop, Shellscripting
Confidential
Delivery Model: Scaled Agile
Hadoop Platform: Cloudera, Pivotal HD
Developer
Responsibilities:
- Implemented Partitioning, Dynamic Partitions and SMBJ in HIVE for efficient data access.
- Optimized hive queries and modified oozie workflow design to reduce overall time taken from several days to hours
- Automated script Flat CTM module to refresh data on required time frame (full,delta,partial).
- Developed Generic Drop Partitions module which prevent duplication of data in any run.
- Processed Market Basket transaction data for Walmart customers
- Orchestrated HIVE queries and shell scripts using oozie workflows
- Managed Hadoop cluster using Cloudera
- Appreciated from all levels of management from Confidential and Walmart for handling a huge volume of without compromising on quality of data.
Technologies: Java MR, HiveQL, Oozie, Sqoop, Unix Shell Scripting
Confidential
Delivery Model:Scaled Agile
Hadoop Platform: Cloudera
Release Engineer
Responsibilities:
- Delivered CKP Version1.0 which is first of its kind in Hadoop Big data both from Confidential and Walmart accounts.
- Worked on automating importing and exporting jobs into HDFS and Hive using Sqoop from relational databases like Oracle and Teradata across all channels Walmart BM, Sams BM, Walmart.com, Sams.com, Layaway, TLE etc.
- Designed, developed automated script to SVN Project structure, build and distribution for Release Management process.
- Analysis and development of automated reusable scripts to resolve the critical issue of validation results.
- Created UDF’s, UDAF’s and Worked on automating importing and exporting jobs into HDFS and Hive using Sqoop from relational databases like Oracle and Teradata.
- Created MR Program for handling large data.
- Knowledge in performance troubleshooting and tuning Hadoop clusters in Cloudera.
Technologies: Impala, Java, HiveQL, Oozie, Sqoop, Shell scripting
Confidential
Delivery Model: Waterfall
System Administrator
Responsibilities:
- Designed, developed an interface with Career Management System leading Un-Allocated associates to maintain learning curve and stay competent for a new project.
- Suggested, developed, implemented process improvement on Batch Logs for better management of Associates issues contributing to reduced tickets against application.
- Designed and developed automated solution of batch monitoring with front end process killing.
- Generated various reports on PL/SQL with output format in Excel/Pdf/Html.
- Create procedures to help in complex data transformations for data warehouse.
Technologies: PL/SQL, Java, Shell scripting, SABA, WDK, XML
Special Software:
- Oracle 11g database
- Weblogic 8.1.4 & 8.1.6
- Java, J2EE
