Sr. Hadoop Developer Resume
Malvern, PA
SUMMARY
- IT Professional with more than thirteen (13) years of experience in Hadoop/Bigdata ecosystems and related technologies
- Excellent experience in Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode Data Node and MapReduce programming paradigm
- Expertise in Spark using program interfaces Scala and Python (pyspark)
- Extensive experience in ETL process, data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining and advanced data processing
- Experienced in optimizing ETL workflows
- Proficient in designing and coding Oozie workflows in the ETL and conformance layers in Hadoop ecosystems
- Expertise in Hive, Impala and Hive UDFs
- Exported and imported large volumes of data from the Hadoop ecosystem to various relational databases using Sqoop
- Proficient in AWS EMR/EC2, AWS S3, Cloud Formation, CloudWatch and AWS lambda function etc.
- Well versed in Atlassian tools like Bamboo, Bitbucket, Github and JIRA
- Expertise in IBM Mainframe with deep knowledge in Mainframe based applications and Mainframe tools
- Expertise in troubleshooting and in leading teams to fix production issues
- Proficient in project management, production support, application development, programming, system analysis, software quality assurance and change management process with various clients
- Conversant with all phases of project Life Cycle including requirement gathering, analysis, design, development, testing, implementation, software quality standards, configuration management, change management and quality procedures
- Expertise in handling support and maintenance projects with hands on experience with ticket tracking tools like HP SMPO, ITSM, Remedy and JIRA
- Hands on experience in the migration of mainframe applications to other technologies like SAP, UNIX etc., and re - hosing and decommissioning mainframe to Microfocus enterprise server
TECHNICAL SKILLS
Platforms/frameworks: Hadoop 2.7, IBM S/390, IBM PC Compatibles
Operating Systems: Linux, OS/390, Windows 10/7/XP/2000/Server, MS-DOS
API: Spark 1.6/2.x, MapReduce
Programming Language: Python, Scala, Java, VS COBOL, JCL, Easytrive, SAS
Scripting Language: Korn shell/UNIX shell scripting, XML, SQL
Workflow: Oozie
Databases: Hive, Impala, DB2, Oracle, IMS DB
ETL Tool: Sqoop, Flume, Kafka
Web Interface: Hue
File systems: Avro files, Parquet files, HDFS, VSAM
OLTP: CICS, IMS DC/TM
Middleware: MQ Series
Tools: /Technologies: Spring Tool Suite, Eclipse, Crucible, Changeman, Endavor, PanvaletPanvalet, Xpeditor, DB2/VSAM, Fileaid, Paltinum Startool, SARJobtrac, SPUFI, QMF, Tape Management system (TMS), OPC scheduler, Abendaid, DADS, IBM debugger, Mainframe Express
Tracking Tools: Atlassian Tools, Bit Bucket, Bamboo, JIRA Remedy, ITSM, HP SMPO, Version One
PROFESSIONAL EXPERIENCE
Sr. Hadoop Developer
Confidential, Malvern, PA
Responsibilities:
- Migrated the various client score models developed in On-Prem Hadoop to AWS EMR
- Refactored the existing score model logic from warehouse tables to enterprise table and mapped the warehouse logic to enterprise logic
- Coded the new score model programs in Pyspark and Spark-Scala
- Converted the existing Pyspark and Spark Scala programs from Spark 1.6 to Spark 2.2 version
- Migrated the existing Sqoop tables from On-Prem to S3 bucket and built the new Sqoop pipeline for the newly added enterprise tables to S3 bucket
- Converted the existing integration suites running in Impala to Hive/S3 bucket
- Created and customized the Cloud Formation templates using troposphere and spin up the AWS EMR cluster
- Integrated, built and deployed the Cloud Formation/delete templates like S3-copy, created stack and deleted stack using Bamboo
- Created CloudWatch events for the AWS EMR logs and integrated the CloudWatch logs with Splunk
Environment: AWS EMR/EC2, AWS S3, Splunk, CloudWatch, AWS Lambda, Hadoop2.7, Spark 2.2, Scala, Python 3.7, Oozie, Sqoop, Hive Presto, UNIX shell scripting, Bamboo, Bitbucket, JIRA, Control M
Sr. Hadoop Developer
Confidential, Pennigton, NJ
Responsibilities:
- Designed the data lake to pull HMDA loan details of various clients from the upstream system like Peaks, nCino
- Designed and implemented the Sqoop process to pull the client data from various Oracle databases to Hadoop environment of Confidential
- Implemented the ETL process and conformance codes for the HMDA data lake
- Designed and implemented the Oozie workflow to import and export the client’s loan information to various loan processing and data analytical systems in Confidential
- Created and worked with Hive tables in the Hadoop data hub region and stored the Sqoop data in the parquet format
- Designed and coded the conformance logic using Spark-Scala which can be used for target or consuming systems
- Optimized the Spark-Scala and Spark-SQL codes in the conformance layers for process improvement
- Implemented the Oozie coordinator and scheduled the daily/weekly/monthly jobs
- Created the test suites using JUnit and performed the units, integration and end to end testing using JUnit in QA and SIT regions
- Optimized the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution
Environment: Hadoop 2.7, Spark, Scala, Oozie, Sqoop, Hive, Impala, Oracle, Hue, UNIX, shell scripting
Sr. Hadoop Developer
Confidential, Malvern, PA
Responsibilities:
- Designed the ETL process to bring the client score details from Teleaf, data warehouse and enterprise system to Confidential Hadoop ecosystem
- Worked with the business users to understand and clarify the business requirements and prepared the design documents
- Designed, coded and implemented the Sqoop process and imported the score detailed Hadoop data hub
- Performed cleanse and validated the imported data convert to the Avro file format which are accessible to the Hadoop data mart environments
- Made the necessary changes to cleanse and validate programs using spark-scala
- Designed and coded the score calculation logic for the Confidential clients using pyspark and executed the pyspark programs in Hadoop data mart environment
- Designed and implemented the Oozie workflow for the daily/weekly/monthly client score calculation and Web interaction reports
- Implemented the Oozie coordinator and scheduled the daily/weekly/monthly jobs
- Created the test suites using pyspark and performed the units, integration and end to end testing using the pyunits
- Converted the Avro files in the Hadoop data hub to parquet format using Hive scripts
- Imported the data from Oracle and DB2 database to Hadoop ecosystem using Sqoop
- Created the Hive tables in Hadoop data mart environment and validated the performance of Hive and Impala queries against the master tables
- Optimized the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution
- Fine-tuned the pyspark codes for the optimized utilization of Hadoop resources for the production run
- Executed the comparison test in the production region and fine tune the end results to ensure accuracy
- Troubleshot and fixed the daily Oozie workflow failure and implemented permanent fixes
- Analyzed the Java-MapReduce program, prepared the analysis documents and performed the feasibility study to convert the Java-MapReduce programs to spark-python (pyspark)
- Prepared the high level/low level design documents for the conversion Java, Map Reduce codes to pyspark
- Coded the programs in pyspark for those in Java and performed the unit/integration/regression and comparison testing to ensure that the newly converted codes have the same functionality and performance with the codes in Java
- Mentored team members and provided the application training for the new hires
Environment: Hadoop 2.7, Spark, Python, Scala, Oozie, Sqoop, Hive, Impala, Oracle, DB2, Hue, UNIX shell scripting, SAS
