Sr. Hadoop Developer Resume Malvern, PA - Hire IT People

SUMMARY

IT Professional with more than thirteen (13) years of experience in Hadoop/Bigdata ecosystems and related technologies
Excellent experience in Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode Data Node and MapReduce programming paradigm
Expertise in Spark using program interfaces Scala and Python (pyspark)
Extensive experience in ETL process, data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining and advanced data processing
Experienced in optimizing ETL workflows
Proficient in designing and coding Oozie workflows in the ETL and conformance layers in Hadoop ecosystems
Expertise in Hive, Impala and Hive UDFs
Exported and imported large volumes of data from the Hadoop ecosystem to various relational databases using Sqoop
Proficient in AWS EMR/EC2, AWS S3, Cloud Formation, CloudWatch and AWS lambda function etc.
Well versed in Atlassian tools like Bamboo, Bitbucket, Github and JIRA
Expertise in IBM Mainframe with deep knowledge in Mainframe based applications and Mainframe tools
Expertise in troubleshooting and in leading teams to fix production issues
Proficient in project management, production support, application development, programming, system analysis, software quality assurance and change management process with various clients
Conversant with all phases of project Life Cycle including requirement gathering, analysis, design, development, testing, implementation, software quality standards, configuration management, change management and quality procedures
Expertise in handling support and maintenance projects with hands on experience with ticket tracking tools like HP SMPO, ITSM, Remedy and JIRA
Hands on experience in the migration of mainframe applications to other technologies like SAP, UNIX etc., and re - hosing and decommissioning mainframe to Microfocus enterprise server

TECHNICAL SKILLS

Platforms/frameworks: Hadoop 2.7, IBM S/390, IBM PC Compatibles

Operating Systems: Linux, OS/390, Windows 10/7/XP/2000/Server, MS-DOS

API: Spark 1.6/2.x, MapReduce

Programming Language: Python, Scala, Java, VS COBOL, JCL, Easytrive, SAS

Scripting Language: Korn shell/UNIX shell scripting, XML, SQL

Workflow: Oozie

Databases: Hive, Impala, DB2, Oracle, IMS DB

ETL Tool: Sqoop, Flume, Kafka

Web Interface: Hue

File systems: Avro files, Parquet files, HDFS, VSAM

OLTP: CICS, IMS DC/TM

Middleware: MQ Series

Tools: /Technologies: Spring Tool Suite, Eclipse, Crucible, Changeman, Endavor, PanvaletPanvalet, Xpeditor, DB2/VSAM, Fileaid, Paltinum Startool, SARJobtrac, SPUFI, QMF, Tape Management system (TMS), OPC scheduler, Abendaid, DADS, IBM debugger, Mainframe Express

Tracking Tools: Atlassian Tools, Bit Bucket, Bamboo, JIRA Remedy, ITSM, HP SMPO, Version One

PROFESSIONAL EXPERIENCE

Sr. Hadoop Developer

Confidential, Malvern, PA

Responsibilities:

Migrated the various client score models developed in On-Prem Hadoop to AWS EMR
Refactored the existing score model logic from warehouse tables to enterprise table and mapped the warehouse logic to enterprise logic
Coded the new score model programs in Pyspark and Spark-Scala
Converted the existing Pyspark and Spark Scala programs from Spark 1.6 to Spark 2.2 version
Migrated the existing Sqoop tables from On-Prem to S3 bucket and built the new Sqoop pipeline for the newly added enterprise tables to S3 bucket
Converted the existing integration suites running in Impala to Hive/S3 bucket
Created and customized the Cloud Formation templates using troposphere and spin up the AWS EMR cluster
Integrated, built and deployed the Cloud Formation/delete templates like S3-copy, created stack and deleted stack using Bamboo
Created CloudWatch events for the AWS EMR logs and integrated the CloudWatch logs with Splunk

Environment: AWS EMR/EC2, AWS S3, Splunk, CloudWatch, AWS Lambda, Hadoop2.7, Spark 2.2, Scala, Python 3.7, Oozie, Sqoop, Hive Presto, UNIX shell scripting, Bamboo, Bitbucket, JIRA, Control M

Sr. Hadoop Developer

Confidential, Pennigton, NJ

Responsibilities:

Designed the data lake to pull HMDA loan details of various clients from the upstream system like Peaks, nCino
Designed and implemented the Sqoop process to pull the client data from various Oracle databases to Hadoop environment of Confidential
Implemented the ETL process and conformance codes for the HMDA data lake
Designed and implemented the Oozie workflow to import and export the client’s loan information to various loan processing and data analytical systems in Confidential
Created and worked with Hive tables in the Hadoop data hub region and stored the Sqoop data in the parquet format
Designed and coded the conformance logic using Spark-Scala which can be used for target or consuming systems
Optimized the Spark-Scala and Spark-SQL codes in the conformance layers for process improvement
Implemented the Oozie coordinator and scheduled the daily/weekly/monthly jobs
Created the test suites using JUnit and performed the units, integration and end to end testing using JUnit in QA and SIT regions
Optimized the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution

Environment: Hadoop 2.7, Spark, Scala, Oozie, Sqoop, Hive, Impala, Oracle, Hue, UNIX, shell scripting

Sr. Hadoop Developer

Confidential, Malvern, PA

Responsibilities:

Designed the ETL process to bring the client score details from Teleaf, data warehouse and enterprise system to Confidential Hadoop ecosystem
Worked with the business users to understand and clarify the business requirements and prepared the design documents
Designed, coded and implemented the Sqoop process and imported the score detailed Hadoop data hub
Performed cleanse and validated the imported data convert to the Avro file format which are accessible to the Hadoop data mart environments
Made the necessary changes to cleanse and validate programs using spark-scala
Designed and coded the score calculation logic for the Confidential clients using pyspark and executed the pyspark programs in Hadoop data mart environment
Designed and implemented the Oozie workflow for the daily/weekly/monthly client score calculation and Web interaction reports
Implemented the Oozie coordinator and scheduled the daily/weekly/monthly jobs
Created the test suites using pyspark and performed the units, integration and end to end testing using the pyunits
Converted the Avro files in the Hadoop data hub to parquet format using Hive scripts
Imported the data from Oracle and DB2 database to Hadoop ecosystem using Sqoop
Created the Hive tables in Hadoop data mart environment and validated the performance of Hive and Impala queries against the master tables
Optimized the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution
Fine-tuned the pyspark codes for the optimized utilization of Hadoop resources for the production run
Executed the comparison test in the production region and fine tune the end results to ensure accuracy
Troubleshot and fixed the daily Oozie workflow failure and implemented permanent fixes
Analyzed the Java-MapReduce program, prepared the analysis documents and performed the feasibility study to convert the Java-MapReduce programs to spark-python (pyspark)
Prepared the high level/low level design documents for the conversion Java, Map Reduce codes to pyspark
Coded the programs in pyspark for those in Java and performed the unit/integration/regression and comparison testing to ensure that the newly converted codes have the same functionality and performance with the codes in Java
Mentored team members and provided the application training for the new hires

Environment: Hadoop 2.7, Spark, Python, Scala, Oozie, Sqoop, Hive, Impala, Oracle, DB2, Hue, UNIX shell scripting, SAS

We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

Malvern, PA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship