We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Charlotte, NC

SUMMARY

  • IT professional with 8 plus years of experience Software Development, Design and Validation, specialized in Mainframes, Java, Databases and Hadoop Technologies. Also, have gained experience in various industries including Engineering, Service Delivery and Customer Relationship Management.
  • Strong experience in Big data & projects in multiple domains, tools in all phases of SDLC: Requirements gathering, System Design, Development, Enhancement, Maintenance, Testing, Deployment, Production support, System.
  • Strong experience in Big Data and Hadoop Ecosystem tools like MR, PIG, HIVE, SQOOP, OOZIE, FLUME, HBASE and SPARK.
  • Working on different file formats like JSON, XML, CSV, XLS etc.
  • Using Amazon AWS EMR and EC2 for cloud big data processing.
  • Good understanding of HDFS Design, Daemons, Name node Federation and HDFS high availability (HA).
  • Good understanding on Spark core, Spark SQL and Spark Streaming and Kafka.
  • Knowledge of NoSQL databases such as HBase, and DynamoDB.
  • Experience in Knowledge of UNIX and shell scripting
  • Processes using hundreds of terabytes of data loaded into corporate data warehouse to build data visualizations for business analytics team.
  • Solid understanding of the high volume, high performance systems.
  • Worked on Integration Manager of SQOOP import and export.
  • Hands - on experience in scheduling jobs on Autosys, Oozie, Ca7.
  • Very good understanding on Performance tuning and Query optimization techniques.
  • Excellent knowledge on YARN architecture
  • Good Knowledge on Data warehousing concepts.
  • Experience in Agile Development environments
  • Hardworking professional with a strong ability to work well in a team environment. Exceptional time management skills with a strong work ethic.
  • Good knowledgeon creating buckets in S3 AWS for storing the input and output files.
  • Writing Sqoop scripts to make the interaction between databases.
  • Possess superior design and debugging capabilities, innovative problem solving and excellent analytical Skills.
  • Focused on Quality and processes. Excellent written and verbal communication skills and team player.
  • Have flair to adapt to new software applications and products, self-starter, have excellent communication skills and good understanding of business work flow.

TECHNICAL SKILLS

Skill SetBig Data Ecosystem: Hadoop, Map Reduce, YARN, Hive, HBase, Flume, Sqoop, Impala, Oozie, Zookeeper, Cloudera SPARK, Scala, Kafka, Hue, DMEXPRESS-H

Programming Languages: C, C++, Data Structures, Java, SQL, Pig Latin, HiveQL and JCL, COBOL, easytrieve, Rexx, VSAM

DB Languages: Teradata SQL assistant, SQL Server, MySQL, Oracle, DB2, Mongo DB

Operating Systems: Windows, MS-DOS, UNIX/Linux, Z Os

IDE: Eclipse, TOAD, Microsoft Visio, Atom

Methodologies: Waterfall, Agile, UML, Design Patterns

Version Control Systems: SVN Tortoise, Endevor, Changeman

Build Tools: MavenPlanning - Effort Estimation, Project planning.

Issue Tracker: Atlassian Jira, Remedy

Scheduler-: Autosys, Oozie, Ca7

PROFESSIONAL EXPERIENCE

Confidential, Charlotte, NC

Hadoop Developer

Responsibilities:

  • Developed Sqoop job to pull PRDS (party reference data) to the HDFS location from Teradata
  • Prepared xmls for each source system like ATM, Loans, Teller etc to validate each record from HDFS source file and these xmls are validated by XSD.
  • Files types Delimited, Position Based and Binary files are loaded in to SparkContext and validated against xml.
  • Implemented Repartition, Caching and broadcast concepts on RDD’s, DF’s and variables to achieve better performance on cluster.
  • Create parquet files for valid records and invalid records separately for all systems.
  • Storing the parquet data into hive data base with daily date partitions for further queries.
  • The validated parquet files of two or more systems got combined in curation module to get the common transactions data.
  • Data Frames are created by reading the validated Parquet Files and run the SQL queries using SQLContext to get the common transaction data from all the systems.
  • DevelopedSparkjobs using Scala in test environment for faster data processing and usedSpark SQl for querying.
  • Involved in working with Spark on top of Yarn/MRv2 for interactive and Batch Analysis.
  • Analyzing Business Requirements Document andFunctional Specificationdocument to develop detailed Test Plan and Test Cases
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Flume, Pig, Hive, HBase, Oozie, Zookeeper, Sqoop, Spark and Kafka.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Writing Sqoop scripts to make the interaction between databases.
  • Involved in converting Map Reduce programs into Spark transformations using Spark RDD's on Scala.
  • Developed Spark scripts by using Scala Shell commands as per the requirement.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Load the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Loading Data into HBase using Bulk Load and Non-bulk load.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs with control flows.
  • Creating Buckets in S3 and copying the data and retrieving the data stored in AWS storage S3.
  • Creating instances in RDS like MySql,Oracle and loading the data into those instances and pulling the data stored in those instances.
  • Loading the data into Redshift instance in AWS RDS.
  • Creation of EMR Cluster in AWS.

Environment: Flume, HBase, Spark, Scala, Intelli-J, Maven,Kafka,SQOOP,HIVE,Oozie,Autosys,Unix,Teradata,PIG, SVN Tortoise

Hadoop Developer

Confidential

Responsibilities:

  • Design and migration of existing Teradata system toHadoop.
  • Worked on Hive and Pig extensively to analyze network data.
  • Automation of data pulls from SQL Server toHadoopeco system via SQOOP.
  • Wrote Hive queries for data analysis and query optimization
  • DeliveredHadoopmigration strategy, roadmap and technology fitment.
  • Generated ad-hoc reports using Hive to validate customer viewing history and debug issues in production.
  • Responsible for Hadoop ETL design, development, testing and review of code.
  • Development of Map Reduce code to process the Input SOR files.
  • Developed Hive scripts for data transformation and aggregation.
  • To set up standards and processes for Hadoop based application design and implementation.
  • Developing compression scripts using various compression techniques and codecs Gzip, Bzip2 etc.
  • Written Sqoop commands for exporting and importing data in HDFS.
  • Development of Oozie workflows for launching the jobs.
  • Developing complex ETL mapping and its corresponding sessions and workflows.
  • Developing various ETL transformations using DMX-h tool.
  • Scheduling the jobs using Autosys based on various success conditions usage of filewatchers.
  • Development of Unix Shell Scripts, to prepare Environment for Application and to delete all the

Environment: SQOOP,HIVE,Oozie,Autosys,Unix,Teradata,PIG,Core JAVA Eclipse,DMX-h tool,SVN Tortoise

Confidential, Jacksonville, FL

Mainframe/Teradata Developer and Module Lead

Responsibilities:

  • Ensure an excellent quality of deliverables and efficiency in meeting deadlines for all assignments
  • To do an impact Analysis for the New/Changed Requirements and prepare LLD (Low Level Design).
  • Compare Client Supplied products like BRD, HLD with the LLD to find out any incompleteness.
  • Develop the code according to the LLD using TCS tool like the SAS-RAW DATA LOAD (RDL).
  • Perform Peer Review and Code Walkthrough
  • To do analysis and provide data to Onshore Counterpart for submitting it to the User community.
  • Assist in Deployment and provide Technical & Operational support during Install.
  • Involve in Post implementation support.
  • Successfully managed the team of four as Module Lead and trained them in Teradata and Mainframes.
  • Provided solutions to Teradata and Mainframes related issues and errors.
  • Tuning queries in existing applications for better system performance and minimizing application run time.
  • Building tables and corresponding views based on user requirements and testing the same for compliance and business requirement.
  • Analyzing Business Requirements Document and Functional Specification document to develop detailed Test Plan and Test Cases.
  • Extracting and analyzing performance metrics for all Teradata queries being added/changed to minimize performance impact and reduce system resource consumption.
  • Performing regression testing for capturing compatibility issues and identifying defects due to new business changes in data.
  • Analysis, design, testing and control for entire projects that convert Legacy Mainframe - Teradata applications to Hadoop - Teradata applications that involve complex CDC logic.
  • Developing, testing and implementing new jobs and scripts that involve huge business transformational logic using all Teradata utilities including TPT.
  • Enhancing/Changing existing applications based on business requirement which involves impact analysis and code changes to existing scripts.

Environment: Z OS,TERADATA,COBOL,VSAM,EASYTRIEVE,REXX,CA7,JCL,CHANGEMAN,ENDEVORFILEMANAGER,FILEAID,DEBUG TOOL,EXPEDITOR

Confidential

Mainframe Developer

Responsibilities:

  • Understand the business needs and objectives of the system and interacted with the end client/users and gathered requirements for the integrated system.
  • Designing of almost 12 applications namely AMF, AMW, AM-Transit (Transit Check Fraud), KDM (Kite Detection Monitoring), V12-Signature Verification System, SNS-Extract.
  • Worked on Major integrated release ADDP (ATM Debit Detection Platform).
  • Preparation of HLD, LLD and BRD (Business requirement document).
  • Developing the Code using COBOL, JCL, VSAM, Easyrieve
  • Developing the code in COBOL as a midrange application, extracting and parsing the data from various upstream applications such as Unix server.
  • Developing various batch jobs using JCL to submit the instructions on Z OS.
  • Sending the processed files to downstream applications using NDM/FTP process.
  • Developing complicated COBOL-DB2 programs using cursors with less utilization of system resources.
  • Designing the batch job flows or designing the complete application flows with Visio Editor.
  • Co-ordinating with scheduling team for implementation of PODS requests and verifying the scheduled jobs.
  • Developing COBOL code using VSAM (Virtual Storage Access Memory) for key sequenced datasets.
  • Optimization of already existing COBOL codes with effective usage of sort cards.
  • Coding the MQ’s for interacting with upstream applications.
  • Designing Tools using REXX for creating job setups in no time.
  • Preparing test cases test plan and performing unit testing and regression testing of complete application flow which includes 300 jobs.
  • Monitoring the regular job flows and fixing the abends and handling emergency fixes of severity1.
  • Involved in developing Easytrieve coding for reducing the coding efforts and areas where less data being handled.
  • Migration of code to production using version control tool such as Change man and Endevor
  • Scheduling of batch jobs using Ca7 Scheduler.
  • Post production support for warranty period
  • Planning and providing permanent fixes for job flows with regular abends.

Environment: ZOS,COBOL,VSAM,EASYTRIEVE,REXX,CA7,JCL,CHANGEMAN,ENDEVOR,FILEMANAGER,FILEAID,DB2,DEBUG TOOL,EXPEDITOR

We'd love your feedback!