Hadoop Spark Programmer Resume

SUMMARY

9.5 years of Experience in IT of which 3 Years of experience in Big Data analytics
Worked with teh tools in Hadoop Ecosystem including SPARK, SQOOP, SCALA, Hive, HDFS, MapReduce, Yarn, Oozie
Migrating ETL project to HADOOP projects.
Ability to move teh data in and out of Hadoop from various RDBMS and UNIX using SQOOP and other traditional data movement technologies.
Experienced in installing, configuring Hadoop cluster of major Hadoop distributions.
Have hands on experience in writing Scala programs in Spark frame work.
Hands on Experience in working with ecosystems like Hive and Sqoop
Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream.
Created RDD, used Transformation and Action commands/functions
Successfully loaded files to Hive and HDFS from MYSQL.
Loaded teh dataset into Hive for ETL Operation.
Good knowledge on Hadoop Cluster architecture and monitoring teh cluster.
Good understanding of cloud configuration in Amazon web services (AWS)..
Worked extensively on (ETL) IBM InfoSphere DataStage 8.7, 8.5, 11.3 using Components like DataStage Designer, DataStage Director, DataStage Administrator.
Excellent experience in working with large data feeds and heterogeneous systems like Oracle, Netezza, SQL Server, DB2, Teradata, XML files, COBOL/Mainframe files and Flat files.
Pervasive knowledge of IT industry domains such as Banking, Health care etc.
Involved in Code Analysis, Coding, Testing, Bug fixing and Production support
Developed and Tested Unix scripts with Ksh for teh project requirements
Participated in Requirement gathering meetings with Business, Prepared Test Strategy Matrix, Test Plans, Test Result Report, Performed Baseline, Forward and Regression Testing
Sharing Test Key Controls with Business to get approval and Project sign off.
Logged Defects and participated in Defect Prevention Analysis meetings
Excellent knowledge of studying teh ER diagrams, data dependencies using Metadata stored in teh DataStage Repository.
Played multiple roles - Programmer Analyst, Application Developer, Software Tester, Quality Analyst, Module Leader & Front Line Manager .
Participates in Agile - Scrum standups and Sprint planning meetings

TECHNICAL SKILLS

Frame work: Hadoop, SPARK, IBM Infosphere Datastage, Mainframes

Schedulers: Control +M, Autosys, CA-7, Oozie

Operating Systems: Windows 95/98/NT/2000/XP, Unix, Z /OS

Database: Oracle, SQL Server, DB2, MySQL, Netezza, Teradata

Tools: Toad, SPARK, Hive, SQOOP, Flume, YarnJIRA, Kanban, ISPF, SQUFI, ENDEVOR, FILEAID, XPEDITOR, Aginity

Languages: Cobol, CICS, JCL, Shell Scripting, C, C++, JAVA, PHP, HTML, Java ScriptSCALA, Unix Shell Scripting (Ksh)

PROFESSIONAL EXPERIENCE

Confidential

Hadoop Spark Programmer

Responsibilities

Participated in Project Requirement discussions with Business and development teams
Use HIVE on SPARK
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
POC to use SPARK and SCALA in teh existing HADOOP ecosystem.
Create a Hadoop design which replicates teh Current system design.
Developed Scala scripts, UDFFs using both Data frames/SQL and RDD in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
Creating SQOOP programs to move teh data from Oracle to Hive Temporary Tables.
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Loaded teh data into Spark RDD and do in memory data Computation to generate teh Output response.
Create SQOOP programs to move teh Static Lookup Data from Oracle to Hive tables.
Developed Hive queries to pre-process teh data required for running teh business process.
Create teh Main upload files from teh Hive Temporary Tables.
Create Oziee workflows for HIVE scripts and schedule teh OZIEE workflows and DMX-h scripts in Autosys.
Create UDF functions for HIVE queries.

Environment: Hadoop, Spark, Scala, Sqoop, Hive, InfoSphere DataStage 11.3, Teadata, Oracle, Autosys, UNIX Shell scripting

Confidential, NJ

Spark Programmer

Responsibilities

Responsible for building scalable distributed data solutions using Hadoop.
Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
Developed Spark scripts by using Scala shell commands as per teh requirement.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Developed Scala scripts, UDFFs using both Data frames/SQL and RDD in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Loaded teh data into Spark RDD and do in memory data Computation to generate teh Output response.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Performed advanced procedures like text analytics and processing, using teh in-memory computing capabilities of Spark using Scala.
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, TEMPEffective & efficient Joins, Transformations and other during ingestion process itself.
Worked extensively with Sqoop for importing metadata from Oracle.
Involved in creating Hive tables, and loading and analyzing data using hive queries
Developed Hive queries to process teh data and generate teh data cubes for visualizing
Implemented schema extraction for Parquet and Avro file Formats in Hive.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.

Environment: Hadoop, Spark, Scala, Hive, Pig, UNIX Shell scripting

Confidential

ETL Developer

Responsibilities

Understood teh technical specifications and develop data stage jobs for Extraction Transformation, Cleansing and Loading process of DW.
Profiling and analyzing data from different sources, addressing teh data quality issues, Transforming and processing of data and loading teh data into Teradata.
Implemented Survive stage & Match Stage for data patterns & data definitions
Captured data from a variety of sources including DB2, Flat Files, Mainframes and other formats.
Extensively worked with Data Stage Shared Containers for Re-using teh Business functionality.
Used extensively Reject Link, Job Parameters, and Stage Variables in developing jobs.
Drafting technical documents like Overview documents, migration and deployment documents for every code release.
Automation is done by using batch logic, scheduling jobs on a daily, on a weekly and yearly basis depending on teh requirement using Autosys.
Involved in various reviews and meetings including Internal and external code review, weekly status calls, issue resolution meetings and code acceptance meetings.
Assisted SIT testing team, UAT team and Production team during code release with code walk through and presentations and Defect identification, reporting and tracking.

Environment: InfoSphere DataStage 8.5, Oracle 11g, Fixed width files, COBOL files, Sequential files, DB2, XML files

Confidential

Delivery Module Lead

Responsibilities

Developed unit test cases, unit-tested teh jobs before migrating teh code to QA and Production boxes.
Responsible for managing scope, planning, tracking, change control, aspects of teh project
Analysed of Business requirements and Specifications
Coded online and batch programs using COBOL, VME COBOL, CICS, DB2, VSAM, JCL PWB, ALTADATA & ITS
Prepared of test cases, Unit testing and regression testing
Responsible for TEMPeffective communication between teh project team and teh customer
Translated customer requirements into formal requirements and design documents
Established Quality Procedure for teh team, monitor and audit to ensure team meets quality goals
Performed teh role of a team lead managing teh work allocation, mentoring, ensure co-ordination amongst teh team members, gain confidence of teh onsite SMEs/leads/business
Participated actively in team/customer meetings and ensure co-ordination between onsite/offshore team
Made sure teh commitments made to teh client, quality of deliverables are met
Mentored teh team in technical/business areas and halp them resolve issues related to it
Reviewed work status and assist teh team in all phases of teh software engineering cycle as and when required

Environment: IBM - Mainframes - COBOL, VME COBOL, CICS, DB2, VSAM, JCL, PWB, ALTADATA, ITS.

Confidential

Trainee programmer

Key Responsibilities

Analysed of teh existing modules to be developed or enhanced
Performed new modules coding
Changed code of existing programs
Prepared of Unit test reports for teh Unit testing to be done
Involved in Unit testing and Logging teh Unit test reports for various conditions
Supported System integration testing and User acceptance testing
Prepared Quality related documents through out teh SDLC process.

Environment: PHP, MySql, HTML, Javascript.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship