Hadoop Spark Programmer Resume
SUMMARY
- 9.5 years of Experience in IT of which 3 Years of experience in Big Data analytics
- Worked with teh tools in Hadoop Ecosystem including SPARK, SQOOP, SCALA, Hive, HDFS, MapReduce, Yarn, Oozie
- Migrating ETL project to HADOOP projects.
- Ability to move teh data in and out of Hadoop from various RDBMS and UNIX using SQOOP and other traditional data movement technologies.
- Experienced in installing, configuring Hadoop cluster of major Hadoop distributions.
- Have hands on experience in writing Scala programs in Spark frame work.
- Hands on Experience in working with ecosystems like Hive and Sqoop
- Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream.
- Created RDD, used Transformation and Action commands/functions
- Successfully loaded files to Hive and HDFS from MYSQL.
- Loaded teh dataset into Hive for ETL Operation.
- Good knowledge on Hadoop Cluster architecture and monitoring teh cluster.
- Good understanding of cloud configuration in Amazon web services (AWS)..
- Worked extensively on (ETL) IBM InfoSphere DataStage 8.7, 8.5, 11.3 using Components like DataStage Designer, DataStage Director, DataStage Administrator.
- Excellent experience in working with large data feeds and heterogeneous systems like Oracle, Netezza, SQL Server, DB2, Teradata, XML files, COBOL/Mainframe files and Flat files.
- Pervasive knowledge of IT industry domains such as Banking, Health care etc.
- Involved in Code Analysis, Coding, Testing, Bug fixing and Production support
- Developed and Tested Unix scripts with Ksh for teh project requirements
- Participated in Requirement gathering meetings with Business, Prepared Test Strategy Matrix, Test Plans, Test Result Report, Performed Baseline, Forward and Regression Testing
- Sharing Test Key Controls with Business to get approval and Project sign off.
- Logged Defects and participated in Defect Prevention Analysis meetings
- Excellent knowledge of studying teh ER diagrams, data dependencies using Metadata stored in teh DataStage Repository.
- Played multiple roles - Programmer Analyst, Application Developer, Software Tester, Quality Analyst, Module Leader & Front Line Manager .
- Participates in Agile - Scrum standups and Sprint planning meetings
TECHNICAL SKILLS
Frame work: Hadoop, SPARK, IBM Infosphere Datastage, Mainframes
Schedulers: Control +M, Autosys, CA-7, Oozie
Operating Systems: Windows 95/98/NT/2000/XP, Unix, Z /OS
Database: Oracle, SQL Server, DB2, MySQL, Netezza, Teradata
Tools: Toad, SPARK, Hive, SQOOP, Flume, YarnJIRA, Kanban, ISPF, SQUFI, ENDEVOR, FILEAID, XPEDITOR, Aginity
Languages: Cobol, CICS, JCL, Shell Scripting, C, C++, JAVA, PHP, HTML, Java ScriptSCALA, Unix Shell Scripting (Ksh)
PROFESSIONAL EXPERIENCE
Confidential
Hadoop Spark Programmer
Responsibilities
- Participated in Project Requirement discussions with Business and development teams
- Use HIVE on SPARK
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- POC to use SPARK and SCALA in teh existing HADOOP ecosystem.
- Create a Hadoop design which replicates teh Current system design.
- Developed Scala scripts, UDFFs using both Data frames/SQL and RDD in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Creating SQOOP programs to move teh data from Oracle to Hive Temporary Tables.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Loaded teh data into Spark RDD and do in memory data Computation to generate teh Output response.
- Create SQOOP programs to move teh Static Lookup Data from Oracle to Hive tables.
- Developed Hive queries to pre-process teh data required for running teh business process.
- Create teh Main upload files from teh Hive Temporary Tables.
- Create Oziee workflows for HIVE scripts and schedule teh OZIEE workflows and DMX-h scripts in Autosys.
- Create UDF functions for HIVE queries.
Environment: Hadoop, Spark, Scala, Sqoop, Hive, InfoSphere DataStage 11.3, Teadata, Oracle, Autosys, UNIX Shell scripting
Confidential, NJ
Spark Programmer
Responsibilities
- Responsible for building scalable distributed data solutions using Hadoop.
- Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
- Developed Spark scripts by using Scala shell commands as per teh requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDFFs using both Data frames/SQL and RDD in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Loaded teh data into Spark RDD and do in memory data Computation to generate teh Output response.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Performed advanced procedures like text analytics and processing, using teh in-memory computing capabilities of Spark using Scala.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, TEMPEffective & efficient Joins, Transformations and other during ingestion process itself.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Developed Hive queries to process teh data and generate teh data cubes for visualizing
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Environment: Hadoop, Spark, Scala, Hive, Pig, UNIX Shell scripting
Confidential
ETL Developer
Responsibilities
- Understood teh technical specifications and develop data stage jobs for Extraction Transformation, Cleansing and Loading process of DW.
- Profiling and analyzing data from different sources, addressing teh data quality issues, Transforming and processing of data and loading teh data into Teradata.
- Implemented Survive stage & Match Stage for data patterns & data definitions
- Captured data from a variety of sources including DB2, Flat Files, Mainframes and other formats.
- Extensively worked with Data Stage Shared Containers for Re-using teh Business functionality.
- Used extensively Reject Link, Job Parameters, and Stage Variables in developing jobs.
- Drafting technical documents like Overview documents, migration and deployment documents for every code release.
- Automation is done by using batch logic, scheduling jobs on a daily, on a weekly and yearly basis depending on teh requirement using Autosys.
- Involved in various reviews and meetings including Internal and external code review, weekly status calls, issue resolution meetings and code acceptance meetings.
- Assisted SIT testing team, UAT team and Production team during code release with code walk through and presentations and Defect identification, reporting and tracking.
Environment: InfoSphere DataStage 8.5, Oracle 11g, Fixed width files, COBOL files, Sequential files, DB2, XML files
Confidential
Delivery Module Lead
Responsibilities
- Developed unit test cases, unit-tested teh jobs before migrating teh code to QA and Production boxes.
- Responsible for managing scope, planning, tracking, change control, aspects of teh project
- Analysed of Business requirements and Specifications
- Coded online and batch programs using COBOL, VME COBOL, CICS, DB2, VSAM, JCL PWB, ALTADATA & ITS
- Prepared of test cases, Unit testing and regression testing
- Responsible for TEMPeffective communication between teh project team and teh customer
- Translated customer requirements into formal requirements and design documents
- Established Quality Procedure for teh team, monitor and audit to ensure team meets quality goals
- Performed teh role of a team lead managing teh work allocation, mentoring, ensure co-ordination amongst teh team members, gain confidence of teh onsite SMEs/leads/business
- Participated actively in team/customer meetings and ensure co-ordination between onsite/offshore team
- Made sure teh commitments made to teh client, quality of deliverables are met
- Mentored teh team in technical/business areas and halp them resolve issues related to it
- Reviewed work status and assist teh team in all phases of teh software engineering cycle as and when required
Environment: IBM - Mainframes - COBOL, VME COBOL, CICS, DB2, VSAM, JCL, PWB, ALTADATA, ITS.
Confidential
Trainee programmer
Key Responsibilities
- Analysed of teh existing modules to be developed or enhanced
- Performed new modules coding
- Changed code of existing programs
- Prepared of Unit test reports for teh Unit testing to be done
- Involved in Unit testing and Logging teh Unit test reports for various conditions
- Supported System integration testing and User acceptance testing
- Prepared Quality related documents through out teh SDLC process.
Environment: PHP, MySql, HTML, Javascript.