Spark & Scala Developer Resume
WA
SUMMARY:
- Over 11+ Years of professional experience in Information Technology with Analysis, Design, Development, Implementation, Integration and Testing of Client/Server applications for Banking, Insurance Industries using Object Oriented Methodologies and production support activities.
- 4 years of experience with SCALA, PYSPARK, SPARK.
- 4 years of experience in HADOOP, PIG, HIVE, OOZIE, SCOOP.
- Working with AVRO, PARQUET, JSON files.
- Working with BZIP2, GZIP, SNAPPY compression techniques.
- Working on Azure Blob storage & Azure data lake
- Working on new fast database Kusto & big data storage Cosmos
- Ability to move the data in and out of Hadoop from various RDBMS and UNIX using SQOOP and other traditional data movement technologies.
- Working knowledge on Spark Streaming.
- 9 years of experience in development and implementation Data Stage 8.1/8.5/11.3.
- 10 years SQL & RDBMS
- 8 years of Unix & Shell scripting
- Proficient in developing strategies for Extraction, Transformation and Loading (ETL) mechanism.
- Migrating ETL project to HADOOP projects.
- Experienced and trained in developing Data Warehousing / Business Intelligence applications using IIS 8.5 and Different Databases. (Primarily in DB2, Oracle).
- Hands on experience in designing/developing ETL jobs.
- Hands on experience in identifying performance issue and fine tuning ETL jobs.
- Worked on various operating systems - UNIX/AIX & Autosys Scheduler.
- Experience in all phases of SDLC/ project life cycle from requirements gathering through implementation support.
- Experience in system Analysis, design and application development, flowcharting, unit and system testing, test plan preparation, testing validations and program debugging.
- Possess excellent Analytical, Debugging and problem-solving skills.
- 10+ years of experience in Capital Market, Investments, Banking & Financial verticals.
TECHNOLOGY:
- Data warehousing concepts and dimensional modeling.
- ETL methodologies for designing programs that extract, integrate, aggregate, transform, manage, and reuse metadata.
- Experience in HADOOP, PIG, HIVE, OOZIE, SCOOP, Map Reduce.
- Experience with SCALA, SPARK.
- Preparing high level design document from business requirements and creating detailed functional specification document
- Code scheduling of the ETL jobs through Unix shell scripts for automated runs of the different jobs and monitor them to ensure that they run smoothly
- Scheduling production runs, and monitoring jobs using Autosys scheduling.
- Delivered DW / ETL Solutions in variety of platforms and heterogeneous environments such as Unix, Windows, Linux
- Expert in Databases such as DB2 V9, Oracle 10g, SQL Server, Hive
TECHNICAL SKILLS:
ETL Tool: IIS v9.1, IIS v8.5, Ascential DataStage 7.5
Operating Systems: MVS/Z/OS, UNIX, Windows
Databases: DB2, SQL Server, Oracle, IMS, Teradata, HIVE, kusto, cosmos
Schedulers: Autosys
Languages: SQL, PL/SQL, Spark, Scala,Python
Tools: File Aid for DB2, Rapid SQL, QMF, SQL loader, PIG, DMX, SqoopAccess Method Services VSAM
PROFESSIONAL EXPERIENCE:
Confidential, WA
Spark & Scala Developer
Responsibilities:
- Working with the Hortonworks Distribution of Hadoop.
- Implemented proof of concept to analyze the streaming data using Apache Spark with Scala; Used Maven/SBT for build and deploy the Spark programs.
- Responsible for building Confidential data cube using SPARK framework by writing Spark SQL queries in Scala so as to improve efficiency of data processing and reporting query response time.
- Developed spark programming code in SCALA on INTELLIJ IDE using SBT tools.
- Wrote AZURE POWERSHELL scripts to copy or move data from local file system to HDFS Blob storage.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Responsible for programming code independently for intermediate to complex modules following development standards.
- Planned and conducted code reviews for changes and enhancements that ensure standards compliance and systems interoperability.
- Responsible for modifying the code, debugging, and testing the code before deploying on the production cluster.
- We are loading all the data into the kusto database
- We are getting the data from cosmos big data storage.
Environment: Hadoop Hive, Confidential SQL server, Ubuntu, YARN, Hortonworks, UNIX Shell Scripting, AZURE PowerShell, Scala, Spark, Maven, SBT, IntelliJ, Confidential Azure HDINSIGHT, SSMS, Azure Data Factory, Azure Data Warehouse, kusto, cosmos
Confidential, OH
Big Data Developer
Responsibilities:
- Interacting with the clients for requirement gathering
- & motoring to the subordinates
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Create a Hadoop design which replicates the Current system design.
- Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Creating SQOOP programs to move the data from Oracle to Hive Temporary Tables.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Create SQOOP programs to move the Static Lookup Data from Oracle to Hive tables.
- Developed Hive queries to pre-process the data required for running the business process.
- Create the Main upload files from the Hive Temporary Tables.
- Create Oozie workflows for HIVE scripts and schedule the OOZIE workflows
- Implemented the process using python pandas & pyspark
Environment: Python,Anaconda, HDFS, Unix - Shell Scripting, MapReduce, HDFS, Hive, Oozie with Hue, Apache Spark
Confidential, NC
Big Data Developer
Responsibilities:
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Create a Hadoop design which replicates the Current system design.
- Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Creating SQOOP programs to move the data from Oracle to Hive Temporary Tables.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Create SQOOP programs to move the Static Lookup Data from Oracle to Hive tables.
- Developed Hive queries to pre-process the data required for running the business process.
- Create the Main upload files from the Hive Temporary Tables.
- Create Oozie workflows for PIG and HIVE scripts and schedule the OOZIE workflows and DMX-h scripts in Autosys.
- Create UDF functions for HIVE queries.
Environment: Scala, Hadoop HDFS, Unix - Shell Scripting, Hadoop, MapReduce, HDFS, Hive, Oozie with Hue, Apache Spark, MySQL, DMX-H, Autosys, Datastage 11.3
Confidential, NC
Big Data Developer
Responsibilities:
- Copy the data from the EDGE NODE to the HDFS File system and also load into HIVE tables.
- Compare the Load Ready Files against the Data Stage Datasets to validate the Data generated in HADOOP.
- Study the existing Data Stage Requirement and develop a parallel Design in HADOOP.
- Use NDM script to pull the data from the Source system and place it on the EDGE NODE.
- Developed SQOOP commands to pull data from Teradata and push to HDFS.
- Use SQOOP to load the data to oracle.
- Actively involved in design analysis, coding and strategy development.
- Developed Hive scripts for implementing dynamic partitions and buckets for retail history data.
- Streamlined Hadoop Jobs and workflow operations using Oozie workflow and scheduled through AUTOSYS on a monthly basic.
- Developed Oozie workflows and they are scheduled through AUTOSYS on a monthly basis
- Use BZIP2 compression to store the GOLDEN copies.
- Develop the DMX-H jobs and tasks to process the input files and generate the Load Ready Files.
- Used Pig Latin to apply transaction on systems of record.
- Developed Pig scripts and UDF.
- Developed Pig scripts and UDF's as per the Business logic
Environment: Scala, Hadoop HDFS, Unix - Shell Scripting, Hadoop, MapReduce, HDFS, Hive, Oozie with Hue, Apache Spark, MySQL, DMX-H, Autosys, Datastage 11.3, PIG, Teradata, Oracle, HIVE
Confidential, NC
Big Data DeveloperResponsibilities:
- Establish, Design and build a solution for replicating the Batch jobs process to ETL process using Agile development methodology
- Migration of current state Mainframe Batch Processes to CSDP (Customer Data Integration Platform)
- This includes the Inbound/ Outbound, CARI, Reporting and Purge processes
- Ongoing BAU project changes are being retrofitted as part of multiple NEW BAC projects
- Cutover from Mainframe to Outbound clients to accept provisioned files from the CSDP platform
- Data Quality dashboard for critical KBEs (Key Business Elements)
- Capture of metadata - business/technical terms, end to end data lineage of all the job flows developed
- Jobs on the CSDP platform to exceed/meet the performance SLAs of COBOL and Mainframe production jobs.
- Client acceptance testing for inbound/Outbound processes
- Redesign to take advantage of the data stage features and adopt a better design solution
Environment: DATASTAGE V8.5, Autosys, DB2, IMS database
Confidential, NC
Big Data DeveloperResponsibilities:
- Migration of current state Mainframe Batch Processes in WCC Production to WCC Gold Standard Adoption. This includes only outbound processes.
- Ongoing BAU project changes (Jul\Aug Independent releases) will have to be retrofitted.
- Cutover from Mainframe to Outbound clients to accept provisioned files from the WCC Gold Standard Adoption.
- Data Quality dashboard for critical Key Business Element’s that is provisioned by WCC. Capture of metadata - business/technical terms, end to end data lineage of all the job flows developed.
- Jobs on the WCC Gold Standard Adoption to exceed/meet the performance SLAs of COBOL and Mainframe production jobs.
- WCC Application change in support of the migration, data provisioning (eg: sunset mainframe jobs, change in CA7 schedules etc).
- Client acceptance testing for Outbound is required.
- Redesign to take advantage of the data stage features and adopt a better design solution is in scope.
Environment: DATASTAGE V8.5, Autosys, DB2, IMS database
Confidential
Big Data DeveloperResponsibilities:
- Initial study of the System
- Preparation of Design document
- Preparation of Job and Sequence Designs
- Loading the data into the target system
- Using DataStage Designer for developing various jobs to extract, cleansing, transforming, integrating and loading data into Data Warehouse.
- Performance tuning of DataStage Job Sequencers/Jobs.
- Design /Test Case Specifications / Performance Review and Coding of Shared Containers and Reusable jobs.
- Unit test document preparation/Unit testing.
Environment: DATASTAGE V8.5, SQL, PLSQL, DB2
Confidential
Big Data DeveloperResponsibilities:
- Worked on Datastage client tools Designer, Manager and Director.
- Worked on both server jobs and parallel jobs.
- Working as ETL developer
- Extracting the source from Flat file & oracle tables loading to oracle tables
- Understanding the source system & business logic
- Worked on parallel stages- Transformer, Lookup, Join, Aggregator, remove duplicates.
- Worked on testing the jobs
Environment: DATASTAGE 7.5, SQL, Oracle