We provide IT Staff Augmentation Services!

Apache Spark-scala Developer Resume

3.00/5 (Submit Your Rating)

Wilmington, DE

PROFESSIONAL SUMMARY:

  • Overall 8.5 years of experience in Information Technology with a strong background in Analyzing, Designing, Developing, Testing, Implementation and Maintenance of Database and Business Intelligence applications development in various verticals such as Insurance (Commercial and Personal line), Finance and Retail.
  • Around 3 years of extensive experience in Spark (using RDDs, Data frames & SQLs) and Hadoop (using Map - reduce) eco-system with underlying programming language as Scala and Core Java.
  • Experience in working with Cloudera and Hortonworks distribution with 24 nodes cluster running Spark on YARN.
  • Hands on knowledge in using Hive to extract, transform and load (ETL) ~160 GB/month of policy and claims data into a reportable format using Spark environment.
  • Experience in importing and exporting Gigabytes of data between HDFS and Relational Database (DB2 and MySQL) Systems using Sqoop.
  • Hands on knowledge on cleansing and transform the raw data into useful information and load it to a Kafka queue (further loaded to HDFS) and NoSQL database for UI team to display it using the Web application.
  • Have good experience in extracting and generating statistical analysis using Business Intelligence tool Tableau for better analysis of data.
  • Knowledge of job workflow scheduling and monitoring tools like Oozie, NoSQL database such as HBase.
  • Solid experience in Mainframe Technologies MVS, z/OS, COBOL, JCL, VSAM, CICS and DB2.
  • Well exposure to On-site, Offshore and Near-shore models and having good co-ordination skills in handling On-site/Offshore teams.
  • Experience in working with all stake holders of the project including Enterprise Architects, BI/DW Data Modelers, Data Analysts, Business Analysts, System Analysts, PMO and various levels of Business Users including Business Executives and Top Management.
  • Experience in training Business Users in the developed/modified systems.
  • Well versed in using Software development methodologies like Rapid Application Development (RAD), Agile Methodology and Scrum software development processes.
  • Experience with Object Oriented Analysis and Design (OOAD) methodologies.
  • Directly involved in re-engineering 2 Mainframe applications into Hadoop and Spark to reduce mainframe MIPS and storage cost.
  • Experience in tuning and designing DB2 database tables with use of STROBE tool.
  • Experience in Ispirer tool for analyzing the DB2 data in SQL server environment.
  • Expert in supporting technical upgrade activities such as DB2, IMS, z/OS and ACF2 (security).
  • Experience in Change Management Tool JIRA, Maximo, SCCD & ServiceNow.
  • Experience in design and coordinating of middle layer services such as BizTalk, Direct Connect, NDM services etc.

TECHNICAL SKILLS:

Hadoop Ecosystem: HDFS, MapReduce, YARN, Hive, TEZ, Pig, Impala, Sqoop, Flume, Apache Kafka, Oozie and Zookeeper.

Spark Ecosystem: Spark Core (RDDs, Data Frames and Datasets), SparkSQL

Operating System: UNIX, Linux, Windows XP, IBM z/OS

Databases: HBase, MySQL, DB2, IMS DB, SQL ServerLanguages: C, Scala, Core Java, PIG Latin, SparkSQL, HiveQL, Shell Scripting

Development Tools: IntelliJ, Eclipse, NetBeans, SBT.

BI/ETL Tool: Tableau, SSIS (ETL)

Mainframe Skills: COBOL, JCL, VSAM, CICS, STROBE (for performance analysis), Ispirer, CONTROL-M, ENDEVOR, Panvalet, CA-7, ZEKE, File -Aid, ABEND Aid.

Applications: MS Project, Word, Excel, PowerPoint, SharePoint and Top Team.

Hadoop: certification from Edureka

IBM610 Certified Database Associate: DB2 10.1 Fundamentals

INS21: (External) & General Insurance (Internal) certification in Insurance Domain Brain bench certification in COBOL, JCL & DB2, Completed industrial training of 30 hours in Agile and Scrum.

PROFESSIONAL EXPERIENCE:

Confidential, Wilmington, DE

Apache Spark-Scala Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
  • Configured deployed and maintained multi-node Dev and Test Clusters.
  • Designed and implemented a stream filtering system on top of Apache Kafka to reduce stream size.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Implemented ELK (Elastic Search, Log stash and Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
  • Worked extensively with Sqoop for importing metadata from DB2.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Good experience with continuous Integration of application using Jenkins.
  • Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

Environment: Hadoop YARN, Spark Core, Spark SQL, Scala, Hive, Sqoop, Elastic Search, Impala, Tableau, Oozie, Jenkins, Cloudera, DB2 10.1, Linux, JIRA.

Confidential, Wilmington, DE

Apache Spark-Scala Developer

Responsibilities:

  • The mainframe Causal file process re-engineered in Spark ecosystem using the core concept of RDDs and Data frames whereas the underlying language used as Scala.
  • The promotional data that required for this application off loaded from Mainframe DB2 to HDFS using Sqoop.
  • The business rules maintained in SSIS package has been converted into HIVE warehouse.
  • Worked on Hive partition and bucketing concepts and created Hive External and internal tables.
  • Worked on iterative data validation and processing which is done on Spark with the help of Scala language.
  • Once report has been created the raw data has been feed into visualization tool named as Tableau for business research.
  • Involved in architecture design, development and implementation of Hadoop deployment, backup and recovery systems.
  • Experience in using Avro, Parquet, ORC File and JSON file formats and developed UDFs using Hive and Pig.
  • Driving the application from development phase to production phase using Continuous Integration and Continuous Deployment (CICD) model using SBT and Jenkins.
  • Collaborated on insights with other Data Scientists, Business Analysts and Partners.
  • Involved in Unit Testing & Debugging.
  • Worked with offshore and onsite team with a total of 4 Big Data developer.

Environment: Cloudera CDH 5.12 distribution, Spark, Scala, Tableau, Hive, HDFS, Sqoop, DB2, Shell scripting, LINUX, ServiceNow.

Confidential, Philadelphia, PA

Hadoop Developer

Responsibilities:

  • Worked on the proof-of-concept for Apache Hadoop 1.20.2 framework initiation.
  • Installed and configured Hadoop clusters and eco-system.
  • Developed automated scripts to install Hadoop clusters.
  • Involved in all phases of the Big Data Implementation including requirement analysis, design, development, building, testing, and deployment of Hadoop cluster in fully distributed mode Mapping the DB2 V9.7, V10.x Data Types to Hive Data Types and validations.
  • Developed Hive jobs to transfer 8 years of bulk data from DB2, MS SQL Server to HDFS layer
  • Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts
  • Job automation framework to support & operationalize data loads.
  • Automated the DDL creation process in hive by mapping the DB2 data types.
  • Monitored Hadoop cluster job performance and capacity planning.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Had experience in Hadoop framework, HDFS, MapReduce processing implementation.
  • Tuning Hadoop performance with high availability and involved in recovery of Hadoop clusters.
  • Responsible for coding Java Batch, Restful Service, Map Reduce program, Hive query’s, testing, debugging, Peer code review, troubleshooting and maintain status report.
  • Used AVRO, Parquet file formats for serialization of data.
  • Developed several test cases using MR Unit for testing Map Reduce Applications
  • Used Bzip2 compression technique to compress the files before loading it to Hive
  • Experience in using HBase as backend database for the application development.
  • Support/Troubleshoot hive programs running on the cluster and Involved in fixing issues arising out of duration testing.
  • Prepare daily and weekly project status report and share it with the client.

Environment: Hadoop, MapReduce, Flume, Sqoop, Hive, Pig, Web Services, Linux, Core Java, Informatica, HBase, Avro, JIRA, Git, Cloudera, MR Unit, MS-SQL Server, UNIX, DB2, JIRA.

Confidential, Philadelphia, PA

DB2 Developer and Analyst

Responsibilities:

  • Developed revised design model to overcome the performance bottlenecks. On a high level the design is - through SSIS package the information from SQL server has been extracted, transformed and via mainframe batch the data has been loaded into DB2 tables.
  • Worked as a programmer to develop the system where 30M data got loaded in 5 mins.
  • Coded complex DB2 store-procedure logic to convert the SSIS business rules.
  • DB2 unique Exchange command has been used to exchange the data between clone and base tables.

Technologies: DB2 Database, JCL, SQL Server, SSIS Package, FTP

Confidential, Wilmington, DE

Senior Mainframe Developer

Responsibilities:

  • To measure, analyze and improve the current Confidential batch where there is a performance bottleneck, both on functionality and logical flow.
  • To lower the CPU consumption & DB2 usage which eventually reduce the batch elapsed time both in online run and in nightly run.
  • Minimizing DB2 usage based on following approach -
  • Removing un-necessary and redundant SQL(s) based on STROBE Report.
  • Improving efficiency of SQL(s) by using best DB2 practices.
  • Removing DISPLAY wherever applicable.
  • Decommission obsolete conditions and validations.

Technologies/Tools/Utilities: DB2 Database, COBOL, DB2, JCL, Strobe, Explain, Code Profiling

Confidential, Wilmington, DE

Senior Mainframe Developer

Responsibilities:

  • Given the idea to business and presented the proposal.
  • Working as Lead developer and SME to guide the project and implement successfully.

Technologies: COBOL, DB2, JCL, SQL Server, .NET

Tools: /Utilities: CA-7, Strobe, Panvalet

Confidential

Mainframe Developer

Responsibilities:

  • To measure, analyze and improve the current Confidential batch where there is a performance bottleneck, both on functionality and logical flow.
  • To lower the CPU consumption & DB2 usage which eventually reduce the batch elapsed time both in online run and in nightly run.
  • Minimizing DB2 usage based on following approach -
  • Removing un-necessary and redundant SQL(s) based on STROBE Report.
  • Improving efficiency of SQL(s) by using best DB2 practices.
  • Removing DISPLAY wherever applicable.
  • Decommission obsolete conditions and validations.

Technologies: DB2 Database, COBOL, Sync Sort, Easytrieve, JCL

Tools: /Utilities: Omegamon, ZEKE.

Confidential

Mainframe Developer

Responsibilities:

  • Involves as SME of Confidential application to migrate the project from Mainframe to .Net and BizTalk. Also acts as a co-coordinator between the team involves different layer of technology such as Mainframe, .Net, BizTalk.
  • Coding (involves developing codes for form specific sub-modules based on the work request).
  • Unit and Integration Testing.

Technologies: DB2 & IMS Database, COBOL, JCL, VSAM, BizTalk.

Tools: /Utilities: SYNCSORT, Ispirer

Confidential

Mainframe Developer

Responsibilities:

  • Main objective is to generate forms for a particular policy based on work request logged in Bug-tracker. We run the nightly policy print batch cycle to generate the extract file.
  • Coding (involves developing codes for form specific sub-modules based on the work request).
  • Unit and Integration Testing.
  • Enhance Functionality
  • Develop new reporting systems for State regulatory bodies
  • Add new reports to be reported to Federal and state regulatory bodies
  • Build the basis for future extension of the solution (e.g. automation of updating dates etc.)
  • Application Maintenance
  • Marinating the existing application which includes maintain the codes, databases and files
  • Improve efficiency
  • Automate manual date verification before triggering jobs etc. where possible.

We'd love your feedback!