We provide IT Staff Augmentation Services!

Sr. Spark / Scala Developer Resume

4.00/5 (Submit Your Rating)

Minneapolis, MN

PROFESSIONAL SUMMARY:

  • About 8 years of IT/Business Solutions experience in analysis, design, development and implementation of cost - effective, high quality, high-performance and innovative technology solutions in Healthcare, Insurance domains and IT sectors
  • 4+ years of experience on BIG DATA using HADOOP framework and related technologies such as HDFS, HBASE, MapReduce, HIVE, PIG, FLUME, KAFKA, OOZIE, SQOOP, AVRO and ZOOKEEPER
  • Well versed in configuring and administering the Hadoop Cluster using Cloudera and Hortonworks
  • Experience in creating real time data streaming solutions using Apache Spark/ Spark Streaming/ Apache Storm, Kafka and Flume
  • Currently working on Spark applications extensively using Scala as the main programming platform
  • Processing this data using SparkStreamingAPI with Scala.
  • Used Spark Data Frames, Spark-SQL and RDD API of Spark for performing various data transformations and dataset building
  • Used Scala collection framework to store and process metadata and other related information.
  • Experience in using DataStax Spark-Cassandra connectors to get data from Cassandra tables and process them using Apache Spark
  • Exposure to Data Lake Implementation and developed Data pipelines and applied business logic utilizing Apache Spark
  • Involved converting Cassandra/Hive/SQL queries into Spark transformations using RDD's and Scala.
  • Implemented SparkScripts using Scala, SparkSQL to access hive tables into spark for faster processing of data.
  • Hands on experience doing real time on NO-SQL databases like MongoDB, HBase and Cassandra
  • Experience in creating MongoDB clusters and hands on experience with complex MongoDB aggregate functions and mapping
  • Designing data models in Cassandra and working with CassandraQueryLanguage
  • Worked with No-SQL data-stores, primarily HBase using the Java API of HBase and Hive Integration
  • Experience in using Flume to load log files into HDFS and Oozie for data scrubbing and process
  • Experienced in implementing handling skewed data, sparse data using pig Latin scripts and joins
  • Experience on performance tuning of HIVE queries and JavaMapReduce programs for scalability and faster execution
  • Experienced in handling real time analytics using HBase on top of HDFS data
  • Experience in transforming, Grouping, Aggregations, Joins using KafkaStreamsAPI
  • Hands on experience deploying KAFKA connect in standalone and distributed mode creating docker containers using DOCKER
  • Created TOPICS and written KAFKAproducer and consumer in java as required, developed KAFKAsource/sink connectors to store the streaming new data into topics,from topics to required different database by performing ETL tasks also used Akkatoolkit with Scala to perform some builds
  • Experienced in collecting metrics for Hadoop clusters using Ambari&ClouderaManager.
  • Has knowledge on Storm architecture, Experience in using data modeling tools like Erwin
  • Excellent experience in using scheduling tools to automate batch jobs
  • Hands on experience in using Apache SOLR/Lucene
  • Expertise using SQLServer, SQL, queries,procedures, functions
  • Hands on experience in App Development using Java, Hadoop, RDBMS and Linux shellscripting
  • Oracle JAVA certified professional and certified in AWS Solutions Architect-Associate
  • Strong experience in Extending Hive and Pig core functionality by writing custom UDFs
  • Experience in Software Design, Development and Implementation of Client/Server WebApplications
  • Hands on experience using Angular4, jQuery, JavaScript, Java Beans, Struts, PL/SQL, SQL, HTML, CSS, PHP, XML, and AJAX, Spring Boot, Hibernate
  • Excellent knowledge of Object Oriented Design and Development
  • Ability to work as team and individually on many cutting-edge technologies with excellent management skills, business understanding and strong communication skills

TECHNIAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce, Yarn, HBase, Pig, Hive, Sqoop, FlumeOozie, Zookeeper,Splunk, Hortonworks, Cloudera

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans

IDE’s: Eclipse, Net beans, IntelliJ,Pycharm

ETL: Talend, SSIS

Frameworks: MVC, Struts, Hibernate, Spring, Spring boot

Programming languages: C, C++, Java, Python, Scala, Linux shell scripts

Databases: RDBMS (MySQL, DB2, MS-SQL Server, PostgreSQL)NoSQL (MongoDB, HBase, Cassandra)

Amazon Web Services: EMR, EC2, S3, RDS, Cloud Search, Redshift, Data Pipeline, Lambda.

Web Servers: Web Logic, Web Sphere, Apache Tomcat

Web Technologies: HTML, XML, JavaScript, AJAX, WSDL, Angular 4, SOAP/REST

Build Tools: Ant, Maven, Gradle, Akka

Version Controller: GIT and Tortoise SVN

Cloud: AWS, Microsoft Azure

Development Methodologies: Agile, Scrum, Waterfall

WORK EXPERIENCE:

Sr. Spark / Scala Developer

Confidential, Minneapolis, MN

Responsibilities:

  • Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, DataFrames, Pair RDD's
  • Developed preprocessing job using SparkDataframes to transform JSON documents to flat file
  • Loaded D-Stream data into Spark RDD and did in-memory data computation to generate output response
  • Processing with Amazon EMR big data across a Hadoop cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Imported data from AWSS3 into SparkRDD, performed transformations and actions on RDD's.
  • Worked on Big Data infrastructure for batch processing and real-time processing using Apache Spark
  • Developed Apache Spark applications by using Scala for data processing from various streaming sources
  • Processed the Web server logs by developing Multi-Hop Flume agents by using AvroSink and loaded intoCassandra for further analysis, Extracted files from Cassandra through Flume
  • Responsible for design and development of SparkSQL Scripts based on Functional Specifications
  • Worked on the large-scale HadoopYARN cluster for distributed data processing and analysis using Spark,Hive, and Cassandra
  • Involved in converting Cassandra/Hive/SQL queries into Spark transformations using RDD's and Scala
  • Implemented SparkScripts using Scala, SparkSQL to access hive tables to spark for faster processingof data.
  • Developed Some Helper class for abstracting Cassandra cluster connection act as core toolkit
  • Involved in creating Data Lake by extracting customer's data from various data sources to HDFS which includedata from Excel, databases, and log data from servers
  • Moved data from HDFS to Cassandra using Map Reduce and BulkOutputFormat class .
  • Extracted files from Cassandra through Sqoop and placed in HDFS and processed it using Hive
  • Writing MapReduce (Hadoop) programs to convert text files into AVRO and loading into Hivetable
  • Experienced in writing real-time processing and core jobs using SparkStreaming with Kafka as a data pipeline system
  • Extending HIVE/PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig
  • Involved in loading data from restendpoints to Kafka producers and transferring the data to Kafkabrokers
  • Used Apache Kafka functionalities like distribution, partition, replicated commit log service for messaging
  • Partitioning Datastreams using Kafka. Designed and configured Kafkacluster to accommodate heavythroughput.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reportsfor the BI team
  • Used Apache Oozie for scheduling and managing multiple Hive Jobs. Knowledge of HCatalogfor Hadoop based storage management
  • Migrated an existing on-premises application to Amazon Web Services (AWS) and used its services likeEC2 and S3 for small data sets processing and storage, experienced in maintaining the Hadoop clusteronAWSEMR
  • Developed solutions to pre-process large sets of structured, semi-structured data, with different file formats like Text, Avro, Sequence, XML, JSON, and Parquet
  • Generated various kinds of reports using Pentaho and Tableau based on Client specification
  • Have come across new tools like Jenkins, Chef and RabbitMQ.
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint

Environment: Hadoop, MapReduce, HDFS, Yarn, Hive, Sqoop, Oozie, Spark, Scala, AWS, EC2, S3, EMR, Cassandra, Flume, Kafka, Pig, Linux, Shell Scripting

Sr. Spark / Bigdata Developer

Confidential, Worcester, MA

Responsibilities:

  • Built scalable distributed Hadoop cluster running HortonworksDataPlatform (HDP 2.6)
  • Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using SparkContext, Spark-SQL, PairRDD's
  • Serializing JSON data and storing the data into tables using SparkSQL
  • SparkStreaming collects data from Kafka in near-real-time and performs necessary transformations and aggregation to build the common learner data model and stores the data in NoSQL store (HBase).
  • Good Knowledge on Sparkframework on both batch and real - timedataprocessing
  • Hands on experience in MLlib from Spark are used for predictive intelligence, customer segmentation and for smooth maintenance in Sparkstreaming
  • Developing programs for Sparkstreaming which takes the data from Kafka and pushes into different sources
  • Loading the data from the different Data sources like ( Teradata, DB2, Oracle and flat files ) into HDFS using Sqoop and load into Hive tables, which are partitioned.
  • Created different pigscripts & converted them as shell command to provide aliases for common operation for project business flow.
  • Implemented Partitioning, Bucketing in Hive for better organization of the data.
  • Implemented various Hivequeries for Analysis and call then from java client engine to run on different nodes.
  • Created few Hive UDF's to as well to hide or abstract complex repetitive rules.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
  • Developed bash scripts to bring the log files from FTP server and then processing it to load into Hivetables .
  • All the bashscripts are scheduled using Resource Manager Scheduler.
  • Developed Map Reduce programs for applying business rules on the data.
  • Developed a NiFi Workflow to pick up the data from Data Lake as well as from server and send that to Kafkabroker
  • Involved in loading and transforming large sets of structured data from router location to EDW using an ApacheNiFi data pipeline flow
  • Implemented Kafkaeventlog producer to produce the logs into Kafkatopic which are utilized by ELK (Elastic Search, Log Stash, Kibana) stack to analyze the logs produced by the Hadoopcluster
  • Did Implementation using Apache Kafka replacement for a more traditional message broker ( JMS Solace ) to reduce licensing and decouple processing from data producers, to buffer unprocessed messages.
  • Implemented receiver-based approach, here I worked on Spark streaming for linking with Streaming Context using java API and handle proper closing & waiting stages as well.
  • Experience in Implementing RackTopologyscripts to the Hadoop Cluster .
  • Implemented the part to resolve issues related with old Hazel cast API Entry Processor .
  • Used AkkaToolkit to perform few builds and used Akka with Scala
  • Excellent knowledge with Talend Administration console, Talend installation, using Context and global map variables in Talend
  • Used dashboard tools like Tableau
  • Have Knowledge about Splunkarchitecture and various components (Indexer, forwarder, search head, deployment server)
  • Used Talend Admin Console Job conductor to schedule ETL Jobs on daily, weekly basis

Environment: Hadoop HDP, Linux, MapReduce, HBase, HDFS, Hive, Pig, Tableau, NoSQL, Shell Scripting, Sqoop, Java, Eclipse, Maven, Splunk, Open source technologies Apache Kafka, Apache Spark, Hazel cast, Git, Talend.

Hadoop Developer

Confidential, Anoka, MN

Responsibilities:

  • Dealing with data migrations from diversified databases into HDFS and Hive using Sqoop
  • Written Java program ApacheTIKA to get the Metadata and the data of the documents and perform cleansing and build clustersand save to MongoDB
  • Implemented DynamicPartitions and Buckets in HIVE for efficient data access
  • UtilizedHive to process huge amount of provider info
  • Used Cloudera Impala as an SQL engine for processing the data stored in HBase and HDFS.
  • Applying automation jobs in Linuxshell
  • Built an AWS-EC2 instance and Migrated data to Cloud
  • Developed MapReduce jobs in java for datacleansing and pre-processing
  • Moving data from DB2, Oracle Exadata to HDFS and vice-versa using SQOOP
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
  • Creating S3buckets and managing policies for S3 buckets and Utilized S3 bucket and Glacier for Archival storage and backup on AWS .
  • Experience with AWSLambda workflow implementation using python to interact with application deployed on EC2 instance and S3 bucket.
  • Worked with different file formats and compression techniques to determine standards
  • Developed data pipeline using Pig and Hive from Teradata, DB2 data sources. These pipelines had customized UDF’S to extend the ETL functionality
  • Developed user defined functions in Pig
  • Analysing/Transforming data with Hive and Pig
  • Developed hivequeries and UDFS to analyse/transform the data in HDFS .
  • Developed hivescripts for implementing control tables logic in HDFS
  • Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE
  • Developed Pigscripts and UDF’s as per the Business logic
  • Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis
  • Wrote pythonscripts to parse XML documents and load the data in database
  • Enhanced existing module written in python scripts .
  • Excellent knowledge on Python Collections and Multi - Threading
  • Designed and developed read lock capability in HDFS
  • Implemented Hadoop Float equivalent to the DB2 Decimal
  • Involved in End to End implementation of ETL logic
  • Effective coordination with offshore team and managed project deliverable on time

Environment:Hadoop Ecosystem (MapReduce, HDFS, YARN, Hive, Sqoop, Pig Latin, Zoo Keeper, Oozie) NoSQL Databases (HBase, MongoDB), MySQL, Hortonworks, JIRA, Linux Shell Scripting,Teradata and Eclipse, Java, python

Hadoop Developer

Confidential, Jefferson, LA

Responsibilities:

  • Hands on experience in loading data from UNIX file system and Teradata to HDFS
  • Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoopcluster on the Cloudera’s CDHdistribution.
  • Developed PIGscripts for the processing of semi-structured data using sorting, joins and Grouping the data.
  • Developed Java MapReduce programs on log data to transform into a structured way to find user location,age group, spending time
  • Collected and aggregated large amounts of weblog data from different sources using ApacheFlume and stored the data into HDFS for analysis
  • Created HBase tables to store variable data formats coming from different portfolios Performed real-timeanalytics on HBase using JavaAPI and RestAPI.
  • Used Docker containers in development environment
  • Extracted files from Couch DB, MongoDB through Sqoop and placed in HDFS for processed
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box(such as Map-Reduce, Pig, Hive, and Sqoop) as well as system-specific jobs (such as Java programs and shellscripts).
  • Developed ETL using Hive, Oozie, shellscripts and Sqoop and Analyzed the weblog data using the HiveQL.
  • Supported Data Analysts in running MapReduce Programs.
  • Experienced with working on AvroData files using AvroSerialization system

Environment: Cloudera Distribution (CDH), HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, HBase, Java,Maven, Avro, Oozie, ETL and Unix Shell Scripting.

Java Developer

Confidential

Responsibilities:

  • Developed Client side UI changes using JSP, JavaScript, AngularJS, JQuery, HTML, CSS, AJAX, SpringMVC, SpringIOC, SpringJDBC,Webservices, XML,MSSQL, SOAP, XSD, JSON, AJAX, Log4j
  • Used SpringBoot configured Application Context files and performed database object mapping using Hibernate annotations.
  • Used SpringDependencyInjection, to make application easy to test and integrate
  • Designed and developed the application using MVCArchitecture
  • The businesslogic was implemented using SpringMVC and Hibernate
  • The presentation layer was implemented using HTML , CSS and JSP
  • Worked on serverside implementation using spring core, spring annotations navigation from presentation to other layers using SpringMVC and integrated spring with Hibernate using Hibernate template to implement persistent layer
  • Implemented the CRUD operations via SQLqueries to the database MySQL
  • Used RESTFUL client to interact with the services by providing the RESTFUL URL mapping
  • Performed backend unit testing using Junit
  • Wrote Queries , StoredProcedures and created tables using SQL2008
  • Created Single-page Client - side applications using JavaScript , HTML , CSS , and Bootstrap
  • Working on creating SOAP / Rest Services Java and Springs
  • Developing the (view models and controller) MVCactions method to fetch the data from the back-end services and send it as JSON objects to the views
  • Developed servlets to process update information
  • Used JDBC for communicating with the database
  • Involved in UIUX , Improved the core Website functionality by fixing broken links and scripting errors, Sending reports to all clients (Up to date history of server, Ticket Summary, Daily server reports)

Work Environment: J2EE, Struts, Spring, Hibernate, Java Beans, Servlets, SQL, Oracle, HTML, CSS, Bootstrap, JavaScript, Tomcat, XML,JSON

We'd love your feedback!