Sr. Spark / Scala Developer Resume Minneapolis, MN - Hire IT People

PROFESSIONAL SUMMARY:

About 8 years of IT/Business Solutions experience in analysis, design, development and implementation of cost - effective, high quality, high-performance and innovative technology solutions in Healthcare, Insurance domains and IT sectors
4+ years of experience on BIG DATA using HADOOP framework and related technologies such as HDFS, HBASE, MapReduce, HIVE, PIG, FLUME, KAFKA, OOZIE, SQOOP, AVRO and ZOOKEEPER
Well versed in configuring and administering the Hadoop Cluster using Cloudera and Hortonworks
Experience in creating real time data streaming solutions using Apache Spark/ Spark Streaming/ Apache Storm, Kafka and Flume
Currently working on Spark applications extensively using Scala as the main programming platform
Processing this data using SparkStreamingAPI with Scala.
Used Spark Data Frames, Spark-SQL and RDD API of Spark for performing various data transformations and dataset building
Used Scala collection framework to store and process metadata and other related information.
Experience in using DataStax Spark-Cassandra connectors to get data from Cassandra tables and process them using Apache Spark
Exposure to Data Lake Implementation and developed Data pipelines and applied business logic utilizing Apache Spark
Involved converting Cassandra/Hive/SQL queries into Spark transformations using RDD's and Scala.
Implemented SparkScripts using Scala, SparkSQL to access hive tables into spark for faster processing of data.
Hands on experience doing real time on NO-SQL databases like MongoDB, HBase and Cassandra
Experience in creating MongoDB clusters and hands on experience with complex MongoDB aggregate functions and mapping
Designing data models in Cassandra and working with CassandraQueryLanguage
Worked with No-SQL data-stores, primarily HBase using the Java API of HBase and Hive Integration
Experience in using Flume to load log files into HDFS and Oozie for data scrubbing and process
Experienced in implementing handling skewed data, sparse data using pig Latin scripts and joins
Experience on performance tuning of HIVE queries and JavaMapReduce programs for scalability and faster execution
Experienced in handling real time analytics using HBase on top of HDFS data
Experience in transforming, Grouping, Aggregations, Joins using KafkaStreamsAPI
Hands on experience deploying KAFKA connect in standalone and distributed mode creating docker containers using DOCKER
Created TOPICS and written KAFKAproducer and consumer in java as required, developed KAFKAsource/sink connectors to store the streaming new data into topics,from topics to required different database by performing ETL tasks also used Akkatoolkit with Scala to perform some builds
Experienced in collecting metrics for Hadoop clusters using Ambari&ClouderaManager.
Has knowledge on Storm architecture, Experience in using data modeling tools like Erwin
Excellent experience in using scheduling tools to automate batch jobs
Hands on experience in using Apache SOLR/Lucene
Expertise using SQLServer, SQL, queries,procedures, functions
Hands on experience in App Development using Java, Hadoop, RDBMS and Linux shellscripting
Oracle JAVA certified professional and certified in AWS Solutions Architect-Associate
Strong experience in Extending Hive and Pig core functionality by writing custom UDFs
Experience in Software Design, Development and Implementation of Client/Server WebApplications
Hands on experience using Angular4, jQuery, JavaScript, Java Beans, Struts, PL/SQL, SQL, HTML, CSS, PHP, XML, and AJAX, Spring Boot, Hibernate
Excellent knowledge of Object Oriented Design and Development
Ability to work as team and individually on many cutting-edge technologies with excellent management skills, business understanding and strong communication skills

TECHNIAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce, Yarn, HBase, Pig, Hive, Sqoop, FlumeOozie, Zookeeper,Splunk, Hortonworks, Cloudera

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans

IDE’s: Eclipse, Net beans, IntelliJ,Pycharm

ETL: Talend, SSIS

Frameworks: MVC, Struts, Hibernate, Spring, Spring boot

Programming languages: C, C++, Java, Python, Scala, Linux shell scripts

Databases: RDBMS (MySQL, DB2, MS-SQL Server, PostgreSQL)NoSQL (MongoDB, HBase, Cassandra)

Amazon Web Services: EMR, EC2, S3, RDS, Cloud Search, Redshift, Data Pipeline, Lambda.

Web Servers: Web Logic, Web Sphere, Apache Tomcat

Web Technologies: HTML, XML, JavaScript, AJAX, WSDL, Angular 4, SOAP/REST

Build Tools: Ant, Maven, Gradle, Akka

Version Controller: GIT and Tortoise SVN

Cloud: AWS, Microsoft Azure

Development Methodologies: Agile, Scrum, Waterfall

WORK EXPERIENCE:

Sr. Spark / Scala Developer

Confidential, Minneapolis, MN

Responsibilities:

Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, DataFrames, Pair RDD's
Developed preprocessing job using SparkDataframes to transform JSON documents to flat file
Loaded D-Stream data into Spark RDD and did in-memory data computation to generate output response
Processing with Amazon EMR big data across a Hadoop cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
Imported data from AWSS3 into SparkRDD, performed transformations and actions on RDD's.
Worked on Big Data infrastructure for batch processing and real-time processing using Apache Spark
Developed Apache Spark applications by using Scala for data processing from various streaming sources
Processed the Web server logs by developing Multi-Hop Flume agents by using AvroSink and loaded intoCassandra for further analysis, Extracted files from Cassandra through Flume
Responsible for design and development of SparkSQL Scripts based on Functional Specifications
Worked on the large-scale HadoopYARN cluster for distributed data processing and analysis using Spark,Hive, and Cassandra
Involved in converting Cassandra/Hive/SQL queries into Spark transformations using RDD's and Scala
Implemented SparkScripts using Scala, SparkSQL to access hive tables to spark for faster processingof data.
Developed Some Helper class for abstracting Cassandra cluster connection act as core toolkit
Involved in creating Data Lake by extracting customer's data from various data sources to HDFS which includedata from Excel, databases, and log data from servers
Moved data from HDFS to Cassandra using Map Reduce and BulkOutputFormat class .
Extracted files from Cassandra through Sqoop and placed in HDFS and processed it using Hive
Writing MapReduce (Hadoop) programs to convert text files into AVRO and loading into Hivetable
Experienced in writing real-time processing and core jobs using SparkStreaming with Kafka as a data pipeline system
Extending HIVE/PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig
Involved in loading data from restendpoints to Kafka producers and transferring the data to Kafkabrokers
Used Apache Kafka functionalities like distribution, partition, replicated commit log service for messaging
Partitioning Datastreams using Kafka. Designed and configured Kafkacluster to accommodate heavythroughput.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reportsfor the BI team
Used Apache Oozie for scheduling and managing multiple Hive Jobs. Knowledge of HCatalogfor Hadoop based storage management
Migrated an existing on-premises application to Amazon Web Services (AWS) and used its services likeEC2 and S3 for small data sets processing and storage, experienced in maintaining the Hadoop clusteronAWSEMR
Developed solutions to pre-process large sets of structured, semi-structured data, with different file formats like Text, Avro, Sequence, XML, JSON, and Parquet
Generated various kinds of reports using Pentaho and Tableau based on Client specification
Have come across new tools like Jenkins, Chef and RabbitMQ.
Worked with SCRUM team in delivering agreed user stories on time for every Sprint

Environment: Hadoop, MapReduce, HDFS, Yarn, Hive, Sqoop, Oozie, Spark, Scala, AWS, EC2, S3, EMR, Cassandra, Flume, Kafka, Pig, Linux, Shell Scripting

Sr. Spark / Bigdata Developer

Confidential, Worcester, MA

Responsibilities:

Built scalable distributed Hadoop cluster running HortonworksDataPlatform (HDP 2.6)
Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using SparkContext, Spark-SQL, PairRDD's
Serializing JSON data and storing the data into tables using SparkSQL
SparkStreaming collects data from Kafka in near-real-time and performs necessary transformations and aggregation to build the common learner data model and stores the data in NoSQL store (HBase).
Good Knowledge on Sparkframework on both batch and real - timedataprocessing
Hands on experience in MLlib from Spark are used for predictive intelligence, customer segmentation and for smooth maintenance in Sparkstreaming
Developing programs for Sparkstreaming which takes the data from Kafka and pushes into different sources
Loading the data from the different Data sources like ( Teradata, DB2, Oracle and flat files ) into HDFS using Sqoop and load into Hive tables, which are partitioned.
Created different pigscripts & converted them as shell command to provide aliases for common operation for project business flow.
Implemented Partitioning, Bucketing in Hive for better organization of the data.
Implemented various Hivequeries for Analysis and call then from java client engine to run on different nodes.
Created few Hive UDF's to as well to hide or abstract complex repetitive rules.
Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
Developed bash scripts to bring the log files from FTP server and then processing it to load into Hivetables .
All the bashscripts are scheduled using Resource Manager Scheduler.
Developed Map Reduce programs for applying business rules on the data.
Developed a NiFi Workflow to pick up the data from Data Lake as well as from server and send that to Kafkabroker
Involved in loading and transforming large sets of structured data from router location to EDW using an ApacheNiFi data pipeline flow
Implemented Kafkaeventlog producer to produce the logs into Kafkatopic which are utilized by ELK (Elastic Search, Log Stash, Kibana) stack to analyze the logs produced by the Hadoopcluster
Did Implementation using Apache Kafka replacement for a more traditional message broker ( JMS Solace ) to reduce licensing and decouple processing from data producers, to buffer unprocessed messages.
Implemented receiver-based approach, here I worked on Spark streaming for linking with Streaming Context using java API and handle proper closing & waiting stages as well.
Experience in Implementing RackTopologyscripts to the Hadoop Cluster .
Implemented the part to resolve issues related with old Hazel cast API Entry Processor .
Used AkkaToolkit to perform few builds and used Akka with Scala
Excellent knowledge with Talend Administration console, Talend installation, using Context and global map variables in Talend
Used dashboard tools like Tableau
Have Knowledge about Splunkarchitecture and various components (Indexer, forwarder, search head, deployment server)
Used Talend Admin Console Job conductor to schedule ETL Jobs on daily, weekly basis

Environment: Hadoop HDP, Linux, MapReduce, HBase, HDFS, Hive, Pig, Tableau, NoSQL, Shell Scripting, Sqoop, Java, Eclipse, Maven, Splunk, Open source technologies Apache Kafka, Apache Spark, Hazel cast, Git, Talend.

Hadoop Developer

Confidential, Anoka, MN

Responsibilities:

Dealing with data migrations from diversified databases into HDFS and Hive using Sqoop
Written Java program ApacheTIKA to get the Metadata and the data of the documents and perform cleansing and build clustersand save to MongoDB
Implemented DynamicPartitions and Buckets in HIVE for efficient data access
UtilizedHive to process huge amount of provider info
Used Cloudera Impala as an SQL engine for processing the data stored in HBase and HDFS.
Applying automation jobs in Linuxshell
Built an AWS-EC2 instance and Migrated data to Cloud
Developed MapReduce jobs in java for datacleansing and pre-processing
Moving data from DB2, Oracle Exadata to HDFS and vice-versa using SQOOP
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
Creating S3buckets and managing policies for S3 buckets and Utilized S3 bucket and Glacier for Archival storage and backup on AWS .
Experience with AWSLambda workflow implementation using python to interact with application deployed on EC2 instance and S3 bucket.
Worked with different file formats and compression techniques to determine standards
Developed data pipeline using Pig and Hive from Teradata, DB2 data sources. These pipelines had customized UDF’S to extend the ETL functionality
Developed user defined functions in Pig
Analysing/Transforming data with Hive and Pig
Developed hivequeries and UDFS to analyse/transform the data in HDFS .
Developed hivescripts for implementing control tables logic in HDFS
Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE
Developed Pigscripts and UDF’s as per the Business logic
Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis
Wrote pythonscripts to parse XML documents and load the data in database
Enhanced existing module written in python scripts .
Excellent knowledge on Python Collections and Multi - Threading
Designed and developed read lock capability in HDFS
Implemented Hadoop Float equivalent to the DB2 Decimal
Involved in End to End implementation of ETL logic
Effective coordination with offshore team and managed project deliverable on time

Environment:Hadoop Ecosystem (MapReduce, HDFS, YARN, Hive, Sqoop, Pig Latin, Zoo Keeper, Oozie) NoSQL Databases (HBase, MongoDB), MySQL, Hortonworks, JIRA, Linux Shell Scripting,Teradata and Eclipse, Java, python

Hadoop Developer

Confidential, Jefferson, LA

Responsibilities:

Hands on experience in loading data from UNIX file system and Teradata to HDFS
Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoopcluster on the Cloudera’s CDHdistribution.
Developed PIGscripts for the processing of semi-structured data using sorting, joins and Grouping the data.
Developed Java MapReduce programs on log data to transform into a structured way to find user location,age group, spending time
Collected and aggregated large amounts of weblog data from different sources using ApacheFlume and stored the data into HDFS for analysis
Created HBase tables to store variable data formats coming from different portfolios Performed real-timeanalytics on HBase using JavaAPI and RestAPI.
Used Docker containers in development environment
Extracted files from Couch DB, MongoDB through Sqoop and placed in HDFS for processed
Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box(such as Map-Reduce, Pig, Hive, and Sqoop) as well as system-specific jobs (such as Java programs and shellscripts).
Developed ETL using Hive, Oozie, shellscripts and Sqoop and Analyzed the weblog data using the HiveQL.
Supported Data Analysts in running MapReduce Programs.
Experienced with working on AvroData files using AvroSerialization system

Environment: Cloudera Distribution (CDH), HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, HBase, Java,Maven, Avro, Oozie, ETL and Unix Shell Scripting.

Java Developer

Confidential

Responsibilities:

Developed Client side UI changes using JSP, JavaScript, AngularJS, JQuery, HTML, CSS, AJAX, SpringMVC, SpringIOC, SpringJDBC,Webservices, XML,MSSQL, SOAP, XSD, JSON, AJAX, Log4j
Used SpringBoot configured Application Context files and performed database object mapping using Hibernate annotations.
Used SpringDependencyInjection, to make application easy to test and integrate
Designed and developed the application using MVCArchitecture
The businesslogic was implemented using SpringMVC and Hibernate
The presentation layer was implemented using HTML , CSS and JSP
Worked on serverside implementation using spring core, spring annotations navigation from presentation to other layers using SpringMVC and integrated spring with Hibernate using Hibernate template to implement persistent layer
Implemented the CRUD operations via SQLqueries to the database MySQL
Used RESTFUL client to interact with the services by providing the RESTFUL URL mapping
Performed backend unit testing using Junit
Wrote Queries , StoredProcedures and created tables using SQL2008
Created Single-page Client - side applications using JavaScript , HTML , CSS , and Bootstrap
Working on creating SOAP / Rest Services Java and Springs
Developing the (view models and controller) MVCactions method to fetch the data from the back-end services and send it as JSON objects to the views
Developed servlets to process update information
Used JDBC for communicating with the database
Involved in UIUX , Improved the core Website functionality by fixing broken links and scripting errors, Sending reports to all clients (Up to date history of server, Ticket Summary, Daily server reports)

Work Environment: J2EE, Struts, Spring, Hibernate, Java Beans, Servlets, SQL, Oracle, HTML, CSS, Bootstrap, JavaScript, Tomcat, XML,JSON

We provide IT Staff Augmentation Services!

Sr. Spark / Scala Developer Resume

Minneapolis, MN

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship