Sr. Spark / Scala Developer Resume Minneapolis, MN - Hire IT People

PROFESSIONAL SUMMARY

About 8 years of IT/Business Solutions experience in analysis, design, development and implementation of cost - effective, high quality, high-performance and innovative technology solutions in Healthcare, Insurance domains and IT sectors
4+ years of experience on BIG DATA using HADOOP framework and related technologies such as HDFS, HBASE, MapReduce, HIVE, PIG, FLUME, KAFKA, OOZIE, SQOOP, AVRO and ZOOKEEPER
Well versed in configuring and administering teh Hadoop Cluster using Cloudera and Hortonworks
Experience in creating real time data streaming solutions using Apache Spark/ Spark Streaming/ Apache Storm,Kafkaand Flume
Currently working on Spark applications extensively using Scala as teh main programming platform
Processing dis data using SparkStreamingAPI with Scala.
Used Spark Data Frames, Spark-SQL and RDD API of Spark for performing various data transformations and dataset building
Used Scala collection framework to store and process metadata and other related information.
Experience in using DataStax Spark-Cassandra connectors to get data from Cassandra tables and process them using Apache Spark
Exposure to Data Lake Implementation and developed Data pipelines and applied business logic utilizing Apache Spark
Involved converting Cassandra/Hive/SQL queries into Spark transformations using RDD's and Scala.
Implemented SparkScripts using Scala, SparkSQL to access hive tables into spark for faster processing of data.
Hands on experience doing real time on NO-SQL databases like MongoDB, HBase and Cassandra
Experience in creating MongoDB clusters and hands on experience with complex MongoDB aggregate functions and mapping
Designing data models in Cassandra and working with CassandraQueryLanguage
Worked with No-SQL data-stores, primarily HBase using teh Java API of HBase and Hive Integration
Experience in using Flume to load log files into HDFS and Oozie for data scrubbing and process
Experienced in implementing handling skewed data, sparse data using pig Latin scripts and joins
Experience on performance tuning of HIVE queries and JavaMapReduce programs for scalability and faster execution
Experienced in handling real time analytics using HBase on top of HDFS data
Experience in transforming, Grouping, Aggregations, Joins using KafkaStreamsAPI
Hands on experience deploying KAFKA connect in standalone and distributed mode creating docker containers using DOCKER
Created TOPICS and written KAFKAproducer and consumer in java as required, developed KAFKAsource/sink connectors to store teh streaming new data into topics,from topics to required different database by performing ETL tasks also used Akkatoolkit with Scala to perform some builds
Experienced in collecting metrics for Hadoop clusters using Ambari&ClouderaManager.
Has noledge on Storm architecture, Experience in using data modeling tools like Erwin
Excellent experience in using scheduling tools to automate batch jobs
Hands on experience in using Apache SOLR/Lucene
Expertise using SQLServer, SQL, queries,procedures, functions
Hands on experience in App Development using Java, Hadoop, RDBMS and Linux shellscripting
Oracle JAVA certified professional and certified in AWS Solutions Architect-Associate
Strong experience in Extending Hive and Pig core functionality by writing custom UDFs
Experience in Software Design, Development and Implementation of Client/Server WebApplications
Hands on experience using Angular4, jQuery, JavaScript, Java Beans, Struts, PL/SQL, SQL, HTML, CSS, PHP, XML, and AJAX, Spring Boot, Hibernate
Excellent noledge of Object Oriented Design and Development
Ability to work as team and individually on many cutting-edge technologieswith excellent management skills, business understanding and strong communication skills

TECHNIAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce, Yarn, HBase, Pig, Hive, Sqoop, Flume,Oozie, Zookeeper,Splunk, Hortonworks, Cloudera

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans

IDE’s: Eclipse, Net beans, IntelliJ,Pycharm

ETL: Talend, SSIS

Frameworks: MVC, Struts, Hibernate, Spring, Spring boot

Programming languages: C, C++, Java, Python, Scala, Linux shell scripts

Databases-: RDBMS (MySQL, DB2, MS-SQL Server, PostgreSQL), NoSQL (MongoDB, HBase, Cassandra)

Amazon Web Services: EMR, EC2, S3, RDS, Cloud Search, Redshift, Data PipelineLambda.: Web Servers Web Logic, Web Sphere, Apache Tomcat

Web Technologies: HTML, XML, JavaScript, AJAX, WSDL, Angular 4, SOAP/REST

Build Tools: Ant, Maven, Gradle, Akka Version Controller GIT and Tortoise SVN

Cloud: AWS, Microsoft Azure

Development Methodologies: Agile, Scrum, Waterfall

PROFESSIONAL EXPERIENCE

Sr. Spark / Scala Developer

Confidential, Minneapolis, MN

Roles & Responsibilities:

Worked with Spark for improving performance and optimization of teh existing algorithms in Hadoopusing Spark Context, Spark-SQL, DataFrames, Pair RDD's
Developed preprocessing job using SparkDataframes to transform JSON documents to flat file
Loaded D-Stream data into Spark RDD and did in-memory data computation to generate output response
Processing with Amazon EMR big data across a Hadoop cluster of virtual servers on AmazonElastic Compute Cloud (EC2) andAmazonSimple Storage Service (S3).
Imported data from AWSS3 into SparkRDD, performed transformations and actions on RDD's.
Worked on Big Data infrastructure for batch processing and real-time processing using Apache Spark
Developed Apache Spark applications by using Scala for data processing from various streaming sources
Processed teh Web server logs by developing Multi-Hop Flume agents by using AvroSink and loaded intoCassandra for further analysis, Extracted files from Cassandra through Flume
Responsible for design and development of SparkSQL Scripts based on Functional Specifications
Worked on teh large-scale HadoopYARN cluster for distributed data processing and analysis using Spark,Hive, and Cassandra
Involved in converting Cassandra/Hive/SQL queries into Spark transformations using RDD's and Scala
Implemented SparkScripts using Scala, SparkSQL to access hive tables to spark for faster processingof data.
Developed Some Helper class for abstracting Cassandra cluster connection act as core toolkit
Involved in creating Data Lake by extracting customer's data from various data sources to HDFS which includedata from Excel, databases, and log data from servers
Moved data from HDFS to Cassandra using Map Reduce and BulkOutputFormat class.
Extracted files from Cassandra through Sqoop and placed in HDFS and processed it using Hive
Writing MapReduce (Hadoop) programs to convert text files into AVRO and loading into Hivetable
Experienced in writing real-time processing and core jobs using SparkStreaming with Kafka as a datapipeline system
Extending HIVE/PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig
Involved in loading data from restendpoints to Kafka producers and transferring teh data to Kafkabrokers
Used Apache Kafka functionalities like distribution, partition, replicated commit log service for messaging
Partitioning Datastreams using Kafka. Designed and configured Kafkacluster to accommodate heavythroughput.
Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reportsfor teh BI team
Used Apache Oozie for scheduling and managing multiple Hive Jobs. Knowledge of HCatalogfor Hadoop based storage management
Migrated an existing on-premises application to Amazon Web Services (AWS) and used its services likeEC2 and S3 for small data sets processing and storage, experienced in maintaining teh Hadoop clusteronAWSEMR
Developed solutions to pre-process large sets of structured, semi-structured data, with different file formats like Text, Avro, Sequence, XML, JSON, and Parquet
Generated various kinds of reports using Pentaho and Tableau based on Client specification
Has come across new tools like Jenkins, Chef and RabbitMQ.
Worked with SCRUM team in delivering agreed user stories on time for every Sprint

Environment: Hadoop, MapReduce, HDFS, Yarn, Hive, Sqoop, Oozie, Spark, Scala, AWS, EC2, S3, EMR, Cassandra, Flume, Kafka, Pig, Linux, Shell Scripting

Sr. Spark / Bigdata Developer

Confidential - Worcester, MA

Roles & Responsibilities:

Built scalable distributed Hadoop cluster running HortonworksDataPlatform (HDP 2.6)
Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using SparkContext, Spark-SQL, PairRDD's
Serializing JSON data and storing teh data into tables using SparkSQL
SparkStreaming collects data from Kafka in near-real-time and performs necessary transformations and aggregation to build teh common learner data model and stores teh data in NoSQL store (HBase).
Good Knowledge on Sparkframework on both batch and real-timedataprocessing
Hands on experience in MLlib from Spark are used for predictive intelligence, customer segmentation and for smooth maintenance in Sparkstreaming
Developing programs for Sparkstreaming which takes teh data from Kafka and pushes into different sources
Loading teh data from teh different Data sources like (Teradata, DB2, Oracle and flat files) into HDFS using Sqoop and load into Hive tables, which are partitioned.
Created different pigscripts& converted them as shell command to provide aliases for common operation for project business flow.
Implemented Partitioning, Bucketing in Hive for better organization of teh data.
Implemented various Hivequeries for Analysis and call tan from java client engine to run on different nodes.
Created few Hive UDF's to as well to hide or abstract complex repetitive rules.
Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and tan imported into hive tables.
Developed bash scripts to bring teh log files from FTP server and tan processing it to load into Hivetables.
All teh bashscripts are scheduled using Resource Manager Scheduler.
Developed Map Reduce programs for applying business rules on teh data.
Developed a NiFi Workflow to pick up teh data from Data Lake as well as from server and send dat to Kafkabroker
Involved in loading and transforming large sets of structured data from router location to EDW using an ApacheNiFi data pipeline flow
Implemented Kafkaeventlog producer to produce teh logs into Kafkatopic which are utilized by ELK (Elastic Search, Log Stash, Kibana) stack to analyze teh logs produced by teh Hadoopcluster
Did Implementation using Apache Kafka replacement for a more traditional message broker (JMS Solace) to reduce licensing and decouple processing from data producers, to buffer unprocessed messages.
Implemented receiver-based approach, here me worked on Spark streaming for linking with Streaming Context using java API and handle proper closing & waiting stages as well.
Experience in Implementing RackTopologyscripts to theHadoopCluster.
Implemented teh part to resolve issues related with old Hazel cast API Entry Processor.
Used AkkaToolkit to perform few builds and used Akka with Scala
Excellent noledge with Talend Administration console, Talend installation, using Context and global map variables in Talend
Used dashboard tools like Tableau
Has Knowledge about Splunkarchitecture and various components (Indexer, forwarder, search head, deployment server)
Used Talend Admin Console Job conductor to schedule ETL Jobs on daily, weekly basis
Environment:HadoopHDP, Linux, MapReduce, HBase, HDFS, Hive, Pig, Tableau, NoSQL, Shell Scripting, Sqoop, Java, Eclipse, Maven, Splunk, Open source technologies Apache Kafka, Apache Spark, Hazel cast, Git, Talend.

Hadoop Developer

Confidential - Anoka, MN

Responsibilities:

Dealing with data migrations from diversified databases into HDFS and Hive using Sqoop
Written Java program ApacheTIKA to get teh Metadata and teh data of teh documents and perform cleansing and build clustersand save to MongoDB
Implemented DynamicPartitions and Buckets in HIVE for efficient data access
UtilizedHive to process huge amount of provider info
Used ClouderaImpala as an SQL engine for processing teh data stored in HBase and HDFS.
Applying automation jobs in Linuxshell
Built an AWS-EC2 instance and Migrated data to Cloud
Developed MapReduce jobs in java for datacleansing and pre-processing
Moving data from DB2, Oracle Exadata to HDFS and vice-versa using SQOOP
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
Creating S3buckets and managing policies for S3 buckets and Utilized S3 bucket and Glacier for Archival storage and backup on AWS.
Experience with AWSLambda workflow implementation using python to interact with application deployed on EC2 instance and S3 bucket.
Worked with different file formats and compression techniques to determine standards
Developed data pipeline using Pig and Hive from Teradata, DB2 data sources. These pipelines had customized UDF’S to extend teh ETL functionality
Developed user defined functions in Pig
Analysing/Transforming data with Hive and Pig
Developed hivequeries and UDFS to analyse/transform teh data in HDFS.
Developed hivescripts for implementing control tables logic in HDFS
Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE
Developed Pigscripts and UDF’s as per teh Business logic
Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis
Wrote pythonscripts to parse XML documents and load teh data in database
Enhanced existing module written in python scripts.
Excellent noledge on Python Collections and Multi-Threading
Designed and developed read lock capability in HDFS
Implemented Hadoop Float equivalent to teh DB2 Decimal
Involved in End to End implementation of ETL logic
Effective coordination with offshore team and managed project deliverable on time

Environment:Hadoop Ecosystem (MapReduce, HDFS, YARN, Hive, Sqoop, Pig Latin, Zoo Keeper, Oozie) NoSQL Databases (HBase, MongoDB), MySQL, Hortonworks, JIRA, Linux Shell Scripting,Teradata and Eclipse, Java, python

Hadoop Developer

Confidential - Jefferson, LA

Roles & Responsibilities:

Hands on experience in loading data from UNIX file system and Teradata to HDFS
Installed and configured Flume, Hive, Pig, Sqoop and Oozie on teh Hadoopcluster on teh Cloudera’s CDHdistribution.
Developed PIGscripts for teh processing of semi-structured data using sorting, joins and Grouping teh data.
Developed Java MapReduce programs on log data to transform into a structured way to find user location,age group, spending time
Collected and aggregated large amounts of weblog data from different sources using ApacheFlume and stored teh data into HDFS for analysis
Created HBase tables to store variable data formats coming from different portfolios Performed real-timeanalytics on HBase using JavaAPI and RestAPI.
Used Docker containers in development environment
Extracted files from Couch DB, MongoDB through Sqoop and placed in HDFS for processed
Integrated Oozie with teh rest of teh Hadoop stack supporting several types of Hadoop jobs out of teh box(such as Map-Reduce, Pig, Hive, and Sqoop) as well as system-specific jobs (such as Java programs and shellscripts).
Developed ETL using Hive, Oozie, shellscripts and Sqoop and Analyzed teh weblog data using teh HiveQL.
Supported Data Analysts in running MapReduce Programs.
Experienced with working on AvroData files using AvroSerialization system

Environment: Cloudera Distribution (CDH), HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, HBase, Java,Maven, Avro, Oozie, ETL and Unix Shell Scripting.

Java Developer

Confidential

Roles & Responsibilities:

Developed Client side UI changes using JSP, JavaScript, AngularJS, JQuery, HTML, CSS, AJAX, SpringMVC, SpringIOC, SpringJDBC,Webservices, XML,MSSQL, SOAP, XSD, JSON, AJAX, Log4j
Used SpringBoot configured Application Context files and performed database object mapping using Hibernate annotations.
Used SpringDependencyInjection, to make application easy to test and integrate
Designed and developed teh application using MVCArchitecture
Teh businesslogic was implemented using SpringMVC and Hibernate
Teh presentation layer was implemented using HTML, CSS and JSP
Worked on serverside implementation using spring core, spring annotations navigation from presentation to other layers using SpringMVC and integrated spring with Hibernate using Hibernate template to implement persistent layer
Implemented teh CRUD operations via SQLqueries to teh database MySQL
Used RESTFUL client to interact with teh services by providing teh RESTFUL URL mapping
Performed backend unit testing using Junit
Wrote Queries, StoredProcedures and created tables using SQL2008
Created Single-page Client-side applications using JavaScript, HTML, CSS, and Bootstrap
Working on creating SOAP/Rest Services Java and Springs
Developing teh (view models and controller) MVCactions method to fetch teh data from teh back-end services and send it as JSON objects to teh views
Developed servlets to process update information
Used JDBC for communicating with teh database
Involved in UIUX, Improved teh core Website functionality by fixing broken links and scripting errors, Sending reports to all clients (Up to date history of server, Ticket Summary, Daily server reports)

Work Environment: J2EE, Struts, Spring, Hibernate, Java Beans, Servlets, SQL, Oracle, HTML, CSS, Bootstrap, JavaScript, Tomcat, XML,JSON

Java /J2EE Develope

Confidential

Roles & Responsibilities:

Developed Servlets and JavaServerPages (JSP), to route teh submittals to teh EJB components. Java Scriptinghandled teh Front-end validations
Developed User Interface using JSP, Struts and JavaScript.
Designed and developed reconciliation module to generate invoices using PL/SQL, storeprocedures and Triggers.
Involved in designing and developing dynamic web pages using HTML and JSP with Strutstag libraries.
Responsible for designing Rich user Interface Applications using JavaScript, CSS, HTML and AJAX and developed web services by using SOAPUI.
Used JPA to persistently store large amount of data into database.
Implemented modules using JavaAPIs, Javacollection, Threads, XML, and integrating teh modules.
Applied J2EE Design Patterns such as Factory, Singleton, and Business delegate, DAO, Front Controller Pattern and MVC.
Teh application is implemented using JSP and servlets are used for implementing Business logic.
Developed utility and helper classes and Server-side Functionalities using servlets.
Created DAO Classes and Written Various SQLqueries to perform DML Operations on teh data as per theirquirements.
Used Log4j for External Configuration Files and debugging.
Created Session Beans and controller Servlets for handling HTTP requests from JSP pages.
EclipseIDE used for teh developing teh application.
Setting up environment such as deploying application, maintaining Web server.
Analyzing teh Defects and providing fix.
Development of XML files using XPATH, XSLT, Parsing using both SAX and DOM parsers.
Designed and developed XSL style sheets using XSLT to transform XML and display teh Customer Information on teh screen for teh user and also for processing
Used JUnit for Testing Java Classes..

Work Environment: Java, J2EE, Oracle, RMI, ClearCase, JDBC, UNIX, Junit, Eclipse,Struts, XML, XSLT,XPATH, XHTML, CSS, HTTP

We provide IT Staff Augmentation Services!

Sr. Spark / Scala Developer Resume

Minneapolis, MN

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship