Spark/Scala Developer Resume Longwood, FL - Hire IT People

SUMMARY

Over 8 years of professional experience involving project development, implementation, deployment and maintenance using Java/J2EE, Hadoop and Spark related technologies using Cloudera, Hortonworks .
Hadoop Developer with 5 years of working experience in designing and implementing complete end - to- end Hadoop based data analytical solutions using HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase etc.
Experience in installation, configuration, management and deployment of Hadoop Cluster, HDFS, Map Reduce, Pig, Hive, Sqoop, Apache Storm, Flume, Oozie, HBase and Zookeeper.
In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce, Spark.
Hadoop related eco-systems as a Data Storage and Retrieval systems.
Good experience in creating data ingestion pipelines, data transformations, data management, data governance and real time streaming at an enterprise level.
Profound experience in creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
Good knowledge on Spark Ecosystem and Spark Architecture.
Experience developing Pig Latin and HiveQL scripts for Data Analysis and ETL purposes and extended the default functionality by writing User Defined Functions (UDFs), (UDAFs) for custom data specific processing.
Good Hands-on full life cycle implementation using CDH (Cloudera) and HDP (Hortonworks Data Platform) distributions.
Strong Knowledge on Architecture of Distributed systems and parallel processing, In-depth understanding of MapReduce programing paradigm and Spark execution framework.
Profound understanding of Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.

TECHNICAL SKILLS

Big Data Technologies: Hadoop 1.x/2.x(Yarn), HDFS, MapReduce, Pig, Hive, HBase, Cassandra, Zookeeper, Oozie, Sqoop, Flume, HCatalog, Apache Spark, Scala, Impala, Kafka, Storm, Tez, Ganglia, Nagios, Splunk, Elastic Search, Kibana
Hadoop Distributions: Cloudera, Horton Works, AWS
Operating Systems: Windows, Macintosh, Linux, Ubuntu, Unix, CentOS, Redhat.
Programming Languages: C, JAVA, J2EE, SQL, PigLatin, HiveQL, Scala, Python, Unix Shell Scripting
Java Technologies: JSP, Servlets, Spring, Hibernate, Maven
Databases: MS - SQL, Oracle, MS-Access, NoSQL, MySQL
Reporting Tools/ETL Tools: Tableau, Informatica, Data stage, Talend, Pentaho, Power View
Methodologies: Agile/Scrum, Waterfall, DevOps
Development Tools: Eclipse, NetBeans, IntelliJ, Hue, Microsoft Office Suite (Word, Excel, PowerPoint, Access)

PROFESSIONAL EXPERIENCE

Confidential, Longwood, FL

Spark/Scala Developer

Responsibilities:

Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS.
Worked on the Spark SQL for analyzing the data.
Used Scala to write code for all Spark use cases.
Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Pair RDD'S, YARN.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD'S and Scala.
Worked in Spark SQL on different data formats like JSON and Parquet.
Developed Spark scripts by using Scala shell commands as per the requirement.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in HDFS.
Load the data into Spark RDD and performed in-memory data computation to generate the output response.
Familiarity with Hadoop open source stack including Yarn, Kafka, Hive
I have been experienced with KAFKA to ingest data into Spark engine
Worked on streaming pipeline that uses Spark to read data from Kafka, transform it and write it to HDFS.
Extensive experience in using the mom with active MQ, apache storm, apache Spark & Kafka maven and zookeeper.
Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV, JSON.
Developed workflows using Oozie to automate the tasks of loading the data into HDFS and preprocessing with Pig.
Involved in preparing JIL's for AutoSys jobs.
Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries .
Used Oozie for automating the end to end data pipelines and Oozie coordinators for scheduling the work flows.
Implemented daily workflow for extraction, processing and analysis of data with Oozie.
Hadoop installation & configuration of multiple nodes on AWS EC2 system
Implementing Hadoop with the AWS EC2 system using a few instances in gathering and analyzing data log files.
Implemented a Continuous Delivery pipeline with Docker, Jenkins and GitHub, Nexus, Maven and AWS.

Environment: Spark - 1.5.2, Spark SQl, Java 1.8, Hive, HDFS, HQL, YARN, HBase, MapReduce, Sqoop, Flume, Oozie, Kafka, Scala, AWS Oracle 12c.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

Worked on live 60 nodes Hadoop Cluster running CDH5.4.4, CHD5.2.0, CDH5.2.1
Worked on Hadoop cluster using different Bigdata analytic tools including Kafka, Pig, Hive and Map Reduce .
Developed simple to complex MapReduce streaming jobs using Python language that are implemented using Hive and Pig .
Implemented data access jobs through Pig, Hive, HBase (0.98.0), Storm (0.91)
Involved in loading data from LINUX file system to HDFS
Importing and exporting data into HDFS and Hive using Sqoop .
Altered existing Scala programs to enhance performance and obtain partitioned results Spark tool.
Used SparkSQL for Scala &, Python interface that automatically converts RDD case classes to schema RDD.
Used SparkSQL to read and write table which are stored in Hive .
Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job daily.
Involved in developing Hive DDLs to create, alter and drop Hive tables and storm, & Kafka.
Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers and Kafka brokers

Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, Storm, Kafka, LINUX, Hortonworks distribution, Bigdata, Java APIs, Java collection, SQL, NoSQL, MongoDB, Cassandra.

Confidential, Malvern, PA

Hadoop Administrator/Developer

Responsibilities:

Responsible for installation, configuration, maintenance, monitoring, performance tuning and troubleshooting Hadoop Clusters in different environments such as Development Cluster, Test Cluster and Production.
Used Job Tracker to assign MapReduce tasks to Task Tracker in cluster of nodes.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop .
Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
Implemented Kerberos security in all environments.
Defined file system layout and data set permissions.
Implemented Capacity Scheduler to share the resources of the cluster for the MapReduc e jobs given by the users.
Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
Involved in loading data from Linux and Unix file system to HDFS.
Involved in emitting processed data from Hadoop to relational databases or external file systems using Sqoop, HDFS GET or Copy to Local .
Involved in Cluster planning and setting up the multimode cluster.
Commissioned and Decommissioned nodes from time to time.
Involved in HDFS maintenance and administering it through HDFS-Java API .
Worked with Hadoop developers and designers in troubleshooting MapReduce job failures and issues.

Environment: Hadoop 1.2.1, MapReduce, HDFS, Pig, Hive, Sqoop, Cloudera Hadoop Distribution, HBase, Windows NT, LINUX, UNIX Shell Scripting.

Confidential

Hadoop Administrator/Developer

Responsibilities:

Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
Imported and exported data into HDFS from Oracle database and vice versa using Sqoop .
Installed and configured Hadoop Cluster for major Hadoop distributions.
Used Hive and Pig as an ETL tool for event joins, filters, transformations and pre-aggregations.
Created partitions, bucketing across state in Hive to handle structured data.
Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data such as removing personal information or merging many small files into a handful of very large, compressed files using Pig pipelines in the data preparation stage.
Involved in moving all log files generated from various sources to HDFS for further processing through Kafka, Flume & and process the files by using Piggybank.
Extensively used PIG to communicate with Hive using HCatalog and HBase using Handlers.
Performed Sqooping for various file transfers through the HBase tables for processing of data to several NoSQL DBs- Cassandra, MangoDB .
Created tables, secondary indices, join indices viewed in Teradata development environment for testing.
Captured data logs from web server into HDFS using Flume & for analysis.
Managed and reviewed Hadoop log files.

Environment: Hive, Pig, MapReduce, Sqoop, Oozie, Flume, Kafka, Storm, HBase, Unix, Linux, Python, SQL, Hadoop 1.x, HDFS, GitHub, Talend, Python Scripting.

Confidential

Java Developer

Responsibilities:

Involved in development of JavaScript code for client-side validations.
Developed the HTML based web pages for displaying the reports.
Developed front-end screens using JSP, HTML, jQuery, JavaScript and CSS .
Performed data validation in Struts from beans and Action Classes.
Developed dynamic content of presentation layer using JSP .
Accessed stored procedures and functions using JDBC Callable statements.
Involved in designing use-case diagrams, class diagrams and interaction using UML model with Rational Rose.
Implemented Hibernate to persist the data into Database and wrote HQL based queries to implement CRUD operations on the data.
Developed coding using SQL, PL/SQL, Queries, Joins, Views, Procedures/Functions, Triggers and Packages.
Developed Web Applications with rich internet applications using Java applets, Silverlight, Java .
Used JDBC for database access.
Played a key role in the high-level design for the implementation of the application.
Designed and established the process and mapping the functional requirement to the workflow process.

Environment: Java, Servlets, Java Beans, JSP, EJB, J2EE, STRUTS, XML, XSLT, JavaScript, HTML, CSS, Spring 3.2, SQL, PL/SQL, MS Visio, Eclipse, JDBC, Windows XP.

We provide IT Staff Augmentation Services!

Spark/scala Developer Resume

Longwood, FL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship