We provide IT Staff Augmentation Services!

Spark/scala Developer Resume

2.00/5 (Submit Your Rating)

Longwood, FL

SUMMARY

  • Over 8 years of professional experience involving project development, implementation, deployment and maintenance using Java/J2EE, Hadoop and Spark related technologies using Cloudera, Hortonworks .
  • Hadoop Developer with 5 years of working experience in designing and implementing complete end - to- end Hadoop based data analytical solutions using HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase etc.
  • Experience in installation, configuration, management and deployment of Hadoop Cluster, HDFS, Map Reduce, Pig, Hive, Sqoop, Apache Storm, Flume, Oozie, HBase and Zookeeper.
  • In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce, Spark.
  • Hadoop related eco-systems as a Data Storage and Retrieval systems.
  • Good experience in creating data ingestion pipelines, data transformations, data management, data governance and real time streaming at an enterprise level.
  • Profound experience in creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
  • Good knowledge on Spark Ecosystem and Spark Architecture.
  • Experience developing Pig Latin and HiveQL scripts for Data Analysis and ETL purposes and extended the default functionality by writing User Defined Functions (UDFs), (UDAFs) for custom data specific processing.
  • Good Hands-on full life cycle implementation using CDH (Cloudera) and HDP (Hortonworks Data Platform) distributions.
  • Strong Knowledge on Architecture of Distributed systems and parallel processing, In-depth understanding of MapReduce programing paradigm and Spark execution framework.
  • Profound understanding of Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.

TECHNICAL SKILLS

  • Big Data Technologies: Hadoop 1.x/2.x(Yarn), HDFS, MapReduce, Pig, Hive, HBase, Cassandra, Zookeeper, Oozie, Sqoop, Flume, HCatalog, Apache Spark, Scala, Impala, Kafka, Storm, Tez, Ganglia, Nagios, Splunk, Elastic Search, Kibana
  • Hadoop Distributions: Cloudera, Horton Works, AWS
  • Operating Systems: Windows, Macintosh, Linux, Ubuntu, Unix, CentOS, Redhat.
  • Programming Languages: C, JAVA, J2EE, SQL, PigLatin, HiveQL, Scala, Python, Unix Shell Scripting
  • Java Technologies: JSP, Servlets, Spring, Hibernate, Maven
  • Databases: MS - SQL, Oracle, MS-Access, NoSQL, MySQL
  • Reporting Tools/ETL Tools: Tableau, Informatica, Data stage, Talend, Pentaho, Power View
  • Methodologies: Agile/Scrum, Waterfall, DevOps
  • Development Tools: Eclipse, NetBeans, IntelliJ, Hue, Microsoft Office Suite (Word, Excel, PowerPoint, Access)

PROFESSIONAL EXPERIENCE

Confidential, Longwood, FL

Spark/Scala Developer

Responsibilities:

  • Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS.
  • Worked on the Spark SQL for analyzing the data.
  • Used Scala to write code for all Spark use cases.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Pair RDD'S, YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD'S and Scala.
  • Worked in Spark SQL on different data formats like JSON and Parquet.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in HDFS.
  • Load the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Familiarity with Hadoop open source stack including Yarn, Kafka, Hive
  • I have been experienced with KAFKA to ingest data into Spark engine
  • Worked on streaming pipeline that uses Spark to read data from Kafka, transform it and write it to HDFS.
  • Extensive experience in using the mom with active MQ, apache storm, apache Spark & Kafka maven and zookeeper.
  • Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
  • Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV, JSON.
  • Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre­processing with Pig.
  • Involved in preparing JIL's for AutoSys jobs.
  • Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries .
  • Used Oozie for automating the end to end data pipelines and Oozie coordinators for scheduling the work flows.
  • Implemented daily workflow for extraction, processing and analysis of data with Oozie.
  • Hadoop installation & configuration of multiple nodes on AWS EC2 system
  • Implementing Hadoop with the AWS EC2 system using a few instances in gathering and analyzing data log files.
  • Implemented a Continuous Delivery pipeline with Docker, Jenkins and GitHub, Nexus, Maven and AWS.

Environment: Spark - 1.5.2, Spark SQl, Java 1.8, Hive, HDFS, HQL, YARN, HBase, MapReduce, Sqoop, Flume, Oozie, Kafka, Scala, AWS Oracle 12c.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

  • Worked on live 60 nodes Hadoop Cluster running CDH5.4.4, CHD5.2.0, CDH5.2.1
  • Worked on Hadoop cluster using different Bigdata analytic tools including Kafka, Pig, Hive and Map Reduce .
  • Developed simple to complex MapReduce streaming jobs using Python language that are implemented using Hive and Pig .
  • Implemented data access jobs through Pig, Hive, HBase (0.98.0), Storm (0.91)
  • Involved in loading data from LINUX file system to HDFS
  • Importing and exporting data into HDFS and Hive using Sqoop .
  • Altered existing Scala programs to enhance performance and obtain partitioned results Spark tool.
  • Used SparkSQL for Scala &, Python interface that automatically converts RDD case classes to schema RDD.
  • Used SparkSQL to read and write table which are stored in Hive .
  • Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job daily.
  • Involved in developing Hive DDLs to create, alter and drop Hive tables and storm, & Kafka.
  • Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers and Kafka brokers

Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, Storm, Kafka, LINUX, Hortonworks distribution, Bigdata, Java APIs, Java collection, SQL, NoSQL, MongoDB, Cassandra.

Confidential, Malvern, PA

Hadoop Administrator/Developer

Responsibilities:

  • Responsible for installation, configuration, maintenance, monitoring, performance tuning and troubleshooting Hadoop Clusters in different environments such as Development Cluster, Test Cluster and Production.
  • Used Job Tracker to assign MapReduce tasks to Task Tracker in cluster of nodes.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop .
  • Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
  • Implemented Kerberos security in all environments.
  • Defined file system layout and data set permissions.
  • Implemented Capacity Scheduler to share the resources of the cluster for the MapReduc e jobs given by the users.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Involved in loading data from Linux and Unix file system to HDFS.
  • Involved in emitting processed data from Hadoop to relational databases or external file systems using Sqoop, HDFS GET or Copy to Local .
  • Involved in Cluster planning and setting up the multimode cluster.
  • Commissioned and Decommissioned nodes from time to time.
  • Involved in HDFS maintenance and administering it through HDFS-Java API .
  • Worked with Hadoop developers and designers in troubleshooting MapReduce job failures and issues.

Environment: Hadoop 1.2.1, MapReduce, HDFS, Pig, Hive, Sqoop, Cloudera Hadoop Distribution, HBase, Windows NT, LINUX, UNIX Shell Scripting.

Confidential

Hadoop Administrator/Developer

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Imported and exported data into HDFS from Oracle database and vice versa using Sqoop .
  • Installed and configured Hadoop Cluster for major Hadoop distributions.
  • Used Hive and Pig as an ETL tool for event joins, filters, transformations and pre-aggregations.
  • Created partitions, bucketing across state in Hive to handle structured data.
  • Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data such as removing personal information or merging many small files into a handful of very large, compressed files using Pig pipelines in the data preparation stage.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Kafka, Flume & and process the files by using Piggybank.
  • Extensively used PIG to communicate with Hive using HCatalog and HBase using Handlers.
  • Performed Sqooping for various file transfers through the HBase tables for processing of data to several NoSQL DBs- Cassandra, MangoDB .
  • Created tables, secondary indices, join indices viewed in Teradata development environment for testing.
  • Captured data logs from web server into HDFS using Flume & for analysis.
  • Managed and reviewed Hadoop log files.

Environment: Hive, Pig, MapReduce, Sqoop, Oozie, Flume, Kafka, Storm, HBase, Unix, Linux, Python, SQL, Hadoop 1.x, HDFS, GitHub, Talend, Python Scripting.

Confidential

Java Developer

Responsibilities:

  • Involved in development of JavaScript code for client-side validations.
  • Developed the HTML based web pages for displaying the reports.
  • Developed front-end screens using JSP, HTML, jQuery, JavaScript and CSS .
  • Performed data validation in Struts from beans and Action Classes.
  • Developed dynamic content of presentation layer using JSP .
  • Accessed stored procedures and functions using JDBC Callable statements.
  • Involved in designing use-case diagrams, class diagrams and interaction using UML model with Rational Rose.
  • Implemented Hibernate to persist the data into Database and wrote HQL based queries to implement CRUD operations on the data.
  • Developed coding using SQL, PL/SQL, Queries, Joins, Views, Procedures/Functions, Triggers and Packages.
  • Developed Web Applications with rich internet applications using Java applets, Silverlight, Java .
  • Used JDBC for database access.
  • Played a key role in the high-level design for the implementation of the application.
  • Designed and established the process and mapping the functional requirement to the workflow process.

Environment: Java, Servlets, Java Beans, JSP, EJB, J2EE, STRUTS, XML, XSLT, JavaScript, HTML, CSS, Spring 3.2, SQL, PL/SQL, MS Visio, Eclipse, JDBC, Windows XP.

We'd love your feedback!