We provide IT Staff Augmentation Services!

Spark Developer Resume

4.00/5 (Submit Your Rating)

NC

SUMMARY:

  • 6 Years of professional experience in IT which includes 4 years of comprehensive experience in Apache Hadoop and Spark Developer, and related technologies.
  • Expertise in writing Hadoop Jobs using Java and Scala language.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce, Spark and Spark Sql
  • Experience of converting Hive/SQL queries into Spark transformations using Spark RDDs.
  • Experience of developing SQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce jobs.
  • Experience in importing and exporting Multi Terabytes of data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice - versa
  • Experienced in working with Hadoop/BigData storage and analytical frameworks over Amazon AWS Cloud using tools like SSH, Putty and MindTerm.
  • Good understanding of Data Mining and Machine Learning techniques like Random Forest, logistic Regression, K-Means.
  • Experience in implementing Custom Partitioners and Combiners for effective data distributions.
  • Experience in writing simple to complex adhoc PIG Scripts and Pig UDFs
  • Having experience in writing simple to complex HIVE adhoc scripts, HIVE UDFs, UDTF and UDAFs
  • Experience in writing shell scripts to dump the Shared data from MySQL, Oracle servers to HDFS.
  • Good Knowledge in creating event processing data pipelines using Kafka and Storm.
  • Good understanding in configuring simple to complex work flows using Oozie.
  • Good understanding of NoSQL databases like MongoDB and Cassandra
  • Proficient in Working with Various IDE tools including Eclipse and VM Ware.
  • Very good experience in customer specification study, requirements gathering, analyzing the requirement, design, development, testing and implementation.
  • Worked on different operating systems like UNIX/Linux, Windows
  • Exceptional ability to quickly master new concepts and capable of working in-group as well as independently with excellent communication skills.

TECHNICAL SKILLS:

Languages and Technologies: Java, Scala, R, C, C++, XML, SQL, Shell Script, PIG Latin,Impala, MapReduce, Hive, Sqoop, Spark, Spark SQL,AWS, Zookeeper, Hbase, Kafka, Oozie, Storm, Flume

Operating Systems: Linux, Windows

Databases: MySQL, MSSQL, MongoDB, Cassandra

Tools: Eclipse, Winscp, Wireshark, JIRA, IBM Tivoli

Scripting Languages: Scala, JavaScript, PHP,Python

Others: HTML, XML, JSON, REST, SOAP

PROFESSIONAL EXPERIENCE:

Confidential

Spark Developer

Responsibilities:

  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data .
  • Prepared Spark build from the source code and ran the PIG Scripts using Spark rather using MR jobs for better performance
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Implemented various machine learning techniques like Random Forest, K-Means, Logistic Regression for predictions and pattern identification using Spark-MLib.
  • Developed Scripts and Batch Job to schedule various Hadoop Program.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Developed Kafka producer and consumers for message handling.
  • Responsible for analyzing multi-platform applications using python.
  • Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system.
  • Used storm for an automatic mechanism to analyze large amounts of non-unique data points with low latency and high throughput.
  • Migrating servers, databases, and applications from on-premise to AWS, Azure and Google Cloud Platform
  • Developed MapReduce jobs in Python for data cleaning and data processing.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.

Environment: CDH4, Scala, Spark, HDFS, AWS, Hive, Pig, Linux, Python, MySQL, MySQL Workbench, Eclipse, PL/SQL, SQL connector.

Confidential, NC

Hadoop Developer

Responsibilities:

  • Worked as a H a doo p developer to analyze large amounts of data to analyze regulatory reports by creating M a pR e du c e j ob s i n Ja va.
  • Exported data using S q oo p into HDFS and H i v e for report analysis.
  • Worked on U se r D efi n e d Fun cti on s in Hive to load the data from HDFS to run aggregation function on multiple rows.
  • Created a MapReduce job to perform l oo k - up s of specific entries using key-value pairs.
  • Developed P i g L ati n scri p t s to load data from output files and put to HDFS.
  • Monitored and managed Hadoop cluster using the Cloudera Manager web interface.
  • Developed and implemented hive custom UDFs involving date functions.
  • Used Oo zi e Workflow engine to run multiple Hive and Pig jobs.
  • Created stored procedures, triggers and functions to operate on report data in MySQL.
  • Implemented POC to migrate map reduce jobs into Spark RDD transformations.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.

Environment: UNIX Scripting, Java, Hadoop, MapReduce, HDFS, Pig, Sqoop, Hive, Oracle, Teradata and Eclipse

Confidential, Durham, NC

Hadoop Developer/ Administrator

Responsibilities:

  • Installed and configured Hadoop HDFS, MapReduce, Pig, Hive, and Sqoop.
  • Wrote Map/Reduce jobs in Java to run over Hadoop clusters.
  • Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Namenode utilizing zookeeper services.
  • Developing PIG scripts to transform the raw data into intelligent data as specified by business users.
  • Worked on Hadoop cluster and data querying tools Hive to store and retrieve data.
  • Reviewing and managing Hadoop log files by consolidating logs from multiple machines using flume.
  • Exported analyzed data to HDFS using Sqoop for generating reports.
  • Importing and exporting data into HDFS and Hive using Sqoop and Flume.
  • Worked on Oozie workflow engine to run multiple Map Reduce jobs.
  • Experienced in working with applications team in installing Hadoop updates, upgrades based on requirement.

Environment: Hadoop, MapReduce, HDFS, Pig, Sqoop, Hive, Oracle, Teradata, Eclipse and Unix Scripting.

Confidential

Java Developer

Responsibilities:

  • Followed Agile software development with Scrum methodology.
  • Designed and developed various modules of the application with OOAD.
  • Implemented JAVA/J2EE design patterns such as Factory, Singleton, DTO, DAO, Session Facade.
  • Utilized J2SE 7 extensively to develop business logic.
  • Implemented dynamic functionality to screens using JQuery and implemented Asynchronous means of retrieval of data using AJAX
  • Responsible for designing and coding of User Interfaces using SpringMVC framework.
  • Implemented Ajax component for dynamic values to get from database and updating forms.
  • Developed the Code both Front-end and Back-end.
  • Developed classes in DAO and service layers
  • Consumed Restful web services using JAX-RS
  • Used SVN for source control repository
  • Used supervised machine learning techniques for developing prediction models and logistic decisions.
  • Used third party library like JFreeChart for data visualization.
  • Used swing components in creating the dashboard.
  • Configured spring with hibernate properties and validations for Dependency Injection.

Environment: Java, JSP, Servlets, Web Sphere Application Server, Eclipse, Java Script, Oracle, PL/SQL and JDBC.

We'd love your feedback!