We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

5.00/5 (Submit Your Rating)

Collegeville, PA

SUMMARY:

  • Total 6+ years of experience in IT Industry in Analysis, Design, Development of software applications in Hadoop Big Data, Spark and Application Packaging.
  • 4 years of exclusive experience in Hadoop and its components like HDFS, Map Reduce, Apache Pig, Hive, Sqoop, HBase, Oozie,Scala, Spark, Flume and Kafka.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
  • Experience in working with Batch processing of big data as well as streaming data.
  • Experience in working with Spark RDD's, Dataframe, Datasets,Spark Streaming API’s.
  • Experience in analysing data using Spark, Hive, Pig and Map Reduce programs.
  • Experience in working with Flume, Kafka to ingest real-time and near-real-time streaming data directly into HDFS.
  • Experience in Processing streaming data and visualizing the results.
  • Experience in working with various Hadoop file formats.
  • Experience in working with various compression techniques.
  • Involved in writing the Pig scripts to reduce the job execution time when compared to Map Reduce programs.
  • Experience in modifying tuning Hadoop Eco system as per requirements.
  • Knowledge on NOSQL Databases like Cassandra, Mongodb and HBase
  • In depth understanding of Application Packaging tools such as Wise package studio, Adminstudio Installshield, VB Scripting and Batch scripting.
  • Strong SQL knowledge.
  • Analysing various reports and putting efforts to identify the root cause and also find the related solution to close the current defects.
  • Very good capabilities in understanding the Business Process Functionalities, leading to gain excellent domain knowledge.
  • Ability & highly keen to learn, grasp and deliver both individually & also as a proactive team member.
  • Excellent communication, interpersonal,analytical skills, and strong ability to perform as part of team.
  • Exceptional ability to learn new concepts and capable of working in-team as well as independently with excellent communication skills.

TECHNICAL SKILLS:

Hadoop Ecosystem: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Kafka, Hbase, Oozie, Avro, Zookeeper, Apache Impala,Scala, Spark and Spark Streaming.

Programming Languages: Java, SQL, PL/SQL, Unix Shell Scripting, and Perl

Databases: Oracle 8i/9i/10g, Microsoft SQL Server, MySQL

Tools: TOAD, SQL Developer, SoapUI, Maven.

Operating Systems: Windows, UNIX, LINUX

PROFESSIONAL EXPERIENCE:

Confidential, Collegeville, PA

Hadoop/Spark Developer

  • Collaborated with business owners, subject matter experts to get input requirements.
  • Analysing input data perform data cleansing and building automated scripts to transform and store data in HDFS.
  • Used SQOOP to transfer relational data from Relational Database Systems to HDFS and vice-versa
  • Building data pipelines to collect streaming data using Flume and Kafka to collect and store the data on the distributed storage layer (HDFS).
  • Implemented spark jobs using scala and spark SQL which are generally faster than map reduce jobs
  • Imported data in various file types into spark RDDs/Data frames.
  • Participated in apache Spark POCS for analysing the data based on several business factors
  • Performed transformations and actions on data using Spark RDDs, Dataframes and Datasets.
  • Used spark SQL capability to process the structured data over RDD’s.
  • Used spark to implement and compare the results which were earlier performed in Map reduce using PIG/Hive.
  • Well versed knowledge on Spark Streaming API’s.
  • Good knowledge on using interactive notebooks like Jupyter/Zeppelin.

Environment: Cloudera, Hadoop, HDFS, YARN, Hive, Sqoop, Flume, Kafka, Linux, Java, Oozie, Spark 1.6, spark 2.3, Scala, SQL, AWS, S3,Pig,Spark Streaming.

Confidential, Cuyahoga Falls, OH

Hadoop/Spark Developer

Responsibilities:

  • Hands on experience in Spark and Spark Streaming creating RDD & applying operations transformations and Actions.
  • Developed Spark applications using Scala for easy Hadoop transitions.
  • Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Developed Spark code using Scala and Spark-SQL for faster processing and testing.
  • Implemented Spark sample programs in python using pyspark.
  • Analyzed the SQL scripts and designed the solution to implement using pyspark.
  • Developed pyspark code to mimic the transformations performed in the on-premise environment.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
  • Responsible for loading Data pipelines from web servers and Teradata using Sqoop with Kafka and Spark Streaming API.
  • Developed Kafka producer and consumers, Cassandra clients and Spark along with components on HDFS, Hive.
  • Populated HDFS and HBase with huge amounts of data using Apache Kafka.
  • Used Kafka to ingest data into Spark engine.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
  • Experienced with different scripting language like Python and shell scripts.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
  • Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
  • Designed and implemented Incremental Imports into Hive tables and writing Hive queries to run on TEZ.
  • Experienced data pipelines using Kafka and Akka for handling large terabytes of data.
  • Written shell scripts that run multiple Hive jobs which helps to automate different Hive tables incrementally which are used to generate different reports using Tableau for the Business use.
  • Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
  • Developed Solr web apps to query and visualize and Solr indexed data from HDFS.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Worked on Spark SQL, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
  • Using Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python into Pig Latin and HQL (HiveQL).
  • Extensively worked on Text, ORC, Avro and Parquet file formats and compression techniques like Snappy, Gzip and Zlib.
  • Implemented Hortonworks NiFi (HDP 2.4) and recommended solution to inject data from multiple data sources to HDFS and Hive using NiFi.
  • Developed various data loading strategies and performed various transformations for analyzing the datasets by using Hortonworks Distribution for Hadoop ecosystem.
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement and used Cassandra through Java services.
  • Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoop cluster.
  • Build servers using AWS, importing volumes, launching EC2, RDS, creating security groups, auto-scaling, load balancers (ELBs) in the defined virtual private connection and open stack to provision new machines for clients.
  • Implemented AWS solutions using EC2, S3, RDS, ECS, EBS, Elastic Load Balancer, and Auto scaling groups, Optimized volumes and EC2 instances.
  • Creating S3 buckets and managing policies for S3 buckets and utilized S3 bucket and Glacier for storage and backup AWS.
  • Performed AWS Cloud administration managing EC2 instances, S3, SES and SNS services.
  • Wrote ETL jobs to read from web APIs using REST and HTTP calls and loaded into HDFS using java and Talend.
  • Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
  • ORM framework with spring framework for data persistence and transaction management.

Environment: Hadoop, Hive, Map reduce, Sqoop, Kafka, Spark, Yarn, Pig, Cassandra, Oozie, shell Scripting, Scala, Maven, Java, JUnit, agile methodologies, NIFI, MySQL, Tableau, AWS, EC2, S3, Hortonworks, power BI, Solr.

Confidential

Hadoop Developer.

  • Used SQOOP to import existing relational database to Hadoop environment (HDFS).
  • Used Sqoop jobs to import data in incremental manner to periodically update data into HDFS and avoid duplicate data.
  • Used apache flume to collect Log data from source servers to sink in HDFS.
  • All the forms of data accumulated on top Hadoop Ecosystem is further processed using map reduce or pig jobs.
  • Pig jobs are used whenever required to decrease the latency of map reduce jobs.
  • The transformed data is used for analysis and information obtained from data analysis is further used to create various patterns.
  • Worked with Hive Partitioning, bucketing and perform different types of joins in Hive.
  • As part of Application team worked in GENPACT project and was involved in creating silent installer packages and closely worked with SCCM team.

Environment: Hadoop, HDFS, YARN, Hive, Pig, Sqoop, Flume, Kafka, Linux, Java, Oozie, SQL

Confidential

Hadoop Developer

  • 1+ years as Hadoop Developer and SPOC for Application Packaging team.
  • As a part of Hadoop Team worked in Target project used SQOOP to import existing relational database to Hadoop environment (HDFS).
  • Used Sqoop jobs to periodically import data in incremental manner into HDFS.
  • All the forms of data accumulated on top Hadoop Ecosystem is further processed using map reduce or pig jobs.
  • As part of Application team worked in creating silent installer packages and closely worked with SCCM team and SOC teams.
  • Analysed various reports published by SOC and SCCM teams which helped to identify the root cause of patch failures, which reduced patch defects.

Environment: Hadoop, HDFS, YARN, Hive, Pig, Sqoop, Flume, Kafka, Linux, Java, Oozie,SQL, AdminStudio,Installshield, SCCM,VBScript, PowerShell, MSI/Install Script, Wise Scripting, Batch Scripting, and wise package studio

Confidential

Application Packaging Engineer

  • Worked as Application Packaging engineer.
  • Worked in creating silent installer packages using application packaging tools such as wise package studio, AdminStudio,Installshield.
  • Worked on various scripting tools such as VBScript, PowerShell, MSI/Install Script, Wise Scripting and Batch Scripting.
  • Create silent installation packages and testing them using SCCM before performing user acceptance testing to ensure everything is working after the deployment.

Environment: SQL, AdminStudio, Installshield, SCCM, VBScript, PowerShell, MSI/Install Script, Wise Scripting, Batch Scripting, and wise package studio.

We'd love your feedback!