Spark/Hadoop Developer Resume Durham NC - Hire IT People

SUMMARY

Around 8 years of IT experience which includes experience as Hadoop/Spark developer using Big data technologies like Hadoop Ecosystem, Spark Ecosystem and experience in application development using J2EE.
Experience in working on various Hadoop data access components like MAPREDUCE, PIG, HIVE, HBASE, SPARK and KAFKA.
Experience on handling Hive queries using Spark SQL that integrates with Spark environment
Having good knowledge on Hadoop data management components like HDFS and YARN.
Hands on experience in using various Hadoop workflow compononets like SQOOP, FLUME and KAFKA.
Worked on Hadoop data operation components like ZOOKEEPER and OOZIE.
Working knowledge on AWS technologies like S3 and EMR for storage, big data processsing and analysis.
Good understanding of Hadoop security components like RANGER and KNOX.
Good experience working with Hadoop distributions such as HORTONWORKS and CLOUDERA.
Excellent programming skills at higher level of abstraction using SCALA and JAVA.
Experience in Java programming with skills in analysis, design, testing and deploying with various technologies like J2EE, JavaScript, JSP, JDBC, HTML, XML and JUNIT.
Having good knowledge on Apache Spark components including SPARK CORE, SPARK SQL, SPARK STREAMING and SPARK MLLIB.
Experience in performing transformations and actions on Spark RDDS using Spark Core.
Experience in using Broadcast variables, Accumulator variables and RDD caching in Spark.
Experience in troubleshooting Cluster jobs using Spark UI
Experience working with Cloudera Distribution Hadoop(CDH) and Hortonworks data platform(HDP).
Expert in Hadoop and Big data ecosystem including Hive, HDFS, Spark, Kafka, MapReduce, Sqoop, Oozie and Zookeeper
Good Knowledge on Hadoop Cluster architecture and monitoring the cluster
Hands - on experience in distributed systems technologies, infrastructure administration, monitoring configuration
Expertise in data transformation & analysis using Spark, Hive
Knowledge of writing Hive Queries to generate reports using Hive Query Language
Hands on experience with the Spark SQL for complex data transformations using Scala programming language.
Developed Spark code using Python/Scala and Spark-SQL for faster testing and processing of data
Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts
Extensive experience in data ingestion technologies like Flume, Kafka, Sqoop and NiFi
Utilize Flume, Kafka and NiFi to gain real-time and near real-time streaming data in HDFS from different data sources
Good in analyzing data using HiveQL and custom MapReduce program in Java
Good Knowledge in working with AWS (Amazon Web Services) cloud platform
Good knowledge in Unix shell commands
Experience in analyzing Log files for Hadoop and eco system services and finding root cause and setting up and managing the batch scheduler on Oozie
Thorough knowledge of Release management, CI/CD process using Jenkins and Configuration management using Visual Studio Online
Experience in extracting the data from RDBMS in to HDFS using Sqoop Ingestion, collecting the logs from log collector into HDFS using Flume
Used Project Management services like JIRA for handling service requests and tracking issues.
Good experience with Software methodologies like Agile and Waterfall.
Experienced working with Zookeeper to provide coordination services to the cluster
Skilled in Tableau 9 for data visualization, Reporting and Analysis
Extensively involved through the Software Development Life Cycle (SDLC) from initial planning through implementation of the projects by using Agile and waterfall methodologies
Good team player with ability to solve problems, organize and prioritize multiple tasks.

TECHNICAL SKILLS

Data Access Tools: HDFS, YARN, Hive, Pig, HBase, Solr, Impala, Spark Core, Spark SQL, Spark Streaming

Data Management: HDFS, YARN

Data Workflow: Sqoop, Flume, Kafka

Data Operation: Zookeper, Oozie

Data Security: Ranger, Knox

BigData Distributions: Hortonworks, Cloudera

Cloud Technologies: AWS (Amazon Web Services) EC2, S3, IAM, CLOUD WATCH, DynamoDB, SNS, SQS, EMR, KINESIS

Programming & Languages: Java, Scala, Pig Latin, HQL, SQL, Shell Scripting, HTML, CSS, JavaScript

IDE/Build Tools: Eclipse, Intellij

Java/J2EE Technologies: XML, Junit, JDBC, AJAX, JSON, JSP

Operating Systems: Linux, Windows, Kali Linux

SDLC: Agile/SCRUM, Waterfall

PROFESSIONAL EXPERIENCE

Confidential, Durham NC

Spark/Hadoop Developer

Responsibilities:

Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data
Converted existing MapReduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL APIs.
Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing customer behavioral data.
Worked on Big Data infrastructure for batch processing as well as real-time processing. Responsible for building scalable distributed data solutions using Hadoop.
Developed real time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka.
Developed Spark jobs and Hive Jobs to summarize and transform data
Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
Experienced in developing Spark scripts for data analysis in scala.
Used Spark-Streaming APIs to perform necessary transformations.
Involved in converting Hive/SQL queries into Spark transformations using Spark SQL and Scala.
Worked with spark to consume data from kafka and convert that to common format using scala.
Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and Spark.
Converted existing MapReduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL APIs.
Wrote new spark jobs in Scala to analyze the data of the customers and sales history.
Involved in requirement analysis, design, coding and implementation phases of the project.
Used Spark API over Hadoop YARN to perform analytics on data in Hive.
Experience in both SQLContext and SparkSession.
Developed Scala based Spark applications for performing data cleansing, data aggregation, de-normalization and data preparation needed for machine learning and reporting teams to consume.
Worked on troubleshooting spark application to make them more error tolerant.
Involved in HDFS maintenance and loading of structured and unstructured data and imported data from mainframe dataset to HDFS using Sqoop and written the PySpark Script to process the HDFS data.
Used Spark API over Hadoop YARN to perform analytics on data in Hive.
Extensively worked on the core and Spark SQL modules of Spark.
Involved in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
Created partitioned tables and loaded data using both static partition and dynamic partition method.
Implemented POC’s on migrating to Spark-Streaming to process the live data.
Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
Ingested data from RDBMS and performed data transformations, and then export the transformed data to HDFS as per the business requirement..
Used Impala to read, write and query the data in HDFS.
Worked on troubleshooting spark application to make them more error tolerant.
Stored the output files for export onto HDFS and later these files are picked up by downstream systems.
Load the data into Spark RDD and do in memory data Computation to generate the Output response.

Environment: Hadoop 2.x, Spark Core, Spark SQL,Spark API Spark Streaming, Scala,Pyspark, Hive,Pig, kafka,Oozie, Amazon EMR, Tableau, Impala, RDBMS,Hive,HDFS,YARN, JIRA, MapReduce.

Confidential, New York, NYC

Spark/Hadoop Developer

Responsibilities:

Responsible to collect, clean, and store data for analysis using Kafka, Sqoop, Spark, HDFS
Used Kafka and Spark framework for real time and batch data processing
Ingested large amount of data from different data sources into HDFS using Kafka
Implemented Spark using Scala and performed cleansing of data by applying Transformations and Actions
Used Case Class in Scala to convert RDD’s into DataFrames in Spark
Processed and Analyzed data in stored in HBase and HDFS
Developed Spark jobs using Scala on top of Yarn for interactive and Batch Analysis.
Developed Unix shell scripts to load large number of files into HDFS from Linux File System.
Experience in querying data using Spark SQL for faster processing of the data sets.
Offloaded data from EDW into Hadoop Cluster using Sqoop.
Developed Sqoop scripts for importing and exporting data into HDFS and Hive
Created Hive internal and external Tables by Partitioning, bucketing for further Analysis using Hive
Used Oozie workflow to automate and schedule jobs
Used Zookeeper for maintaining and monitoring clusters
Exported the data into RDBMS using Sqoop for BI team to perform visualization and to generate reports
Continuously monitored and managed the Hadoop Cluster using Cloudera Manager
Used JIRA for project tracking and participated in daily scrum meetings

Environment: Spark, Sqoop, Scala, Hive, Kafka, YARN, Teradata, RDBMS, HDFS, Oozie, Zookeeper, HBase, Tableau, Hadoop (Cloudera), JIRA

Confidential, New York, NY

Hadoop Developer

Responsibilities:

Actively participated in interaction with users to fully understand the requirements of the system
Experience with the Hadoop ecosystem and NoSQL database
Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS
Imported data from RDBMS (MySQL, Teradata) to HDFS and vice versa using Sqoop (Big Data ETL tool) for Business Intelligence, visualization and report generation
Working with Kafka to get near real-time data onto big data cluster and required data into Spark for analysis
Used Spark streaming to receive near real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL database such as Cassandra and HDFS
Involved in Analyzing data by writing queries using HiveQL for faster data processing
Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets
Optimized queries in Hive to increase performance and query execution time
Involved in writing Flume and Hive scripts to extract, transform and load the data into Database
Created tables in DataStax Cassandra and loaded large sets of data for processing
Worked on Oozie workflows, coordinators to run multiple Hive jobs
Used Git for version control, JIRA for project tracking and Jenkins for continuous integration
Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review session.

Environment: HDFS, Kafka, Sqoop, Scala, java, Hive, Oozie, NoSQL, Oracle, MySQL, GIT, Zookeeper, DataStax Cassandra, Agile methodology, JIRA, Hortonworks data platform, Jenkins, AGILE(SCRUM).

We provide IT Staff Augmentation Services!

Spark/hadoop Developer Resume

Durham, NC

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship