We provide IT Staff Augmentation Services!

Spark/hadoop Developer Resume

Durham, NC

SUMMARY:

  • Around 8 years of IT experience which includes 5+ years of experience as Hadoop/Spark developer using Big data technologies like Hadoop Ecosystem, Spark Ecosystem and 2 years of experience in application development using J2EE.
  • Experience in working on various Hadoop data access components like MAPREDUCE, PIG, HIVE, HBASE, SPARK and KAFKA.
  • Experience on handling Hive queries using Spark SQL that integrates with Spark envsironment
  • Having good knowledge on Hadoop data management components like HDFS and YARN.
  • Hands on experience in using various Hadoop workflow compononets like SQOOP, FLUME and KAFKA.
  • Worked on Hadoop data operation components like ZOOKEEPER and OOZIE.
  • Working knowledge on AWS technologies like S3 and EMR for storage, big data processsing and analysis.
  • Good understanding of Hadoop security components like RANGER and KNOX.
  • Good experience working with Hadoop distributions such as HORTONWORKS and CLOUDERA.
  • Excellent programming skills at higher level of abstraction using SCALA and JAVA.
  • Experience in Java programming with skills in analysis, design, testing and deploying with various technologies like J2EE, JavaScript, JSP, JDBC, HTML, XML and JUNIT.
  • Having good knowledge on Apache Spark components including SPARK CORE, SPARK SQL, SPARK STREAMING and SPARK MLLIB.
  • Experience in performing transformations and actions on Spark RDDS using Spark Core.
  • Experience in using Broadcast variables, Accumulator variables and RDD caching in Spark.
  • Experience in troubleshooting Cluster jobs using Spark UI
  • Experience working with Cloudera Distribution Hadoop(CDH) and Hortonworks data platform(HDP).
  • Expert in Hadoop and Big data ecosystem including Hive, HDFS, Spark, Kafka, MapReduce, Sqoop, Oozie and Zookeeper
  • Good Knowledge on Hadoop Cluster architecture and monitoring the cluster
  • Hands - on experience in distributed systems technologies, infrastructure administration, monitoring configuration
  • Expertise in data transformation & analysis using Spark, Hive
  • Knowledge of writing Hive Queries to generate reports using Hive Query Language
  • Hands on experience with the Spark SQL for complex data transformations using Scala programming language.
  • Developed Spark code using Python/Scala and Spark-SQL for faster testing and processing of data
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts
  • Extensive experience in data ingestion technologies like Flume, Kafka, Sqoop and NiFi
  • Utilize Flume, Kafka and NiFi to gain real-time and near real-time streaming data in HDFS from different data sources
  • Good in analyzing data using HiveQL and custom MapReduce program in Java
  • Good Knowledge in working with AWS (Amazon Web Services) cloud platform
  • Good knowledge in Unix shell commands
  • Experience in analyzing Log files for Hadoop and eco system services and finding root cause and setting up and managing the batch scheduler on Oozie
  • Thorough knowledge of Release management, CI/CD process using Jenkins and Configuration management using Visual Studio Online
  • Experience in extracting the data from RDBMS in to HDFS using Sqoop Ingestion, collecting the logs from log collector into HDFS using Flume
  • Used Project Management services like JIRA for handling service requests and tracking issues.
  • Good experience with Software methodologies like Agile and Waterfall.
  • Experienced working with Zookeeper to provide coordination services to the cluster
  • Skilled in Tableau 9 for data visualization, Reporting and Analysis
  • Extensively involved through the Software Development Life Cycle (SDLC) from initial planning through implementation of the projects by using Agile and waterfall methodologies
  • Good team player with ability to solve problems, organize and prioritize multiple tasks.

TECHNICAL SKILLS:

Data Access Tools: HDFS, YARN, Hive, Pig, HBase, Solr, Impala, Spark Core, Spark SQL, Spark Streaming

Data Management: HDFS, YARN

Data Workflow: Sqoop, Flume, Kafka

Data Operation: Zookeper, Oozie

Data Security: Ranger, Knox

BigData Distributions: Hortonworks, Cloudera

Cloud Technologies: AWS (Amazon Web Services) EC2, S3, IAM, CLOUD WATCH, DynamoDB, SNS, SQS, EMR, KINESIS

Programming & Languages: Java, Scala, Pig Latin, HQL, SQL, Shell Scripting, HTML, CSS, JavaScript

IDE/Build Tools: Eclipse, Intellij

Java/J2EE Technologies: XML, Junit, JDBC, AJAX, JSON, JSP

Operating Systems: Linux, Windows, Kali Linux

SDLC: Agile/SCRUM, Waterfall

PROFESSIONAL EXPERIENCE:

Confidential, Durham, NC

Spark/Hadoop Developer

Responsibilities:

  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data
  • Converted existing MapReduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL APIs.
  • Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing customer behavioral data.
  • Worked on Big Data infrastructure for batch processing as well as real-time processing. Responsible for building scalable distributed data solutions using Hadoop.
  • Developed real time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka.
  • Developed Spark jobs and Hive Jobs to summarize and transform data
  • Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
  • Experienced in developing Spark scripts for data analysis in scala.
  • Used Spark-Streaming APIs to perform necessary transformations.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark SQL and Scala.
  • Worked with spark to consume data from kafka and convert that to common format using scala.
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and Spark.
  • Converted existing MapReduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL APIs.
  • Wrote new spark jobs in Scala to analyze the data of the customers and sales history.
  • Involved in requirement analysis, design, coding and implementation phases of the project.
  • Used Spark API over Hadoop YARN to perform analytics on data in Hive.
  • Experience in both SQLContext and SparkSession.
  • Developed Scala based Spark applications for performing data cleansing, data aggregation, de-normalization and data preparation needed for machine learning and reporting teams to consume.
  • Worked on troubleshooting spark application to make them more error tolerant.
  • Involved in HDFS maintenance and loading of structured and unstructured data and imported data from mainframe dataset to HDFS using Sqoop and written the PySpark Script to process the HDFS data.
  • Used Spark API over Hadoop YARN to perform analytics on data in Hive.
  • Extensively worked on the core and Spark SQL modules of Spark.
  • Involved in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
  • Created partitioned tables and loaded data using both static partition and dynamic partition method.
  • Implemented POC’s on migrating to Spark-Streaming to process the live data.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data to HDFS as per the business requirement..
  • Used Impala to read, write and query the data in HDFS.
  • Worked on troubleshooting spark application to make them more error tolerant.
  • Stored the output files for export onto HDFS and later these files are picked up by downstream systems.
  • Load the data into Spark RDD and do in memory data Computation to generate the Output response.

Environment: Hadoop 2.x, Spark Core, Spark SQL,Spark API Spark Streaming, Scala,Pyspark, Hive,Pig, kafka,Oozie, Amazon EMR, Tableau, Impala, RDBMS,Hive,HDFS,YARN, JIRA, MapReduce.

Confidential, New York, NYC

Spark/Hadoop Developer

Responsibilities:

  • Responsible to collect, clean, and store data for analysis using Kafka, Sqoop, Spark, HDFS
  • Used Kafka and Spark framework for real time and batch data processing
  • Ingested large amount of data from different data sources into HDFS using Kafka
  • Implemented Spark using Scala and performed cleansing of data by applying Transformations and Actions
  • Used Case Class in Scala to convert RDD’s into DataFrames in Spark
  • Processed and Analyzed data in stored in HBase and HDFS
  • Developed Spark jobs using Scala on top of Yarn for interactive and Batch Analysis .
  • Developed Unix shell scripts to load large number of files into HDFS from Linux File System.
  • Experience in querying data using Spark SQL for faster processing of the data sets.
  • Offloaded data from EDW into Hadoop Cluster using Sqoop .
  • Developed Sqoop scripts for importing and exporting data into HDFS and Hive
  • Created Hive internal and external Tables by Partitioning, bucketing for further Analysis using Hive
  • Used Oozie workflow to automate and schedule jobs
  • Used Zookeeper for maintaining and monitoring clusters
  • Exported the data into RDBMS using Sqoop for BI team to perform visualization and to generate reports
  • Continuously monitored and managed the Hadoop Cluster using Cloudera Manager
  • Used JIRA for project tracking and participated in daily scrum meetings

Environment: Spark, Sqoop, Scala, Hive, Kafka, YARN, Teradata, RDBMS, HDFS, Oozie, Zookeeper, HBase, Tableau, Hadoop (Cloudera), JIRA

Confidential, New York, NY

Hadoop Developer

Responsibilities:

  • Actively participated in interaction with users to fully understand the requirements of the system
  • Experience with the Hadoop ecosystem and NoSQL database
  • Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS
  • Imported data from RDBMS (MySQL, Teradata) to HDFS and vice versa using Sqoop (Big Data ETL tool) for Business Intelligence, visualization and report generation
  • Working with Kafka to get near real-time data onto big data cluster and required data into Spark for analysis
  • Used Spark streaming to receive near real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL database such as Cassandra and HDFS
  • Involved in Analyzing data by writing queries using HiveQL for faster data processing
  • Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets
  • Optimized queries in Hive to increase performance and query execution time
  • Involved in writing Flume and Hive scripts to extract, transform and load the data into Database
  • Created tables in DataStax Cassandra and loaded large sets of data for processing
  • Worked on Oozie workflows, coordinators to run multiple Hive jobs
  • Used Git for version control, JIRA for project tracking and Jenkins for continuous integration
  • Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review session.

Environment: HDFS, Kafka, Sqoop, Scala, java, Hive, Oozie, NoSQL, Oracle, MySQL, GIT, Zookeeper, DataStax Cassandra, Agile methodology, JIRA, Hortonworks data platform, Jenkins, AGILE(SCRUM).

Confidential

Hadoop Developer

Responsibilities:

  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest behavioural data into HDFS for analysis.
  • Responsible for importing log files from various sources into HDFS using Flume .
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Created customized BI tool for manager team that perform Query analytics using Hive QL .
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins .
  • Estimated the hardware requirements for NameNode and DataNodes & planning the cluster.
  • Developed framework to import the data from database to HDFS using Sqoop . Developed HQLs to extract data from Hive tables for reporting.
  • Hands on experience in writing MR jobs for cleansing the data and to copy it to AWS cluster form our cluster
  • Used open source web scraping framework for python to crawl and extract data from web pages.
  • Possess strong skills in application programming and system programming using C++ and Python on Windows and LINUX platforms using principles of Object Oriented Programming (OOPS) and Design Patterns
  • Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
  • Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
  • Worked with Kafka for the proof of concept for carrying out log processing on a distributed system. Worked with NoSQL database Hbase to create tables and store data.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop .
  • Involved in Agile SDLC during the development of project.
  • Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
  • Experienced in Monitoring Cluster using Cloudera manager

Environment: Hadoop, HDFS, HBase, MapReduce, Java, C++, Python, LINUX, AWS, Hive, Pig, Sqoop, Flume, Kafka, Oozie, Hue, Storm, Zookeeper,SQL, ETL, Python, Cassandra, Cloudera Manager, MySQL, MongoDB, Agile.

Confidential

Software Developer

Responsibilities:

  • Responsible for the analysis, documenting the requirements and architecting the application based on J2EE standards.
  • Attended Scrum meetings daily as a part of Agile Methodology.
  • Involved in complete Software Development Life Cycle (SDLC) with Object Oriented Approach of client's business process and continuous client feedback.
  • Implementing MVC Architecture using Spring Framework, customized user interfaces. Used Core Java, and Spring Aspect Oriented programming concepts for logging, security, error handling mechanism
  • Developed application modules using Spring MVC, Spring Annotations, Spring Beans, Dependency Injection, with database interfaceusing Hibernate.
  • Used the Java Collections API extensively in the application as security protection for XML, SOAP, REST and JSON to make a secure Web Deployment.
  • Developed server-side services using Java, Spring, Web Services (SOAP, Restful, WSDL, JAXB, JAX-RPC)
  • Built Web pages that are more user-interactive using JQuery plugins for Drag and Drop, AutoComplete, AJAX, JSON, Angular JS, JavaScript and Bootstrap.
  • Used XSL to transform XML data structure into HTML pages.
  • Used Struts as the framework in this project and developed struts action classes, form beans.
  • Created dispatch Action classes, and Validation plug-in using Struts framework.
  • DB2 was used as the database and wrote queries to extract data from the database.
  • Developed SQL queries and stored procedures.
  • Designed Developed white-box test cases using JUnit, Git, JMeter, Mockito Framework.

Environment: Core Java, Agile,Scrum, XML, HTML, Jmeter, SOAP, REST, JDK, JSP, Servlets, JDBC, HTML, CSS, JUnit, SQL, MySQL, Windows, Oracle, Eclipse

Hire Now