We provide IT Staff Augmentation Services!

Sr. Hadoop/ Spark Developer Resume

3.00/5 (Submit Your Rating)

Cincinnati, OH

SUMMARY

  • 10+ years of experience as a software professional that starts from requirement gathering, analysis, design, implementation & testing software products using Java/ J2EE technologies and in Big data technologies using Hadoop ecosystem
  • Experience in working with different Hadoop ecosystem components such as HDFS, MapReduce, HBase, Spark, Yarn, Kafka, Zookeeper, PIG, HIVE, Sqoop, Storm, Oozie, and Flume.
  • Good experience in creating data ingestion pipelines, data transformations, data management and data governance, real time steaming engines at an Enterprise level.
  • Expertize in java Application Development, Client/Server Applications using core java, J2EE technology, Web Services, REST Services, Oracle, SQL Server and other relational databases.
  • Involved in creating analytical models that could be used for Recommendations, Risk Modeling, Fraud Detection and Prevention, Sentimental Analysis, Click Stream Analysis etc.,
  • Very good experience in real time data streaming solutions using Apache Spark/ Spark streaming ( Spark SQL, Spark Streaming, MLlib, GraphX), Apache storm, Kafka and flume.
  • Very good knowledge on usage of various big data ingestion techniques using Sqoop, Flume, Kafka, Native HDFS java API, REST API, HttpFS and WebHDFS.
  • Worked on maintaining cluster such as Troubleshooting, managing and reviewing the performance related configuration fine tuning.
  • Experience in working with various Hadoop distributions like Cloudera, Hortonworks and MapR.
  • Good experience in implementing end to end Data Security and Governance within Hadoop platform using Apache Knox, Apache Sentry, Kerberos etc.,
  • Experience with different NoSQL data bases like HBase, Accumulo, Cassandra, MongoDB.
  • Worked with different file formats like AVRO, ORC, Parquet while moving data into and out of HDFS.
  • Experience with Apache Phoenix to access the data stored in HBase.
  • Good experience in Designing, Planning, Administering, Installation, Configuring, Troubleshooting, Performance monitoring and Fine - tuning of Cassandra Cluster.
  • Excellent knowledge on CQL (Cassandra Query Language), for retrieving the data present in Cassandra cluster by running queries in CQL.
  • Worked with Amazon Web Services (AWS) EC2 and S3, EMR, RedShift, Dynamo DB.
  • Experience in software component design technologies like UML Design, Use case and requirement Components diagrams.
  • Experience in Data mining and business Intelligence tools such as Tableau, Qlikview and Microstratergy.
  • Experience in automating tasks with Python Scripting and Shell Scripting.
  • Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse and Data Mart. Well versed with Star-Schema & Snowflake schemas for designing the Data Marts.
  • Developed ETL Scripts for Data acquisition and Transformation using Informatica and Talend.
  • Good experience and understanding of Enterprise Data warehouse (EDW) architecture and possess End to End knowledge of EDW functioning.
  • Experienced in using Agile methodologies including extreme programming, SCRUM and Test-Driven Development (TDD).
  • Strong knowledge of System Testing, User Acceptance testing and software quality assurance best practices and methodologies
  • Experience in building and deploying web applications in multiple applications servers and middleware platforms including Web logic, Web sphere, Apache Tomcat, JBoss.#readytowork

PROFESSIONAL EXPERIENCE

Sr. Hadoop/ Spark Developer

Confidential - Cincinnati, OH

Responsibilities:

  • Developed simple to complex MapReduce jobs using Java language for processing and validating the data.
  • Developed data pipeline using Sqoop, Spark, MapReduce, and Hive to ingest, transform and analyze, customer behavioral data.
  • Implemented Spark using python and Spark SQL for faster processing of data and algorithms for real time analysis in Spark.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Used the Spark - Cassandra Connector to load data to and from Cassandra. Real time streaming the data using Spark with Kafka.
  • Developing Kafka producers and consumers in java and integrating with apache storm and ingesting data into HDFS and HBase by implementing the rules in storm.
  • Develop efficient MapReduce programs in Python to perform batch processes on huge unstructured datasets.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Python.
  • Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, MapReduce and then loading data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • Analyzed the data by performing Hive queries (Hive QL) and running Pig scripts (Pig Latin) to study customer behavior.
  • Created HBase tables and column families to store the user event data and wrote automated HBase test cases for data quality checks using HBase command line tools.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Scheduled and executed workflows in Oozie to run Hive and Pig jobs and created UDF's to store specialized data structures in HBase and Cassandra.
  • Develop NiFi workflow to pick up the multiple retail files from ftp location and move those to HDFS on daily basis.
  • Worked with developer teams on Nifi workflow to pick up the data from rest API server, from data lake as well as from SFTP server and send that to Kafka broker.
  • Evaluated Hortonworks NiFi (HDF 2.0) and recommended solution to inject data from multiple data sources to HDFS & Hive using NiFi and importing data using Nifi tool from Linux servers.
  • Developed product profiles using Pig and commodity UDFs & developed Hive scripts in Hive QL to de-normalize and aggregate the data.
  • Optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Tuned Spark/Python code to improve the performance of machine learning algorithms for data analysis.
  • Performed data validation on the data ingested using MapReduce by building a custom model to filter all the invalid data and cleanse the data.
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process.

Environment: Hadoop, MapReduce, Yarn, Spark, Hive, Pig, Kafka, HBase, Oozie, Sqoop, Python, Bash/Shell Scripting, Flume, HBase, Cassandra, Oracle 11g, Core Java, Storm, HDFS, Unix, Teradata, NiFi, Eclipse.

Sr. Spark and Hadoop Developer

Confidential - Richmond, VA

Responsibilities:

  • Build a framework Spark with Scala and migrated existing PySpark applications to improve the runtime and performance.
  • Built a File Watcher service in java to consume the files and write to HDFS and audit the progress in HBase.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the streamsets pipeline.
  • Experience in Job management using Fair scheduler and Developed job processing scripts using Control-M.
  • Developed BTEQ, FAST Export scripts to load data from HDFS to Teradata.
  • Implement a framework with Teradata connector to import and export data from Teradata to Hadoop.
  • Improved performance to export data from Hadoop to Teradata by using performance tuning techniques.
  • Involved in installation of Tez and improved the query performance.
  • Involved in installation of Apache Phoenix in Dev and Test Environment.
  • Build dashboard to audit the incoming file status using the Apache Phoenix.
  • Implemented Streamsets data quality functionality in Java

Environment: Cloudera Distribution, HDFS, Map Reduce, Hue, Impala, Hive, Spark, Kafka, Sqoop, Pig, Streamsets, HBase, Control-M, Scala, Java, Eclipse, Shell Scripts, Teradata

Spark and Hadoop Developer

Confidential - Bellevue, WA

Responsibilities:

  • Created various Spark applications using Scala to perform various aggregations with enterprise data of the users.
  • Using Spark streaming consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.
  • Developed custom FTP adaptors to pull the compressed files from FTP servers to HDFS directly using HDFS File System API.
  • Worked on the Real time Streaming using Apache STORM with RabbitMQ and process the compressed files and write to HDFS and HBase.
  • Implemented python script to fetch data from AWS S3 and audit the status in HBase and load it to HIVE.
  • Implemented batch processing of jobs using Spark Scala API.
  • Used Spark SQL and Data Frame API extensively to build spark applications.
  • Used Spark SQL for data analysis and given to the data scientists for further analysis.
  • Closely worked with data science team in building Spark MLlib applications to build various predictive models.
  • Implemented Spark using Scala and utilizing Spark Core, Spark Streaming and Spark SQL API for faster processing of data instead of MapReduce in Java.
  • Developed multiple Map Reduce jobs in Java for complex business requirements including data cleansing and preprocessing.
  • Developed Sqoop scripts to import/export data from RDBMS to HDFS and Hive tables and vice versa.
  • Worked on analyzing Hadoop clusters using Big Data Analytic tools including Map Reduce, Pig and Hive.
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive QL queries.
  • Developed Java Code to read from Rest API and write to HDFS as ORC files.
  • Worked in Spark to read the data from Hive and write it to Cassandra using Java.
  • Developed Shell Scripts and Python Programs to automate tasks.
  • Used ETL (SSIS) to develop jobs for extracting, cleaning, transforming and loading data into data warehouse.
  • Involved in building the ETL architecture and Source to Target mapping to load data into Data warehouse.
  • Loaded the final processed data to HBase tables to allow downstream application team to build rich and data
  • Driven applications.
  • Experience in writing Phoenix queries on top of HBase tables to boost query.
  • Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
  • Created partitioned tables and loaded data using both static partition and dynamic partition methods.
  • Involved in cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Implemented MapReduce programs to handle semi/ unstructured data like XML, JSON files or log files.
  • Experience in Job management using Fair scheduler and Developed job processing scripts using Control-M.
  • Worked on Cluster of size 1230 nodes.
  • Worked on processing large amount of data 12TB per day.
  • Good experience with continuous Integration of application using Jenkins.
  • Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
  • Implemented Spark using Scala and utilizing Data Frames and Spark SQL API for faster processing of data.
  • Developed framework to load data from HDFS to Teradata using Spark JDBC API.
  • Developed BTEQ, FAST Export scripts to load data from HDFS to Teradata.
  • Used Sqoop job to import the data from RDBMS using Incremental Import. Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Horton Works Distribution, Hadoop, Hive, Spark, Scala, Kafka, AKKA, Cassandra, Apache Storm, Python, Java, Zookeeper, Map Reduce, Sqoop, HDFS, Oozie, HBase, SQL, Shell Scripting, Teradata, RabbitMQ, Yarn, Mesos

Sr. Hadoop Developer

Confidential - Phoenix, AZ

Responsibilities:

  • Created various Spark applications using Scala to perform various enrichment of these click stream data with enterprise data of the users
  • Developed custom FTP adaptors to pull the clickstream data from FTP servers to HDFS directly using HDFS File System API.
  • Implemented batch processing of jobs using Spark Scala API.
  • Used Spark SQL and Data Frame API extensively to build spark applications.
  • Used spark engine Spark SQL for data analysis and given to the data scientists for further analysis.
  • Closely worked with data science team in building Spark MLlib applications to build various predictive models.
  • Developed multiple Map Reduce jobs in Java for complex business requirements including data cleansing and preprocessing.
  • Migrating existing on-premise applications and services to AWS.
  • Implemented installation and configuration of multi-node cluster on the cloud using Amazon Web Services (AWS) on EC2.
  • Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud S3 and used Elastic Map Reduce (EMR) to run a Map-reduce.
  • Developed Sqoop scripts to import/export data from RDBMS to HDFS and Hive tables and vice versa.
  • Worked on analyzing Hadoop clusters using Big Data Analytic tools including Map Reduce, Pig and Hive.
  • Stored the data in tabular formats using Hive tables and Hive SerDe's.
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive QL queries.
  • Worked on implement Hadoop streaming through Apache Kafka and Spark.
  • Using Spark streaming consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.
  • Involved building and managing NoSQL Database like HBase or Cassandra.
  • Worked in Spark to read the data from Hive and write it to Cassandra using Java.
  • Involved in developing Pig scripts/Pig UDF and to store unstructured data into HDFS.
  • Involved in designing various stages of migrating data from RDBMS to Cassandra.
  • Developed Shell Scripts and Python Programs to automate tasks.
  • Used ETL (SSIS) to develop jobs for extracting, cleaning, transforming and loading data into data warehouse.
  • Involved in building the ETL architecture and Source to Target mapping to load data into Data warehouse.
  • Loaded the final processed data to HBase tables to allow downstream application team to build rich and data driven applications.
  • Experience in writing Phoenix queries on top of HBase tables to boost query.
  • Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
  • Created partitioned tables and loaded data using both static partition and dynamic partition methods.
  • Used Oozie for automating the end to end data pipelines and Oozie coordinators for scheduling the work flows.
  • Involved in cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Analyzed the Hadoop log files using Pig scripts to oversee the errors.
  • Implemented MapReduce programs to handle semi/ unstructured data like XML, JSON files and sequence files for log files.
  • Having daily scrum calls on status of the deliverables with business user/stakeholders, client and drive periodic review meetings.
  • Involved in setting up the QA environment and written unit test cases using MRUnit.

Environment: MapR Distribution, Cassandra 2.1, HDFS, Map Reduce, Hive, Spark, Kafka, Sqoop, Pig, HBase, Oozie, Scala, Java, Eclipse, Shell Scripts, Oracle 10g, Windows, Linux, AWS.

Hadoop Developer

Confidential - Irving, TX

Responsibilities:

  • Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
  • Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA.
  • Loaded the customer profiles data, customer spending data, credit from legacy warehouses onto HDFS using Sqoop.
  • Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
  • Worked in the transition team which primarily worked on migration of Informatica to Hadoop.
  • Built data pipeline using Pig and Java Map Reduce to store onto HDFS.
  • Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
  • Used Pattern matching algorithms to recognize the customer across different sources and built risk profiles for each customer using Hive and stored the results in HBase.
  • Used Apache Phoenix to access the data stored in HBase.
  • Performed unit testing using MRUnit.
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Developed simple to complex Map/reduce Jobs using Hive and Pig.
  • Worked on Real Time/Near Real Time data processing using Flume and Storm.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms like Gzip, SNAPPY, LZO etc.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Implemented Pig scripts integrated them into Oozie workflows and performed integrated testing.
  • Used Sqoop job to import the data from RDBMS using Incremental Import. Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Developed HIVE and Pig queries and provided support for data analysts.
  • Extensively worked on data ingestion between heterogeneous RDBMS systems and HDFS using Sqoop.
  • Responsible for defining the data flow within Hadoop eco system and direct the team in implement them.
  • Exported the result set from Hive to MySQL using Shell scripts.

Environment: Cloudera Distribution, Hadoop, Hive, Zookeeper, Map Reduce, Sqoop, Pig 0.10 and 0.11, JDK1.6, HDFS, Flume, Oozie, Informatica 9.5, DB2, HBase, PL/SQL, SQL, Shell Scripting.

We'd love your feedback!