We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

5.00/5 (Submit Your Rating)

Long Island, NY

PROFILE SUMMARY:

  • Overall 7 years of IT experience and 4 years in Big Data and Hadoop Ecosystem technologies.
  • Experience in designing, developing and application pipelines that allow efficient exchange of data between core database engine and the Hadoop ecosystem.
  • Hands on experience on Hadoop/Big Data related technologies, experience in storage, processing, Querying and analysis of data.
  • Good understanding of Hadoop, Spark architectures and hands on experience with Hadoop components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce, Hue.
  • Experience in managing Hadoop cluster using Cloudera’s CDH, Hortonworks HDP distributions
  • Experience in developing solutions to analyse large data sets efficiently.
  • Hands on experience in loading the data from local system to HDFS using FTP and has strong knowledge on Apache Hue.
  • Experience in with ETL and Query big data tools like Pig Latin and Hive QL.
  • Experience in manipulating/analysing large datasets and finding patterns, insights with structured and unstructured data.
  • Expertise in writing Hadoop jobs for analysing data using Hive and Pig
  • Experienced in integration of various data sources like RDBMS, Shell scripting, spreadsheets and Text files.
  • Experienced in writing complex MapReduce programs that work with file formats like Text, Sequence, Xml, Parquet and Avro.
  • Knowledge on NoSQL data bases including HBase and Cassandra.
  • Experience with environments of job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Good experience in implementing analytical procedures like text analytical procedures and processing the in - memory computing capabilities with Spark and Scala.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Experience in creating RDD, Data frames and Datasets for the required data and performed transformations using Spark RDD’s, Spark SQL.
  • Good knowledge on creating Data Pipelines in Spark using Scala.
  • Experience in developing Spark Programs for Batch and Real-Time Processing.
  • Experience in Spark streaming to ingest data from multiple data sources into HDFS.
  • Extract Real time feed using Kafka and Spark streaming.
  • Consumed XML messages using Kafka and processes the xml file using Spark streaming to capture UI updates.
  • Involved in scheduling Oozie workflow engine to run multiple Hive jobs.
  • Experienced in designing and using CQL (Cassandra Query Language) to perform CRUD Operations on Cassandra file system.
  • Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa.
  • Knowledge with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
  • Capable of using AWS utilities such as EC2, S3.
  • Experience in Partitions, bucketing concepts of Hive and designed both Managed and External tables in Hive to optimize performance
  • Experience with Agile development, Object Modelling using UML.
  • Experience working with various files formats Log files, Avro files, JSON files and XML files.
  • Experience in using different columnar file formats like RC, ORC and Parquet formats.
  • Good understanding of various compression techniques used in Hadoop processing like G-zip and Snappy.
  • Good knowledge on network protocols, TCP/IP configuration and network architecture.
  • Supported various reporting teams and experience with data visualization tool Tableau.

TECHNICAL SKILLS:

Operating System: Windows, Linux, MacOS

Hadoop Distribution: Cloudera (CDH3, CDH4 and CDH5), Hortonworks

Programming Languages: Scala, core Java

Data base languages: MySQL, SQL Server, CQL

Big Data/Hadoop Ecosystem: HDFS, MapReduce, Pig, Hive, HBase, Zookeeper, Sqoop, Kafka, Spark, Oozie, Cassandra, Yarn, Flume, NiFi, Scala, FTP and Hue

Cloud Technologies: AWS

Scripting Languages: Shell scripting, HTML

Developments/IDE Tools and BI Tools: .Net Beans, Eclipse, Visual Studio, GIT. Tableau

PROFESSIONAL WORK EXPERIENCE:

Confidential, Long Island, NY

Hadoop/Spark Developer

Responsibilities:

  • Analyze the requirement and determine system architecture to achieve the goals.
  • Used various Spark Transformations and Actions for processing the input data.
  • Work with Spark RDD, Data frame and Data set to process the data and generate the required result set.
  • Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context
  • Using Spark with Scala for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Developed custom Kafka producers and consumers components for real time data processing.
  • Created Spark Streaming task to import live data from Kafka sources and implemented analysis models.
  • Developed shell scripts to generate the create statements from the data and load the data into the table.
  • Used Spark Streaming APIs for developing data models which gets data from Kafka in near real time and persist it to Cassandra.
  • Experienced in using the spark application master to monitor the spark jobs and capture the logs for the spark jobs.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Responsible for Spark Core configuration based on type of Input source.
  • Worked with NoSQL database Cassandra in creating Cassandra tables to load large sets of semi-structured data coming from various sources.
  • Experience in sorting the analyzed results back into the Cassandra cluster.
  • Experience in creating both Managed and External tables based on the requirement.
  • Implemented Spark using Scala for faster testing and processing of data.
  • Worked with NoSQL database Cassandra like creating tables and to load large sets of semi-structured data coming from various sources.
  • Developed Scala scripts using both Data frames/SQL/Datasets and RDD/MapReduce in Spark for data aggregation, queries.
  • Used Spark Streaming task to import live data from Kafka sources and implementation analysis models.
  • Experience in writing sql queries to process the data using the Spark SQL.
  • Used Spark-SQL to read the parquet data and create the tables using the Scala API.
  • Having experience in Developing Data pipeline using Kafka to store data in HDFS.
  • Used various compression techniques, improved the performance and efficiency of the HDFS
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: Spark, Spark Streaming, Kafka, Spark RDD, Spark SQL, Cassandra, Oozie, AWS S3, Parquet, Json, Scala.

Confidential, Chesterfield, MO

Hadoop Developer

Responsibilities:

  • Involved in analysing business requirements and prepared detailed specification that follow project guidelines required for project development.
  • Responsible for building Scalable distributed data solutions using Hadoop framework.
  • Used Spark API over Hortonworks Hadoop to perform analytics on data.
  • Involved in importing large sets of Structured, Semi-structured and Unstructured data into Hadoop system.
  • Performed necessary transformations and aggregations to build the common learner data model in NoSQL store (HBase).
  • Developed Spark Jobs and Hive Jobs to summarize and transform data.
  • Developed workflow in Oozie to orchestrate a series of Pig scripts to remove, merge and compress files using pig pipelines in the data preparation stage.
  • Good understanding of ETL tools and their application to Big Data environment.
  • Involved in transforming data from legacy tables to HDFS using Sqoop and storing the results in HBase.
  • Involved in Collecting and Aggregating large amounts of log data and staging data in HDFS for further analysis.
  • Design and develop real time data streaming solutions using Apache Spark, Spark SQL and built data pipelines to store large data sets into NoSQL databases like HBase.
  • Used Sqoop to import data into Hive from other data systems.
  • Worked on creating scatter and gather pattern in NiFi like executing Sqoop scripts through NiFi.
  • Implemented Spark applications using Spark SQL which is responsible for creating RDDs and Data Frames of large datasets.
  • Create Hive Tables as per requirement with appropriate static and dynamic partitions.
  • Installed Oozie workflow engine to run multiple Hive jobs.
  • Knowledge on real time data analytics using Spark (Spark Streaming, Spark SQL)
  • Involved in writing Spark applications using Scala to perform various operations according to the requirement.
  • Writing shell scripts for exporting log files to Hadoop cluster through automated process.
  • Implementing MapReduce programs to handle semi/unstructured data like XML, JSON and sequence files for log files.
  • Helped in design of Scalable Big Data Clusters & solutions and involved in defect meetings.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Environment: HDFS, Spark, Hive, HBase, MapReduce, Pig, Oozie, NiFi, Scala, Sqoop, HDP.

Confidential, Louisville, KY

Hadoop Developer

Responsibilities:

  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Handled importing data from RDBMS and MySQL using Sqoop and performed transformations and loaded to HDFS.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Ingest real-time and near-real time streaming data into HDFS using Flume.
  • Developed MapReduce programs using Pig to parse the raw data and create intermediate data then transform to customer patterns.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs
  • Involved in identifying job dependencies to design workflow for Oozie and resource management for YARN.
  • Extensive knowledge on PIG scripts using Bags and tuples.
  • Capturing data from existing databases that provide SQL interfaces using Sqoop.
  • Experienced in handling different types of joins in Hive like Map joins, reducer joins, bucket map join.
  • Responsible for creating Technical Specification documents for the generated extracts.
  • Exported the analysed data to the relational databases using Sqoop for visualization to generate reports for the BI team.

Environment: Hadoop MapReduce, HDFS, YARN, HDP, Hive, HBase, Java, SQL, Sqoop, Flume, Oozie.

Confidential

Java Developer

Responsibilities:

  • Responsible for understanding the requirements and involved in developing the application.
  • Implemented server-side programs by using Servlets and JSP.
  • Involved in designing development, integration testing of modules, implementation for given requirements.
  • Designed and documented the stored procedures and handled the database access by implementing Controller Servlet.
  • Implemented the Back-End Business logic using Core Java technologies including collections, Generics, exception Handling, Java Reflection and Java I/O.
  • Worked on database interaction layer of insertions, updating and retrieval operations of data from oracle database by writing stored procedures.
  • Used Agile methodology for every module in the project for developing the application.
  • Worked on Use case diagrams and sequence diagrams using Rational rose for design phase.
  • Involved in developing multi-threading for improving CPU time.
  • Used multithreading to simultaneously process tables and when a user data is completed in one table.
  • Manipulated database data with SQL queries, including setting up stored procedures and triggers.
  • Involved in writing Junit Test Cases and Implemented developments such as webpages design, data binding, Single-Page Applications using HTML/CSS, JavaScript, jQuery.
  • Presented the process logical and physical flow to various teams using PowerPoint and Visio, Lucid Chart diagrams.
  • Thoroughly documented the detailed process flow with UML diagrams and flow charts for distribution across various teams.

Environment: Core Java, JavaScript, HTML, CSS, AJAX, jQuery, Junit, JIRA, Oracle DB, SQL Developer.

We'd love your feedback!