Hadoop/Spark Developer Resume Long Island, NY - Hire IT People

PROFILE SUMMARY:

Overall 7 years of IT experience and 4 years in Big Data and Hadoop Ecosystem technologies.
Experience in designing, developing and application pipelines that allow efficient exchange of data between core database engine and the Hadoop ecosystem.
Hands on experience on Hadoop/Big Data related technologies, experience in storage, processing, Querying and analysis of data.
Good understanding of Hadoop, Spark architectures and hands on experience with Hadoop components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce, Hue.
Experience in managing Hadoop cluster using Cloudera’s CDH, Hortonworks HDP distributions
Experience in developing solutions to analyse large data sets efficiently.
Hands on experience in loading the data from local system to HDFS using FTP and has strong knowledge on Apache Hue.
Experience in with ETL and Query big data tools like Pig Latin and Hive QL.
Experience in manipulating/analysing large datasets and finding patterns, insights with structured and unstructured data.
Expertise in writing Hadoop jobs for analysing data using Hive and Pig
Experienced in integration of various data sources like RDBMS, Shell scripting, spreadsheets and Text files.
Experienced in writing complex MapReduce programs that work with file formats like Text, Sequence, Xml, Parquet and Avro.
Knowledge on NoSQL data bases including HBase and Cassandra.
Experience with environments of job workflow scheduling and monitoring tools like Oozie and Zookeeper.
Good experience in implementing analytical procedures like text analytical procedures and processing the in - memory computing capabilities with Spark and Scala.
Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
Experience in creating RDD, Data frames and Datasets for the required data and performed transformations using Spark RDD’s, Spark SQL.
Good knowledge on creating Data Pipelines in Spark using Scala.
Experience in developing Spark Programs for Batch and Real-Time Processing.
Experience in Spark streaming to ingest data from multiple data sources into HDFS.
Extract Real time feed using Kafka and Spark streaming.
Consumed XML messages using Kafka and processes the xml file using Spark streaming to capture UI updates.
Involved in scheduling Oozie workflow engine to run multiple Hive jobs.
Experienced in designing and using CQL (Cassandra Query Language) to perform CRUD Operations on Cassandra file system.
Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa.
Knowledge with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
Capable of using AWS utilities such as EC2, S3.
Experience in Partitions, bucketing concepts of Hive and designed both Managed and External tables in Hive to optimize performance
Experience with Agile development, Object Modelling using UML.
Experience working with various files formats Log files, Avro files, JSON files and XML files.
Experience in using different columnar file formats like RC, ORC and Parquet formats.
Good understanding of various compression techniques used in Hadoop processing like G-zip and Snappy.
Good knowledge on network protocols, TCP/IP configuration and network architecture.
Supported various reporting teams and experience with data visualization tool Tableau.

TECHNICAL SKILLS:

Operating System: Windows, Linux, MacOS

Hadoop Distribution: Cloudera (CDH3, CDH4 and CDH5), Hortonworks

Programming Languages: Scala, core Java

Data base languages: MySQL, SQL Server, CQL

Big Data/Hadoop Ecosystem: HDFS, MapReduce, Pig, Hive, HBase, Zookeeper, Sqoop, Kafka, Spark, Oozie, Cassandra, Yarn, Flume, NiFi, Scala, FTP and Hue

Cloud Technologies: AWS

Scripting Languages: Shell scripting, HTML

Developments/IDE Tools and BI Tools: .Net Beans, Eclipse, Visual Studio, GIT. Tableau

PROFESSIONAL WORK EXPERIENCE:

Confidential, Long Island, NY

Hadoop/Spark Developer

Responsibilities:

Analyze the requirement and determine system architecture to achieve the goals.
Used various Spark Transformations and Actions for processing the input data.
Work with Spark RDD, Data frame and Data set to process the data and generate the required result set.
Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context
Using Spark with Scala for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
Developed custom Kafka producers and consumers components for real time data processing.
Created Spark Streaming task to import live data from Kafka sources and implemented analysis models.
Developed shell scripts to generate the create statements from the data and load the data into the table.
Used Spark Streaming APIs for developing data models which gets data from Kafka in near real time and persist it to Cassandra.
Experienced in using the spark application master to monitor the spark jobs and capture the logs for the spark jobs.
Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
Responsible for Spark Core configuration based on type of Input source.
Worked with NoSQL database Cassandra in creating Cassandra tables to load large sets of semi-structured data coming from various sources.
Experience in sorting the analyzed results back into the Cassandra cluster.
Experience in creating both Managed and External tables based on the requirement.
Implemented Spark using Scala for faster testing and processing of data.
Worked with NoSQL database Cassandra like creating tables and to load large sets of semi-structured data coming from various sources.
Developed Scala scripts using both Data frames/SQL/Datasets and RDD/MapReduce in Spark for data aggregation, queries.
Used Spark Streaming task to import live data from Kafka sources and implementation analysis models.
Experience in writing sql queries to process the data using the Spark SQL.
Used Spark-SQL to read the parquet data and create the tables using the Scala API.
Having experience in Developing Data pipeline using Kafka to store data in HDFS.
Used various compression techniques, improved the performance and efficiency of the HDFS
Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: Spark, Spark Streaming, Kafka, Spark RDD, Spark SQL, Cassandra, Oozie, AWS S3, Parquet, Json, Scala.

Confidential, Chesterfield, MO

Hadoop Developer

Responsibilities:

Involved in analysing business requirements and prepared detailed specification that follow project guidelines required for project development.
Responsible for building Scalable distributed data solutions using Hadoop framework.
Used Spark API over Hortonworks Hadoop to perform analytics on data.
Involved in importing large sets of Structured, Semi-structured and Unstructured data into Hadoop system.
Performed necessary transformations and aggregations to build the common learner data model in NoSQL store (HBase).
Developed Spark Jobs and Hive Jobs to summarize and transform data.
Developed workflow in Oozie to orchestrate a series of Pig scripts to remove, merge and compress files using pig pipelines in the data preparation stage.
Good understanding of ETL tools and their application to Big Data environment.
Involved in transforming data from legacy tables to HDFS using Sqoop and storing the results in HBase.
Involved in Collecting and Aggregating large amounts of log data and staging data in HDFS for further analysis.
Design and develop real time data streaming solutions using Apache Spark, Spark SQL and built data pipelines to store large data sets into NoSQL databases like HBase.
Used Sqoop to import data into Hive from other data systems.
Worked on creating scatter and gather pattern in NiFi like executing Sqoop scripts through NiFi.
Implemented Spark applications using Spark SQL which is responsible for creating RDDs and Data Frames of large datasets.
Create Hive Tables as per requirement with appropriate static and dynamic partitions.
Installed Oozie workflow engine to run multiple Hive jobs.
Knowledge on real time data analytics using Spark (Spark Streaming, Spark SQL)
Involved in writing Spark applications using Scala to perform various operations according to the requirement.
Writing shell scripts for exporting log files to Hadoop cluster through automated process.
Implementing MapReduce programs to handle semi/unstructured data like XML, JSON and sequence files for log files.
Helped in design of Scalable Big Data Clusters & solutions and involved in defect meetings.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Environment: HDFS, Spark, Hive, HBase, MapReduce, Pig, Oozie, NiFi, Scala, Sqoop, HDP.

Confidential, Louisville, KY

Hadoop Developer

Responsibilities:

Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Handled importing data from RDBMS and MySQL using Sqoop and performed transformations and loaded to HDFS.
Load and transform large sets of structured, semi structured and unstructured data.
Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Ingest real-time and near-real time streaming data into HDFS using Flume.
Developed MapReduce programs using Pig to parse the raw data and create intermediate data then transform to customer patterns.
Used Hive and created Hive tables and involved in data loading and writing Hive UDFs
Involved in identifying job dependencies to design workflow for Oozie and resource management for YARN.
Extensive knowledge on PIG scripts using Bags and tuples.
Capturing data from existing databases that provide SQL interfaces using Sqoop.
Experienced in handling different types of joins in Hive like Map joins, reducer joins, bucket map join.
Responsible for creating Technical Specification documents for the generated extracts.
Exported the analysed data to the relational databases using Sqoop for visualization to generate reports for the BI team.

Environment: Hadoop MapReduce, HDFS, YARN, HDP, Hive, HBase, Java, SQL, Sqoop, Flume, Oozie.

Confidential

Java Developer

Responsibilities:

Responsible for understanding the requirements and involved in developing the application.
Implemented server-side programs by using Servlets and JSP.
Involved in designing development, integration testing of modules, implementation for given requirements.
Designed and documented the stored procedures and handled the database access by implementing Controller Servlet.
Implemented the Back-End Business logic using Core Java technologies including collections, Generics, exception Handling, Java Reflection and Java I/O.
Worked on database interaction layer of insertions, updating and retrieval operations of data from oracle database by writing stored procedures.
Used Agile methodology for every module in the project for developing the application.
Worked on Use case diagrams and sequence diagrams using Rational rose for design phase.
Involved in developing multi-threading for improving CPU time.
Used multithreading to simultaneously process tables and when a user data is completed in one table.
Manipulated database data with SQL queries, including setting up stored procedures and triggers.
Involved in writing Junit Test Cases and Implemented developments such as webpages design, data binding, Single-Page Applications using HTML/CSS, JavaScript, jQuery.
Presented the process logical and physical flow to various teams using PowerPoint and Visio, Lucid Chart diagrams.
Thoroughly documented the detailed process flow with UML diagrams and flow charts for distribution across various teams.

Environment: Core Java, JavaScript, HTML, CSS, AJAX, jQuery, Junit, JIRA, Oracle DB, SQL Developer.

We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Long Island, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship