Hadoop/spark Developer Resume
Long Island, NY
PROFILE SUMMARY:
- Overall 7 years of IT experience and 4 years in Big Data and Hadoop Ecosystem technologies.
- Experience in designing, developing and application pipelines that allow efficient exchange of data between core database engine and the Hadoop ecosystem.
- Hands on experience on Hadoop/Big Data related technologies, experience in storage, processing, Querying and analysis of data.
- Good understanding of Hadoop, Spark architectures and hands on experience with Hadoop components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce, Hue.
- Experience in managing Hadoop cluster using Cloudera’s CDH, Hortonworks HDP distributions
- Experience in developing solutions to analyse large data sets efficiently.
- Hands on experience in loading the data from local system to HDFS using FTP and has strong knowledge on Apache Hue.
- Experience in with ETL and Query big data tools like Pig Latin and Hive QL.
- Experience in manipulating/analysing large datasets and finding patterns, insights with structured and unstructured data.
- Expertise in writing Hadoop jobs for analysing data using Hive and Pig
- Experienced in integration of various data sources like RDBMS, Shell scripting, spreadsheets and Text files.
- Experienced in writing complex MapReduce programs that work with file formats like Text, Sequence, Xml, Parquet and Avro.
- Knowledge on NoSQL data bases including HBase and Cassandra.
- Experience with environments of job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Good experience in implementing analytical procedures like text analytical procedures and processing the in - memory computing capabilities with Spark and Scala.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Experience in creating RDD, Data frames and Datasets for the required data and performed transformations using Spark RDD’s, Spark SQL.
- Good knowledge on creating Data Pipelines in Spark using Scala.
- Experience in developing Spark Programs for Batch and Real-Time Processing.
- Experience in Spark streaming to ingest data from multiple data sources into HDFS.
- Extract Real time feed using Kafka and Spark streaming.
- Consumed XML messages using Kafka and processes the xml file using Spark streaming to capture UI updates.
- Involved in scheduling Oozie workflow engine to run multiple Hive jobs.
- Experienced in designing and using CQL (Cassandra Query Language) to perform CRUD Operations on Cassandra file system.
- Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa.
- Knowledge with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- Capable of using AWS utilities such as EC2, S3.
- Experience in Partitions, bucketing concepts of Hive and designed both Managed and External tables in Hive to optimize performance
- Experience with Agile development, Object Modelling using UML.
- Experience working with various files formats Log files, Avro files, JSON files and XML files.
- Experience in using different columnar file formats like RC, ORC and Parquet formats.
- Good understanding of various compression techniques used in Hadoop processing like G-zip and Snappy.
- Good knowledge on network protocols, TCP/IP configuration and network architecture.
- Supported various reporting teams and experience with data visualization tool Tableau.
TECHNICAL SKILLS:
Operating System: Windows, Linux, MacOS
Hadoop Distribution: Cloudera (CDH3, CDH4 and CDH5), Hortonworks
Programming Languages: Scala, core Java
Data base languages: MySQL, SQL Server, CQL
Big Data/Hadoop Ecosystem: HDFS, MapReduce, Pig, Hive, HBase, Zookeeper, Sqoop, Kafka, Spark, Oozie, Cassandra, Yarn, Flume, NiFi, Scala, FTP and Hue
Cloud Technologies: AWS
Scripting Languages: Shell scripting, HTML
Developments/IDE Tools and BI Tools: .Net Beans, Eclipse, Visual Studio, GIT. Tableau
PROFESSIONAL WORK EXPERIENCE:
Confidential, Long Island, NY
Hadoop/Spark Developer
Responsibilities:
- Analyze the requirement and determine system architecture to achieve the goals.
- Used various Spark Transformations and Actions for processing the input data.
- Work with Spark RDD, Data frame and Data set to process the data and generate the required result set.
- Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context
- Using Spark with Scala for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Developed custom Kafka producers and consumers components for real time data processing.
- Created Spark Streaming task to import live data from Kafka sources and implemented analysis models.
- Developed shell scripts to generate the create statements from the data and load the data into the table.
- Used Spark Streaming APIs for developing data models which gets data from Kafka in near real time and persist it to Cassandra.
- Experienced in using the spark application master to monitor the spark jobs and capture the logs for the spark jobs.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
- Responsible for Spark Core configuration based on type of Input source.
- Worked with NoSQL database Cassandra in creating Cassandra tables to load large sets of semi-structured data coming from various sources.
- Experience in sorting the analyzed results back into the Cassandra cluster.
- Experience in creating both Managed and External tables based on the requirement.
- Implemented Spark using Scala for faster testing and processing of data.
- Worked with NoSQL database Cassandra like creating tables and to load large sets of semi-structured data coming from various sources.
- Developed Scala scripts using both Data frames/SQL/Datasets and RDD/MapReduce in Spark for data aggregation, queries.
- Used Spark Streaming task to import live data from Kafka sources and implementation analysis models.
- Experience in writing sql queries to process the data using the Spark SQL.
- Used Spark-SQL to read the parquet data and create the tables using the Scala API.
- Having experience in Developing Data pipeline using Kafka to store data in HDFS.
- Used various compression techniques, improved the performance and efficiency of the HDFS
- Worked with SCRUM team in delivering agreed user stories on time for every Sprint.
Environment: Spark, Spark Streaming, Kafka, Spark RDD, Spark SQL, Cassandra, Oozie, AWS S3, Parquet, Json, Scala.
Confidential, Chesterfield, MO
Hadoop Developer
Responsibilities:
- Involved in analysing business requirements and prepared detailed specification that follow project guidelines required for project development.
- Responsible for building Scalable distributed data solutions using Hadoop framework.
- Used Spark API over Hortonworks Hadoop to perform analytics on data.
- Involved in importing large sets of Structured, Semi-structured and Unstructured data into Hadoop system.
- Performed necessary transformations and aggregations to build the common learner data model in NoSQL store (HBase).
- Developed Spark Jobs and Hive Jobs to summarize and transform data.
- Developed workflow in Oozie to orchestrate a series of Pig scripts to remove, merge and compress files using pig pipelines in the data preparation stage.
- Good understanding of ETL tools and their application to Big Data environment.
- Involved in transforming data from legacy tables to HDFS using Sqoop and storing the results in HBase.
- Involved in Collecting and Aggregating large amounts of log data and staging data in HDFS for further analysis.
- Design and develop real time data streaming solutions using Apache Spark, Spark SQL and built data pipelines to store large data sets into NoSQL databases like HBase.
- Used Sqoop to import data into Hive from other data systems.
- Worked on creating scatter and gather pattern in NiFi like executing Sqoop scripts through NiFi.
- Implemented Spark applications using Spark SQL which is responsible for creating RDDs and Data Frames of large datasets.
- Create Hive Tables as per requirement with appropriate static and dynamic partitions.
- Installed Oozie workflow engine to run multiple Hive jobs.
- Knowledge on real time data analytics using Spark (Spark Streaming, Spark SQL)
- Involved in writing Spark applications using Scala to perform various operations according to the requirement.
- Writing shell scripts for exporting log files to Hadoop cluster through automated process.
- Implementing MapReduce programs to handle semi/unstructured data like XML, JSON and sequence files for log files.
- Helped in design of Scalable Big Data Clusters & solutions and involved in defect meetings.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Environment: HDFS, Spark, Hive, HBase, MapReduce, Pig, Oozie, NiFi, Scala, Sqoop, HDP.
Confidential, Louisville, KY
Hadoop Developer
Responsibilities:
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Handled importing data from RDBMS and MySQL using Sqoop and performed transformations and loaded to HDFS.
- Load and transform large sets of structured, semi structured and unstructured data.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Ingest real-time and near-real time streaming data into HDFS using Flume.
- Developed MapReduce programs using Pig to parse the raw data and create intermediate data then transform to customer patterns.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs
- Involved in identifying job dependencies to design workflow for Oozie and resource management for YARN.
- Extensive knowledge on PIG scripts using Bags and tuples.
- Capturing data from existing databases that provide SQL interfaces using Sqoop.
- Experienced in handling different types of joins in Hive like Map joins, reducer joins, bucket map join.
- Responsible for creating Technical Specification documents for the generated extracts.
- Exported the analysed data to the relational databases using Sqoop for visualization to generate reports for the BI team.
Environment: Hadoop MapReduce, HDFS, YARN, HDP, Hive, HBase, Java, SQL, Sqoop, Flume, Oozie.
Confidential
Java Developer
Responsibilities:
- Responsible for understanding the requirements and involved in developing the application.
- Implemented server-side programs by using Servlets and JSP.
- Involved in designing development, integration testing of modules, implementation for given requirements.
- Designed and documented the stored procedures and handled the database access by implementing Controller Servlet.
- Implemented the Back-End Business logic using Core Java technologies including collections, Generics, exception Handling, Java Reflection and Java I/O.
- Worked on database interaction layer of insertions, updating and retrieval operations of data from oracle database by writing stored procedures.
- Used Agile methodology for every module in the project for developing the application.
- Worked on Use case diagrams and sequence diagrams using Rational rose for design phase.
- Involved in developing multi-threading for improving CPU time.
- Used multithreading to simultaneously process tables and when a user data is completed in one table.
- Manipulated database data with SQL queries, including setting up stored procedures and triggers.
- Involved in writing Junit Test Cases and Implemented developments such as webpages design, data binding, Single-Page Applications using HTML/CSS, JavaScript, jQuery.
- Presented the process logical and physical flow to various teams using PowerPoint and Visio, Lucid Chart diagrams.
- Thoroughly documented the detailed process flow with UML diagrams and flow charts for distribution across various teams.
Environment: Core Java, JavaScript, HTML, CSS, AJAX, jQuery, Junit, JIRA, Oracle DB, SQL Developer.