We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

3.00/5 (Submit Your Rating)

New York, NY

PROFESSIONAL SUMMARY:

  • Good understanding on Spark Streaming with Kafka for real - time processing.
  • Around 7 years of professional IT experience in all phases of Software Development Life Cycle including hands on experience in Java/J2EE technologies and Big Data Analytics.
  • Experience in analysis, design, development and integration using BigData HadoopTechnology like MapReduce,Hive, Pig, Sqoop, Ozzie, Kafka, HBase, AWS, Cloudera, Horton works, Impala, Avro, Data Processing, Java/J2EE, SQL.
  • Good knowledge on Hadoop Architecture and its components such as HDFS, MapReduce, Job Tracker, Task Tracker, Name Node, Data Node.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like HDFS, Hive, Spark, Scala, Spark-SQL, MapReduce, Pig, Sqoop, Flume, HBase, Zookeeper, and Oozie.
  • Having extensive knowledge on Hadoop technology experience in Storage, writing Queries, processing and analysis of data.
  • Experience in extending Pig and Hive functionalities with custom UDFs for analysis of data, file processing, by running Pig Latin Scripts and using Hive Query Language.
  • Experience working with Amazon AWS cloud which includes services like (EC2, S3A, RDS and EBS), Elastic Beanstalk, Cloud Watch.
  • Worked onDataModelling using various ML (Machine Learning Algorithms) via R and Python.
  • Experienced in transferring data from different data sources into HDFS systems using Kafka.
  • Experience in Configured Hive Meta store with MySQL, which stores teh metadata for Hive tables.
  • Strong experience and knowledge of real time data analytics using Storm, Kafka, Flume and Spark.
  • Strong knowledge in using Flume for Streaming teh Data to HDFS.
  • Good knowledge in using job scheduling and monitoring tools likeOozieandZoo Keeper.
  • Proficient in developing Web based user interfaces using HTML5, CSS3, JavaScript, jQuery, AJAX, XML, JSON, jQuery UI, Bootstrap, AngularJS, Node JS, and Ext JS.
  • Expertise on working with various databases in writing Sql queries, Stored Procedures, functions and Triggers by using PL\SQL and Sql.
  • Experience in NoSQL Column-Oriented Databases like Cassandra, HBase, MongoDB and Filo DB and its Integration withHadoop cluster.
  • Experience in installation, configuring, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH 5.X) distributions and on Amazon web services (AWS).
  • Strong Experience in troubleshooting teh operating system like Linux, RedHat, and UNIX, maintaining teh cluster issues and java related bugs.
  • Experience in Developing Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
  • Good exposure to Service Oriented Architectures (SOA) built on Web services (WSDL) using SOAP protocol.
  • Well experienced in OOPS principles inheritance, encapsulation, polymorphism and Core Java principles collections, multithreading, synchronization, exception handling.
  • Java Experience Created applications in core Java, built application that satisfy use of database and constant connectivity such as a client-server model using JDBC, JSP, Spring and Hibernate.
  • Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSF, Struts, Spring, Hibernate, EJB.
  • Good Knowledge in developing responsive Front End components with JSP, HTML, XHTML, JavaScript, DOM, Servlets, JSF, NodeJS, Ajax, JQuery and AngularJS.

TECHNICAL SKILLS:

Programming Languages: Java, J2EE, C, SQL/PLSQL, PIG LATIN, Scala, HTML, XML

Hadoop: HDFS, MapReduce, HBase, Hive, Pig, Impala, SQOOP, Flume, OOZIE, Spark, Spark QL, and Zookeeper, AWS, Cloudera, Horton works, Kafka, Avro.

Web Technologies: JDBC, JSP, JavaScript, AJAX, SOAP.

Scripting Languages: Java Script, Pig Latin, Python 2.7and Scala.

RDBMS Languages: Oracle, Microsoft SQL Server, MYSQL.

NoSQL: MongoDB, HBase, Apache Cassandra, Filo DB.

SOA: Web Services (SOAP, WSDL)

IDES: My Eclipse, Eclipse, and RAD

Operating System: Linux, Windows, UNIX, CentOS.

Methodologies: Agile, Waterfall model.

Testing Hadoop: MR UNIT Testing, Quality Center, Hive Testing.

Other Tools: SVN, Apache Ant, Junit and Star UML, TOAD, Pl/SQL Developer, JIRA, Visual Source, QC, Agile Methodology

PROFESSIONAL EXPERIENCE:

Confidential, NEW YORK, NY

Sr. Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Experienced in developing Spark scripts for data analysis in both python and scala.
  • Built on premise data pipelines using Kafka and spark for real time data analysis.
  • Created reports in TABLEAU for visualization of teh data sets created and tested native Drill, Impala and Spark connectors.
  • Analysed teh SQL scripts and designed teh solution to implement using Scala.
  • Implemented Hive complex UDF's to execute business logic with Hive Queries.
  • Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
  • Evaluated performance of Spark SQL vs IMPALA vs DRILL on offline data as a part of poc.
  • Worked on solr configuration and customizations based on requirements.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Designed and developed SSIS (ETL) packages to validate, extract, transform and load data from OLTPsystem to teh Data warehouse and Report-Data mart.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Experienced in loading and transforming of large sets of structured, semi structured, and
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Responsible for developing data pipeline by implementing Kafka producers and consumers.
  • Worked on teh ETL scripts and fixed teh issues at teh time ofdataload from variousdatasources.
  • Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
  • Performed data analysis with HBase using Apache Phoenix.
  • Managing and reviewing Hadoop Log files to resolve any configuration issues.
  • Developed a program to extract teh name entities from OCR files.
  • Used GIT for version control.

Environment: MapR, Cloudera, Hadoop, HDFS, AWS, PIG, Hive, Impala, Drill, SparkSql, OCR,MapReduce, Flume, Sqoop, Oozie, Storm, Zepplin, Mesos, Docker, Solr, Kafka, Mapr DB, Spark,Scala, Hbase, ZooKeeper, Shell Scripting, Gerrit, Java, Redis.

Confidential, Nashville, TN

Hadoop/Spark Developer

Responsibilities:

  • Worked with Hadoop Ecosystem components like Cassandra, Sqoop, Flume, Oozie, Hive and Pig.
  • Developed PIG and Hive UDF's in java for extended use of PIG and Hive and wrote Pig Scripts for sorting, joining, filtering and grouping teh data.
  • Developed spark programs using Scala, involved in creating Spark SQL Queries and Developed Oozie workflow for spark jobs.
  • Developed teh Oozie workflows with Sqoop actions to migrate teh data from relational databases like Oracle, Teradata to HDFS.
  • Developed Hive queries to do analysis of teh data and to generate teh end reports to be used by business users.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Developed a data pipeline using Kafka, Cassandra and Hive to ingest, transform and analysing customer behavioural data.
  • Great familiarity with Hive joins & used HQL for querying teh databases eventually leading to complex Hive UDFs.
  • Responsible to migrate iterative map reduce programs into Spark transformations using Spark and Scala.
  • Used Scala to write teh code for all teh use cases in Spark and Spark SQL.
  • Expertise in implementing Spark and Scala application using higher order functions for both batch and interactive analysis requirement. Implemented SPARK batch jobs.
  • Worked with Spark core, Spark Streaming and spark SQL modules of Spark.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Exploring with Spark various modules of Spark and working with Data Frames, RDD and Spark Context.
  • Developed a data pipeline using Spark and Hive to ingest, transform and analysing data.
  • Developed Spark scripts by using Scala shell commands as per teh requirement.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Analysed teh SQL scripts and designed teh solution to implement using Scala.
  • Responsible for developing data pipeline with Amazon AWS to extract teh data from weblogs and store in HDFS.
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EMR and RDS.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.

Environment: Hadoop YARN, Spark Core, Spark Streaming, Spark SQL, Scala, Kafka, Hive, Cassandra, Sqoop, Amazon AWS, Tableau, Oozie, Cloudera, Oracle, Linux.

Confidential, WI

Hadoop/Spark Developer

Responsibilities:

  • Worked on Creating Kafka topics, partitions, writing custom partitioner classes.
  • Experienced in writing Spark Applications in Scala and Python (PySpark).
  • Imported Avro files using ApacheKafka and did some analytics using Sparkin Scala.
  • Extracting real time data using Kafka and Spark streaming by Creating D streams and converting them into RDD, processing it and stored it into Cassandra.
  • Experience in building Real-time Data Pipelines with Kafka Connect and Spark Streaming.
  • Configured, deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Using Spark-Streaming APIs to perform transformations and actions on fly for building teh common learner data model which gets teh data from Kafka in near real time and persists into Cassandra.
  • Developed script which will Load teh data into Spark Data frames and do in memory data computation to generate teh output response.
  • Used Scala sbt to develop Scala coded spark projects and executed using spark-submit.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Developed teh batch scripts to fetch teh data from AWS S3storage and do required transformations in Scala using Spark frame work.
  • Building teh Cassandra nodesusing AWS & amp; setting up teh Cassandra cluster using Ansible automation tools
  • Worked and learned a great deal from Amazon Web Services (AWS) cloud services like EC2, S3, EMR, EBS, RDS and VPC.
  • Developed Scala scripts, UDF’s using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Involved in executing various Oozie workflows and automating parallel Hadoop MapReduce jobs.
  • Developed Oozie Bundles to Schedule Pig, Sqoop and Hive jobs to create data pipelines.
  • Experience in using ORC, Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
  • Developed Hive queries to do analysis of teh data and to generate teh end reports to be used by business users.
  • Used spark and spark-SQL to read teh parquet data and create teh tables in hive using teh Scala API.
  • Design solution for various system components using Microsoft Azure.
  • Written generic extensive data quality check framework to be used by teh application using impala.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
  • Involved in teh process ofCassandra data modelling and building efficient data structures.
  • Understanding of Kerberos authentication in Oozie workflow for Hive and Cassandra.

Environment: Hadoop, Hive, Impala, Oracle, Spark, Python, Pig, Sqoop, Oozie, Map Reduce, GIT, HDFS, Cassandra, Apache Kafka, Storm, Linux, Solr, Confluence, Jenkins.

Confidential

Java Developer

Responsibilities:

  • Developed modules in Java and integrated with MySQL database.
  • Responsible for coding using Java Servlets, Java Beans and XML.
  • Worked with OOPS concepts such as Inheritance, Encapsulation, Abstraction and Polymorphism.
  • Expertise in performing operations such as Collections, Exception Handling and Multithreading.
  • Developed web applications using Spring MVC framework.
  • Involved in Analysis, Design and Development of different phases of Process Flow module.
  • Designed and developed highly customized front end screens using Sencha ExtJs framework library, JavaScript, HTML, CSS as a Rich Internet Application (RIA).
  • Designed Graphical User Interfaces using JSP’s.
  • Worked on various design patterns UML and Enterprise Application Integration.
  • Implemented Action class and Action Forms using struts.
  • Worked on teh design of teh entire end-to end architecture for teh Classification Web Application.
  • Added Dynamic functionality to teh user interface using Java Script.
  • Implementation of components and wireframes using cross-browser compatible JavaScript, JQuery and AJAX.
  • Experience in Programming with SQL, PL/SQL.
  • Used JDBC for administering and managing users and clients.
  • Implemented XSLT transformation for converting XML to HTML.
  • Implemented database tables, middleware designing, client-side web programming and server-side java programming.
  • Followed Scrum Agile methodology for teh iterative development of teh application.
  • Scripting of Test cases base on teh specifications received for teh request.
  • Utilized various Testing methodologies for testing application on various levels like system testing and integration.

Environment: Java, Java Script, JSP, Java Beans, struts, Java Servlets, JQuery, Apache Tomcat, Eclipse, AJAX, Windows, PL/SQL, JDBC, XML, CSS, HTML.

We'd love your feedback!