Sr. Hadoop Developer Resume
New York, NY
PROFESSIONAL SUMMARY:
- Good understanding on Spark Streaming with Kafka for real - time processing.
- Around 7 years of professional IT experience in all phases of Software Development Life Cycle including hands on experience in Java/J2EE technologies and Big Data Analytics.
- Experience in analysis, design, development and integration using BigData HadoopTechnology like MapReduce,Hive, Pig, Sqoop, Ozzie, Kafka, HBase, AWS, Cloudera, Horton works, Impala, Avro, Data Processing, Java/J2EE, SQL.
- Good knowledge on Hadoop Architecture and its components such as HDFS, MapReduce, Job Tracker, Task Tracker, Name Node, Data Node.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like HDFS, Hive, Spark, Scala, Spark-SQL, MapReduce, Pig, Sqoop, Flume, HBase, Zookeeper, and Oozie.
- Having extensive knowledge on Hadoop technology experience in Storage, writing Queries, processing and analysis of data.
- Experience in extending Pig and Hive functionalities with custom UDFs for analysis of data, file processing, by running Pig Latin Scripts and using Hive Query Language.
- Experience working with Amazon AWS cloud which includes services like (EC2, S3A, RDS and EBS), Elastic Beanstalk, Cloud Watch.
- Worked onDataModelling using various ML (Machine Learning Algorithms) via R and Python.
- Experienced in transferring data from different data sources into HDFS systems using Kafka.
- Experience in Configured Hive Meta store with MySQL, which stores teh metadata for Hive tables.
- Strong experience and knowledge of real time data analytics using Storm, Kafka, Flume and Spark.
- Strong knowledge in using Flume for Streaming teh Data to HDFS.
- Good knowledge in using job scheduling and monitoring tools likeOozieandZoo Keeper.
- Proficient in developing Web based user interfaces using HTML5, CSS3, JavaScript, jQuery, AJAX, XML, JSON, jQuery UI, Bootstrap, AngularJS, Node JS, and Ext JS.
- Expertise on working with various databases in writing Sql queries, Stored Procedures, functions and Triggers by using PL\SQL and Sql.
- Experience in NoSQL Column-Oriented Databases like Cassandra, HBase, MongoDB and Filo DB and its Integration withHadoop cluster.
- Experience in installation, configuring, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH 5.X) distributions and on Amazon web services (AWS).
- Strong Experience in troubleshooting teh operating system like Linux, RedHat, and UNIX, maintaining teh cluster issues and java related bugs.
- Experience in Developing Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
- Good exposure to Service Oriented Architectures (SOA) built on Web services (WSDL) using SOAP protocol.
- Well experienced in OOPS principles inheritance, encapsulation, polymorphism and Core Java principles collections, multithreading, synchronization, exception handling.
- Java Experience Created applications in core Java, built application that satisfy use of database and constant connectivity such as a client-server model using JDBC, JSP, Spring and Hibernate.
- Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSF, Struts, Spring, Hibernate, EJB.
- Good Knowledge in developing responsive Front End components with JSP, HTML, XHTML, JavaScript, DOM, Servlets, JSF, NodeJS, Ajax, JQuery and AngularJS.
TECHNICAL SKILLS:
Programming Languages: Java, J2EE, C, SQL/PLSQL, PIG LATIN, Scala, HTML, XML
Hadoop: HDFS, MapReduce, HBase, Hive, Pig, Impala, SQOOP, Flume, OOZIE, Spark, Spark QL, and Zookeeper, AWS, Cloudera, Horton works, Kafka, Avro.
Web Technologies: JDBC, JSP, JavaScript, AJAX, SOAP.
Scripting Languages: Java Script, Pig Latin, Python 2.7and Scala.
RDBMS Languages: Oracle, Microsoft SQL Server, MYSQL.
NoSQL: MongoDB, HBase, Apache Cassandra, Filo DB.
SOA: Web Services (SOAP, WSDL)
IDES: My Eclipse, Eclipse, and RAD
Operating System: Linux, Windows, UNIX, CentOS.
Methodologies: Agile, Waterfall model.
Testing Hadoop: MR UNIT Testing, Quality Center, Hive Testing.
Other Tools: SVN, Apache Ant, Junit and Star UML, TOAD, Pl/SQL Developer, JIRA, Visual Source, QC, Agile Methodology
PROFESSIONAL EXPERIENCE:
Confidential, NEW YORK, NY
Sr. Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Experienced in developing Spark scripts for data analysis in both python and scala.
- Built on premise data pipelines using Kafka and spark for real time data analysis.
- Created reports in TABLEAU for visualization of teh data sets created and tested native Drill, Impala and Spark connectors.
- Analysed teh SQL scripts and designed teh solution to implement using Scala.
- Implemented Hive complex UDF's to execute business logic with Hive Queries.
- Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
- Evaluated performance of Spark SQL vs IMPALA vs DRILL on offline data as a part of poc.
- Worked on solr configuration and customizations based on requirements.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Designed and developed SSIS (ETL) packages to validate, extract, transform and load data from OLTPsystem to teh Data warehouse and Report-Data mart.
- Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
- Experienced in loading and transforming of large sets of structured, semi structured, and
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Responsible for developing data pipeline by implementing Kafka producers and consumers.
- Worked on teh ETL scripts and fixed teh issues at teh time ofdataload from variousdatasources.
- Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
- Performed data analysis with HBase using Apache Phoenix.
- Managing and reviewing Hadoop Log files to resolve any configuration issues.
- Developed a program to extract teh name entities from OCR files.
- Used GIT for version control.
Environment: MapR, Cloudera, Hadoop, HDFS, AWS, PIG, Hive, Impala, Drill, SparkSql, OCR,MapReduce, Flume, Sqoop, Oozie, Storm, Zepplin, Mesos, Docker, Solr, Kafka, Mapr DB, Spark,Scala, Hbase, ZooKeeper, Shell Scripting, Gerrit, Java, Redis.
Confidential, Nashville, TN
Hadoop/Spark Developer
Responsibilities:
- Worked with Hadoop Ecosystem components like Cassandra, Sqoop, Flume, Oozie, Hive and Pig.
- Developed PIG and Hive UDF's in java for extended use of PIG and Hive and wrote Pig Scripts for sorting, joining, filtering and grouping teh data.
- Developed spark programs using Scala, involved in creating Spark SQL Queries and Developed Oozie workflow for spark jobs.
- Developed teh Oozie workflows with Sqoop actions to migrate teh data from relational databases like Oracle, Teradata to HDFS.
- Developed Hive queries to do analysis of teh data and to generate teh end reports to be used by business users.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Developed a data pipeline using Kafka, Cassandra and Hive to ingest, transform and analysing customer behavioural data.
- Great familiarity with Hive joins & used HQL for querying teh databases eventually leading to complex Hive UDFs.
- Responsible to migrate iterative map reduce programs into Spark transformations using Spark and Scala.
- Used Scala to write teh code for all teh use cases in Spark and Spark SQL.
- Expertise in implementing Spark and Scala application using higher order functions for both batch and interactive analysis requirement. Implemented SPARK batch jobs.
- Worked with Spark core, Spark Streaming and spark SQL modules of Spark.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Exploring with Spark various modules of Spark and working with Data Frames, RDD and Spark Context.
- Developed a data pipeline using Spark and Hive to ingest, transform and analysing data.
- Developed Spark scripts by using Scala shell commands as per teh requirement.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Analysed teh SQL scripts and designed teh solution to implement using Scala.
- Responsible for developing data pipeline with Amazon AWS to extract teh data from weblogs and store in HDFS.
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Worked and learned a great deal from AWS Cloud services like EC2, S3, EMR and RDS.
- Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
Environment: Hadoop YARN, Spark Core, Spark Streaming, Spark SQL, Scala, Kafka, Hive, Cassandra, Sqoop, Amazon AWS, Tableau, Oozie, Cloudera, Oracle, Linux.
Confidential, WI
Hadoop/Spark Developer
Responsibilities:
- Worked on Creating Kafka topics, partitions, writing custom partitioner classes.
- Experienced in writing Spark Applications in Scala and Python (PySpark).
- Imported Avro files using ApacheKafka and did some analytics using Sparkin Scala.
- Extracting real time data using Kafka and Spark streaming by Creating D streams and converting them into RDD, processing it and stored it into Cassandra.
- Experience in building Real-time Data Pipelines with Kafka Connect and Spark Streaming.
- Configured, deployed and maintained multi-node Dev and Test Kafka Clusters.
- Using Spark-Streaming APIs to perform transformations and actions on fly for building teh common learner data model which gets teh data from Kafka in near real time and persists into Cassandra.
- Developed script which will Load teh data into Spark Data frames and do in memory data computation to generate teh output response.
- Used Scala sbt to develop Scala coded spark projects and executed using spark-submit.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Developed teh batch scripts to fetch teh data from AWS S3storage and do required transformations in Scala using Spark frame work.
- Building teh Cassandra nodesusing AWS & amp; setting up teh Cassandra cluster using Ansible automation tools
- Worked and learned a great deal from Amazon Web Services (AWS) cloud services like EC2, S3, EMR, EBS, RDS and VPC.
- Developed Scala scripts, UDF’s using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Involved in executing various Oozie workflows and automating parallel Hadoop MapReduce jobs.
- Developed Oozie Bundles to Schedule Pig, Sqoop and Hive jobs to create data pipelines.
- Experience in using ORC, Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
- Developed Hive queries to do analysis of teh data and to generate teh end reports to be used by business users.
- Used spark and spark-SQL to read teh parquet data and create teh tables in hive using teh Scala API.
- Design solution for various system components using Microsoft Azure.
- Written generic extensive data quality check framework to be used by teh application using impala.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
- Involved in teh process ofCassandra data modelling and building efficient data structures.
- Understanding of Kerberos authentication in Oozie workflow for Hive and Cassandra.
Environment: Hadoop, Hive, Impala, Oracle, Spark, Python, Pig, Sqoop, Oozie, Map Reduce, GIT, HDFS, Cassandra, Apache Kafka, Storm, Linux, Solr, Confluence, Jenkins.
Confidential
Java Developer
Responsibilities:
- Developed modules in Java and integrated with MySQL database.
- Responsible for coding using Java Servlets, Java Beans and XML.
- Worked with OOPS concepts such as Inheritance, Encapsulation, Abstraction and Polymorphism.
- Expertise in performing operations such as Collections, Exception Handling and Multithreading.
- Developed web applications using Spring MVC framework.
- Involved in Analysis, Design and Development of different phases of Process Flow module.
- Designed and developed highly customized front end screens using Sencha ExtJs framework library, JavaScript, HTML, CSS as a Rich Internet Application (RIA).
- Designed Graphical User Interfaces using JSP’s.
- Worked on various design patterns UML and Enterprise Application Integration.
- Implemented Action class and Action Forms using struts.
- Worked on teh design of teh entire end-to end architecture for teh Classification Web Application.
- Added Dynamic functionality to teh user interface using Java Script.
- Implementation of components and wireframes using cross-browser compatible JavaScript, JQuery and AJAX.
- Experience in Programming with SQL, PL/SQL.
- Used JDBC for administering and managing users and clients.
- Implemented XSLT transformation for converting XML to HTML.
- Implemented database tables, middleware designing, client-side web programming and server-side java programming.
- Followed Scrum Agile methodology for teh iterative development of teh application.
- Scripting of Test cases base on teh specifications received for teh request.
- Utilized various Testing methodologies for testing application on various levels like system testing and integration.
Environment: Java, Java Script, JSP, Java Beans, struts, Java Servlets, JQuery, Apache Tomcat, Eclipse, AJAX, Windows, PL/SQL, JDBC, XML, CSS, HTML.
