We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

4.00/5 (Submit Your Rating)

New Hartford, NY

PROFESSIONAL SUMMARY:

  • Having 8+ years of professional experience this includes Analysis, Design, Development, Integration
  • Deployment and Maintenance of quality software applications using Java/J2EE Technologies and Hadoop technologies.
  • Hands on experience in using various Hadoop distributions (Apache, Horton works, Cloud era, MapR).
  • Experience in working with Amazon EMR, Cloudera (CDH3, CDH4 & CDH5 ) and Horton Works Hadoop Distributions.
  • Expertise in Hadoop Ecosystem tools which including HDFS, Yarn, MapReduce, Pig, Hive, Sqoop, Flume, Kafka, Spark, Zookeeper and Oozie.
  • Good knowledge in EMR (Elastic Map Reducing) to perform big data operations in AWS.
  • Knowledge in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
  • Excellent understanding of Spark and its benefits in Big Data Analytics.
  • Hands on experience in Stream processing frameworks such as Storm, Spark Streaming .
  • Experience in design and develop the POC in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Hand - on experience in using Scala, Spark Streaming, batch processing for processing the Streaming data and batch data.
  • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Experience in data analysis using HiveQL, Pig Latin and custom Map Reduce programs in Java.
  • Hands-on experience on fetching the live stream data from DB2 to HBase table using Spark Streaming and Apache Kafka.
  • Experienced working with Hadoop Big Data technologies (hdfs and MapReduce programs), Hadoop ecosystems (HBase, Hive, pig) and NoSQL database MongoDB.
  • Experience in queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL.
  • Scraped and analyzed data using Machine Learning algorithms in Python and SQL.
  • Experience using Spark DataStax and Cassandra Connector load data to and from Cassandra.
  • Experience on usage of NoSQL in writing applications like HBase, Cassandra and MongoDB.
  • Extensive Experience on importing and exporting data using Flume and Kafka.
  • Experience in configuring the Zookeeper to coordinate servers in clusters and to maintain data consistency.
  • Expertise in loading the data from the different data sources like (Teradata and DB2) into HDFS using Sqoop and load into partitioned Hive tables.
  • Experience in developing data pipeline by using Kafka to store the data into HDFS.
  • Experience in migrating data by using SQOOP from HDFS to Relational Database System and vice-versa according to client's requirements.
  • Used Cassandra CQL with Java API’s to retrieve the data from Cassandra tables.
  • Worked with NIFI for managing flow of data from source to HDFS.
  • Good Experience on source control repositories like CVS, GIT and SVN .
  • Experience in working different scripting technologies like Python, UNIX shell scripts.
  • Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
  • Experience working with Spring and Hibernates frameworks in JAVA.
  • Experience in developing web page interfaces using HTML, JSP and Java Swings scripting languages.
  • Used Spring Core Annotations for Dependency Injection Spring DI and Spring MVC for REST API’s and Spring Boot for micro-services.
  • Good understanding and working experience on Cloud based architectures.
  • Experience in handling various file formats like AVRO, Parquet, Sequential etc.
  • Expertise implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
  • Expertise in Oracle ORMB and Stored procedures concepts.
  • Good understanding and experience with Software Development methodologies like Agile and Waterfall and performed Testing such as Unit, Regression, White-box and Black-box.
  • Ability to work onsite and offshore team members.

TECHNICAL SKILLS:

Big Data Technologies: HDFS, MapReduce, YARN, Pig, Hive, Sqoop, Kafka, Flume, HBase, Cassandra, MongoDB, Spark, Solr, Ambari, Hue, Avro, Mahout, Impala, Oozie, Nifi and Zookeeper

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache

Database: MySQL, Oracle 10g/11g, PL/SQL, MS SQL Server 2012

No-SQL Database: HBase, Cassandra and MongoDB

Programming Languages: C, C++, Java, JavaScript, Python, Scala

Frameworks: Struts, Spring, Hibernate, Spring Boot, Micro-services

Operating System: Windows 7/8/10, Vista, Ubuntu, Linux, UNIX, Mac OS

Cloud Platforms: AWS Cloud, Google Cloud

Application Servers: Web Logic, Web Sphere, Tomcat

Architecture: Client-Server Architecture, Relational DBMS, OLAP, OLTP

Testing: Selenium Web Driver, Junit

Modelling Tools: Visual paradigm for UML, Rational Rose, Star UML

ETL Tools: Talend, Informatica, Tableau

IDE Tools: NetBeans, Eclipse, Intellij, Visual Studio Code

Built Tools: Maven, Jenkins

Development Methodologies: Waterfall, Agile/Scrum

PROFESSIONAL EXPERIENCE:

Confidential, New Hartford, NY

Sr. Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Experienced in loading and transforming of large sets of structured, semi structured and unstructured data.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
  • Experienced in developing Spark scripts for data analysis in both python and Scala.
  • Built on premise data pipelines using Kafka and spark for real time data analysis.
  • Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
  • Analysed the SQL scripts and designed the solution to implement using Scala.
  • Implemented Hive complex UDF's to execute business logic with Hive Queries.
  • Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
  • Evaluated performance of Spark SQL vs IMPALA vs DRILL on offline data as a part of poc.
  • Worked on Solr configuration and customizations based on requirements.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Responsible for developing data pipeline by implementing Kafka producers and consumers.
  • Performed data analysis with HBase using Apache Phoenix.
  • Exported the analyzed data to Impala to generate reports for the BI team.
  • Developed multiple Spark jobs in PySpark for data cleaning and pre-processing.
  • Managing and reviewing Hadoop Log files to resolve any configuration issues.
  • Developed a program to extract the name entities from OCR files.
  • Used Gradle for building and testing project
  • Fixed defects as needed during the QA phase, support QA testing, troubleshoot defects and identify the source of defects.
  • Used Mingle and later moved to JIRA for task/bug tracking.
  • Used GIT for version control.

Environment: MapR, Cloudera, Hadoop, HDFS,AWS, PIG, Hive, Impala, Drill, SparkSQL, OCR, MapReduce, Flume, Sqoop, Oozie, Storm, Zepplin, Mesos, Docker, Solr, Kafka, Mapr DB, Spark, Scala, HBase, ZooKeeper, Tableau, Shell Scripting, Gerrit, Java, Redis.

Confidential, Kenilworth, NJ

Hadoop Developer

Responsibilities:

  • Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
  • Extensively used Spark stack to develop pre-processing job which includes RDD, Datasets and Data frames Api'sto transform the data for upstream consumption.
  • Developed Real-time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka, Flume and JMS.
  • Worked on extracting and enriching HBase data between multiple tables using joins in spark.
  • Worked on writing APIs to load the processed data to HBase tables.
  • Replaced the existing MapReduce programs into Spark application using Scala.
  • Built on premise data pipelines using Kafka and Spark streaming using the feed from API streaming Gateway REST service.
  • Developed the Hive UDF’s to handle data quality and create filtered datasets for further processing
  • Experienced in writing Sqoop scripts to import data into Hive/HDFS from RDBMS.
  • Good knowledge on Kafka streams API for data transformation.
  • Implemented logging framework - ELK stack (Elastic Search, LogStash & Kibana) on AWS.
  • Setup Spark EMR to process huge data which is stored in Amazon S3.
  • Developed Oozie workflow for scheduling & orchestrating the ETL process.
  • Used Talend tool to create workflows for processing data from multiple source systems.
  • Created sample flows in Talend, Stream sets with custom coded jars and analyzed the performance of Stream sets and Kafka steams.
  • Developed Hive Queries to analyze the data in HDFS to identify issues and behavioural patterns.
  • Involved in writing optimized Pig Script along with developing and testing Pig Latin Scripts.
  • Able to use Python Pandas, Numpy modules for Data analysis, Data scraping and parsing.
  • Deployed applications using Jenkins framework integrating Git- version control with it.
  • Participated in production support on a regular basis to support the Analytics platform
  • Used Rally for task/bug tracking.
  • Used GIT for version control.

Environment: MapR, Hadoop, HBase, HDFS, AWS, PIG, Hive, Drill, SparkSQL, MapReduce, Spark streaming, Kafka, Flume, Sqoop, Oozie, Jupyter Notebook, Docker, Kafka, Spark, Scala, Talend, Shell Scripting, Java.

Confidential, Dublin, OH

Hadoop Developer

Responsibilities:

  • Loading the data from the different Data sources like (Teradata, DB2, Oracle and flat files) into HDFS using Sqoop and load into Hive tables, which are partitioned.
  • Created different pig scripts & converted them as a shell command to provide aliases for common operation for project business flow.
  • Implemented various Hive queries for Analysis and call then from java client engine to run on different nodes.
  • Created few Hive UDF's to as well to hide or abstract complex repetitive rules.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
  • Involved in End to End implementation of ETL logic.
  • Reviewing ETL application use cases before on boarding to Hadoop.
  • Developed bash scripts to bring the log files from FTP server and then processing it to load into Hive tables.
  • All the bash scripts are scheduled using Resource Manager Scheduler.
  • Moved data from HDFS to Cassandra using Map Reduce and Bulk Output Format class.
  • Developed Map Reduce programs for applying business rules to the data.
  • Did Implementation using Apache Kafka replacement for a more traditional message broker ( Confidential ) to reduce licensing and decouple processing from data producers, to buffer unprocessed messages.
  • Created HBase tables and column families to store the user event data.
  • Written automated HBase test cases for data quality checks using HBase command line tools.
  • Implemented receiver based approach, here I worked on Spark streaming for linking with Streaming Context using java API and handle proper closing & waiting for stages as well.
  • Maintaining Authentication module to support Kerberos.
  • Experience in Implementing Rack Topology scripts to the Hadoop Cluster.
  • Implemented the part to resolve issues related with old Hazel cast API Entry Processor.
  • Participated with the admin team in designing and upgrading CDH 3 to HDP 4.
  • Developed Some Helper class for abstracting Cassandra cluster connection act as core toolkit.
  • Enhanced existing module written in python scripts.
  • Used dashboard tools like Tableau.

Environment: Hadoop, Linux, MapReduce, HDFS, HBase, Hive, Pig, Tableau, NoSQL, Shell Scripting, Sqoop, Java, Eclipse, Oracle 10g, Maven, Open source technologies Apache Kafka, Apache Spark, ETL, Hazel cast, Git, Mockito, python.

Confidential

Java Developer

Responsibilities:

  • Involved in Analysis, Design, Development and Testing of the application.
  • Incorporated UML diagrams (Class diagrams, Activity diagrams, Sequence diagrams) as part of design documentation and other system documentation.
  • Enhanced the Port search functionality by adding a VPN Extension Tab.
  • Created end to end functionality for view and edit of VPN Extension details.
  • Used AGILE process to develop the application as the process allows faster development as compared to RUP.
  • Designed UI screens using JSP, jQuery, Ajax and HTML.
  • Used Hibernate for persistence framework
  • Used Struts MVC framework and WebLogic Application Server in this application.
  • Involved in creating DAO’s and used Hibernate for ORM mapping.
  • Built REST API end-points for various concepts.
  • Written procedures and Triggers for validating the consistency of Meta data.
  • Written SQL code blocks using cursors for shifting records from various tables based on checks.
  • Written Java classes to test UI and Web services through JUnit and JWebUnit.
  • Performed functional and integration testing.
  • Extensively involved in release/deployment related critical activities.
  • Tested the entire application using JUnit and JWebUnit.
  • Log4J was used to log both User Interface and Domain Level Messages.
  • Used Perforce for version control.

Environment: JAVA, JSP, Servlets, J2EE, EJB, Struts Framework, JDBC, WebLogic Application Server, Hibernate, Oracle 9i, UNIX, Web Services, CVS, Eclipse, Rational Rose, JUnit, JWebUnit.

Confidential

Java Developer

Responsibilities:

  • Involved in Analysis, design and coding on JAVA/JSP Front End Environment.
  • Responsible for developing use cases, class and sequence diagram for the modules using UML and Rational Rose Enterprise edition as a Feature owner.
  • Developed application using Spring, Servlets, JSP and EJB.
  • Implemented MVC (Model View Controller) architecture.
  • Designed the Application flow using Rational Rose.
  • Used web servers like Apache Tomcat.
  • Implemented Application prototype using HTML, CSS and JavaScript.
  • Developed the user interfaces with the spring tag libraries.
  • Developed, build and deployment scripts using Apache ANT to customize WAR, EAR and EJB jar files.
  • Prepared field validation and on-scenario test cases using Junit and testing of the module in 3 phases named unit testing and system using testing and regression testing.
  • Code and unit test according to client standards.
  • Used Oracle Database for data storage and coding stored procedures, functions and Triggers.
  • Wrote DB queries using SQL for interacting with database.
  • Design and develop XML processing components for dynamic menus on the application.
  • Created Components using JAVA, Spring and JNDI.
  • Prepared Sp ring deployment descriptors using XML.
  • Problem Management during QA, Implementation and Post- Production Support.
  • Developed a logging component using Apache Log4J to log messages and errors and wrote test cases to verify the code for different conditions using Junit.

Environment: Java, HTML, Spring, JSP, Servlets, DBMS, Web Services, JNDI, JDBC, Eclipse, Web sphere, XML/XSL, Apache Tomcat, TOAD, Oracle, MySQL, JUNIT, SQL, PL/SQL, CSS.

We'd love your feedback!