Sr. Hadoop Developer Resume New Hartford, NY - Hire IT People

PROFESSIONAL SUMMARY:

Having 8+ years of professional experience this includes Analysis, Design, Development, Integration
Deployment and Maintenance of quality software applications using Java/J2EE Technologies and Hadoop technologies.
Hands on experience in using various Hadoop distributions (Apache, Horton works, Cloud era, MapR).
Experience in working with Amazon EMR, Cloudera (CDH3, CDH4 & CDH5 ) and Horton Works Hadoop Distributions.
Expertise in Hadoop Ecosystem tools which including HDFS, Yarn, MapReduce, Pig, Hive, Sqoop, Flume, Kafka, Spark, Zookeeper and Oozie.
Good knowledge in EMR (Elastic Map Reducing) to perform big data operations in AWS.
Knowledge in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
Excellent understanding of Spark and its benefits in Big Data Analytics.
Hands on experience in Stream processing frameworks such as Storm, Spark Streaming .
Experience in design and develop the POC in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
Hand - on experience in using Scala, Spark Streaming, batch processing for processing the Streaming data and batch data.
Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
Experience in data analysis using HiveQL, Pig Latin and custom Map Reduce programs in Java.
Hands-on experience on fetching the live stream data from DB2 to HBase table using Spark Streaming and Apache Kafka.
Experienced working with Hadoop Big Data technologies (hdfs and MapReduce programs), Hadoop ecosystems (HBase, Hive, pig) and NoSQL database MongoDB.
Experience in queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL.
Scraped and analyzed data using Machine Learning algorithms in Python and SQL.
Experience using Spark DataStax and Cassandra Connector load data to and from Cassandra.
Experience on usage of NoSQL in writing applications like HBase, Cassandra and MongoDB.
Extensive Experience on importing and exporting data using Flume and Kafka.
Experience in configuring the Zookeeper to coordinate servers in clusters and to maintain data consistency.
Expertise in loading the data from the different data sources like (Teradata and DB2) into HDFS using Sqoop and load into partitioned Hive tables.
Experience in developing data pipeline by using Kafka to store the data into HDFS.
Experience in migrating data by using SQOOP from HDFS to Relational Database System and vice-versa according to client's requirements.
Used Cassandra CQL with Java API’s to retrieve the data from Cassandra tables.
Worked with NIFI for managing flow of data from source to HDFS.
Good Experience on source control repositories like CVS, GIT and SVN .
Experience in working different scripting technologies like Python, UNIX shell scripts.
Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
Experience working with Spring and Hibernates frameworks in JAVA.
Experience in developing web page interfaces using HTML, JSP and Java Swings scripting languages.
Used Spring Core Annotations for Dependency Injection Spring DI and Spring MVC for REST API’s and Spring Boot for micro-services.
Good understanding and working experience on Cloud based architectures.
Experience in handling various file formats like AVRO, Parquet, Sequential etc.
Expertise implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
Expertise in Oracle ORMB and Stored procedures concepts.
Good understanding and experience with Software Development methodologies like Agile and Waterfall and performed Testing such as Unit, Regression, White-box and Black-box.
Ability to work onsite and offshore team members.

TECHNICAL SKILLS:

Big Data Technologies: HDFS, MapReduce, YARN, Pig, Hive, Sqoop, Kafka, Flume, HBase, Cassandra, MongoDB, Spark, Solr, Ambari, Hue, Avro, Mahout, Impala, Oozie, Nifi and Zookeeper

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache

Database: MySQL, Oracle 10g/11g, PL/SQL, MS SQL Server 2012

No-SQL Database: HBase, Cassandra and MongoDB

Programming Languages: C, C++, Java, JavaScript, Python, Scala

Frameworks: Struts, Spring, Hibernate, Spring Boot, Micro-services

Operating System: Windows 7/8/10, Vista, Ubuntu, Linux, UNIX, Mac OS

Cloud Platforms: AWS Cloud, Google Cloud

Application Servers: Web Logic, Web Sphere, Tomcat

Architecture: Client-Server Architecture, Relational DBMS, OLAP, OLTP

Testing: Selenium Web Driver, Junit

Modelling Tools: Visual paradigm for UML, Rational Rose, Star UML

ETL Tools: Talend, Informatica, Tableau

IDE Tools: NetBeans, Eclipse, Intellij, Visual Studio Code

Built Tools: Maven, Jenkins

Development Methodologies: Waterfall, Agile/Scrum

PROFESSIONAL EXPERIENCE:

Confidential, New Hartford, NY

Sr. Hadoop Developer

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop.
Experienced in loading and transforming of large sets of structured, semi structured and unstructured data.
Developed Spark jobs and Hive Jobs to summarize and transform data.
Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
Experienced in developing Spark scripts for data analysis in both python and Scala.
Built on premise data pipelines using Kafka and spark for real time data analysis.
Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
Analysed the SQL scripts and designed the solution to implement using Scala.
Implemented Hive complex UDF's to execute business logic with Hive Queries.
Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
Evaluated performance of Spark SQL vs IMPALA vs DRILL on offline data as a part of poc.
Worked on Solr configuration and customizations based on requirements.
Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
Responsible for developing data pipeline by implementing Kafka producers and consumers.
Performed data analysis with HBase using Apache Phoenix.
Exported the analyzed data to Impala to generate reports for the BI team.
Developed multiple Spark jobs in PySpark for data cleaning and pre-processing.
Managing and reviewing Hadoop Log files to resolve any configuration issues.
Developed a program to extract the name entities from OCR files.
Used Gradle for building and testing project
Fixed defects as needed during the QA phase, support QA testing, troubleshoot defects and identify the source of defects.
Used Mingle and later moved to JIRA for task/bug tracking.
Used GIT for version control.

Environment: MapR, Cloudera, Hadoop, HDFS,AWS, PIG, Hive, Impala, Drill, SparkSQL, OCR, MapReduce, Flume, Sqoop, Oozie, Storm, Zepplin, Mesos, Docker, Solr, Kafka, Mapr DB, Spark, Scala, HBase, ZooKeeper, Tableau, Shell Scripting, Gerrit, Java, Redis.

Confidential, Kenilworth, NJ

Hadoop Developer

Responsibilities:

Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
Extensively used Spark stack to develop pre-processing job which includes RDD, Datasets and Data frames Api'sto transform the data for upstream consumption.
Developed Real-time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka, Flume and JMS.
Worked on extracting and enriching HBase data between multiple tables using joins in spark.
Worked on writing APIs to load the processed data to HBase tables.
Replaced the existing MapReduce programs into Spark application using Scala.
Built on premise data pipelines using Kafka and Spark streaming using the feed from API streaming Gateway REST service.
Developed the Hive UDF’s to handle data quality and create filtered datasets for further processing
Experienced in writing Sqoop scripts to import data into Hive/HDFS from RDBMS.
Good knowledge on Kafka streams API for data transformation.
Implemented logging framework - ELK stack (Elastic Search, LogStash & Kibana) on AWS.
Setup Spark EMR to process huge data which is stored in Amazon S3.
Developed Oozie workflow for scheduling & orchestrating the ETL process.
Used Talend tool to create workflows for processing data from multiple source systems.
Created sample flows in Talend, Stream sets with custom coded jars and analyzed the performance of Stream sets and Kafka steams.
Developed Hive Queries to analyze the data in HDFS to identify issues and behavioural patterns.
Involved in writing optimized Pig Script along with developing and testing Pig Latin Scripts.
Able to use Python Pandas, Numpy modules for Data analysis, Data scraping and parsing.
Deployed applications using Jenkins framework integrating Git- version control with it.
Participated in production support on a regular basis to support the Analytics platform
Used Rally for task/bug tracking.
Used GIT for version control.

Environment: MapR, Hadoop, HBase, HDFS, AWS, PIG, Hive, Drill, SparkSQL, MapReduce, Spark streaming, Kafka, Flume, Sqoop, Oozie, Jupyter Notebook, Docker, Kafka, Spark, Scala, Talend, Shell Scripting, Java.

Confidential, Dublin, OH

Hadoop Developer

Responsibilities:

Loading the data from the different Data sources like (Teradata, DB2, Oracle and flat files) into HDFS using Sqoop and load into Hive tables, which are partitioned.
Created different pig scripts & converted them as a shell command to provide aliases for common operation for project business flow.
Implemented various Hive queries for Analysis and call then from java client engine to run on different nodes.
Created few Hive UDF's to as well to hide or abstract complex repetitive rules.
Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
Involved in End to End implementation of ETL logic.
Reviewing ETL application use cases before on boarding to Hadoop.
Developed bash scripts to bring the log files from FTP server and then processing it to load into Hive tables.
All the bash scripts are scheduled using Resource Manager Scheduler.
Moved data from HDFS to Cassandra using Map Reduce and Bulk Output Format class.
Developed Map Reduce programs for applying business rules to the data.
Did Implementation using Apache Kafka replacement for a more traditional message broker ( Confidential ) to reduce licensing and decouple processing from data producers, to buffer unprocessed messages.
Created HBase tables and column families to store the user event data.
Written automated HBase test cases for data quality checks using HBase command line tools.
Implemented receiver based approach, here I worked on Spark streaming for linking with Streaming Context using java API and handle proper closing & waiting for stages as well.
Maintaining Authentication module to support Kerberos.
Experience in Implementing Rack Topology scripts to the Hadoop Cluster.
Implemented the part to resolve issues related with old Hazel cast API Entry Processor.
Participated with the admin team in designing and upgrading CDH 3 to HDP 4.
Developed Some Helper class for abstracting Cassandra cluster connection act as core toolkit.
Enhanced existing module written in python scripts.
Used dashboard tools like Tableau.

Environment: Hadoop, Linux, MapReduce, HDFS, HBase, Hive, Pig, Tableau, NoSQL, Shell Scripting, Sqoop, Java, Eclipse, Oracle 10g, Maven, Open source technologies Apache Kafka, Apache Spark, ETL, Hazel cast, Git, Mockito, python.

Confidential

Java Developer

Responsibilities:

Involved in Analysis, Design, Development and Testing of the application.
Incorporated UML diagrams (Class diagrams, Activity diagrams, Sequence diagrams) as part of design documentation and other system documentation.
Enhanced the Port search functionality by adding a VPN Extension Tab.
Created end to end functionality for view and edit of VPN Extension details.
Used AGILE process to develop the application as the process allows faster development as compared to RUP.
Designed UI screens using JSP, jQuery, Ajax and HTML.
Used Hibernate for persistence framework
Used Struts MVC framework and WebLogic Application Server in this application.
Involved in creating DAO’s and used Hibernate for ORM mapping.
Built REST API end-points for various concepts.
Written procedures and Triggers for validating the consistency of Meta data.
Written SQL code blocks using cursors for shifting records from various tables based on checks.
Written Java classes to test UI and Web services through JUnit and JWebUnit.
Performed functional and integration testing.
Extensively involved in release/deployment related critical activities.
Tested the entire application using JUnit and JWebUnit.
Log4J was used to log both User Interface and Domain Level Messages.
Used Perforce for version control.

Environment: JAVA, JSP, Servlets, J2EE, EJB, Struts Framework, JDBC, WebLogic Application Server, Hibernate, Oracle 9i, UNIX, Web Services, CVS, Eclipse, Rational Rose, JUnit, JWebUnit.

Confidential

Java Developer

Responsibilities:

Involved in Analysis, design and coding on JAVA/JSP Front End Environment.
Responsible for developing use cases, class and sequence diagram for the modules using UML and Rational Rose Enterprise edition as a Feature owner.
Developed application using Spring, Servlets, JSP and EJB.
Implemented MVC (Model View Controller) architecture.
Designed the Application flow using Rational Rose.
Used web servers like Apache Tomcat.
Implemented Application prototype using HTML, CSS and JavaScript.
Developed the user interfaces with the spring tag libraries.
Developed, build and deployment scripts using Apache ANT to customize WAR, EAR and EJB jar files.
Prepared field validation and on-scenario test cases using Junit and testing of the module in 3 phases named unit testing and system using testing and regression testing.
Code and unit test according to client standards.
Used Oracle Database for data storage and coding stored procedures, functions and Triggers.
Wrote DB queries using SQL for interacting with database.
Design and develop XML processing components for dynamic menus on the application.
Created Components using JAVA, Spring and JNDI.
Prepared Sp ring deployment descriptors using XML.
Problem Management during QA, Implementation and Post- Production Support.
Developed a logging component using Apache Log4J to log messages and errors and wrote test cases to verify the code for different conditions using Junit.

Environment: Java, HTML, Spring, JSP, Servlets, DBMS, Web Services, JNDI, JDBC, Eclipse, Web sphere, XML/XSL, Apache Tomcat, TOAD, Oracle, MySQL, JUNIT, SQL, PL/SQL, CSS.

We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

New Hartford, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship