Sr. Hadoop Developer Resume
New Hartford, NY
PROFESSIONAL SUMMARY:
- Having 8+ years of professional experience this includes Analysis, Design, Development, Integration
- Deployment and Maintenance of quality software applications using Java/J2EE Technologies and Hadoop technologies.
- Hands on experience in using various Hadoop distributions (Apache, Horton works, Cloud era, MapR).
- Experience in working with Amazon EMR, Cloudera (CDH3, CDH4 & CDH5 ) and Horton Works Hadoop Distributions.
- Expertise in Hadoop Ecosystem tools which including HDFS, Yarn, MapReduce, Pig, Hive, Sqoop, Flume, Kafka, Spark, Zookeeper and Oozie.
- Good knowledge in EMR (Elastic Map Reducing) to perform big data operations in AWS.
- Knowledge in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- Excellent understanding of Spark and its benefits in Big Data Analytics.
- Hands on experience in Stream processing frameworks such as Storm, Spark Streaming .
- Experience in design and develop the POC in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Hand - on experience in using Scala, Spark Streaming, batch processing for processing the Streaming data and batch data.
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
- Experience in data analysis using HiveQL, Pig Latin and custom Map Reduce programs in Java.
- Hands-on experience on fetching the live stream data from DB2 to HBase table using Spark Streaming and Apache Kafka.
- Experienced working with Hadoop Big Data technologies (hdfs and MapReduce programs), Hadoop ecosystems (HBase, Hive, pig) and NoSQL database MongoDB.
- Experience in queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL.
- Scraped and analyzed data using Machine Learning algorithms in Python and SQL.
- Experience using Spark DataStax and Cassandra Connector load data to and from Cassandra.
- Experience on usage of NoSQL in writing applications like HBase, Cassandra and MongoDB.
- Extensive Experience on importing and exporting data using Flume and Kafka.
- Experience in configuring the Zookeeper to coordinate servers in clusters and to maintain data consistency.
- Expertise in loading the data from the different data sources like (Teradata and DB2) into HDFS using Sqoop and load into partitioned Hive tables.
- Experience in developing data pipeline by using Kafka to store the data into HDFS.
- Experience in migrating data by using SQOOP from HDFS to Relational Database System and vice-versa according to client's requirements.
- Used Cassandra CQL with Java API’s to retrieve the data from Cassandra tables.
- Worked with NIFI for managing flow of data from source to HDFS.
- Good Experience on source control repositories like CVS, GIT and SVN .
- Experience in working different scripting technologies like Python, UNIX shell scripts.
- Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
- Experience working with Spring and Hibernates frameworks in JAVA.
- Experience in developing web page interfaces using HTML, JSP and Java Swings scripting languages.
- Used Spring Core Annotations for Dependency Injection Spring DI and Spring MVC for REST API’s and Spring Boot for micro-services.
- Good understanding and working experience on Cloud based architectures.
- Experience in handling various file formats like AVRO, Parquet, Sequential etc.
- Expertise implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
- Expertise in Oracle ORMB and Stored procedures concepts.
- Good understanding and experience with Software Development methodologies like Agile and Waterfall and performed Testing such as Unit, Regression, White-box and Black-box.
- Ability to work onsite and offshore team members.
TECHNICAL SKILLS:
Big Data Technologies: HDFS, MapReduce, YARN, Pig, Hive, Sqoop, Kafka, Flume, HBase, Cassandra, MongoDB, Spark, Solr, Ambari, Hue, Avro, Mahout, Impala, Oozie, Nifi and Zookeeper
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache
Database: MySQL, Oracle 10g/11g, PL/SQL, MS SQL Server 2012
No-SQL Database: HBase, Cassandra and MongoDB
Programming Languages: C, C++, Java, JavaScript, Python, Scala
Frameworks: Struts, Spring, Hibernate, Spring Boot, Micro-services
Operating System: Windows 7/8/10, Vista, Ubuntu, Linux, UNIX, Mac OS
Cloud Platforms: AWS Cloud, Google Cloud
Application Servers: Web Logic, Web Sphere, Tomcat
Architecture: Client-Server Architecture, Relational DBMS, OLAP, OLTP
Testing: Selenium Web Driver, Junit
Modelling Tools: Visual paradigm for UML, Rational Rose, Star UML
ETL Tools: Talend, Informatica, Tableau
IDE Tools: NetBeans, Eclipse, Intellij, Visual Studio Code
Built Tools: Maven, Jenkins
Development Methodologies: Waterfall, Agile/Scrum
PROFESSIONAL EXPERIENCE:
Confidential, New Hartford, NY
Sr. Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Experienced in loading and transforming of large sets of structured, semi structured and unstructured data.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
- Experienced in developing Spark scripts for data analysis in both python and Scala.
- Built on premise data pipelines using Kafka and spark for real time data analysis.
- Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
- Analysed the SQL scripts and designed the solution to implement using Scala.
- Implemented Hive complex UDF's to execute business logic with Hive Queries.
- Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
- Evaluated performance of Spark SQL vs IMPALA vs DRILL on offline data as a part of poc.
- Worked on Solr configuration and customizations based on requirements.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Responsible for developing data pipeline by implementing Kafka producers and consumers.
- Performed data analysis with HBase using Apache Phoenix.
- Exported the analyzed data to Impala to generate reports for the BI team.
- Developed multiple Spark jobs in PySpark for data cleaning and pre-processing.
- Managing and reviewing Hadoop Log files to resolve any configuration issues.
- Developed a program to extract the name entities from OCR files.
- Used Gradle for building and testing project
- Fixed defects as needed during the QA phase, support QA testing, troubleshoot defects and identify the source of defects.
- Used Mingle and later moved to JIRA for task/bug tracking.
- Used GIT for version control.
Environment: MapR, Cloudera, Hadoop, HDFS,AWS, PIG, Hive, Impala, Drill, SparkSQL, OCR, MapReduce, Flume, Sqoop, Oozie, Storm, Zepplin, Mesos, Docker, Solr, Kafka, Mapr DB, Spark, Scala, HBase, ZooKeeper, Tableau, Shell Scripting, Gerrit, Java, Redis.
Confidential, Kenilworth, NJ
Hadoop Developer
Responsibilities:
- Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
- Extensively used Spark stack to develop pre-processing job which includes RDD, Datasets and Data frames Api'sto transform the data for upstream consumption.
- Developed Real-time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka, Flume and JMS.
- Worked on extracting and enriching HBase data between multiple tables using joins in spark.
- Worked on writing APIs to load the processed data to HBase tables.
- Replaced the existing MapReduce programs into Spark application using Scala.
- Built on premise data pipelines using Kafka and Spark streaming using the feed from API streaming Gateway REST service.
- Developed the Hive UDF’s to handle data quality and create filtered datasets for further processing
- Experienced in writing Sqoop scripts to import data into Hive/HDFS from RDBMS.
- Good knowledge on Kafka streams API for data transformation.
- Implemented logging framework - ELK stack (Elastic Search, LogStash & Kibana) on AWS.
- Setup Spark EMR to process huge data which is stored in Amazon S3.
- Developed Oozie workflow for scheduling & orchestrating the ETL process.
- Used Talend tool to create workflows for processing data from multiple source systems.
- Created sample flows in Talend, Stream sets with custom coded jars and analyzed the performance of Stream sets and Kafka steams.
- Developed Hive Queries to analyze the data in HDFS to identify issues and behavioural patterns.
- Involved in writing optimized Pig Script along with developing and testing Pig Latin Scripts.
- Able to use Python Pandas, Numpy modules for Data analysis, Data scraping and parsing.
- Deployed applications using Jenkins framework integrating Git- version control with it.
- Participated in production support on a regular basis to support the Analytics platform
- Used Rally for task/bug tracking.
- Used GIT for version control.
Environment: MapR, Hadoop, HBase, HDFS, AWS, PIG, Hive, Drill, SparkSQL, MapReduce, Spark streaming, Kafka, Flume, Sqoop, Oozie, Jupyter Notebook, Docker, Kafka, Spark, Scala, Talend, Shell Scripting, Java.
Confidential, Dublin, OH
Hadoop Developer
Responsibilities:
- Loading the data from the different Data sources like (Teradata, DB2, Oracle and flat files) into HDFS using Sqoop and load into Hive tables, which are partitioned.
- Created different pig scripts & converted them as a shell command to provide aliases for common operation for project business flow.
- Implemented various Hive queries for Analysis and call then from java client engine to run on different nodes.
- Created few Hive UDF's to as well to hide or abstract complex repetitive rules.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Involved in End to End implementation of ETL logic.
- Reviewing ETL application use cases before on boarding to Hadoop.
- Developed bash scripts to bring the log files from FTP server and then processing it to load into Hive tables.
- All the bash scripts are scheduled using Resource Manager Scheduler.
- Moved data from HDFS to Cassandra using Map Reduce and Bulk Output Format class.
- Developed Map Reduce programs for applying business rules to the data.
- Did Implementation using Apache Kafka replacement for a more traditional message broker ( Confidential ) to reduce licensing and decouple processing from data producers, to buffer unprocessed messages.
- Created HBase tables and column families to store the user event data.
- Written automated HBase test cases for data quality checks using HBase command line tools.
- Implemented receiver based approach, here I worked on Spark streaming for linking with Streaming Context using java API and handle proper closing & waiting for stages as well.
- Maintaining Authentication module to support Kerberos.
- Experience in Implementing Rack Topology scripts to the Hadoop Cluster.
- Implemented the part to resolve issues related with old Hazel cast API Entry Processor.
- Participated with the admin team in designing and upgrading CDH 3 to HDP 4.
- Developed Some Helper class for abstracting Cassandra cluster connection act as core toolkit.
- Enhanced existing module written in python scripts.
- Used dashboard tools like Tableau.
Environment: Hadoop, Linux, MapReduce, HDFS, HBase, Hive, Pig, Tableau, NoSQL, Shell Scripting, Sqoop, Java, Eclipse, Oracle 10g, Maven, Open source technologies Apache Kafka, Apache Spark, ETL, Hazel cast, Git, Mockito, python.
Confidential
Java Developer
Responsibilities:
- Involved in Analysis, Design, Development and Testing of the application.
- Incorporated UML diagrams (Class diagrams, Activity diagrams, Sequence diagrams) as part of design documentation and other system documentation.
- Enhanced the Port search functionality by adding a VPN Extension Tab.
- Created end to end functionality for view and edit of VPN Extension details.
- Used AGILE process to develop the application as the process allows faster development as compared to RUP.
- Designed UI screens using JSP, jQuery, Ajax and HTML.
- Used Hibernate for persistence framework
- Used Struts MVC framework and WebLogic Application Server in this application.
- Involved in creating DAO’s and used Hibernate for ORM mapping.
- Built REST API end-points for various concepts.
- Written procedures and Triggers for validating the consistency of Meta data.
- Written SQL code blocks using cursors for shifting records from various tables based on checks.
- Written Java classes to test UI and Web services through JUnit and JWebUnit.
- Performed functional and integration testing.
- Extensively involved in release/deployment related critical activities.
- Tested the entire application using JUnit and JWebUnit.
- Log4J was used to log both User Interface and Domain Level Messages.
- Used Perforce for version control.
Environment: JAVA, JSP, Servlets, J2EE, EJB, Struts Framework, JDBC, WebLogic Application Server, Hibernate, Oracle 9i, UNIX, Web Services, CVS, Eclipse, Rational Rose, JUnit, JWebUnit.
Confidential
Java Developer
Responsibilities:
- Involved in Analysis, design and coding on JAVA/JSP Front End Environment.
- Responsible for developing use cases, class and sequence diagram for the modules using UML and Rational Rose Enterprise edition as a Feature owner.
- Developed application using Spring, Servlets, JSP and EJB.
- Implemented MVC (Model View Controller) architecture.
- Designed the Application flow using Rational Rose.
- Used web servers like Apache Tomcat.
- Implemented Application prototype using HTML, CSS and JavaScript.
- Developed the user interfaces with the spring tag libraries.
- Developed, build and deployment scripts using Apache ANT to customize WAR, EAR and EJB jar files.
- Prepared field validation and on-scenario test cases using Junit and testing of the module in 3 phases named unit testing and system using testing and regression testing.
- Code and unit test according to client standards.
- Used Oracle Database for data storage and coding stored procedures, functions and Triggers.
- Wrote DB queries using SQL for interacting with database.
- Design and develop XML processing components for dynamic menus on the application.
- Created Components using JAVA, Spring and JNDI.
- Prepared Sp ring deployment descriptors using XML.
- Problem Management during QA, Implementation and Post- Production Support.
- Developed a logging component using Apache Log4J to log messages and errors and wrote test cases to verify the code for different conditions using Junit.
Environment: Java, HTML, Spring, JSP, Servlets, DBMS, Web Services, JNDI, JDBC, Eclipse, Web sphere, XML/XSL, Apache Tomcat, TOAD, Oracle, MySQL, JUNIT, SQL, PL/SQL, CSS.
