Sr. Big Data Hadoop Developer Resume
Shelton, CT
SUMMARY:
- Overall 8+ years of professional IT industry experience encompassing wide range of skill set in Big Data technologies.
- Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
- Excellent experience in AWS, Cloudera maintaining and optimized AWS infrastructure (EC2 and EBS) also good knowledge in MS Azure.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Experience in developing custom UDF's for Pig and Apache Hive to in corporate methods and functionality of Java into Pig Latin and HiveQL.
- Experience in collection of Log Data and JSON data into HDFS using Flume and processed the data using Hive/Pig.
- Strong knowledge on Hadoop eco systems including HDFS, Hive, Sqoop, Zookeeper etc.
- Experience on developing Java MapReduce jobs for data cleaning and data manipulation as required for the business.
- Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
- Extensive experience with advanced J2EE Frameworks such as spring, Struts, JSF and Hibernate.
- Deploying Customized UDFs in java to extend HIVE and Pig core functionality.
- Expertise in Web pages development using JSP, HTML, Java Script, JQuery and Ajax.
- Involving in efficiently converting JSON files to XML, and CSV files in Talend.
- Working on Bootstrap, AngularJS and NodeJS, knockout, ember, Java Persistence Architecture (JPA).
- Experience on developing Spark batch applications to ingest data into common data lake using Scala
- Good understanding of Amazon Web services to design data pipeline using various services.
- Excellent experience in installing and running various Oozie workflows and automating parallel job executions.
- Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
- Excellent networking and communication with all levels of stakeholders as appropriate, including executives, application developers, business users, and customers
- Expertise in developing production ready Spark applications utilizing Spark - Core, Data frames, Spark-SQL, Spark-ML and Spark-Streaming API's.
- Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
- Proven ability to manage all stages of project development Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.
TECHNICAL SKILLS:
Hadoop/Big Data Technologies: Hadoop 3.0, HDFS, MapReduce, HBase 1.4, Apache Pig,Hive 2.3, Sqoop 1.4, Apache Impala 2.1, Oozie 4.3, Yarn, Apache Flume 1.8, Kafka 1.1,Zookeeper
Cloud Platform: Amazon AWS, EC2, EC3, MS Azure, Azure SQL Database, Azure SQL DataWarehouse, Azure Analysis Services, HDInsight, Azure Data Lake, Data Factory
Hadoop Distributions: Cloudera, Hortonworks, MapR
Programming Language: Java, Scala, Python 3.6, SQL, PL/SQL, Shell Scripting, Storm 1.0,JSP, Servlets
Frameworks: Spring 5.0.5, Hibernate 5.2, Struts 1.3, JSF, EJB, JMS
Web Technologies: HTML5, CSS3, JavaScript, JQuery 3.3, Bootstrap 4.1, XML, JSON, AJAX
Databases: Oracle 12c/11g, SQL
Operating Systems: Linux, Unix, Windows 10/8/7
IDE and Tools: Eclipse 4.7, NetBeans 8.2, IntelliJ, Maven
NoSQL Databases: HBase 1.4, Cassandra 3.11, MongoDB, Accumulo
Web/Application Server: Apache Tomcat 9.0.7, JBoss, Web Logic, Web Sphere
SDLC Methodologies: Agile, Waterfall
Version Control: GIT, SVN, CVS
WORK EXPERIENCE:
Confidential, Shelton, CT
Sr. Big Data Hadoop Developer
Responsibilities:
- Worked as a Big Data Developer, I worked on Hadoop eco-systems including Hive, HBase, Oozie, Pig, Zookeeper, Spark Streaming MCS and so on with MapR distribution.
- Designed and deployed full SDLC of Hadoop cluster based on client's business need.
- Primarily involved in Data Migration process using Azure by integrating with Github repository and Jenkins.
- Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive and spark.
- Performed transformations / analysis by writing complex HQL queries in Hive and exported result to HDFS in discrete file format
- Involved in development of Hadoop System and improving multi-node Hadoop Cluster performance.
- Used Spark to create the structured data from large amount of unstructured data from various sources.
- Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra.
- Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Implemented Security in Web Applications using Azure and deployed Web Applications to Azure.
- Participated in maintaining data integrity between Oracle and SQL databases.
- Used RESTful web services with MVC for parsing and processing XML data.
- Managed real time data processing and real time Data Ingestion in MongoDB and Hive using Storm.
- Installed and configured Hortonworks Ambari for easy management of existing Hadoop cluster.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Developed PL/SQL scripts to validate and load data into interface tables
- Designed, developed and maintained Big Data streaming and batch applications using Storm.
- Export result set from Hive to MySQL using Sqoop export tool for further processing.
- Developed and designed data integration and migration solutions in Azure.
- Integrated Apache Kafka with Elastic search using Kafka Elastic search Connector to stream all messages from different partitions and topics into Elastic search for search and analysis
- Exported event weblogs to HDFS by creating a HDFS sink which directly deposits the weblogs in HDFS.
- Implemented the Cassandra and manage of the other tools to process observed running on over Yarn.
- Developed customized classes for serialization and Deserialization in Hadoop.
- Worked on Apache Nifi as ETL tool for batch processing and real time processing.
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Used Spring JDBC Dao as a data access technology to interact with the database.
- Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala.
Environment: Hadoop3.0, Oozie4.3 Pig 0.17, Zookeeper3.4, Hive 2.3, HBase1.2, Jenkins2.1, Azure, Git hub, Map Reduce, Spark2.4, Cassandra 3.0, flume1.8, Sqoop1.4, Oracle 12c, XML, MongoDB4.0, Teradata r15, SQL, PL/SQL, MySQL8.0, Kafka 1.1, Elastic search 6.6, HDFS, ETL, Zookeeper 3.4, Scala2.1.
Confidential, Lowell, AR
Spark/Hadoop Developer
Responsibilties:
- Created Spark applications using Spark Data frames and Spark SQL API extensively.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Prepared Spark builds from MapReduce source code for better performance.
- Involved in importing the real-time data to Hadoop using Kafka and implemented Oozie jobs for daily imports.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Implemented NiFi flow topologies to perform cleansing operations before moving data into HDFS.
- Used Spark API over Hortonworks, Hadoop YARN to perform analytics on data in Hive.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Worked on AWS provisioning EC2 Infrastructure and deploying applications in Elastic load balancing.
- Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
- Used parquet file format for published tables and created views on the tables.
- Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
- Implemented POC to migrate MapReduce jobs into SparkRDD transformations using SCALA
- Written event-driven, link tracking system to capture user events and feed to Kafka to push it to HBase
- Used Maven in building the application and auto deploying it to the environment.
- Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala.
- Developed several REST web services which produces both XML and JSON to perform tasks, leveraged by both web and mobile applications.
- Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
- Worked with teams in setting up AWS EC2 instances by using different AWS services like S3, EBS, and Elastic Load Balancer, Auto scaling groups, VPC subnets and Cloud Watch.
- Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza.
- Worked on No-SQL databases like Cassandra, MongoDB for POC purpose in storing images and URIs.
- Coordinated with Hortonworks support team through support portal to sort out the critical issues during upgrades.
- Migrated complex MapReduce programs into Spark RDD transformations, actions.
- Explored MLlib algorithms in Spark to understand the possible Machine Learning functionalities that can be used for our use case
- Used Spring Framework for logging implementation and extensively used Spring AOP to reduce cross cutting concerns.
- Involved in Unit integration, bug fixing, acceptance testing with test cases, Code reviewing
- Written ad-hoc queries using Presto and Impala and used Impala analytical functions
Environment: Spark2.4, Scala2.1, Oozie4.3, Kafka1.1, Hadoop3.0, MySQL8.0, YARN, HBase1.2, AWS, Map Reduce, Maven2.0, XML, JSON, PL/SQL, Sqoop1.4, Cassandra3.0, MongoDB4.0.
Confidential, Austin, TX
Hadoop Developer
Responsibilities:
- Involved in Installing, Configuring Hadoop Eco-System, Cloudera Manager using CDH3, CDH4Distributions.
- Involved in creating Hive tables, then applied HiveQL on those tables, this will invoke and run MapReduce jobs automatically.
- Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala.
- Created conversion scripts using Oracle SQL queries, functions and stored procedures, testcases and plans before ETL migrations.
- Worked on Spark for in memory commutations and comparing the Data Frames for optimizing performance.
- Developed Java modules implementing business rules and workflows using Spring MVC, Web Framework.
- Involved in loading and transforming large sets of Structured, Semi-Structured andUnstructured data and analyzed them by running Hive queries and Pig scripts
- Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components
- Installed and configured Hadoop Ecosystem components and Cloudera manager using CDH distribution.
- Used Open Source packages, designed POC to demonstrate Integration of Kafka/Flume with Spark Streaming for real-time data Ingestion and processing.
- Developed Shell scripts to read files from edge node to ingest into HDFS partitions based on thefile naming pattern.
- Exported of result set from HIVE to MySQL using Sqoop export tool for further processing.
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- Involved in creating Hive tables, loading with data and writing hive queries which runs internally in MapReduce way.
Environment: Hadoop2.3, ETL, Oracle, HDFS, MySQL, Sqoop, Hive 2.1 Zookeeper, Pig, Spring MVC4, Scala, Map Reduce.
Confidential, McLean, VA
Java/J2EE Developer
Responsibilities:
- Developed extensive additions to existing Struts/Java/J2EE Web Application utilizing Service Oriented Architecture (SOA) techniques.
- Involved in developing the application using Java/J2EE platform. Implemented the Model View Control (MVC) structure using spring.
- Performed tuning J2EE apps, performance testing, analysis, and tuning.
- Developed User Interface having animations and effects using JSF, JavaScript and HTML.
- Used Hibernate for mapping POJO's to relational database tables using XML files.
- Used Java Servlets, JSPs, AJAX, HTML and CSS for developing the Web component of the application.
- Developed RESTful web services using Java, Spring Boot, databases like PostgreSQL.
- Created several JSP's and populated them with data from databases using JDBC.
- Used Object/Relational mapping Hibernate framework as the persistence layer for interacting with Database.
- Developed the Product Builder UI screens using Angular-JS, NodeJS, HTML5, CSS, and JavaScript.
- Written Hibernate annotation-based mapping Java classes with Oracle Database tables.
- Designed, developed and maintained the data layer using Hibernate and performed configuration of Struts Application Framework.
- Used log4j for logging and SVN for versioncontrol.
- Implemented log4j for application logging and to troubleshoot issues in debug mode.
- Involved in developing test cases using JUnit testing during development mode.
- Used Eclipse as Java IDE tool for creating various J2EE artifacts like Servlets, JSP's and XML.
- Used JMS (Asynchronous/Synchronous) for sending and getting messages from the MQ Series.
- Created and implemented Queries insert, update and delete operation for the Cassandra and mongoDB database.
- Implemented SOA architecture with Web Services using SOAP, WSDL and XML to integrate other legacy systems.
- Involved in using continuous integration tool Jenkins to push and pull the project code into GitHub repositories.
- Involved in Unit Testing and Bug-Fixing and achieved the maximum code coverage using JUNIT test cases.
Environment: Java 8, J2EE, POJO's, XML, Spring Boot, HTML5, log4j, SOA, WSDL, XML, Jenkins, JUNIT, JUnit, SOAP, Hibernate, XML, JMS.
Confidential
Java Developer
Responsibilities:
- Used JavaScript to update content in the database and manipulate files and generated JAVA and Spring MVC Forms to record data of online users.
- Developed and tested many features for dashboard, created using Bootstrap, CSS, and JavaScript.
- Developed single page applications using Angular.js, Implemented two-way data binding using AngularJS.
- Used Spring DAO concept in order to interact with Database using JDBC template and Hibernate template.
- Created an XML configuration file for Hibernate for Database connectivity.
- Created Test suites in SOAP UI projects and created internal test cases depending on the requirement.
- Used Maven for building and deploying the project on Web Sphere application server.
- Involved in the deployment of the application using JBoss, WebLogic servers.
- Involved in development of WebServices using REST for sending and getting data from the external interface in the JSON format.
- Created reusable components using typescript on the client side in Angular Js, used fast data access purpose React Js, NodeJS.
- Built web-based application using Spring MVC Architecture and REST Web-services.
- Developed the custom Logging framework used to log transactions executed across the various applications using Log4j.
- Used SVN as version control system for the source code and project documents.
- Used React.js to render changing currency spreads and to dynamically update the DOM
- Developed Enterprise Java Beans (EJB) with both State Less Session beans and Entity beans using CMP.
- Used XSL/ XSLT for Transforming and displaying reports. Developed DTD's for XML.
- Developed common business-related custom tag using JSP and published to rest of the teams. Developed user interface using JSP, Struts tag library.
- Implemented various designpatterns like singleton, data access object, data transfer object, MVC design pattern.
- Developed many JSP pages, used Dojo in JavaScript Library, jQuery UI for client-side validation.
Environment: JAVA7, Bootstrap, JavaScript, Angular.js, JDBC, Hibernate3.0, XML, SOAP, ReactJs, NodeJS, Angular Js, JSON, DOM, EJB, Log4j, Dojo, jQuery.