Sr. Big Data / Hadoop Developer Resume
Boston, MA
PROFESSIONAL SUMMARY:
- Overall 9 years of experience in design, development, maintenance, and support of Big Data/Hadoop & Java/J2EE solutions.
- Excellent experienced on Hadoop ecosystem, In - depth understanding of MongoDB and the Hadoop Infrastructure.
- Hands on experience with Cloudera and multi cluster nodes on Hortonworks Sandbox.
- Extensively worked on major components of Hadoop Ecosystem like HDFS, MapReduce, Yarn, Pig, Hive, Sqoop, Spark, Storm, Kafka, Oozie, Elastic search, HBase, Cassandra and Zookeeper.
- Excellent knowledge and working experience in SDLC & Agile, Waterfall methodologies.
- Excellent experience in Amazon, Cloudera and Hortonworks Hadoop distribution
- Maintaining and optimized AWS infrastructure (EMR EC2, S3, EBS, Redshift, and Elastic Search).
- Good working experience in using Spark SQL to manipulate Data Frames in Python.
- Experience in handling native drivers of MongoDB, The Drivers which include Java and Python.
- Hands on experience on NoSQL databases such as HBase, Cassandra- bit knowledge on MongoDB.
- Performing benchmarking of the NoSQL databases, Cassandra and HBase Integrating bulk data into Cassandra file system using MapReduce programs.
- Involving in creating data-models for customer data using Cassandra Query Language.
- Experience working with Cassandra and NoSQL database including MongoDB and Hbase.
- Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
- Experience with Apache Spark ecosystem using Spark-SQL, Data Frames, RDD's and knowledge on Spark MLlib.
- Experience in ingestion, storage, querying, processing and analysis of Big Data
- Hands on experience in Big Data including Apache Spark, Spark SQL, Spark Streaming.
- Knowledge on developing Spark Streaming jobs by using RDD’s and leverages Spark-Shell.
- Strong knowledge in MongoDB concepts includes CRUD operations and aggregation framework and in document Schema designs.
- Experience in maintenance/bug-fixing of web based applications in various platforms.
- Experience in managing life cycle of MongoDB including sizing, automation, monitoring and tuning.
- Experience in analyzing data using HiveQL, and custom MapReduce programs in Java.
- Hands on experience in importing and exporting data from different databases like Oracle, MySQL, into HDFS and Hive using Sqoop.
- Developed Hive queries for data analysis to meet the business requirements.
- Proficient in development methodologies such as Scrum, Agile, and Waterfall.
- Expertise in developing Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Experience using various Hadoop Distributions (Cloudera, Hortonworks, MapR) to fully implement.
- Working on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV.
- Experience in working with different scripting technologies like Python, Unix shell scripts.
- Expertise in Web pages' development using JSP, HTML, Java Script, JQuery and Ajax.
- Experience in writing database objects like Stored Procedures, Functions, Triggers, PL/SQL packages, Cursors, SQL Server, and Sybase databases.
- Implemented a few different Kafka clusters adoption occurs organically in different parts of the organization.
- Hands on experience with Spark architecture and its integrations like Spark SQL, Data Frames.
WORK EXPERIENCE:
Confidential - Boston, MA
Sr. Big Data / Hadoop Developer
Responsibilities:
- Worked as a Sr. Big Data/Hadoop Developer with Hadoop Ecosystems components like HBase, Sqoop, ZooKeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
- Experienced in writing Spark Applications in Scala and Python (PySpark).
- Involved in Agile methodologies, daily scrum meetings, sprint planning.
- Worked in Agile development environment having KANBAN methodology. Actively involved in daily Scrum and other design related meetings.
- Configured Hadoop clusters and coordinated with Big Data Admin for cluster maintenance.
- Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
- Installed and configured Hive, HDFS and the Nifi, implemented CDH cluster.
- Assisted with performance tuning and monitoring.
- Imported Avro files using Apache Kafka and did some analytics using Spark in Scala.
- Worked on Installing and configuring the HDP Hortonworks and Cloudera (CDH 5.5.1) Clusters in Dev and Production Environments.
- Worked with NoSQL databases HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
- Worked on NoSQL database MongoDB in storing images and URIs.
- Extracted real time data using Kafka and Spark streaming and converted them into RDD, processing it and stored it into Cassandra.
- Design solution for various system components using Microsoft Azure
- Configures Azure cloud services for endpoint deployment.
- Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoop cluster.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
- Involved in ingesting data into Cassandra and consuming the ingested data from Cassandra to Hadoop Data Lake.
- Multiple Spark Jobs were written to perform Data Quality checks on data before files were moved to Data Processing Layer.
- Involved in worked with integrate tools like Elastic Search with existing source systems.
- Used Elastic Search & MongoDB for storing and querying the offers and non-offers data.
- Worked on MongoDB for distributed storage and processing.
- Loaded JSON-Styled documents in NoSQL database like MongoDB and deployed the data in cloud storage service.
- Implemented test scripts to support test driven development and continuous integration.
- Implemented the Cassandra and manage of the other tools to process observed running on over Yarn.
- Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
- Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive and spark.
- Export result set from HIVE to MySQL using Sqoop export tool for further processing.
- Evaluated the performance of Apache Spark in analyzing genomic data.
- Implemented Hive complex UDF's to execute business logic with Hive Queries.
- Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Created, technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
- Used windows Azure SQL reporting services to create reports with tables, charts and maps
- Built the automated build and deployment framework using Jenkins, Maven etc.
- Involved in Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Deployed data from various sources into HDFS and building reports using Tableau.
- Extended Hive and Pig core functionality by writing custom UDF's using Java.
Environment: Hadoop 3.0, HBase, Sqoop 1.4, ZooKeeper 3.4, Oozie 4.3, Hive 2.3, Pig 0.17, Spark 2.3, Scala 2.12, Python 3.7, Agile, Apache Nifi 1.7, Apache Kafka 1.1, NoSQL, MongoDB 4.0, Cassandra 3.11, Microsoft Azure, MapReduce, Elastic Search 6.3, Yarn, Tableau, Java, Pig 0.17, MySQL
Confidential - St. Louis, MO
Sr. Big data / Hadoop Developer
Responsibilities:
- Worked as a Sr. Big/Hadoop Developer for providing solutions for big data problem.
- Involved in Various Stages of Software Development Life Cycle (SDLC) deliverables of the project using Agile Software development methodology.
- Started using Apache Nifi to copy the data from local file system to HDP.
- Responsibilities included resource management, client meetings, implementation and design, coordinating off shore teams, budgetary analysis and risk management.
- Followed Agile-Scrum project development methodology for implementation of projects, part of the daily scrum meetings and sprint meetings.
- Cluster management for Hadoop on Cloud, AWS Instances.
- Worked on customizing MapReduce code in Amazon EMR using Hive, Pig, Impala frameworks.
- Imported data from AWS S3 and into Spark RDD and performed transformations and actions on RDD's.
- Created real time data ingestion of structured and unstructured data using Kafka and Spark streaming to Hadoop and MemSQL.
- Implement solutions for ingesting data from various sources
- Worked with Data utilizing Big Data Technologies such as Hive, Spark, Pig, Sqoop, HBase, MapReduce, etc.
- Analyzed requirements and designed data model for Cassandra, Hive from the current relational database in Oracle.
- Loaded the customer profiles data, customer spending data, credit from legacy warehouses onto HDFS using Sqoop.
- Supporting Data analytics team providing various sources data in Hive using SparkSQL.
- Setup Architecture for big data capture, representation, information extraction and fusion.
- Created a Hive aggregator to update the Hive table after running the data profiling job.
- Analyzed large data sets by running Hive queries.
- Creating Hive tables, loading with data and writing Hive queries that will run internally in MapReduce way.
- Implemented Partitioning, Dynamic Partitioning and Bucketing in Hive.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Built reusable Hive UDF libraries for business requirements, which enabled users to use these UDF's in Hive Querying.
- Modeled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.
- Implemented a script to transmit sys print information from Oracle to HBase using Sqoop.
- Loaded JSON-Styled documents in NoSQL database like MongoDB and deployed the data in cloud storage service, Amazon Redshift.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports.
- Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Identified data sources, create source-to-target mapping, storage estimation, provide support for Hadoop cluster setup, data partitioning.
- Wrote DDL and DML files to create and manipulate tables in the database
- Developed the Unix shell/Python scripts for creating the reports from Hive data.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Environment: Agile, Apache Nifi 1.7, AWS, Hadoop 3.0, MapReduce, Amazon EMR, Hive 2.3, Pig 0.17, Impala, Kafka 1.1, Spark, Sqoop, Hbase, Cassandra 3.11, Oracle 12c, HDFS, Spark-SQL, JSON, NoSQL, MongoDB, Unix, Apache Flume
Confidential - Eden Prairie, MN
Sr. Java/Hadoop Developer
Responsibilities:
- Worked with Multi Clustered Hadoop Echo-System environment.
- Analyzed Object Oriented Design and presented with UML Sequence, Class Diagrams.
- Researched suitable technology for Hadoop migration considering current enterprise architecture.
- Worked on importing and exporting data, into & out of HDFS and Hive using Sqoop
- Worked on creating Hive tables and wrote Hive queries for data analysis to meet business requirements.
- Worked on extracting files from MongoDB through Sqoop and placed in HDFS and processed.
- Developed Hive UDF's to handle data quality and create filtered datasets for further processing.
- Planned Cassandra cluster which includes Data sizing estimation and identified hardware requirements based on the estimated data size and transaction volume
- Developed real time data processing applications by using Scala and Python
- Implemented Apache Spark Streaming from various streaming sources like Kafka and JMS.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
- Implemented application using MVC architecture integrating Hibernate and Spring frameworks.
- Used the Spark - Cassandra Connector to load data to and from Cassandra.
- Designed user-interface and checking validations using JavaScript.
- Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
- Developed components using Java multithreading concept.
- Involved in design of JSP's and Servlets for navigation among the modules.
- Designed cascading style sheets and XSLT and XML part of Order entry Module & Product Search Module and did client side validations with JavaScript.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Created ODBC connection through Sqoop between Hortonworks and SQL Server.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
- Worked on MongoDB database concepts such as locking, transactions, indexes, sharing, replication and schema design.
- Used Impala to read, write and query the Hadoop data in HDFS or HBase or Cassandra.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Environment: Hadoop, HDFS, Hive, Sqoop, MongoDB, Cassandra, Scala, Python, Apache Spark, Kafka, Hibernate, JavaScript, JDBC, Java, XSLT, XML, Hortonworks, SQL Server
Confidential - Wilmington, DE
Sr. Java / J2EE Developer
Responsibilities:
- Responsible for system analysis, design and development using J2EE architecture.
- Actively participated in requirements gathering, analysis, design and testing phases.
- Responsible for use case diagrams, class diagrams and sequence diagrams using Rational Rose in the Design phase.
- Implemented application using MVC architecture integrating Hibernate and Spring frameworks.
- Designed client application using Java Server Pages (JSP), Cascading Style Sheets (CSS) and XML.
- Implemented the Enterprise JavaBeans to handle various transactions.
- Worked on Linux environment and extensively configured in Linux.
- Developed Web Services to transfer data between client and server vice versa using Rest, SOAP, WSDL and UDDI.
- Used Java Financial platform built an application, which is an integration of technologies such as Struts and Web Flow.
- Designed the application by implementing Struts based on MVC Architecture, simple Java Beans as a Model, JSP UI Components as View and Action Servlets as a Controller.
- Developed MVC design pattern-based User Interface using JSP, XML, HTML and Struts.
- Developed custom validations and consumed Struts Validators framework validations to validate user input.
- Developed version control using Subversion.
- Used Spring Security for Authentication and authorization extensively.
- Used Spring Core for dependency injection/Inversion of Control (IoC).
- Used Log4j for debugging the issues and exceptions.
- Participated in designing Web services framework in support of the product.
- Was responsible to write complex SQL Stored Procedures to retrieve data from the SQL Server database.
- Involved End-to-End development by integration Front End and Backend by Debugging.
- Extensively written unit and integration test cases using mock objects and Junit.
- Used XML to transfer the application data between client and server.
- Used the JDBC for data retrieval from the database for various inquiries.
- Performed application design development maintenance enhancements and testing using JUnit framework.
- Used J2EE patterns such as Controller, Singleton, factory, MVC architecture is used in this application
- Implemented Spring Framework IOC (Inversion of Control) design pattern for relationship between application components.
- Used Hibernate for mapping claim data by connecting to Oracle database.
- Designed and developed the REST based Micro services using the Spring Boot, Spring Data with JPA.
- Used Hibernate extensively to have Database access mechanism with HQL (Hibernate query language) queries.
Environment: J2EE, MVC, Spring framework, JSP, CSS, XML, JavaBeans, Linux, Web Services, SOAP, Java, Struts, HTML, SQL, JDBC, JUnit, Oracle, Hibernate
Confidential
Java developer
Responsibilities:
- As a Java Developer involved in back-end and front-end developing team.
- Involved in the Software Development Life Cycle (SDLC) including Analysis, Design, Implementation
- Developed REST Web Services clients to consume those Web Services as well other enterprise wide Web Services.
- Implementation of Spring Restful web services which produces JSON.
- Responsible for maintaining the code quality, coding and implementation standards by code reviews.
- Developed the front end of the application using HTML, CSS, JSP and JavaScript.
- Created RESTFULL APIs using Spring MVC.
- Used JavaScript and AJAX technologies for front end user input validations and Spring validation framework for backend validation for the User Interface.
- Used both annotation based configuration and XML based.
- Developed application service components and configured beans using (applicationContext.xml) Spring IOC.
- Implemented persistence mechanism using Hibernate (ORM Mapping).
- Developed the DAO layer for the application using Spring Hibernate Template support.
- Used WebLogic workshop, Eclipse IDE to develop the application.
- Performed the code build and deployment using Maven.
- Used SVN version controller to maintain the code versions.
- Worked on web applications using open source MVC frameworks.
- Developed Web interface using JSP, Standard Tag Libraries (JSTL), and Spring Framework.
- Responsible for use case diagrams, class diagrams and sequence diagrams using Rational Rose in the Design phase.
- Responsible to write complex SQL and HQL queries to retrieve data from the Oracle database.
- Extensively written unit and integration test cases using mock objects and JUnit.
- Used XML to transfer the application data between client and server.
- Used the JDBC for data retrieval from the database for various inquiries.
- Developed ANT scripts that checkout code from SVN repository, build EAR files.
- Created tables, triggers, stored procedures, SQL queries, joins, integrity constraints and views for multiple databases.
- Used XML Web Services using SOAP to transfer information to the supply chain and domain expertise Monitoring Systems.
- Used JavaScript for client side validations and Struts Validation for server side Validations.
- Use Eclipse and Tomcat web server for developing & deploying the applications.
- Implemented logger for debugging and testing purposes using Log4j.
Environment: JSON, HTML, CSS, JSP, JavaScript, Spring MVC, AJAX, XML, Hibernate, Eclipse, Maven, JSTL, Oracle, JUnit, JDBC, ANT, SOAP, Apache Tomcat, Log4j
TECHNICAL SKILLS:
Hadoop/Big Data Technologies: Hadoop 3.0, HDFS, MapReduce, HBase 1.4, Apache Pig 0.17, Hive 2.3, Sqoop 1.4, Apache Impala 3.0, Oozie 4.3, Yarn, Apache Flume 1.8, Kafka 1.1, Zookeeper 3.4
Hadoop Distributions: Cloudera, Hortonworks, MapR
Cloud: AWS, Azure, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake and Data Factory.
Programming Language: Java, Scala 2.12, Python 3.6, SQL, PL/SQL, Shell Scripting, Storm 1.0, JSP, Servlets
Frameworks: Spring 5.0.5, Hibernate 5.2, Struts 1.3, JSF, EJB, JMS
Web Technologies: HTML5, CSS, JavaScript, JQuery 3.3, Bootstrap 4.1, XML, JSON, AJAX
Databases: Oracle 12c/11g, SQL
Database Tools: TOAD, SQL PLUS, SQL
Operating Systems: Linux, Unix, Windows 10/8/7
IDE and Tools: Eclipse 4.7, NetBeans 8.2NoSQL Databases: HBase 1.4, Cassandra 3.11, MongoDB
Web/Application Server: Apache Tomcat 9.0.7, JBoss, Web Logic, Web Sphere
SDLC Methodologies: Agile, Waterfall
Version Control: GIT, SVN, CVS, Maven
