Sr. Big Data/hadoop Developer Resume
Charleston, SC
SUMMARY:
- Overall 9 years of experience in various IT related technologies, which includes hands - on experience in Big Data & Java/J2EE technologies.
- Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Rich working experience in data loading in hive tables and writing hive queries using join, order by, group by etc., by Sqoop data from RDBMS.
- Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
- Strong experience on Hadoop distributions like Cloudera, MapR and Hortonworks.
- Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB.
- Apache Spark concepts with Scala, writing transformations in Scala for live streaming data. Click stream analysis using Spark with Scala involving data gathering from Kafka, Flume
- Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, Apache parquet and Avro.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa according to client's requirement.
- Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
- Written Scala codes for data analytics in Spark using MapReduce, ByKey, group ByKey etc. to analyze the real time streaming data.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC, SOAP and RESTful web services.
- Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle.
- Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
- Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
- Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.
- Experience in using various IDEs Eclipse, Intellij and repositories SVN and Git.
- Experience of using build tools Ant, Maven.
- Strong knowledge of Spark for handling large data processing in streaming process along with Scala.
- Experience in designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Spark, Kafka, Storm and Zookeeper.
Languages: C, Java, Python, Scala, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts
Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery
Frameworks: MVC, Struts, Spring, Hibernate
NoSQL Databases: HBase, Cassandra, MongoDB
Operating Systems: HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8
Web Technologies: HTML, DHTML, XML, AJAX, WSDL.
Web/Application servers: Apache Tomcat, WebLogic, JBoss.
Databases: Oracle, DB2, SQL Server, MySQL, Teradata
Tools and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer
Version control: SVN, Confidential, GIT
Web Services: REST, SOAP
PROFESSIONAL EXPERIENCE:
Confidential - Charleston, SC
Sr. Big Data/Hadoop Developer
Responsibilities:
- Worked as a Sr. Big Data/Hadoop Developer with Hadoop Ecosystems components like HBase, Sqoop, Zookeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
- Involved in Agile development methodology active member in scrum meetings.
- Worked in Azure environment for development and deployment of Custom Hadoop Applications.
- Designed and implemented scalable Cloud Data and Analytical architecture solutions for various public and private cloud platforms using Azure.
- Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, MapReduce, Spark and Shells scripts.
- Implemented various Azure platforms such as Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake and Data Factory.
- Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
- Manage and support of enterprise Data Warehouse operation, big data advanced predictive application development using Cloudera & Hortonworks HDP.
- Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
- Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine learning use cases under Spark ML and MLlib.
- Installed Hadoop, Map Reduce, HDFS, Azure to develop multiple MapReduce jobs in PIG and Hive for data cleansing and pre-processing.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Developed a Spark job in Java which indexes data into Elastic Search from external Hive tables which are in HDFS.
- Performed transformations, cleaning and filtering on imported data using Hive, MapReduce, and loaded final data into HDFS.
- Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Import the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
- Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
- Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
- Performed transformations like event joins, filter boot traffic and some pre-aggregations using Pig.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Explored MLlib algorithms in Spark to understand the possible Machine Learning functionalities that can be used for our use case
- Used windows Azure SQL reporting services to create reports with tables, charts and maps.
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
- Developed code in Java which creates mapping in Elastic Search even before data is indexed into.
- Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
- Imported and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Azure, Hadoop 3.0, Sqoop 1.4.6, PIG 0.17, Hive 2.3, MapReduce, Spark 2.2.1, Shells scripts, SQL, Hortonworks, Python, MLlib, HDFS, YARN, Java, Kafka 1.0, Cassandra 3.11, Oozie, Agile
Confidential - Deerfield, IL
Sr. Big Data/Hadoop Developer
Responsibilities:
- Worked as a Big/Hadoop Developer for providing solutions for big data problem.
- Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.
- Design, Architect, and help Maintain scalable solutions on the big data analytics platform for enterprise module.
- Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
- Created real time data ingestion of structured and unstructured data using Kafka and Spark streaming to Hadoop and MemSQL.
- Populate the data into dimensions and fact tables, efficiently involved in creating Talend Mappings.
- Started using Apache Nifi to copy the data from local file system to HDP.
- Imported data from AWS S3 and into Spark RDD and performed transformations and actions on RDD's.
- Migrated physical data center environment to AWS also designed, built, and deployed a multitude applications utilizing almost all of the AWS stack (EC2, S3, RDS)
- Implement solutions for ingesting data from various sources and processing the Data utilizing Big Data Technologies.
- Use Input and Output data as delimited files into HDFS using Talend Big data studio with different Hadoop Component.
- Developed Scala scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Create a table inside RDBMS, insert some data after load the same table into HDFS, Hive using Sqoop.
- Work with Business stakeholder and translate Business objectives, requirements into technical requirements and design.
- Defined the application architecture and design for Big Data Hadoop initiative to maintain structured and unstructured data; create reference architecture for the enterprise.
- Identify data sources, create source-to-target mapping, storage estimation, provide support for Hadoop cluster setup, data partitioning.
- Developed scripts for data ingestion using Sqoop and Flume, Spark SQL and Hive queries for analyzing the data, and Performance optimization
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in Amazon EMR.
- Wrote DDL and DML files to create and manipulate tables in the database
- Developed the Unix shell/Python scripts for creating the reports from Hive data.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Analyzed data using Hadoop components Hive and Pig and created tables in hive for the end users
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Environment: Agile, Hive 2.3, Pig 0.17, Kafka, Spark, Apache Nifi, AWS, HDFS, Scala, Zookeeper, Sqoop, HBase, Sqoop, Spark SQL, Amazon EMR, Apache Flume
Confidential - Rocky Hill, CT
Sr. Java/Hadoop Developer
Responsibilities:
- Worked as Java/Hadoop Developer and responsible for taking care of everything related to the clusters.
- Developed Spark scripts by using Java, and Python shell commands as per the requirement.
- Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SQL Context.
- Performed analysis on implementing Spark using Scala.
- Used Data frames/ Datasets to write SQL type queries using Spark SQL to work with datasets sitting on HDFS.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Created and imported various collections, documents into MongoDB and performed various actions like query, project, aggregation, sort and limit.
- Extensively experienced in deploying, managing and developing MongoDB clusters.
- Created Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team.
- Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS.
- Implemented some of the big data operations on AWS cloud.
- Used Hibernate reverse engineering tools to generate domain model classes, perform association mapping and inheritance mapping using annotations and XML.
- Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
- Maintained the cluster securely using Kerberos and making the cluster up and running all the times.
- Have an experience to load and transform large sets of structured, semi structured and unstructured data, using Sqoop from Hadoop Distributed File Systems to Relational Database Systems.
- Created Hive tables to store the processed results in a tabular format.
- Used Hive QL to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Performed data transformations by writing MapReduce as per business requirements.
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Involved in various NoSQL databases like HBase, Cassandra in implementing and integration.
- Queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
Environment: Java, Spark, Python, HDFS, YARN, Hive, Scala, SQL, MongoDB, Sqoop, AWS, Pig, MapReduce, Cassandra, NoSQL
Confidential - Philadelphia, PA
Sr. Java/J2EE Developer
Responsibilities:
- Worked on developing the application involving Spring MVC implementations and Restful web services.
- Responsible for designing Rich user Interface Applications using JavaScript, CSS, HTML, XHTML and AJAX.
- Developed the spring AOP programming to configure logging for the application
- Involved in the analysis, design, and development and testing phases of Software Development Life Cycle (SDLC).
- Developed code using Core Java to implement technical enhancement following Java Standards.
- Worked with Swing and RCP using Oracle ADF to develop a search application which is a migration project.
- Implemented Hibernate utility classes, session factory methods, and different annotations to work with back end data base tables.
- Implemented Ajax calls using JSF-Ajax integration and implemented cross-domain calls using JQuery Ajax methods.
- Implemented Object-relational mapping in the persistence layer using Hibernate frame work in conjunction with spring functionality.
- Used JPA (Java Persistence API) with Hibernate as Persistence provider for Object Relational mapping.
- Used JDBC and Hibernate for persisting data to different relational databases.
- Developed and implemented Swing, spring and J2EE based MVC (Model-View-Controller) framework for the application
- Implemented application level persistence using Hibernate and spring.
- Data Warehouse (DW) data integrated from different sources in different format (PDF, TIFF, JPEG, web crawl and RDBMS data MySQL, oracle, Sql server etc.)
- Used XML and JSON for transferring/retrieving data between different Applications.
- Also wrote some complex PL/SQL queries using Joins, Stored Procedures, Functions, Triggers, Cursors, and Indexes in Data Access Layer.
- Implementing Restful web services architecture for Client-server interaction and implemented respective POJOs for its implementations
- Designed and developed SOAP Web Services using CXF framework for communicating application services with different application and developed web services interceptors.
- Implemented the project using JAX-WS based Web Services using WSDL, UDDI, and SOAP to communicate with other systems.
- Involved in writing application level code to interact with APIs, Web Services using AJAX, JSON and XML.
- Wrote JUnit test cases for all the classes. Worked with Quality Assurance team in tracking and fixing bugs.
- Developed back end interfaces using embedded SQL, PL/SQL packages, stored procedures, Functions, Procedures, Exceptions Handling in PL/SQL programs, Triggers.
- Used Log4j to capture the log that includes runtime exception and for logging info.
- Used ANT as build tool and developed build file for compiling the code of creating WAR files.
- Used Tortoise SVN for Source Control and Version Management.
- Responsibilities include design for future user requirements by interacting with users, as well as new development and maintenance of the existing source code.
Environment: JDK 1.5, Servlets, JSP, XML, JSF, Web Services (JAX-WS: WSDL, SOAP), Spring MVC, JNDI, Hibernate 3.6, JDBC, SQL, PL/SQL, HTML, DHTML, JavaScript, Ajax, Oracle 10g, SOAP, SVN, SQL, Log4j, ANT.
Confidential
Java Developer
Responsibilities:
- Involved in various Software Development Life Cycle (SDLC) phases of the project which was modeled using Rational Unified Process (RUP)
- Prepared high level technical documents by analyzing the user requirements and implementing the use cases.
- Implement DAO pattern for database connectivity and Hibernate for object persistence.
- Used Maven for build and Jenkins as the continuous integration tool for the application development
- Used WebLogic application server for deploying in dev environments and used Apache Tomcat in local environment.
- Responsible for the design and development of data loader and data exporter with file feed interface.
- Troubleshooting and debugging applications and providing fixes in a timely manner.
- Involved in SDLC stages of application including Requirements analysis, Implementation, Design and Testing.
- Extensively Used Spring MVC Framework for Controlling the Application.
- Extensively used Spring RESTful web services for designing the end points.
- Developed Web applications using Spring Core, Spring MVC, Apache Tomcat, JSTL and spring tag libraries.
- Developed the web interface using HTML, CSS, JavaScript, JQuery, AngularJS, and Bootstrap
- Used Ant to build and package the application.
- Used XML for data loading and reading from different sources.
- Enhance and modify the presentation layer and GUI framework that are written using JSP and client-side validations done using JavaScript & design enhanced wireframe screens.
- Deployed the Application on Tomcat server.
- Used Eclipse as IDE to write the code and debug application using separate log files.
- Wrote unit and system test cases for modified processes and Continuous Integration with the help of QC team and Configuration team on timely manner.
- Successfully involved in test driven development model using JUnit.
- Developed JMS Sender and Receivers for the loose coupling between the other modules and Implemented asynchronous request processing using Message Driven Bean.
- Developed XML configuration files, properties files used in Spring framework for validating Form inputs on server side.
- Involved in deployment of application on WebLogic Application Server in Development & QA environment.
- Used Log4j for External Configuration Files and debugging.
- Developed GIT controls to track and maintain the different version of the project.
Environment: Hibernate, Maven, Jenkins, Apache Tomcat, MVC, HTML, CSS, JavaScript, JQuery, AngularJS, Bootstrap, Ant, XML, Eclipse, JUnit