Hadoop Developer Resume
Tampa, FL
PROFESSIONAL SUMMARY:
- 5+ years of professional experience in IT, which includes 3 years of experience in Hadoop development and Big Data.
- Strong experience in Hadoop Distributed File System and Ecosystem: Hive, Pig, HBase, Zookeeper, Sqoop, Impala and Flume.
- Worked with Big Data distributions Cloudera CDH5, CDH4, CDH3 and Hortonworks 2.5.
- Using Ambari configuring initial development environment using Hortonworks standalone sandbox and monitoring the Hadoop echo system.
- Hands on experience in installing, configuring and using Apache Hadoop ecosystems such as Hadoop Map Reduce (MR1), YARN (MR2), HIVE, PIG, SQOOP, SPARK, FLUME and OOZIE.
- Good knowledge in setting up, maintain and configuring the Hadoop cluster.
- Clear understanding on Hadoop architecture and various components such as HDFS, Job and Task Tracker, Name and Data Node, Secondary Name Node and Map Reduce programming.
- Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera, Hortonworks distributions and AWS.
- Hands - on experience in Sqoop which is used in importing and exporting of the data from HDFS to Relational Databases and vice-versa.
- Imported different kinds of data such as JSON, Log data into pig using different loaders available.
- Experience in ingesting streaming data into Hadoop using Spark, Storm Framework and Scala.
- Experienced in writing complex SQL queries using Inner, Outer and Cross Joins.
- Worked on all the major SQL -skills like Aggregates, Views, Database objects, stored procedures.
- Good in data extraction, manipulation, analyzing and validation of large volume of data.
- Having good experience on all flavors of Hadoop (Cloudera, Hortonworks , MapR) etc.
- Developed applications using Java, RDBMS, and Linux shell scripting.
- Developed custom UDF's for Pig and Hive using java to process and analyze the data.
- Experience in analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.
- Expertise in writing SQL queries using Teradata
- Worked on various databases such as MySQL, Oracle, MS-SQL SERVER.
- Worked on NoSQL databases including HBase, Cassandra.
- Developing various cross platform products while working with different Hadoop file formats like Sequence File, RC File, ORC, AVRO & Parquet.
- Performed different transformations and actions using PySpark to join, aggregate and calculate different statistical values of the data.
- Extensively used Scala, Spark improving the performance and optimization of the existing algorithms/queries in Hadoop and Hive using Spark Context, Spark-SQL (Data Frames and Datasets) and Pair RDD's.
- Familiar about using Dockers.
- Experience on cloud technologies like Amazon Web Services (AWS).
- Good knowledge on Apache Kafka and in configuring producers and consumers in it.
- Experience in converting Hive/SQL queries into Spark transformations using Java.
- Good knowledge on creating workflows to execute Pig, Hive jobs using Oozie Workflow Engine.
- Excellent Java development skills using J2EE, Servlets, Junit, JSP, JDBC.
- Expertise in Java Multithreading, Exception Handling, JSP, Servlets, JavaScript, JQuery, AJAX, CSS, HTML, Spring, Hibernate, Enterprise Java Beans, JDBC, RMI and XML related technologies.
- Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
- Excellent communication skills, team player, analytical, research-minded, technically competent, enjoy facing challenges, approach-oriented, positive minded.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume, Cassandra, Impala, Oozie, Zookeeper, Map R, Amazon Web Services, EMR, MR Unit, Spark, Storm, R studio.
Java & J2EE Technologies: Core Java, JDBC, Servlets, JSP, JNDI, Struts, Spring, Hibernate and Web Services (SOAP and Restful)
IDE s: Eclipse, MyEclipse, IntelliJ
Frameworks: MVC, Struts, Hibernate, Spring
Programming languages: C, C++, Java, Python, Linux shell scripts, R
Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, MongoDB, Graph DB
Web Servers: Web Logic, Web Sphere, Apache Tomcat
Web Technologies: HTML, XML, JavaScript, AJAX, Restful WS
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
ETL Tools: Informatica, QlikView and Cognos
PROFESSIONAL EXPERIENCE:
Confidential, Tampa, FL
Hadoop Developer
Responsibilities:
- Developed Map Reduce jobs in Java for data cleansing, preprocessing and implemented complex data analytical algorithms.
- Developed Map Reduce programs to join data from different data sources using optimized joins by implementing bucketed joins or map joins depending on the requirement.
- Imported data from structured data source into HDFS using Sqoop incremental imports.
- Implemented Kafka Custom partitioners to send data to different categorized topics.
- Experience in implementing Kafka Spouts for streaming data and different bolts to consume data.
- Created Hive tables, partitioners and implemented incremental imports to perform ad-hoc queries on structured data.
- Implemented Storm topology with Streaming group to perform real time analytical operations.
- Created Hive Generic UDF's to process business logic with Hive QL.
- Involved in optimizing Hive queries, improve performance by configuring Hive Query parameters.
- Extensively used Pig for data cleansing.
- Created risky segments using clustering algorithms for protecting the assets in reinsurance using Python, Scikit - learn.
- Imported data using Sqoop from Teradata using Teradata connector.
- Created several new analytical approaches from scratch across data mining, instance modeling, and predictive analytics using SQL, Python, Pandas, NumPy, SciPy, R, Apache Spark HDFS, S3, Scikit-learn.
- Import data using Sqoop to load data from Teradata to HDFS on regular basis.
- Developed and carried forward a coherent research strategy in predictive modeling, machine learning , big data analysis and environmental information systems with Python, SAS, and R programming.
- Involved in design of database objects, Dynamic SQL queries, Composite data types & Global Temporary Tables.
- Responsible for running Hadoop streaming jobs to process terabytes of XML Data.
- Development of Oozie workflow for orchestrating and scheduling the ETL process.
- Involved in implementation of Avro, ORC, and Parquet data formats for Apache Hive computations to handle the custom business requirements.
- Write Unix shell scripts in combination with the Talend data maps to process the source files and load into staging database.
- Involved in creation of virtual machines and infrastructure in the Azure Cloud environment.
- Experience in implementing Kafka consumers and producers by extending Kafka high -level API in java and ingesting data to HDFS or HBase depending on the context.
- Involved in developing Azure Web role and Worker roles.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
- Worked on creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
- Developed SQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce jobs.
- Involved in converting Map Reduce programs into Spark transformations using Spark RDD's and Python.
- Involved in implementing test cases, testing map reduce programs using MR Unit and another mocking frame works.
- Developed Spark scripts by using Python Shell commands as per the requirement.
- Involved in using Hadoop bench marks in monitoring, testing Hadoop cluster.
- Experience implementing machine learning techniques in spark by using spark M lib.
- Involved in cluster maintenance which includes adding, removing cluster nodes, cluster monitoring and troubleshooting, reviewing and managing data backups and Hadoop log files.
- Worked in retrieving transaction data from RDBMS to HDFS, get total transacted amount per user using MapReduce and save output in Hive table.
- Involved in implementing Maven build scripts, to work on maven projects and integrated with Jenkins.
Environment: Hadoop , Map Reduce, Hive, Pig, Spark, Avro, Kafka, Storm, Linux, Sqoop, Shell Scripting, Oozie, Python, Scala, Cassandra, Git, XML, Scala, Java, Maven, Eclipse, Oracle .
Confidential, King of Prussia, PA
Hadoop Developer
Responsibilities:
- Importing and exporting data into HDFS and Hive using Sqoop.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Migrated Map Reduce jobs to Spark Jobs to achieve better performance.
- Responsible for migrating the code base from Cloudera Platform to Amazon EMR and evaluated Amazon eco systems components like Redshift, Dynamo DB.
- Design and Implementation of Real time applications using Apache Storm, Trident Storm, Kafka, and Apache ignite Memory grid and Accumulo.
- Developed Map Reduce/ EMR jobs to analyze the data and provide heuristics and reports. The heuristics were used for improving campaign targeting and efficiency.
- Experienced with batch processing of data sources using Apache Spark, Elastic Search.
- Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Good knowledge in cloud integration with Amazon Elastic Map Reduce (EMR).
- Installing and maintaining the Hadoop - Spark cluster from the scratch in a plain Linux environment and defining the code outputs as PMML.
- Implemented Data loading using Spark, Storm, Kafka, Elastic Search.
- Experience in integrating Cassandra with Elastic Search and Hadoop .
- Stored data in AWS S3 similar to HDFS. Also, performed EMR programs on data stored in S3.
- Extensive experience in Spark Streaming (version 1.5.2) through core Spark API running Scala, Java & Python Scripts to transform raw data from several data sources into forming baseline data.
- Hands on expertise in running the SPARK & SPARK SQL on AMAZON ELASTIC MAPREDUCE (EMR).
- Implemented SPARK batch jobs on AWS instances through Amazon Simple Storage Service (Amazon S3).
- Performed performance tuning for Spark Steaming e.g. setting right Batch Interval time, correct level of Parallelism, selection of correct Serialization & memory tuning.
- Created HBase tables to store variable data formats of input data coming from different portfolios.
- Involved in adding huge volumes of data in rows and columns to store data in HBase.
- Used Spark API over Hadoop YARN to perform analytics on data in Hive.
- Developed Spark code using Scala and Spark -SQL for batch processing of data.
- Integration of Cassandra with Talend and automation of jobs.
- Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka.
- Design and develop T- SQL procedures, views, functions, and triggers to fetch data from different tables by using joins .
- Design and development of database operations in PostgreSQL.
- Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark -SQL, Data Frame, pair RDD's, Spark YARN.
- Hands on experience working on NoSQL databases like HBase and PostgreSQL.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Used Spark Data Frame API to process Structured and Semi Structured files and load them back into S3 Bucket.
- Designed application which receives data from several source systems and ingest to PostgreSQL database.
Environment : Hadoop , Map Reduce, Spark, Spark SQL, Kafka, Storm, HDFS, Hive, Pig, Sqoop, Oozie, Java, SQL, Python, Scala, Impala, AWS, Java, Shell script, Talend
Confidential
Java Developer
Responsibilities:
- Extensively used CVS for version control management.
- Extensive use of Maven to build and deploy the application into dev environment and QA environment and work with the front-end developers in displaying the data.
- Extensive use of Collection Framework features like Map, Object, List to retrieve the data from Web Service, manipulate the data to in corporate Business Logic and save the data to Oracle database.
- Used Multithreading to simultaneously process tables as and when a specific user data is completed in one table.
- Used struts tiles framework for layout management.
- Used Tortoise SVN, GIT for the repository management.
- Consume Web Services using java to retrieve the required information to be populated in the database.
- Set up Jenkins server and build jobs to provide continuous automated builds.
- Worked on Asset Management Module in order to develop services using RESTful Web services.
- Use of SOAPUI to verify the WSDL end point URL.
- Extensive use of core java features like multithreading, Angular JS, Reacts JS, DHTML, caching, Hibernate, messaging to develop middleware for the application.
- Wrote PL/SQL Stored Procedure using TOAD for archiving data on a daily basis for a monthly report and scheduled the job using DBMS Scheduler.
- Externalized Business Logic from code using database to store dynamic rules based on UI used by business and used these rules in the code to apply business logic to data.
- Implemented routing logic and navigation from screen to screen and also implemented login functionality on the client side in Angular Js .
Environment: Web Logic 9.2, Spring 3.0, HTML, Angular JS, Hibernate, Spring MVC, DHTML, Core Java , Struts, JDBC, Jenkins, Maven, Servlets, RESTful Web Services, Reacts JS, My SQL.
Confidential
Java developer
Responsibilities:
- Involving in design, development, testing and implementation of the process systems, working on iterative life cycles business requirements, and creating Detail Design Document.
- Developed JAX-RS RESTful web services that consumes and produces both XML and JSON content using jersey to retrieve specific details for Case Management System products
- Configured JPA Persistence API to interact with Oracle 11g database and Hibernate as platform and created POJO's classes as JPA entities.
- Converted XML into JAVA objects using JAXB API.
- Involved in development of the application using Spring Web MVC and other components of the Spring Framework.
- Used Hibernate to store the persistent data as an Object-Relational Mapping (ORM) tool for communicating with database.
- Used UNIX shell scripts to deploy the application on Amazon web server.
- Developed User interface using HTML, CSS, JavaScript, and CSS, Bootstrap, Ajax and JSON.
- Used jQuery to perform the AJAX calls and to load the surveys.
- Extensively used Alpaca forms for various form fields to fetch the inputs from the user/customer.
- Written Embedded JS to combine data and a template to produce HTML.
- Applied AngularJS to define a route to the REST services and render the Ej's templates.
- Responsible for developing new REST APIs for utilizing JAX-RS on Web Sphere.
- Utilized Web Logic application server to build and deploy the enterprise application.
- Utilized Alpaca forms to create interactive HTML5 forms with jQuery.
- Used GitHub Repository to check in, check out, and merge code, issue tracking and wikis.
- Used Maven to build and deploy the application.
Environment: Java , J2EE, JSP, Struts, Hibernate, AngularJS, JUnit, MVC, Eclipse, AJAX, Apache Tomcat, Log4J, SVN, MySQL, HTML, CSS, JavaScript.