Hadoop Developer Resume
Tampa, FL
SUMMARY:
- 5+ years of professional experience in IT, which includes three years of experience in Hadoop development and Big Data.
- Strong experience in Hadoop Distributed File System and Ecosystem: Hive, Pig, HBase, Zookeeper, Sqoop, Impala and Flume.
- Worked with Big Data distributions Cloudera CDH5, CDH4, CDH3 and Hortonworks 2.5.
- Using Ambari configuring initial development environment using Hortonworks standalone sandbox and monitoring the Hadoop echo system.
- Hands on experience in installing, configuring and using Apache Hadoop ecosystems such as Hadoop Map Reduce (MR1), YARN (MR2), HIVE, PIG, SQOOP, SPARK, FLUME and OOZIE.
- Good knowledge in setting up, maintain and configuring the Hadoop cluster.
- Clear understanding on Hadoop architecture and various components such as HDFS, Job and Task Tracker, Name and Data Node, Secondary Name Node and Map Reduce programming.
- Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera, Hortonworks distributions and AWS.
- Hands - on experience in Sqoop which is used in importing and exporting of the data from HDFS to Relational Databases and vice-versa.
- Imported different kinds of data such as JSON, Log data into pig using different loaders available.
- Experience in ingesting streaming data into Hadoop using Spark, Storm Framework and Scala.
- Good in data extraction, manipulation, analyzing and validation of large volume of data.
- Having good experience on all flavors of Hadoop (Cloudera, Hortonworks, MapR) etc.
- Developed applications using Java, RDBMS, and Linux shell scripting.
- Developed custom UDF's for Pig and Hive using java to process and analyze the data.
- Experience in analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.
- Worked on various databases such as MySQL, Oracle, MS-SQL SERVER.
- Worked on NoSQL databases including HBase, Cassandra.
- Developing various cross platform products while working with different Hadoop file formats like Sequence File, RC File, ORC, AVRO & Parquet.
- Performed different transformations and actions using PySpark to join, aggregate and calculate different statistical values of the data.
- Familiar about using Dockers.
- Experience on cloud technologies like Amazon Web Services (AWS).
- Good knowledge on Apache Kafka and in configuring producers and consumers in it.
- Experience in converting Hive/SQL queries into Spark transformations using Java.
- Good knowledge on creating workflows to execute Pig, Hive jobs using Oozie Workflow Engine.
- Excellent Java development skills using J2EE, Servlets, Junit, JSP, JDBC.
- Expertise in Java Multithreading, Exception Handling, JSP, Servlets, JavaScript, JQuery, AJAX, CSS, HTML, Spring, Hibernate, Enterprise Java Beans, JDBC, RMI and XML related technologies.
- Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
- Excellent communication skills, team player, analytical, research-minded, technically competent, enjoy facing challenges, approach-oriented, positive minded.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, Mapreduce, HBase, Pig, Hive, Sqoop, Flume, Cassandra, Impala, Oozie, Zookeeper, MapR, Amazon Web Services, EMR, MRUnit, Spark, Storm,R, R studio.
Java & J2EE Technologies: Core Java, JDBC, Servlets, JSP, JNDI, Struts, Spring, Hibernate and Web Services (SOAP and Restful)
IDE’s: Eclipse, MyEclipse, IntelliJ
Frameworks: MVC, Struts, Hibernate, Spring
Programming languages: C,C++, Java, Python, Linux shell scripts, R,
Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, MongoDB, Graph DB
Web Servers: Web Logic, Web Sphere, Apache Tomcat
Web Technologies: HTML, XML, JavaScript, AJAX, Restful WS
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
ETL Tools: Informatica, Qlikview and Cognos
PROFESSIONAL EXPERIENCE:
Confidential, Tampa, FL
Hadoop Developer
Responsibilities:
- Developed Map Reduce jobs in Java for data cleansing, preprocessing and implemented complex data analytical algorithms.
- Developed Map Reduce programs to join data from different data sources using optimized joins by implementing bucketed joins or map joins depending on the requirement.
- Imported data from structured data source into HDFS using Sqoop incremental imports.
- Implemented Kafka Custom partitioners to send data to different categorized topics.
- Experience in implementing Kafka Spouts for streaming data and different bolts to consume data.
- Created Hive tables, partitioners and implemented incremental imports to perform ad-hoc queries on structured data.
- Implemented Storm topology with Streaming group to perform real time analytical operations.
- Created Hive Generic UDF's to process business logic with Hive QL.
- Involved in optimizing Hive queries, improve performance by configuring Hive Query parameters.
- Extensively used Pig for data cleansing.
- Created risky segments using clustering algorithms for protecting the assets in reinsurance using Python, Scikit - learn.
- Created several new analytical approaches from scratch across data mining, instance modeling, and predictive analytics using SQL, Python, Pandas, NumPy, SciPy, R, Aapache Spark HDFS, S3, Scikit-learn.
- Developed and carried forward a coherent research strategy in predictive modeling, machine learning, big data analysis and environmental information systems with Python, SAS, and R programming.
- Responsible for running Hadoop streaming jobs to process terabytes of XML Data.
- Development of Oozie workflow for orchestrating and scheduling the ETL process.
- Involved in implementation of Avro, ORC, and Parquet data formats for Apache Hive computations to handle the custom business requirements.
- Write Unix shell scripts in combination with the Talend data maps to process the source files and load into staging database.
- Involved in creation of virtual machines and infrastructure in the Azure Cloud environment.
- Experience in implementing Kafka consumers and producers by extending Kafkahigh-level API in java and ingesting data to HDFS or HBase depending on the context.
- Involved in developing Azure Web role and Worker roles.
- Worked on creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
- Developed SQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce jobs.
- Involved in converting Map Reduce programs into Spark transformations using Spark RDD's and Python.
- Involved in implementing test cases, testing map reduce programs using MRUnit and other mocking frame works.
- Developed Spark scripts by using Python Shell commands as per the requirement.
- Involved in using Hadoop bench marks in monitoring, testing Hadoop cluster.
- Experience implementing machine learning techniques in spark by using spark Mlib.
- Involved in cluster maintenance which includes adding, removing cluster nodes, cluster monitoring and troubleshooting, reviewing and managing data backups and Hadoop log files.
- Worked in retrieving transaction data from RDBMS to HDFS, get total transacted amount per user using MapReduce and save output in Hive table.
- Involved in implementing Maven build scripts, to work on maven projects and integrated with Jenkins.
Environment: Hadoop, Map Reduce, Hive, Pig, Spark, Avro, Kafka, Storm, Linux, Sqoop, Shell Scripting, Oozie, Python, Scala, Cassandra, Git, XML, Scala, Java, Maven, Eclipse, Oracle .
Confidential, New Jersey
Hadoop Developer
Responsibilities:
- Importing and exporting data into HDFS and Hive using Sqoop.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Migrated Map Reduce jobs to Spark Jobs to achieve better performance.
- Responsible for migrating the code base from Cloudera Platform to Amazon EMR and evaluated Amazon eco systems components like Redshift, Dynamo DB.
- Design and Implementation of Real time applications using Apache Storm, Trident Storm, Kafka, and Apache ignite Memory grid and Accumulo.
- Developed Map Reduce/ EMR jobs to analyze the data and provide heuristics and reports. The heuristics were used for improving campaign targeting and efficiency.
- Experienced with batch processing of data sources using Apache Spark, Elastic Search.
- Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Good knowledge in cloud integration with Amazon Elastic Map Reduce (EMR).
- Installing and maintaining the Hadoop - Spark cluster from the scratch in a plain Linux environment and defining the code outputs as PMML.
- Implemented Data loading using Spark, Storm, Kafka, Elastic Search.
- Experience in integrating Cassandra with Elastic Search and Hadoop .
- Stored data in AWS S3 similar to HDFS. Also, performed EMR programs on data stored in S3.
- Extensive experience in Spark Streaming (version 1.5.2) through core Spark API running Scala, Java & Python Scripts to transform raw data from several data sources into forming baseline data.
- Hands on expertise in running the SPARK & SPARK SQL on AMAZON ELASTIC MAPREDUCE (EMR).
- Implemented SPARK batch jobs on AWS instances through Amazon Simple Storage Service (Amazon S3).
- Performed performance tuning for Spark Steaming e.g. setting right Batch Interval time, correct level of Parallelism, selection of correct Serialization & memory tuning.
- Created HBase tables to store variable data formats of input data coming from different portfolios.
- Involved in adding huge volumes of data in rows and columns to store data in HBase.
- Used Spark API over Hadoop YARN to perform analytics on data in Hive.
- Developed Spark code using Scala and Spark -SQL for batch processing of data.
- Integration of Cassandra with Talend and automation of jobs.
- Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka.
- Design and development of database operations in PostgreSQL.
- Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark -SQL, Data Frame, pair RDD's, Spark YARN.
- Hands on experience working on NoSQL databases like HBase and PostgreSQL.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Used Spark Data Frame API to process Structured and Semi Structured files and load them back into S3 Bucket.
- Designed application which receives data from several source systems and ingest to PostgreSQL database.
Environment : Hadoop, Map Reduce, Spark, Spark SQL, Kafka, Storm, HDFS, Hive, Pig, Sqoop, Oozie, Java, SQL, Python, Scala, Impala, AWS, Java, Shell script, Talend
Confidential
Hadoop Developer
Responsibilities:
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Involved in writing HiveQL to extract, transform and load the data into Database.
- Performed Hive partitioning, bucketing and executing different types of joins on Hive tables and implementing Hive Serves like JSON and Avro.
- Fine tuning hive jobs for optimized performance.
- Developed Pig Latin scripts for transformations, event joins, filter.
- Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing.
- Experienced in working with Apache Storm.
- Involved in designing, reviewing, optimizing data transformation processes using Apache Storm.
- Developed SparkSQL to load tables into HDFS to run select queries on top.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
- Imported data from Kafka consumer into HBase using spark streaming.
- Experienced in using Zookeeper and Oozie Operational Services for coordinating the cluster and scheduling workflows.
- Automated all the jobs, for ingesting data from relational databases to load data into Hive tables, using Oozie workflows.
- Performance tuning of queries in Impala for faster retrieval.
- Created Hive, HBase tables and HBase integrated Hive tables as per the design using ORC file format and Snappy compression.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific job.
- Experience in HBase database manipulation with structured, unstructured and semi-structured datasets.
- Involved in ETL, Data Integration, Ingestion and Migration.
- Implemented ETL pipelines to ingest data from traditional EDW Hadoop .
- Generated various marketing reports using Tableau with Hadoop as a source for data.
- Implemented Continuous Integration using Jenkins.
- Involved in daily SCRUM meetings to discuss the development/progress and was active in making Scrum meetings more productive.
Environment: Apache Hadoop, Apache Hive, HDFS, Map Reduce, Pig, Spark, Storm, Impala, Oozie, Flume, Kafka, Tableau, Jenkins, Java, Python, Linux, Agile.
Confidential
Java developer
Responsibilities:
- Involving in design, development, testing and implementation of the process systems, working on iterative life cycles business requirements, and creating Detail Design Document.
- Developed JAX-RS RESTful web services that consumes and produces both XML and JSON content using jersey to retrieve specific details for Case Management System products
- Configured JPA Persistence API to interact with Oracle 11g database and Hibernate as platform and created POJO's classes as JPA entities.
- Converted XML into JAVA objects using JAXB API.
- Involved in development of the application using Spring Web MVC and other components of the Spring Framework.
- Used Hibernate to store the persistent data as an Object-Relational Mapping (ORM) tool for communicating with database.
- Used UNIX shell scripts to deploy the application on Amazon web server.
- Developed User interface using HTML, CSS, JavaScript, and CSS, Bootstrap, Ajax and JSON.
- Used jQuery to perform the AJAX calls and to load the surveys.
- Extensively used Alpaca forms for various form fields to fetch the inputs from the user/customer.
- Written Embedded JS to combine data and a template to produce HTML.
- Applied AngularJS to define a route to the REST services and render the Ej's templates.
- Responsible for developing new REST APIs for utilizing JAX-RS on Web Sphere.
- Utilized Web Logic application server to build and deploy the enterprise application.
- Utilized Alpaca forms to create interactive HTML5 forms with jQuery.
- Used GitHub Repository to check in, check out, and merge code, issue tracking and wikis.
- Used Maven to build and deploy the application.
Environment: Java, J2EE, JSP, Struts, Hibernate, AngularJS, JUnit, MVC, Eclipse, AJAX, Apache Tomcat, Log4J, SVN, MySQL, HTML, CSS, JavaScript.
