Hadoop Developer Resume Tampa, FL - Hire IT People

SUMMARY:

5+ years of professional experience in IT, which includes three years of experience in Hadoop development and Big Data.
Strong experience in Hadoop Distributed File System and Ecosystem: Hive, Pig, HBase, Zookeeper, Sqoop, Impala and Flume.
Worked with Big Data distributions Cloudera CDH5, CDH4, CDH3 and Hortonworks 2.5.
Using Ambari configuring initial development environment using Hortonworks standalone sandbox and monitoring the Hadoop echo system.
Hands on experience in installing, configuring and using Apache Hadoop ecosystems such as Hadoop Map Reduce (MR1), YARN (MR2), HIVE, PIG, SQOOP, SPARK, FLUME and OOZIE.
Good knowledge in setting up, maintain and configuring the Hadoop cluster.
Clear understanding on Hadoop architecture and various components such as HDFS, Job and Task Tracker, Name and Data Node, Secondary Name Node and Map Reduce programming.
Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera, Hortonworks distributions and AWS.
Hands - on experience in Sqoop which is used in importing and exporting of the data from HDFS to Relational Databases and vice-versa.
Imported different kinds of data such as JSON, Log data into pig using different loaders available.
Experience in ingesting streaming data into Hadoop using Spark, Storm Framework and Scala.
Good in data extraction, manipulation, analyzing and validation of large volume of data.
Having good experience on all flavors of Hadoop (Cloudera, Hortonworks, MapR) etc.
Developed applications using Java, RDBMS, and Linux shell scripting.
Developed custom UDF's for Pig and Hive using java to process and analyze the data.
Experience in analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.
Worked on various databases such as MySQL, Oracle, MS-SQL SERVER.
Worked on NoSQL databases including HBase, Cassandra.
Developing various cross platform products while working with different Hadoop file formats like Sequence File, RC File, ORC, AVRO & Parquet.
Performed different transformations and actions using PySpark to join, aggregate and calculate different statistical values of the data.
Familiar about using Dockers.
Experience on cloud technologies like Amazon Web Services (AWS).
Good knowledge on Apache Kafka and in configuring producers and consumers in it.
Experience in converting Hive/SQL queries into Spark transformations using Java.
Good knowledge on creating workflows to execute Pig, Hive jobs using Oozie Workflow Engine.
Excellent Java development skills using J2EE, Servlets, Junit, JSP, JDBC.
Expertise in Java Multithreading, Exception Handling, JSP, Servlets, JavaScript, JQuery, AJAX, CSS, HTML, Spring, Hibernate, Enterprise Java Beans, JDBC, RMI and XML related technologies.
Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
Excellent communication skills, team player, analytical, research-minded, technically competent, enjoy facing challenges, approach-oriented, positive minded.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, Mapreduce, HBase, Pig, Hive, Sqoop, Flume, Cassandra, Impala, Oozie, Zookeeper, MapR, Amazon Web Services, EMR, MRUnit, Spark, Storm,R, R studio.

Java & J2EE Technologies: Core Java, JDBC, Servlets, JSP, JNDI, Struts, Spring, Hibernate and Web Services (SOAP and Restful)

IDE’s: Eclipse, MyEclipse, IntelliJ

Frameworks: MVC, Struts, Hibernate, Spring

Programming languages: C,C++, Java, Python, Linux shell scripts, R,

Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, MongoDB, Graph DB

Web Servers: Web Logic, Web Sphere, Apache Tomcat

Web Technologies: HTML, XML, JavaScript, AJAX, Restful WS

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

ETL Tools: Informatica, Qlikview and Cognos

PROFESSIONAL EXPERIENCE:

Confidential, Tampa, FL

Hadoop Developer

Responsibilities:

Developed Map Reduce jobs in Java for data cleansing, preprocessing and implemented complex data analytical algorithms.
Developed Map Reduce programs to join data from different data sources using optimized joins by implementing bucketed joins or map joins depending on the requirement.
Imported data from structured data source into HDFS using Sqoop incremental imports.
Implemented Kafka Custom partitioners to send data to different categorized topics.
Experience in implementing Kafka Spouts for streaming data and different bolts to consume data.
Created Hive tables, partitioners and implemented incremental imports to perform ad-hoc queries on structured data.
Implemented Storm topology with Streaming group to perform real time analytical operations.
Created Hive Generic UDF's to process business logic with Hive QL.
Involved in optimizing Hive queries, improve performance by configuring Hive Query parameters.
Extensively used Pig for data cleansing.
Created risky segments using clustering algorithms for protecting the assets in reinsurance using Python, Scikit - learn.
Created several new analytical approaches from scratch across data mining, instance modeling, and predictive analytics using SQL, Python, Pandas, NumPy, SciPy, R, Aapache Spark HDFS, S3, Scikit-learn.
Developed and carried forward a coherent research strategy in predictive modeling, machine learning, big data analysis and environmental information systems with Python, SAS, and R programming.
Responsible for running Hadoop streaming jobs to process terabytes of XML Data.
Development of Oozie workflow for orchestrating and scheduling the ETL process.
Involved in implementation of Avro, ORC, and Parquet data formats for Apache Hive computations to handle the custom business requirements.
Write Unix shell scripts in combination with the Talend data maps to process the source files and load into staging database.
Involved in creation of virtual machines and infrastructure in the Azure Cloud environment.
Experience in implementing Kafka consumers and producers by extending Kafkahigh-level API in java and ingesting data to HDFS or HBase depending on the context.
Involved in developing Azure Web role and Worker roles.
Worked on creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
Developed SQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce jobs.
Involved in converting Map Reduce programs into Spark transformations using Spark RDD's and Python.
Involved in implementing test cases, testing map reduce programs using MRUnit and other mocking frame works.
Developed Spark scripts by using Python Shell commands as per the requirement.
Involved in using Hadoop bench marks in monitoring, testing Hadoop cluster.
Experience implementing machine learning techniques in spark by using spark Mlib.
Involved in cluster maintenance which includes adding, removing cluster nodes, cluster monitoring and troubleshooting, reviewing and managing data backups and Hadoop log files.
Worked in retrieving transaction data from RDBMS to HDFS, get total transacted amount per user using MapReduce and save output in Hive table.
Involved in implementing Maven build scripts, to work on maven projects and integrated with Jenkins.

Environment: Hadoop, Map Reduce, Hive, Pig, Spark, Avro, Kafka, Storm, Linux, Sqoop, Shell Scripting, Oozie, Python, Scala, Cassandra, Git, XML, Scala, Java, Maven, Eclipse, Oracle .

Confidential, New Jersey

Hadoop Developer

Responsibilities:

Importing and exporting data into HDFS and Hive using Sqoop.
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Migrated Map Reduce jobs to Spark Jobs to achieve better performance.
Responsible for migrating the code base from Cloudera Platform to Amazon EMR and evaluated Amazon eco systems components like Redshift, Dynamo DB.
Design and Implementation of Real time applications using Apache Storm, Trident Storm, Kafka, and Apache ignite Memory grid and Accumulo.
Developed Map Reduce/ EMR jobs to analyze the data and provide heuristics and reports. The heuristics were used for improving campaign targeting and efficiency.
Experienced with batch processing of data sources using Apache Spark, Elastic Search.
Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
Created Hive External tables and loaded the data in to tables and query data using HQL.
Good knowledge in cloud integration with Amazon Elastic Map Reduce (EMR).
Installing and maintaining the Hadoop - Spark cluster from the scratch in a plain Linux environment and defining the code outputs as PMML.
Implemented Data loading using Spark, Storm, Kafka, Elastic Search.
Experience in integrating Cassandra with Elastic Search and Hadoop .
Stored data in AWS S3 similar to HDFS. Also, performed EMR programs on data stored in S3.
Extensive experience in Spark Streaming (version 1.5.2) through core Spark API running Scala, Java & Python Scripts to transform raw data from several data sources into forming baseline data.
Hands on expertise in running the SPARK & SPARK SQL on AMAZON ELASTIC MAPREDUCE (EMR).
Implemented SPARK batch jobs on AWS instances through Amazon Simple Storage Service (Amazon S3).
Performed performance tuning for Spark Steaming e.g. setting right Batch Interval time, correct level of Parallelism, selection of correct Serialization & memory tuning.
Created HBase tables to store variable data formats of input data coming from different portfolios.
Involved in adding huge volumes of data in rows and columns to store data in HBase.
Used Spark API over Hadoop YARN to perform analytics on data in Hive.
Developed Spark code using Scala and Spark -SQL for batch processing of data.
Integration of Cassandra with Talend and automation of jobs.
Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka.
Design and development of database operations in PostgreSQL.
Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark -SQL, Data Frame, pair RDD's, Spark YARN.
Hands on experience working on NoSQL databases like HBase and PostgreSQL.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Used Spark Data Frame API to process Structured and Semi Structured files and load them back into S3 Bucket.
Designed application which receives data from several source systems and ingest to PostgreSQL database.

Environment : Hadoop, Map Reduce, Spark, Spark SQL, Kafka, Storm, HDFS, Hive, Pig, Sqoop, Oozie, Java, SQL, Python, Scala, Impala, AWS, Java, Shell script, Talend

Confidential

Hadoop Developer

Responsibilities:

Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
Involved in writing HiveQL to extract, transform and load the data into Database.
Performed Hive partitioning, bucketing and executing different types of joins on Hive tables and implementing Hive Serves like JSON and Avro.
Fine tuning hive jobs for optimized performance.
Developed Pig Latin scripts for transformations, event joins, filter.
Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing.
Experienced in working with Apache Storm.
Involved in designing, reviewing, optimizing data transformation processes using Apache Storm.
Developed SparkSQL to load tables into HDFS to run select queries on top.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
Imported data from Kafka consumer into HBase using spark streaming.
Experienced in using Zookeeper and Oozie Operational Services for coordinating the cluster and scheduling workflows.
Automated all the jobs, for ingesting data from relational databases to load data into Hive tables, using Oozie workflows.
Performance tuning of queries in Impala for faster retrieval.
Created Hive, HBase tables and HBase integrated Hive tables as per the design using ORC file format and Snappy compression.
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific job.
Experience in HBase database manipulation with structured, unstructured and semi-structured datasets.
Involved in ETL, Data Integration, Ingestion and Migration.
Implemented ETL pipelines to ingest data from traditional EDW Hadoop .
Generated various marketing reports using Tableau with Hadoop as a source for data.
Implemented Continuous Integration using Jenkins.
Involved in daily SCRUM meetings to discuss the development/progress and was active in making Scrum meetings more productive.

Environment: Apache Hadoop, Apache Hive, HDFS, Map Reduce, Pig, Spark, Storm, Impala, Oozie, Flume, Kafka, Tableau, Jenkins, Java, Python, Linux, Agile.

Confidential

Java developer

Responsibilities:

Involving in design, development, testing and implementation of the process systems, working on iterative life cycles business requirements, and creating Detail Design Document.
Developed JAX-RS RESTful web services that consumes and produces both XML and JSON content using jersey to retrieve specific details for Case Management System products
Configured JPA Persistence API to interact with Oracle 11g database and Hibernate as platform and created POJO's classes as JPA entities.
Converted XML into JAVA objects using JAXB API.
Involved in development of the application using Spring Web MVC and other components of the Spring Framework.
Used Hibernate to store the persistent data as an Object-Relational Mapping (ORM) tool for communicating with database.
Used UNIX shell scripts to deploy the application on Amazon web server.
Developed User interface using HTML, CSS, JavaScript, and CSS, Bootstrap, Ajax and JSON.
Used jQuery to perform the AJAX calls and to load the surveys.
Extensively used Alpaca forms for various form fields to fetch the inputs from the user/customer.
Written Embedded JS to combine data and a template to produce HTML.
Applied AngularJS to define a route to the REST services and render the Ej's templates.
Responsible for developing new REST APIs for utilizing JAX-RS on Web Sphere.
Utilized Web Logic application server to build and deploy the enterprise application.
Utilized Alpaca forms to create interactive HTML5 forms with jQuery.
Used GitHub Repository to check in, check out, and merge code, issue tracking and wikis.
Used Maven to build and deploy the application.

Environment: Java, J2EE, JSP, Struts, Hibernate, AngularJS, JUnit, MVC, Eclipse, AJAX, Apache Tomcat, Log4J, SVN, MySQL, HTML, CSS, JavaScript.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Tampa, FL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship