Sr. Hadoop Developer Resume
Phoenix, AZ
SUMMARY
- Around 6+ years of experience in Analysis, Design, Development, Integration, Testing and maintenance of various applications using JAVA /J2EE technologies along with around 4 years of Big Data / Hadoop experience.
- Experienced in building highly scalable Big - data solutions using Hadoop and multiple distributions i.e. Cloudera, Horton works and NoSQL platforms (HBase& Cassandra).
- Expertise in big data architecture with Hadoop File system and its eco system tools MapReduce, HBase, Hive, Pig, Zookeeper, Oozie, Kafka, Flume, Avro, Impala and Apache Spark .
- Hands on experience on performing Data Quality checks on petabytes of data
- Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Good knowledge on Amazon AWS concepts like EMR & EC2 web services which provides fast and efficient processing of Big Data.
- Developed, deployed and supported several Map Reduce applications in Java to handle semi and unstructured data.
- Experience in writing Map Reduce programs and using Apache Hadoop API for analyzing the data.
- Strong experience in developing, debugging and tuning Map Reduce jobs in Hadoop environment.
- Expertise in developing PIG and HIVE scripts for data analysis
- Hands on experience in data mining process, implementing complex business logic and optimizing the query using Hive QL and controlling the data distribution by partitioning and bucketing techniques to enhance performance.
- Experience working with Hive data, extending the Hive library using custom UDF's to query data in non- standard formats.
- Experience in performance tuning of Map Reduce, Pig jobs and Hive queries.
- Involved in the Ingestion of data from various Databases like DB2, SQL-SERVER using Sqoop.
- Experience working with Flume to handle large volume of streaming data.
- Good working knowledge on Hadoop hue ecosystems.
- Extensive experience in migrating ETL operations into HDFS systems using Pig Scripts .
- Good knowledge in evaluating big data analytics libraries (ML lib) and use of Spark-SQL for data exploratory.
- Experienced in using Apache ignite for handling streaming data.
- Experience in implementing a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper .
- Expert in creating and designing data ingest pipelines using technologies such as spring Integration, Apache Storm- Kafka
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop MapReduce and Pig jobs.
- Worked with different File Formats like TEXTFILE, AVROFILE, ORC for HIVE Querying and Processing
- Used Compression Techniques ( snappy ) with file formats to leverage the storage in HDFS
- Experienced with build tools Maven, ANT and continuous integrations like Jenkins .
- Hands-on experience in using relational databases like Oracle, MySQL, PostgreSQL and MS-SQL Server.
- Experience using IDEs tools Eclipse 3.0, My Eclipse, RAD and Net Beans .
- Hands on development experience with RDBMS, including writing SQL queries, PLSQL, views, stored procedure, triggers, etc.
- Participated in all Business Intelligence activities related to data warehouse, ETL and report development methodology.
- Expertise in Waterfall and Agile software development model & project planning using Microsoft Project Planner and JIRA.
- Highly motivated, dynamic, self-starter with keen interest in emerging technologies
TECHNICAL SKILLS
Big Data Technologies: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Avro, Hadoop Streaming, Zookeeper, Kafka, Impala, Apache Spark, hue, Ambari. Apache ignite.
Hadoop Distributions: Cloudera (CDH4/CDH5), Horton Works
Languages: Java, C, SQL, PYTHON, PL/SQL, PIG-Latin, HQL
IDE Tools: Eclipse, IntelliJ
Framework: Hibernate, Spring, Struts, Junit
Operating Systems: Windows (XP,7,8), UNIX, LINUX, Ubuntu, CentOS
Application Servers: J Boss, Tomcat, Web Logic, Web Sphere, Servlets
Reporting Tools/ETL Tools: Tableau, Power view for Microsoft Excel, Informatica
Databases: Oracle, MySQL, DB2, Derby, PostgreSQL, No-SQL Database (HBase, Cassandra)
PROFESSIONAL EXPERIENCE
Confidential, Phoenix, AZ
Sr. Hadoop Developer
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Flume, Oozie Zookeeper and Sqoop.
- Performed performance tuning and troubleshooting of Map Reduce jobs by analysing and reviewing Hadoop log files.
- Installed and configured Hadoop Map reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
- Collected the logs from the physical machines and the Open Stack controller and integrated into HDFS using Flume.
- Migrated an existing on-premises application to AWS.
- Migrate mongo dB shared/replica cluster form one data centre to another without downtime.
- Manage and Monitor large production MongoDB shared cluster environments having terabytes of the data.
- Worked on Importing and exporting data from RDBMS into HDFS with Hive and PIG using Sqoop.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala, Python.
- Setting up MongoDB Profiling to get slow queries
- Configuring HIVE and Oozie to store metadata in Microsoft SQL Server.
- Experienced in migrating HiveQL into Impala to minimize query response time.
- Developing and running Map-Reduce jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need
- Used Spark API over Horton works Hadoop YARN to perform analytics on data in Hive.
- Developed Spark scripts by using Scala shell commands as per the requirement
- Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN.
- Developed Spark code and Spark-SQL /Streaming for faster testing and processing of data
- Developed a data pipeline to store data into HDFS.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Expertise in deployment of Hadoop Yarn, Spark and Storm integration with Cassandra, ignite and Kafka etc.
- Move data between clusters using distributed copy. Support and maintenance of Sqoop jobs and programs. Designed and developed Spark RDDs, Spark SQLs.
- Experience on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
- Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Java, Linux, Teradata, Zookeeper, Kafka, Impala, Akka, Apache Spark, Spark Streaming Horton Works, HBase, MongoDB
Confidential, Phoenix, AZ
Hadoop Developer/Admin
Responsibilities:
- Hands on experience on Scala, Spark, Hive, Kafka, Shell, SQL, Tableau, Rally .
- Spark Streaming collects data from Kafka in near-real-time and performs necessary transformations and aggregation to build the common learner data model and stores the data in NoSQL store ( HBase ).
- Installing and configuring of various components of Hadoop ecosystem such as Flume, Hive, Pig, Sqoop, Oozie, Zookeeper, Kafka, and Storm and maintained their integrity.
- Have a good knowledge on Confidential internal data sources such as Cornerstone, WSDW, IDN, and SQL .
- We use Apache Kafka Connect for streaming data between Apache Kafka and other systems.
- Experience in performing advanced procedures like text analytics using in-memory computing capabilities of Spark using Scala .
- Partitioning data streams using Kafka . Designed and configured Kafka cluster to accommodate heavy throughput.
- Responsible for fetching real time data using Kafka and processing using Spark streaming with Scala.
- Implemented Hive Partitioning and Bucketing on the collected data in HDFS .
- Experienced in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipeline system.
- Experienced in working with Spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
- Implemented Sqoop jobs to import/export large data exchanges between RDBMS and Hive platforms.
- Working with application teams to install Hadoop updates, patches, version upgrades as required.
- Using Kafka Connect is a utility for streaming data between MapR Event Store for Apache Kafka and other storage systems.
- Worked on visualization tool Tableau for visually analyzing the data.
- Developed Scala scripts using both Data frames/SQL/Datasets and RDD/MapReduce in Spark for Data aggregation, queries and writing data back into OLTP system through Sqoop .
- Migrated Map Reduce programs into Spark transformations using Scala.
- Application deployment and scheduling on cloud Spark/Ambari .
- To create a user interface to access the data from landing zone tables and automate the SQL queries to provide flexibility to the users.
- The data is moving across hops with counts of amounts summary in local and as well as in USD currency.
- Drops involved will be provided in detail along with the reason and expectations of the drops.
- Drill down and drill through on the data set at an aggregated level to understand the data better.
- The uses cases like product level, country level, to recognize it at granular level.
Environment: MapReduce, Scala, Springframeworks2.1.3, Oracle 11.2.0.3, Kafka connectors, Maven 4.0, Spark, Hive Sql, Node Js V8.11.1, no Sql, Java Version 1.8, Tableau, Ambari user views, spark real time data source, cloud platform, consumers.
Confidential - Blue Ash, OH
Java/ Hadoop Developer
Responsibilities:
- Involving in sprint planning as part of monthly deliveries.
- Involving in daily scrum calls and standup meetings as part of agile methodology.
- Good hands on experience on Version One tool to update the work details and working hours for a task.
- Involving in the designing part of views.
- Involving in Writing Spring Configuration Files and Business Logic based on Requirement.
- Involved in code-review sessions.
- Implementing Junit tests based on the business logic w.r.t to assigned backlog in sprint plan.
- Implementing the Fixtures to execute the Fitness test tables.
- Good experience on creating the Jenkins CI jobs and Sonar jobs.
Environment: Core Java, spring, Maven, XMF Services, JMS, Oracle10g, PostgreSQL, 9.2, Eclipse, SVN
Confidential - NY
Java Developer
Responsibilities:
- Responsible for all java related activities like analysis, design and development.
- Participated R & D for two months and provided a solution for making this application work for upgraded backend SAP system.
- Now this solution is highly used in organization for SAP upgrade related projects.
- Further developments in this project using J2EE and related technologies
- Interacting with clients for giving demos and taking requirements, suggestions.
Environment: Java, NWDI, NWDS, Spring IOC.