Sr. Hadoop Developer Resume
Lawrenceville, NJ
SUMMARY
- Around 7 years of IT experience in a variety of industries, which includes 4+ years of hands on experience in Big Data Analytics and development
- Expertise with the tools in Hadoop Ecosystem including HDFS, MapReduce, Hive, Pig, Impala, Sqoop, Spark, Yarn, Oozie
- Excellent knowledge on Hadoop Ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
- Experienced on major Hadoop ecosystem's projects such as PIG, HIVE, HBASE and monitoring them with Cloudera Manager
- Extensive experience in using Hive Query Language for data analytics
- Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice - versa
- Good knowledge in using Oozie for scheduling jobs and monitoring tools
- Experience in designing and developing applications in Spark using python to compare the performance of Spark with Hive and SQL/Oracle
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications
- Experience in manipulating/analyzing large datasets and finding patterns and insights within structured data and semi-structured data
- Have extensive experience in building and deploying applications on Web/Application Servers like IBM WebSphere
- Strong experience on Hadoop distributions like Cloudera
- Experience of working in 2PB-10TB-35 nodes cluster running on CDH-5.9.x
- Strong experience in database design, writing complex SQL Queries and Stored Procedures
- Work closely with the product management and development teams to rapidly translate the understanding of customer data and requirements to product and solutions.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications
- Excellent communication skills and experienced in client interaction while providing technical support and knowledge transfer
- Strong problem-solving skills, good communication, interpersonal skills and a good team player
TECHNICAL SKILLS
Big Data Technologies: Hadoop, HDFS, MapReduce, Spark, Hive, Sqoop, Pig, Impala, Oozie, Yarn, HBase, Basics of HTML.
Programming Languages: Linux Programming, Basics of Python language, HQL, MYSQL, C
Microsoft Suite: Advanced Excel (V-lookup, Pivot Tables), PowerPoint, Project
Tools: CDH 5.9.1, Cloudera Navigator, Cloudera Manager Basics, IBM WebSphere Application Server, VM Ware, Wireshark.
KEY COURSES: Big Data Analytics, Data ware Housing Concepts, Multi-Variant Data Analysis, Data Mining, Database Management and Systems, Software Project Management, Computer Network Design and Analysis, Computer Communication Networks
PROFESSIONAL EXPERIENCE
Confidential - Lawrenceville, NJ
Sr. Hadoop Developer
Responsibilities:
- Monitoring day to day activities of the jobs in Production and Staging environments.
- Working with Spark and Hadoop components and managing analytics services related to digital media systems including identifying and implementing products and services for Web based delivery.
- Working on data transferring from Acheron and Hades API to Mercury endpoint and final goal is to depreciate Analytics Kafka Cluster.
- Demonstrating competence in full life cycle management of REST or Web Service APIs including specification and system deployment.
- Working with Google Cloud, Amazon EC2 Instances and tracking real time data using Spark, Apache Kafka and storing it into backend Hadoop Distributed File System.
- Optimizing Hadoop MapReduce code, Java code, Scala and shell scripts for better scalability, reliability, performance and scheduling the jobs in Task Forest.
- Deploying Software Components into cloud-based infrastructures, caching and scaling-out using cloud API's.
- Developing code using Scala Programming, Java, SparkSQL, Oracle (PL/SQL) and MySQL and Interface using Intelli-J IDE, Databricks.
Environment: Spark 1.6 and 2.x, Hadoop 2x, Hive, Airflow, GCS, Linux, VMware 14.0.0, Cloudera 5.9.1, MySQL
Confidential, Woonsocket, RI
Sr. Hadoop developer
Responsibilities:
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Working with data management team during data integration specifically focusing on data quality, data profiling to integrate, transform and data provisioning.
- Designed and Implemented real-time Big Data processing to enable real-time analytics, event detection and notification for Data-in-Motion.
- Involved in importing data from Weblog and Apps log using Flume.
- Involved in Map Reduce and Hive Optimization.
- Involved in importing data from Oracle to HDFS using SQOOP.
- Involved in writing Map Reduce program and Hive queries to load and process data in Hadoop File System.
- Involved in creating Hive tables, loading with data and extensively worked on writing hive queries.
- Worked on performance tuning of Pig queries.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Responsible for building scalable distributed data solutions using Hadoop.
- Experience in scripting for automation and monitoring using Python.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
Environment: Linux, Hadoop 2.X and Hadoop 3.X HDFS cluster with Cloudera manager 5.10.X, HDFS, Hive, Pig, MapReduce, Spark, Sqoop, Oozie, Flume, Teradata, Scala, HBase, SQL, Unix.
Confidential
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop
- Involved in loading data from Oracle database into HDFS using Sqoop queries
- Developed Spark scripts by using python commands on Jupyter notebook as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Worked on tuning the performance of Hive queries
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE
- Responsible to manage data coming from different sources
- Configured Time Based Schedulers that get data from multiple sources parallel using Oozie work flows
- Installed Oozie workflow engine to run multiple Hive and pig jobs
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop1x, Map Reduce, HDFS, Pig, Hive, python, Oozie, Java, Linux, Cloudera 5.9.1, MySQL