Hadoop Developer Resume
3.00/5 (Submit Your Rating)
New, YorK
SUMMARY
- Close to 5 years of IT experience as a Developer, Designer & quality Tester with cross platform integration experience using Hadoop development.
- Hands on experience in installing, configuring and using Hadoop Ecosystem - HDFS, MapReduce, Pig, Hive, Oozie, Flume, HBase, Spark, Sqoop, Flume and Oozie.
- Strong understanding of various Hadoop services, MapReduce and YARN architecture.
- Responsible for writing Map Reduce programs.
- Experienced in importing-exporting data into HDFS using SQOOP.
- Experience loading data to Hive partitions and creating buckets in Hive.
- Developed Map Reduce jobs to automate transfer the data from HBase.
- Expertise in analysis using PIG, HIVEand MapReduce.
- Experienced in developing UDFs for Hive, PIG using Java.
- Strong understanding of NoSQL databases like HBase, MongoDB & Cassandra.
- Scheduling all Hadoop/hive/Sqoop/HBase jobs using Oozie.
- Experience in HDFS data storage and support for running map-reduce jobs.
- Experience in Chef, Puppet or related tools for configuration management.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Involved in Infrastructure set up and installation of HDP stack on Amazon Cloud.
- Experience with ingesting data from RDBMS sources like - Oracle, SQL and Teradata into HDFS using Sqoop.
- Experience in big data technologies: Hadoop HDFS, Map-reduce, Pig, Hive, Oozie, Sqoop, Zookeeper and NoSQL.
- Adding/installation of new components and removal of them through Cloudera Manager.
- Extensive experience in importing and exporting streaming data into HDFS using stream processing platforms like Flume and Kafka messaging system.
- Implemented Capacity schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Responsible for the Provisioning, installing, configuring, monitoring and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive, Ranger, Falcon, Smart sense, Storm, Kafka.
- Experience in AWS CloudFront, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
- Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
- Experience in real time analytics with Apache Spark (RDD, Data Frames and Streaming API).
- Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
- Experience in integrating Hadoop with Kafka. Expertise in uploading Click stream data from Kafka to HDFS.
- Expert in utilizing Kafka for messaging and publishing subscribe messaging system.
PROFESSIONAL EXPERIENCE
Confidential, New York
Hadoop Developer
Responsibilities:
- Developed Spark applications using Scala utilizing Data frames and Spark SQL API for faster processing of data.
- Developed highly optimized Spark applications to perform various data cleansing, validation, transformation and summarization activities according to the requirement
- Data pipeline consists Spark, Hive and Sqoop and custom build Input Adapters to ingest, transform and analyze operational data.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Installing and configuring of various components of Hadoop ecosystem such as Hive, Pig, Sqoop, Oozie, Zookeeper, Kafka, Storm, Ranger, TDCH and maintained their integrity.
- Manage Hadoopcluster environments (Cluster sizing, cluster configuration, service allocation, security setup, performance tuning and ongoing monitoring).
- Working with application teams to install Hadoopupdates, patches, version upgrades as required.
- Designing, configuring and managing the backup and disaster recovery for HDFS data.
- Implementing HDFS snapshot feature, and set a retention of 30-day record at all time.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
- Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
- Helped Dev ops Engineers for deploying code and debug issues.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
- Scheduled and executed workflows in Oozie to run various jobs.
- Experience in using Hadoop ecosystem and processing data using Amazon AWS.Developed simple to complex MapReduce streaming jobs using Java language for processing and validating the data.
- Developed data pipeline using MapReduce, Flume, Sqoop and Pig to ingest customer behavioral data into HDFS for analysis.
- Developed MapReduce and Spark jobs to discover trends in data usage by users.
- Implemented Spark using Python and Spark SQL for faster processing of data.
- Implemented algorithms for real time analysis in Spark
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Used the Spark -Cassandra Connector to load data to and from Cassandra.
- Real time streaming the data using Spark with Kafka and SOA
- Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, MapReduce and then loading data into HDFS.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Developed Hive scripts in HiveQL to De-Normalize and Aggregate the data.
- Created HBase tables and column families to store the user event data.
Environment: Hadoop, HDFS, HBase, Spark, Scala, Hive, MapReduce, Sqoop, ETL, Java, PL/SQL, Oracle 11g, Unix/Linux.
Confidential, New York, New York
Hadoop Developer
Responsibilities:
- Involved in Hive/SQL queries performing spark transformations using Spark RDDs and Python (pyspark).
- Created a Serverless data ingestion pipeline on AWS using lambda functions.
- Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to DynamoDB using Scala.
- Developed Apache Spark Applications by using Scala, Python and Implemented Apache Spark data processing module to handle data from various RDBMS and Streaming sources.
- Experience in developing and scheduling various Spark Streaming / batch Jobs using python (pyspark) and Scala.
- Developing spark code using pyspark to be applying various transformations and actions for faster data processing.
- Achieved high-throughput, scalable, fault-tolerant stream processing of live data streams using Apache Spark Streaming
- Used Spark Stream processing using Scala to get data into in-memory, created RDDs, Data Frames and applied transformations and actions.
- Involved in using various Python libraries with pyspark in order to create data frames and store them to Hive.
- Sqoop jobs and Hive queries were created for data ingestion from relational databases to analyze historical data.
- Experience in working with Elastic MapReduce (EMR) and setting up environments on amazon AWS EC2 instances.
- Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment.
- Executed Hadoop/Spark jobs on AWS EMR using programs, stored in S3 Buckets.
- Knowledge on creating the user defined functions (UDF's) in Hive.
- Worked with different File Formats like c, Avro, parquet for HIVE querying and processing based on business logic.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Implemented Hive UDF's to implement business logic and Responsible for performing extensive data validation using Hive.
- Involved in loading the structured and semi structured data into spark clusters using Spark SQL and Data Frames API.
- Involved in developing code and generated various data frames based on the business requirement and created temporary tables in hive.
- Utilized AWS CloudWatch to monitor the performance environment instances for operational and performance metrics during load testing.Scripting Hadoop package installation and configuration to support fully automated deployments.
- Deploy builds to production and work with the teams to identify and troubleshoot any issues.
- Worked on MongoDB database concepts such as locking, transactions, indexes, Shading, replication, schema design.
- Developed a fully functional login page for the company's user facing website with complete UI and validations.
- Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
- Implemented Oozie workflow for ETL Process for critical data feeds across the platform.
Environment: HDFS, Map Reduce, Hive 1.1.0, Kafka, Hue 3.9.0, Pig, Flume, Oozie, Sqoop, Apache Hadoop 2.6, Spark, SOLR, Storm, Cloudera Manager, Red Hat, MySQL, Prometheus, Docker, Puppet.
Confidential, Jersey City, New Jersey
Hadoop Developer
Responsibilities:
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data
- Imported the data from different sources like HDFS/HBase into Spark RDD, developed a data pipeline using Kafka to store data into HDFS. Performed real time analysis on the incoming data
- Worked extensively with Sqoop for importing and exporting the data from data Lake HDFS to Relational Database systems like Oracle and MySQL
- Developed python scripts to collect data from source systems and store it on HDFS
- Involved in converting Hive or SQL queries into Spark transformations using Python and Scala
- Built Kafka Rest API to collect events from front end
- Built real time pipeline for streaming data using Kafka and Spark Streaming
- Worked on integrating Apache Kafka with Spark Streaming process to consume data from external sources and run custom functions
- Created the PySpark programs to load the data into Hive and MongoDB databases from PySpark Data frames
- Exploring with the Spark and improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, and Pair RDD's
- Used Data Frame API in Scala for converting the distributed collection of data organized into named columns
- Developed Spark jobs and Hive Jobs to summarize and transform data
- Performance optimization when dealing with large datasets using partitions, broadcasts in Spark, effective and efficient joins, transformations during ingestion process
- Used Spark for interactive queries, processing of streaming data and integration with HBase database for huge volume of data
- Checking of AWS logs and docker logs if any issues during deployment.
- Stored the data in tabular formats using Hive tables and Hive Serdes
- Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access
- Redesigned the HBase tables to improve the performance according to the query requirements
- Developed MapReduce jobs in Java to convert data files into Parquet file format
- Developed Hive queries for data sampling and analysis to the analysts
- Executed Hive queries that helped in analysis of trends by comparing the new data with existing data warehouse reference tables and historical data
- Configured Oozie workflow to run multiple Hive jobs which run independently with time and data availability
- Worked in AWS environment for development and deployment of Custom Hadoop Applications.
- Worked closely with the data modelers to model the new incoming data sets.
- Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution.
- Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Performed real time analysis on the incoming data.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Implemented Spark using Scala and Spark for faster testing and processing of data.
Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.