Hadoop Developer Resume
2.00/5 (Submit Your Rating)
New York, NY
PROFESSIONAL SUMMARY:
- Around 6 years of IT Experience in analysis, implementation and testing of enterprise - wide application, Data warehouse, client-server technologies and web-based applications.
- Over 3 years of experienced in administrative tasks such as multi-node Hadoop installation and maintenance
- Experience in deploying Hadoop 2.0 (YARN) and administration of Hbase, Hive, Sqoop, HDFS
- Installed, configured, supported and managed Apache Ambari in Hortonworks Data Platform 2.5, Cloudera Distribution Hadoop 5.x, Linux, Rackspace and AWS cloud infrastructure.
- Understand the security requirements for Hadoop and integrated with Kerberos infrastructure
- Good knowledge on Kerberos security while successfully Maintained the cluster by adding and removal of nodes.
- Handsome experience in Linux admin activities.
- Experience in minor and major upgrades of Hadoop and Hadoop eco system.
- Monitor Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager.
- Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Involved in bench marking Hadoop / Hbase cluster file systems various batch jobs and workloads.
- Set up the Linux environments, Password less SSH, creating file systems, disabling firewalls and installing Java.
- Experienced in job scheduling using oozie.
- Hands on experience in analyzing Log files for Hadoop ecosystem services and finding root cause.
- Experience in Amazon AWS cloud Administration and actively involved highly available, Scalability, cost effective and fault tolerant systems using multiple AWS services.
- Involved with File transmission and electronic data interchange, trades capture, verify, process and routing operations, Banking Reports Generation, Operational management.
- Experience in dealing with Hadoop cluster and integration with its Ecosystem like HIVE, HBase, Sqoop, Spark, Oozie etc.
- Experienced in AWS CloudFront, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
- Performed systems analysis for several information systems documenting and identifying performance and administrative bottlenecks.
- Monitor health of the platforms and Generate Performance Reports and Monitor and provide continuous improvements.
PROFESSIONAL EXPERIENCE:
Hadoop Developer
Confidential, New York, NY
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop
- Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Developed Map Reduce pipeline jobs to process the data and create necessary HFiles.
- Involved in loading the created HFiles into Hbase for faster access of large customer base without taking Performance hit.
- Worked in AWS environment for development and deployment of Custom Hadoop Applications.
- Involved in creation and designing of data ingest pipelines using technologies such as Kafka.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Created Hbase tables to store various data coming from different portfolios.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Involved in managing and reviewing Hadoop log files.
- Responsible to manage data coming from different sources.
- Transferred the data using Informatica tool from AWS S3 to AWS Redshift. Involved in file movements between HDFS and AWS S3.
- Create a complete processing engine, based on Hortonworks' distribution, enhanced to performance.
- Provide batch processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
- Developed Spark code to using Scala and Spark-SQL for faster processing and testing.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Worked on python files to load the data from csv, json, MySQL, hive files to Neo4j Graphical database.
- Handled Administration, installing, upgrading and managing distributions of Cassandra.
- Assisted in performing unit testing of Map Reduce jobs using MRUnit.
- Assisted in exporting data into Cassandra and writing column families to provide fast listing outputs.
- Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the map reduce jobs that extract the data on a timely manner.
- Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
- Experienced in working with Spark eco system using SCALA and HIVE Queries on different data formats like Text file and parquet.
- Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
- Worked with Hue GUI in scheduling jobs with ease and File browsing and Job browsing.
- Worked with Talendon a POC for integration of data from the data lake.
- Involved in development/implementation of Cassandra environment.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, Confluent Kafka, Hortonworks HDF, HDP, NIFI, Linux, Splunk, Yarn, Cloudera 5.13, Spark, Tableau, Microsoft Azure, Data fabric, DataMesh.
HADOOP DEVELOPER
Confidential, New York, NY
Responsibilities:
- Worked on analyzing Cloudera Hadoop and Hortonworks cluster and different big data analytic tools including Hive and Sqoop
- Installed and configured CDH cluster, using Cloudera manager for easy management of existing Hadoop cluster.
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Developed data pipeline using Flume, Sqoop, and MapReduce to ingest behavioral data into HDFS for analysis.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Created customized BI tool for manager team that perform Query analytics using HiveQL.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Experienced in using Kafka as a data pipeline between JMS and Spark Streaming Applications.
- Created storage with Amazon S3 for storing data. Worked on transferring data from Kafka topic into AWS S3 storage.
- Worked on python files to load the data from csv, json, MySQL, hive files to Neo4j Graphical database.
- Estimated the hardware requirements for Name Node and Data Nodes & planning the cluster.
- Created Hive Generic UDF's, UDAF's, UDTF's in java to process business logic that varies based on policy.
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Consolidating customer data from Lending, Insurance, Trading and Billing systems into data warehouse and mart subsequently for business intelligence reporting.
- Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
- Experienced on Loading streaming data into HDFS using Kafka messaging system.
- Used the Spark -Cassandra Connector to load data to and from Cassandra.
- Worked with NoSQL database HBase to create tables and store data.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Design technical solution for real-time analytics using Kafka and HBase.
- Created UDF's to store specialized data structures in HBase and Cassandra.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
- Collaborated with Business users for requirement gathering for building Tableau reports per business needs.
- Experience in Upgrading Apache Ambari, CDH and HDP Cluster.
- Configured and Maintained different topologies in storm cluster and deployed them on regular basis.
- Imported structured data, tables into HBase.
- Involved in Backup, HA, and DR planning of applications in AWS.
- Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra.
- Experienced with different kind of compression techniques like LZO, GZip, and Snappy.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Used AWS Patch Manager to select and deploy operating system and software patches across EC2 instances.
- Created Data Pipeline of Map Reduce programs using Chained Mappers.
- Configuring Spark Streaming to receive real time data from the Kafka and Store the stream data to HDFS.
- Implemented Spark RDD transformations, actions to migrate Map reduce algorithms.
- Set-up configured and optimized the Cassandra cluster. Developed real-time java-based application to work along with the Cassandra database.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Converting queries to Spark SQL and using parquet file as storage format.
- Developed analytical component using Scala, Spark and Spark Stream.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Written spark programs in Scala and ran spark jobs on YARN.
- Assembled Hive and HBase with Solr to build a full pipeline for data analysis.
- Written Storm topology to emit data into Cassandra DB.
- Implemented map reduce programs to perform joins on the Map side using Distributed Cache in Java. Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.
Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python, Machine Learning, Microsoft Azure.
HADOOP DEVELOPER
Confidential, McLean, VA
Responsibilities
- Launched and configured Amazon EC2 Cloud Instances and S3 buckets using AWS, Ubuntu Linux and RHEL
- Installed application on AWS EC2 instances and configured the storage on S3 buckets
- Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
- Worked in AWS environment for development and deployment of Custom Hadoop Applications.
- Worked closely with the data modelers to model the new incoming data sets.
- Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, Sqoop, flume, Spark, Impala, Cassandra with Horton work Distribution.
- Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Performed real time analysis on the incoming data.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.