Hadoop Developer Resume New York, NY - Hire IT People

PROFESSIONAL SUMMARY:

Around 6 years of IT Experience in analysis, implementation and testing of enterprise - wide application, Data warehouse, client-server technologies and web-based applications.
Over 3 years of experienced in administrative tasks such as multi-node Hadoop installation and maintenance
Experience in deploying Hadoop 2.0 (YARN) and administration of Hbase, Hive, Sqoop, HDFS
Installed, configured, supported and managed Apache Ambari in Hortonworks Data Platform 2.5, Cloudera Distribution Hadoop 5.x, Linux, Rackspace and AWS cloud infrastructure.
Understand the security requirements for Hadoop and integrated with Kerberos infrastructure
Good knowledge on Kerberos security while successfully Maintained the cluster by adding and removal of nodes.
Handsome experience in Linux admin activities.
Experience in minor and major upgrades of Hadoop and Hadoop eco system.
Monitor Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager.
Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
Involved in bench marking Hadoop / Hbase cluster file systems various batch jobs and workloads.
Set up the Linux environments, Password less SSH, creating file systems, disabling firewalls and installing Java.
Experienced in job scheduling using oozie.
Hands on experience in analyzing Log files for Hadoop ecosystem services and finding root cause.
Experience in Amazon AWS cloud Administration and actively involved highly available, Scalability, cost effective and fault tolerant systems using multiple AWS services.
Involved with File transmission and electronic data interchange, trades capture, verify, process and routing operations, Banking Reports Generation, Operational management.
Experience in dealing with Hadoop cluster and integration with its Ecosystem like HIVE, HBase, Sqoop, Spark, Oozie etc.
Experienced in AWS CloudFront, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
Performed systems analysis for several information systems documenting and identifying performance and administrative bottlenecks.
Monitor health of the platforms and Generate Performance Reports and Monitor and provide continuous improvements.

PROFESSIONAL EXPERIENCE:

Hadoop Developer

Confidential, New York, NY

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop
Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
Developed Map Reduce pipeline jobs to process the data and create necessary HFiles.
Involved in loading the created HFiles into Hbase for faster access of large customer base without taking Performance hit.
Worked in AWS environment for development and deployment of Custom Hadoop Applications.
Involved in creation and designing of data ingest pipelines using technologies such as Kafka.
Developed Spark scripts by using Scala shell commands as per the requirement.
Created Hbase tables to store various data coming from different portfolios.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
Involved in managing and reviewing Hadoop log files.
Responsible to manage data coming from different sources.
Transferred the data using Informatica tool from AWS S3 to AWS Redshift. Involved in file movements between HDFS and AWS S3.
Create a complete processing engine, based on Hortonworks' distribution, enhanced to performance.
Provide batch processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
Developed Spark code to using Scala and Spark-SQL for faster processing and testing.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Worked on python files to load the data from csv, json, MySQL, hive files to Neo4j Graphical database.
Handled Administration, installing, upgrading and managing distributions of Cassandra.
Assisted in performing unit testing of Map Reduce jobs using MRUnit.
Assisted in exporting data into Cassandra and writing column families to provide fast listing outputs.
Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the map reduce jobs that extract the data on a timely manner.
Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
Experienced in working with Spark eco system using SCALA and HIVE Queries on different data formats like Text file and parquet.
Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
Worked with Hue GUI in scheduling jobs with ease and File browsing and Job browsing.
Worked with Talendon a POC for integration of data from the data lake.
Involved in development/implementation of Cassandra environment.
Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, Confluent Kafka, Hortonworks HDF, HDP, NIFI, Linux, Splunk, Yarn, Cloudera 5.13, Spark, Tableau, Microsoft Azure, Data fabric, DataMesh.

HADOOP DEVELOPER

Confidential, New York, NY

Responsibilities:

Worked on analyzing Cloudera Hadoop and Hortonworks cluster and different big data analytic tools including Hive and Sqoop
Installed and configured CDH cluster, using Cloudera manager for easy management of existing Hadoop cluster.
Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
Developed data pipeline using Flume, Sqoop, and MapReduce to ingest behavioral data into HDFS for analysis.
Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
Created customized BI tool for manager team that perform Query analytics using HiveQL.
Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
Experienced in using Kafka as a data pipeline between JMS and Spark Streaming Applications.
Created storage with Amazon S3 for storing data. Worked on transferring data from Kafka topic into AWS S3 storage.
Worked on python files to load the data from csv, json, MySQL, hive files to Neo4j Graphical database.
Estimated the hardware requirements for Name Node and Data Nodes & planning the cluster.
Created Hive Generic UDF's, UDAF's, UDTF's in java to process business logic that varies based on policy.
Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
Consolidating customer data from Lending, Insurance, Trading and Billing systems into data warehouse and mart subsequently for business intelligence reporting.
Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
Experienced on Loading streaming data into HDFS using Kafka messaging system.
Used the Spark -Cassandra Connector to load data to and from Cassandra.
Worked with NoSQL database HBase to create tables and store data.
Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
Design technical solution for real-time analytics using Kafka and HBase.
Created UDF's to store specialized data structures in HBase and Cassandra.
Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
Collaborated with Business users for requirement gathering for building Tableau reports per business needs.
Experience in Upgrading Apache Ambari, CDH and HDP Cluster.
Configured and Maintained different topologies in storm cluster and deployed them on regular basis.
Imported structured data, tables into HBase.
Involved in Backup, HA, and DR planning of applications in AWS.
Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra.
Experienced with different kind of compression techniques like LZO, GZip, and Snappy.
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
Used AWS Patch Manager to select and deploy operating system and software patches across EC2 instances.
Created Data Pipeline of Map Reduce programs using Chained Mappers.
Configuring Spark Streaming to receive real time data from the Kafka and Store the stream data to HDFS.
Implemented Spark RDD transformations, actions to migrate Map reduce algorithms.
Set-up configured and optimized the Cassandra cluster. Developed real-time java-based application to work along with the Cassandra database.
Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
Converting queries to Spark SQL and using parquet file as storage format.
Developed analytical component using Scala, Spark and Spark Stream.
Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
Written spark programs in Scala and ran spark jobs on YARN.
Assembled Hive and HBase with Solr to build a full pipeline for data analysis.
Written Storm topology to emit data into Cassandra DB.
Implemented map reduce programs to perform joins on the Map side using Distributed Cache in Java. Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.

Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python, Machine Learning, Microsoft Azure.

HADOOP DEVELOPER

Confidential, McLean, VA

Responsibilities

Launched and configured Amazon EC2 Cloud Instances and S3 buckets using AWS, Ubuntu Linux and RHEL
Installed application on AWS EC2 instances and configured the storage on S3 buckets
Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
Worked in AWS environment for development and deployment of Custom Hadoop Applications.
Worked closely with the data modelers to model the new incoming data sets.
Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, Sqoop, flume, Spark, Impala, Cassandra with Horton work Distribution.
Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
Import the data from different sources like HDFS/HBase into Spark RDD.
Developed a data pipeline using Kafka and Storm to store data into HDFS.
Performed real time analysis on the incoming data.
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
Implemented Spark using Scala and SparkSQL for faster testing and processing of data.

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

New York, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship