Hadoop Developer/Admin Resume New York - Hire IT People

SUMMARY:

Over 9 years of IT experience as a Developer, Designer & quality Tester with cross platform integration experience using Hadoop Ecosystem, Java and Software Functional Testing.
Hands on experience in installing, configuring and using Hadoop Ecosystem - HDFS, MapReduce, Pig, Hive, Oozie, Flume, Hbase, Spark, Sqoop, Flume and Oozie.
Hands on experience using Cloudera and Hortonworks Hadoop Distributions.
Strong understanding of various Hadoop services, MapReduce and YARN architecture.
Responsible for writing Map Reduce programs.
Experienced in importing-exporting data into HDFS using SQOOP.
Experience loading data to Hive partitions and creating buckets in Hive.
Developed Map Reduce jobs to automate transfer the data from HBase.
Expertise in analysis using PIG, HIVEand MapReduce.
Experienced in developing UDFs for Hive,PIG using Java.
Strong understanding of NoSQL databases like HBase, MongoDB & Cassandra.
Scheduling all hadoop/hive/sqoop/Hbase jobs using Oozie.
Experience in setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud.
Good understanding of Scrum methodologies, Test Driven Development and continuous integration.

TECHNICAL SKILLS:

Hadoop/Big Data: Hadoop, Map Reduce, HDFS, Zookeeper, Kafka, Hive, Pig, Sqoop, Oozie, Flume, Yarn, HBase, Spark with Scala.

No SQL Databases: Hbase,Cassandra, mongoDB

Languages: C, C++, Java, J2EE, PL/SQL, Pig Latin, HiveQL, UNIX shell scripts

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL

Frameworks: MVC, Struts, Spring, Hibernate

Operating Systems: Red Hat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata

PROFESSIONAL EXPERIENCE:

Confidential, New York

Hadoop Developer/Admin

Responsibilities:

Worked on installing Kafka on Virtual Machine.
Created topic for different users
Installed Zookeepers, brokers, schema registry, control Center on multiple machine.
Setup ACL/SSL security for different users and assign users to multiple topics
Assign access to users by multiple user’s login.
Created documentation processes, server diagrams, preparing server requisition documents and upload them in Share point
Used Puppet for automation of deployment to the server
Monitor errors, warning on the server using Splunk.
Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
Created POC on AWS based on the service required by the project
Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform, NIFI
Set up Hortonworks Infrastructure from configuring clusters to Node
Installed Ambari server on the clouds
Setup security using Kerberos and AD on Hortonworks clusters
Managing the configuration of the cluster to the meet the needs of analysis whether I/O bound or CPU bound
Worked on setting up high availability for major production cluster. Performed Hadoop version updates using automation tools.
Working on setting up 100 node production cluster and a 40 node backup cluster at two different data centers
Automated the setup of Hadoop Clusters and creation of Nodes
Monitor the improvement of CPU utilization and maintain it.
Performance tune and manage growth of the O/S, disk usage, and network traffic
Responsible for building scalable distributed data solutions using Hadoop.
Involved in loading data from LINUX file system to HDFS.
Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products
Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
Implemented test scripts to support test driven development and continuous integration.
Optimization and Tuning the application
Created User Guide Development and Training overviews for supporting teams
Provide troubleshooting and best practices methodology for development teams. This includes process automation and new application onboarding
Design monitoring solutions and baseline statistics reporting to support the implementation
Experience with designing and building solutions for data ingestion both real time & batch using Sqoop/PIG/Impala/Kafka.
Extremely good knowledge and experience with Map Reduce, Spark Streaming, SparkSQL for data processing and reporting.
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
Used Apache Kafka for importing real time network log data into HDFS.
Developed business specific Custom UDF's in Hive, Pig.
Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
Optimized Map Reduce code by writing Pig Latin scripts.
Import data from external table into HIVE by using load command
Created table in hive and use static, dynamic partition for data slicing mechanism
Working experience with monitoring cluster, identifying risks, establishing good practices to be followed in shared environment
Good understanding on cluster configurations and resource management using YARN

Environment: Hadoop, Confluent Kafka, Hortonworks HDF, HDP, NIFI, Linux, Splunk, Java, Puppet, Apache Yarn, Pig, Spark.

Confidential, New York

Hadoop Developer/Admin

Responsibilities:

Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform
Set up Hortonworks Infrastructure from configuring clusters to Node
Installed Ambari server on the clouds
Setup security using Kerberos and AD on Hortonworks clusters/Cloudera CDH
Assign access to users by multiple users login.
Installed and configured CDH cluster, using Cloudera manager for easy management of existing Hadoop cluster.
Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
Extensively using Cloudera manager for managing multiple clusters with petabytes of data.
Having knowledge on documenting processes, server diagrams, preparing server requisition documents
Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
Managing the configuration of the cluster to the meet the needs of analysis whether I/O bound or CPU bound
Worked on setting up high availability for major production cluster. Performed Hadoop version updates using automation tools.
Working on setting up 100 node production cluster and a 40 node backup cluster at two different data centers
Automate operations, installation and monitoring of the Hadoop Framework specifically: HDFS, Map/Reduce, Yarn, HBase.
Automated the setup of Hadoop Clusters and creation of Nodes
Monitor the improvement of CPU utilization and maintain it.
Performance tune and manage growth of the O/S, disk usage, and network traffic
Responsible for building scalable distributed data solutions using Hadoop.
Involved in loading data from LINUX file system to HDFS.
Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products
Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
Implemented test scripts to support test driven development and continuous integration.
Worked on tuning the performance of Mapreduce Jobs.
Responsible to manage data coming from different sources.
Load and transform large sets of structured, semi structured and unstructured data
Experience in managing and reviewing Hadoop log files.
Job management using Fair scheduler.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Using PIG predefined functions to convert the fixed width file to delimited file.
Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
Involved in scheduling Oozie workflow engine to run multiple Hive and Pig job
Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
Created Oozie workflows to run multiple MR, Hive and pig jobs.
Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Develop Spark code using Scala and Spark-SQL for faster testing and data processing
Involved in the development of Spark Streaming application for one of the data source using Scala, Spark by applying the transformations.
Import the data from different sources like HDFS/MYSQL into SparkRDD.
Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.

Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn.

Confidential, New York

Hadoop Developer

Responsibilities:

Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
Performed S3 buckets creation, policies and also on the IAM role based polices and customizing the JSON template.
Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management.
Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
Worked in AWS environment for development and deployment of Custom Hadoop Applications.
Worked closely with the data modellers to model the new incoming data sets.
Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution.
Installed Hadoop, Map Reduce, HDFS, and AWS and developed multiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and Hbase.
Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
Load the data into Spark RDD and do in memory data Computation to generate the Output response.
Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN..
Import the data from different sources like HDFS/Hbase into Spark RDD.
Developed a data pipeline using Kafka and Storm to store data into HDFS.
Performed real time analysis on the incoming data.
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
Implemented Spark using Scala and SparkSQL for faster testing and processing of data.

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive,HBASE, Oozie, Scala, Spark, Linux.

We provide IT Staff Augmentation Services!

Hadoop Developer/admin Resume

New, YorK

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship