Senior Big DataHadoop Developer Resume Buffalo, NEW YORK - Hire IT People

PROFESSIONAL SUMMARY:

Around 7 years of experience in Information Technology which includes experience in Big data, Hadoop Ecosystem like HDFS, MapReduce, Yarn, Pig Hive,HBase, Sqoop, Oozie, Flume, Zookeeper.
Cloudera Certified Administrator for Apache Hadoop (CCAH)
5 Years’ experience installing, configuring, testing Hadoop ecosystem components.
Experience in Writing Map Reduce programs in Java.
Excellent work Experience with 5 TB data on Data migration, data preprocessing, validations and data Analysis in HDFS .
Experience in Managing scalable Hadoop clusters including Cluster designing, provisioning, custom configurations, monitoring and maintaining using different Hadoop distributions: Cloudera CDH, Hortonworks HDP.
Strong knowledge in Administration and installation of Hadoop ecosystem in a multi node cluster.
Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
Good understanding of HDFS Designs, Daemons, federation and HDFS high availability (HA).
Experience in managing Hadoop cluster using Cloudera Manager and AMBARI.
Experience in importing and exporting data using Sqoop from Relational Database Systems (RDBMS) to HDFS.
Hands on experience working with Java project build managers Apache MAVEN.
Knowledge of UNIX and shell scripting.
Good knowledge in integration of various data sources like RDBMS, Spreadsheets, Text files and XML files.
In depth knowledge of Object Oriented programming methodologies (OOPS) and object Oriented features like Inheritance , Polymorphism, Exception handling and Templates and Development experience with Java technologies.
Experienced in configuring Workflow scheduling using Oozie.
Developed Spark applications using Scala for easy Hadoop transitions.
Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop File System (HDFS).
Designed, and implemented a hybrid cloud virtual data center utilizing AWS to provide servers, storage, networks. High - availability, backup and disaster recovery, demand forecast, capacity planning, and performance management.
Worked on NoSQL databases including Hbase , Cassandra and Mongo DB.
Cluster planning and engineering of POC and Production Clusters.
Good experience in optimizing Map Reduce algorithms using Mappers, Reducers, combiners and practitioners to deliver the best results for the large datasets.
Used A pache Kafka for tracking data ingestion to Hadoop cluster.

TECHNIKAL SKILLS:

Operating systems: WINDOWS, LINUX (Fedora, CentOS), UNIX

Languages and Technologies: C, C++, Java, SQL, PLSQL

Scripting Languages: Shell scripting

Databases: Oracle, MySQL, Postgre SQL

IDE: Eclipse and Net Beans, SBT

Application Servers: Apache Tomcat server, Apache HTTP webserver

Hadoop Ecosystem: HadoopMapReduce, HDFS, Flume, Sqoop, Hive, Pig, Oozie, Cloudera Manager Zookeeper. AWS EC2

Apache Spark: Spark, Spark SQL, Spark Streaming.SCALA, spark with python

Cluster Mgmt.& Monitoring: Cloudera Manager, Horton works Ambari, Ganglia and Nagios.

Security: Kerberos.

PROFESSIONAL EXPERIENCE:

Confidential, Buffalo, NEW YORK

Senior Big DataHadoop Developer

Responsibilities:

Hands on Experience in installing, configuring and using ecosystem components like MapReduce, HDFS, Pig, Hive, Sqoop and Flume.
Experience using HORTONWORKS data platform.
Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration.
Monitored workload, job performance and capacity planning using Cloudera Manager.
Worked in AWS environment for development and deployment of Custom HADOOP Applications.
Experience working with Apache NIFI.
Developed Pig UDF s to pre-process the data for analysis.
Experience in implementing ETL/ELT processes with MapReduce, PIG, Hive.
Used SQOOP to import customer information data from MySQL database into HDFS for data processing.
Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
Experience in using Sequence files, AVRO file, Parquet file formats; Managing and reviewing Hadoop log files.
Dealing with high volume of data in the cluster.
Collected logs data from web servers and integrated into HDFS using Apache Flume.
Tuned the developed ETL jobs for better performance.
Imported logs from web servers with Flume to ingest the data into HDFS
Commissioned and Decommissioned nodes on CDH5 Hadoop cluster on Red hat LINUX
Good troubleshooting skills on Hue , which provides GUI for developer’s/business users for day to day activities
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Used Oozie and Zookeeper for workflow scheduling and monitoring.
Created Hive Managed and External tables defined with static and dynamic partitions.
Load the data into Spark RDD and do in memory data Computation to generate the Output response.
Spark Streaming is used to get the Web server log files.
Integrated Apache Spark with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
Developed multiple Kafka Producers and Consumers from base by using low level and high level API’s.
Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.

Environment: AWS, HADOOP, CDH4, CDH5, CLOUDERAMANAGER, HORTONWORKS, MAPREDUCE, HIVE, PIG, SPARK, SCALA, HOROTONWORKS DATA PLATFORM, TALEND, SQL, SQOOP, FLUME and ECLIPSE.

Confidential, Des Monies, IA

Senior Hadoop Developer

Responsibilities:

Worked on installing cluster, commissioning & decommissioning of Datanode, Namenode high Availability, capacity planning, and slots configuration.
Good Expertise in ETL & Reporting Tools (AB INITIO, Informatica, Talend) .
Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files
Hands on experience in writing MR jobs for cleansing the data and to copy it to AWS cluster form our cluster.
Used HORTONWORKS data platform.
Experienced on adding/installation of new components and removal of them through Ambari.
Monitoring systems and services through Ambari dashboard to make the clusters available for the business
Worked on analyzing, writing HadoopMapReduce jobs using Java API, Pig and Hive
Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
Developing scripts and batch job to schedule a bundle (group of coordinators), which consists of various Hadoop programs using Oozie.
Works with ETL workflow, analysis of big data and loaded them into Hadoop cluster
Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
Developed Spark Application by using Scala
Load and transform large sets of structured, semi structured and unstructured data
ConfiguredMySQL Database to store Hive metadata.
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of spark using Scala.
Developed Spark code using Scala and spark-SQL/Streaming for faster testing and processing of data.
Responsible for loading unstructured data into Hadoop File System (HDFS).
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle 9i.
Created reports for the BI team using Sqoop to export data into HDFS and Hive.
Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
Worked on Map reduce Joins in querying multiple semi-structured data as per analytic needs.
Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in OOZIE.

Environment: HORTON WORKS,Hadoop, Pig, Hive, Java, SQOOP, Kafka,HBase, NoSQL, Oracle, SPARK, SCALA, STORM, ELASTIC SEARCH, ZOOKEEPER, OOZIE, RED HAT LINUX, TABLEAU.

Confidential, Seattle, WA

Hadoop developer

Responsibilities:

Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Hbase, Zookeeper and Sqoop.
Extensively involved in Installation and configuration of Cloudera distribution Hadoop Name Node, Secondary NameNode, Resource Manager, Node Manager and Data Nodes.
Collected the logs data from web servers and integrated into HDFS using Flume.
Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
Installed Oozie workflow engine to run multiple Hive Jobs
Worked with Kafka for the proof of concept for carrying out log processing on distributed system.
Developed data pipeline using Flume, Sqoop and Java map reduce to ingest customer behavioural data and financial histories into HDFS for analysis.
Configured various property files like core-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
Worked on Hue interface for querying the data
Automating system tasks using Puppet.
Created Hive tables to store the processed results in a tabular format.
Utilized cluster co-ordination services through ZooKeeper.
Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
Involved in collecting metrics for Hadoop clusters using Ganglia and Nagios
Configuring Sqoop and Exporting/Importing data into HDFS
Configured NameNode high availability and NameNode federation.
Experienced in loading data from UNIX local file system to HDFS.
Managing and scheduling Jobs on a Hadoop cluster using Oozie.
Configured NameNode high availability and NameNode federation.
Use of Sqoop to import and export data from HDFS to Relational database and vice-versa.
Data analysis in running Hive queries.
Generated reports using the Tableau report designer.

Environment: HDFS, Cloudera Manager, Map Reduce, Spark, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zookeeper, Puppet, Tableau, and Java.

Confidential

Java Developer

Responsibilities:

Re-architected all the applications to utilize the latest infrastructure in a span of three months and helped the developers to implement successfully.
Designed the Hadoop jobs to create the product recommendation using collaborative filtering.
Designed the COSA pretest utility Framework using JSF MVC, JSF Validation, Tag library and JSF Baking beans.
Integrated the Order Capture system with Sterling OMS using JSON Web service
Configured the ESB to transform the Order capture XML to Sterling message.
Configured and Implemented Jenkins, Maven and Nexus for continuous integration.
Mentored and implemented the test driven development (TDD) strategies.
Loaded the data from Oracle to HDFS (Hadoop) using Sqoop.
Developed the Data transformation script using hive and MapReduce.
Designed and implemented the Open API using Spring REST webservice.
Proposed the integration pipeline testing strategy-using cargo.

We provide IT Staff Augmentation Services!

Senior Big Datahadoop Developer Resume

Buffalo New, YorK

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship