- Around 7 years of experience in Information Technology which includes experience in Big data, Hadoop Ecosystem like HDFS, MapReduce, Yarn, Pig Hive,HBase, Sqoop, Oozie, Flume, Zookeeper.
- Cloudera Certified Administrator for Apache Hadoop (CCAH)
- 5 Years’ experience installing, configuring, testing Hadoop ecosystem components.
- Experience in Writing Map Reduce programs in Java.
- Excellent work Experience with 5 TB data on Data migration, data preprocessing, validations and data Analysis in HDFS .
- Experience in Managing scalable Hadoop clusters including Cluster designing, provisioning, custom configurations, monitoring and maintaining using different Hadoop distributions: Cloudera CDH, Hortonworks HDP.
- Strong knowledge in Administration and installation of Hadoop ecosystem in a multi node cluster.
- Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
- Good understanding of HDFS Designs, Daemons, federation and HDFS high availability (HA).
- Experience in managing Hadoop cluster using Cloudera Manager and AMBARI.
- Experience in importing and exporting data using Sqoop from Relational Database Systems (RDBMS) to HDFS.
- Hands on experience working with Java project build managers Apache MAVEN.
- Knowledge of UNIX and shell scripting.
- Good knowledge in integration of various data sources like RDBMS, Spreadsheets, Text files and XML files.
- In depth knowledge of Object Oriented programming methodologies (OOPS) and object Oriented features like Inheritance , Polymorphism, Exception handling and Templates and Development experience with Java technologies.
- Experienced in configuring Workflow scheduling using Oozie.
- Developed Spark applications using Scala for easy Hadoop transitions.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop File System (HDFS).
- Designed, and implemented a hybrid cloud virtual data center utilizing AWS to provide servers, storage, networks. High - availability, backup and disaster recovery, demand forecast, capacity planning, and performance management.
- Worked on NoSQL databases including Hbase , Cassandra and Mongo DB.
- Cluster planning and engineering of POC and Production Clusters.
- Good experience in optimizing Map Reduce algorithms using Mappers, Reducers, combiners and practitioners to deliver the best results for the large datasets.
- Used A pache Kafka for tracking data ingestion to Hadoop cluster.
Operating systems: WINDOWS, LINUX (Fedora, CentOS), UNIX
Languages and Technologies: C, C++, Java, SQL, PLSQL
Scripting Languages: Shell scripting
Databases: Oracle, MySQL, Postgre SQL
IDE: Eclipse and Net Beans, SBT
Application Servers: Apache Tomcat server, Apache HTTP webserver
Hadoop Ecosystem: HadoopMapReduce, HDFS, Flume, Sqoop, Hive, Pig, Oozie, Cloudera Manager Zookeeper. AWS EC2
Apache Spark: Spark, Spark SQL, Spark Streaming.SCALA, spark with python
Cluster Mgmt.& Monitoring: Cloudera Manager, Horton works Ambari, Ganglia and Nagios.
Confidential, Buffalo, NEW YORK
Senior Big DataHadoop Developer
- Hands on Experience in installing, configuring and using ecosystem components like MapReduce, HDFS, Pig, Hive, Sqoop and Flume.
- Experience using HORTONWORKS data platform.
- Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Worked in AWS environment for development and deployment of Custom HADOOP Applications.
- Experience working with Apache NIFI.
- Developed Pig UDF s to pre-process the data for analysis.
- Experience in implementing ETL/ELT processes with MapReduce, PIG, Hive.
- Used SQOOP to import customer information data from MySQL database into HDFS for data processing.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Experience in using Sequence files, AVRO file, Parquet file formats; Managing and reviewing Hadoop log files.
- Dealing with high volume of data in the cluster.
- Collected logs data from web servers and integrated into HDFS using Apache Flume.
- Tuned the developed ETL jobs for better performance.
- Imported logs from web servers with Flume to ingest the data into HDFS
- Commissioned and Decommissioned nodes on CDH5 Hadoop cluster on Red hat LINUX
- Good troubleshooting skills on Hue , which provides GUI for developer’s/business users for day to day activities
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Used Oozie and Zookeeper for workflow scheduling and monitoring.
- Created Hive Managed and External tables defined with static and dynamic partitions.
- Load the data into Spark RDD and do in memory data Computation to generate the Output response.
- Spark Streaming is used to get the Web server log files.
- Integrated Apache Spark with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
- Developed multiple Kafka Producers and Consumers from base by using low level and high level API’s.
- Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
Environment: AWS, HADOOP, CDH4, CDH5, CLOUDERAMANAGER, HORTONWORKS, MAPREDUCE, HIVE, PIG, SPARK, SCALA, HOROTONWORKS DATA PLATFORM, TALEND, SQL, SQOOP, FLUME and ECLIPSE.
Confidential, Des Monies, IA
Senior Hadoop Developer
- Worked on installing cluster, commissioning & decommissioning of Datanode, Namenode high Availability, capacity planning, and slots configuration.
- Good Expertise in ETL & Reporting Tools (AB INITIO, Informatica, Talend) .
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files
- Hands on experience in writing MR jobs for cleansing the data and to copy it to AWS cluster form our cluster.
- Used HORTONWORKS data platform.
- Experienced on adding/installation of new components and removal of them through Ambari.
- Monitoring systems and services through Ambari dashboard to make the clusters available for the business
- Worked on analyzing, writing HadoopMapReduce jobs using Java API, Pig and Hive
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
- Developing scripts and batch job to schedule a bundle (group of coordinators), which consists of various Hadoop programs using Oozie.
- Works with ETL workflow, analysis of big data and loaded them into Hadoop cluster
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Developed Spark Application by using Scala
- Load and transform large sets of structured, semi structured and unstructured data
- ConfiguredMySQL Database to store Hive metadata.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of spark using Scala.
- Developed Spark code using Scala and spark-SQL/Streaming for faster testing and processing of data.
- Responsible for loading unstructured data into Hadoop File System (HDFS).
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle 9i.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Worked on Map reduce Joins in querying multiple semi-structured data as per analytic needs.
- Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in OOZIE.
Environment: HORTON WORKS,Hadoop, Pig, Hive, Java, SQOOP, Kafka,HBase, NoSQL, Oracle, SPARK, SCALA, STORM, ELASTIC SEARCH, ZOOKEEPER, OOZIE, RED HAT LINUX, TABLEAU.
Confidential, Seattle, WA
- Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Hbase, Zookeeper and Sqoop.
- Extensively involved in Installation and configuration of Cloudera distribution Hadoop Name Node, Secondary NameNode, Resource Manager, Node Manager and Data Nodes.
- Collected the logs data from web servers and integrated into HDFS using Flume.
- Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
- Installed Oozie workflow engine to run multiple Hive Jobs
- Worked with Kafka for the proof of concept for carrying out log processing on distributed system.
- Developed data pipeline using Flume, Sqoop and Java map reduce to ingest customer behavioural data and financial histories into HDFS for analysis.
- Configured various property files like core-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
- Worked on Hue interface for querying the data
- Automating system tasks using Puppet.
- Created Hive tables to store the processed results in a tabular format.
- Utilized cluster co-ordination services through ZooKeeper.
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Involved in collecting metrics for Hadoop clusters using Ganglia and Nagios
- Configuring Sqoop and Exporting/Importing data into HDFS
- Configured NameNode high availability and NameNode federation.
- Experienced in loading data from UNIX local file system to HDFS.
- Managing and scheduling Jobs on a Hadoop cluster using Oozie.
- Configured NameNode high availability and NameNode federation.
- Use of Sqoop to import and export data from HDFS to Relational database and vice-versa.
- Data analysis in running Hive queries.
- Generated reports using the Tableau report designer.
Environment: HDFS, Cloudera Manager, Map Reduce, Spark, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zookeeper, Puppet, Tableau, and Java.
- Re-architected all the applications to utilize the latest infrastructure in a span of three months and helped the developers to implement successfully.
- Designed the Hadoop jobs to create the product recommendation using collaborative filtering.
- Designed the COSA pretest utility Framework using JSF MVC, JSF Validation, Tag library and JSF Baking beans.
- Integrated the Order Capture system with Sterling OMS using JSON Web service
- Configured the ESB to transform the Order capture XML to Sterling message.
- Configured and Implemented Jenkins, Maven and Nexus for continuous integration.
- Mentored and implemented the test driven development (TDD) strategies.
- Loaded the data from Oracle to HDFS (Hadoop) using Sqoop.
- Developed the Data transformation script using hive and MapReduce.
- Designed and implemented the Open API using Spring REST webservice.
- Proposed the integration pipeline testing strategy-using cargo.