- Over 7+ years of IT experience with 4+ years of Big Data experience.
- Experience in installing, configuring and troubleshooting Hadoop ecosystem components like Map Reduce, HDFS, Sqoop, Impala, Pig, Flume, Hive, HBase, Zoo Keeper
- Experience on Hadoop CDH3 and CDH4,CDH5
- Experience in upgrading the existing Hadoop cluster to latest releases.
- Experienced in using NFS (network file systems) for Name node metadata backup.
- Experience in using Cloudera Manager 4.0, 5.0 for installation and management of Hadoop cluster.
- Experience in supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud.
- Exporting and importing data into S3.
- ConfiguredName node HA on the existingHadoopcluster using Zookeeper quorum
- Expertise in writing Shell scripting in UNIX using ksh and bash
- Experienced in developing and implementing Map Reduce jobs using java to process and perform various analytics on large datasets
- Good experience in writing Pig Latin scripts and Hive queries
- Good understanding of Data Structure and Algorithms
- Good experience on developing of ETL Scripts for Data cleansing and Transformation
- Experience in Data migration from existing data stores and mainframe NDM (Network Data mover) to Hadoop.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Experience in supporting analysts by administering and configuring HIVE.
- Hands - on programming experience in various technologies like JAVA, JSP, Servlets, SQL, JDBC, HTML, XML, REST, Eclipse, Visual Studio on Windows.
- Experience writing SQL queries and working with Oracle and My SQL.
- Expertise in Object-oriented analysis and programming(OOAD) like UML and use of various design patterns
- Have dealt with end users in requirement gathering, user experience and issues
- Experience in preparing deployment packages and deploying to Dev and QA environments and prepare deployment instructions to Production Deployment Team
- Team player with excellent analytical, communication and project documentation skills
- Agile Methodology and Iterative development
Programming Languages: Java, C, C++, SQL
Hadoop Ecosystems: HDFS, Hive, Pig, Flume, Impala, Oozie, Zookeeper, HBASE and Sqoop.
Operating System: Linux, Windows XP, Server 2003, Server 2008.
Databases: Oracle, MySQL and SQL Server.
Tools: Ant, Maven, TOAD, AgroUML, WinSCP, Putty
Version Control: VSS, SVN and CVS
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Involved in installing Hadoop Ecosystem components on 50 nodes production.
- Installed and configured the Hadoop name node ha service using Zookeeper.
- Installed and configured Hadoop security and access controls using Kerberos, Active Directory
- Responsible to manage data coming from different sources into hdfs through Sqoop, Flume
- Troubleshooting and monitoring Hadoop services using Cloudera manager.
- Monitoring and tuning Map Reduce Programs running on the cluster.
- Involved in HDFS maintenance and loading of structured and unstructured data.
- Developed several Map Reduce Programs for data preprocessing
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Wrote Hive queries for data analysis to meet the business requirements.
- Exporting data from RDBMS to HIVE, HDFS and HIVE, HDFS to RDBMS by using SQOOP.
- Prepared System Design document with all functional implementations.
- Involved in Data model sessions to develop models for HIVE tables.
- Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MapReduce, HIVE, SQOOP and Pig Latin.
- Converting ETL logic to Hadoop mappings.
- Extensive hands on experience in Hadoop file system commands for file handling operations.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
- Worked with parsing XML files using Map reduce to extract sales related attributed and store it in HDFS.
- Involved in building TBUILD scripts to import data from Teradata using Teradata Parallel transport APIs.
Environment: CDH, Hadoop, HDFS, MapReduce, Hive, Sqoop, Pig, XML, Cloudera Manager, Teradata
Hadoop Devops Consultant
- Development and ETL Design in Hadoop
- Developed Mapreduce Input format to read visa specific data format.
- Performance tuning of Hive Queries written by data analysts.
- Developing Hive queries and udf’s as per requirement.
- Migrating existing Ab initio transformation logic to Hadoop Pig Latin and Udf's.
- Used Sqoop to efficiently Transfer data from DB2 to HDFS, Oracle Exadata to HDFS
- Designing ETL flow for several newly on boarding Hadoop Applications.
- Worked on implementing Hadoop Streaming, Python mapreduce for visa analytics
- Implemented NLine Input Format to split a single file into multiple small files
- Designed and Developed oozie workflows, integration with Hcatalog/Pig.
- Documented ETL Best Practices to be implemented with Hadoop
- Monitoring and Debugging Hadoop jobs/Applications running in production.
- Worked on Hadoop CDH upgrade from CDH4.x to CDH5.x.
- Worked on Providing User support and application support on Hadoop Infrastructure.
- Reviewing ETL application use cases before on boarding to Hadoop.
- Worked on Evaluating, comparing different tools for test data management with Hadoop.
- Helping testing team to get up to speed on Hadoop Application testing.
- Worked on Integration of Hiveserver2 with Tableau.
- Worked on impala performance tuning with different workloads and file formats.
- Worked on Installing 20 node UAT Hadoop cluster
- Worked on POC of Talend integration with hadoop
Technologies: Hadoop, HDFS, MapReduce, Hive, Sqoop, Pig, DB2, Oracle, XML, Cloudera Manager
- Worked on the Hadoop File System Java API to develop or Compute the Disk Usage Statistics.
- Experience in Developing the Hive queries for the transformations, aggregations and Mappings on the Maps Data.
- Developed map reduce programs for applying business rules on the data
- Developed and executed Hive Queries for denormalizing the data.
- Automated workflow using Shell Scripts.
- Performance Tuning on Hive Queries.
- Involved in migration of data from Hadoop Cluster to the Hadoop Cluster.
- Worked on configuring multiple Map Reduce Pipelines, for the new Hadoop Cluster.
- Worked on Configuring the New Hadoop Cluster.
Environment: Java, CDH, Hadoop, HDFS, Map Reduce, Hive and Sqoop
- Developed the user interface screens using Swing for accepting various system inputs such as contractual terms, monthly data pertaining to production, inventory and transportation.
- Involved in designing Database Connections using JDBC.
- Involved in creating tables, stored procedures in SQL for data manipulation and retrieval using SQL SERVER 2000, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle.
- Developed the business components (in core Java) used for the calculation module (calculating various entitlement attributes).
- Involved in the logical and physical database design and implemented it by creating suitable tables, views and triggers.
- Created the related procedures and functions used by JDBC calls in the above components.
- Involved in fixing bugs and minor enhancements for the front-end modules.
Environment: Java, HTML, Java Script, CSS, Oracle, JDBC, Swing and Eclipse.