- Experience in designing, implementing large scale data processing, data storage and data distributed systems.
- Expertise in working with large data sets of Hadoop File Systems HDFS ,Map Reduce, Hive, Pig, Sqoop, Flume, Oozie to build robust Big Data solutions.
- Efficient in building hive, pig scripts by capturing data from relational databases that provide SQL interfaces using Sqoop.
- Strong understanding of Big Data Analytics platforms and ETL in the context of Big Data.
- Designing Hadoop clusters that can be scalable of various configuration parameters and helps arrive at values for optimal cluster performance.
- Expertise in data cleansing data mining and efficient in building hive, pig scripts of loaded the dataset into Hive for ETL Extract, Transfer and Load operation.
- Developed free text search solution with Hadoop and SOLR.
- Strong technical background skills with the ability to perform business analysis. Understand the business requirement and extensively involved in the client deliverables
- Involved in technical and functional document irrespective of the project requirements
- Extensively used ETL methodology of development deployment experience in Banking, Insurance and Infrastructure domain with a strong understanding of data analyticsbyloading of data using Informatica and SSIS.
- Being a flexible, highly motivated and effective leader/ leads with excellent analytical, troubleshooting and problem solving skills to develop creative solution for challenging client requirements.
- Hadoop: HDFS, MapReduce, Hive, Pig, Flume, Mahout, Avro, Oozie, ZooKeeperand SOLR.
- Dev. Tools : CentOS, Ubuntu, Linux, Eclipse, Xcode,
- Languages : Java, Python, C / C
- ETL tools : SSIS 2012/2008/2005, Informatica 7.1
- Databases: Oracle 11g/8i, SQL Server 2012 /2008
- Knowledge in NoSQL:Hbase, MongoDband Cassandra.
- Knowledge in HortonWorks,Cloudera, Tez, Ambari, Spark and Storm.
Role : Big Data Lead Consultant
Environment :Hadoop, MR, Flume, Hive,Pig, SQOOP, Oozie, Ubuntuand CentOS
- Co-ordinate with business and understand analytics requirements
- Extraction of log files from different sources and loaded into HDFS for analysation using Flume
- Created Pig scripts with the logic and computation.
- Automated the jobs by pulling data from different sources to load data into HDFStables using Oozie workflows.
- Interface with SME's, Analytics team Account managers and Domain Architects to review to-be developed solution
- Convert high level solution into deliverables by generating a controllable and manageable activity list.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability
Use Case - 1
- Integrated the home loan credit data by loading into staging using ETL.
- Fetch the ETL stage data from RDBMS and moved into HDFS using SQOOP. Initiate the automated process on daily basis using Oozie workflows.
- Created Pig tasks and implemented the logic to load the data and store the refined data into EDW
- Automated the jobs by pulling data from HDFS into PIG using Oozie workflows.
- Involved in administration of Hadoop, hive and pig.
Use Case - 2
- Using Sqoop to export the data into HDFS.
- Setup theHbase tables to load large data sets of structured and semi-structured data.
- Implemented the business logic and created the reports for BI team.
- Claim data has been moved from source system into hadoop using Sqoop on daily basis with scheduled Oozie workflows.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Business logic implemented for categorization of claims using Pig.
- The refined data from HDFS has been moved to relational database using Sqoop on daily basis with scheduled Oozie workflows.
Role : ETL Database Architect
Environment :SSIS, SQL Server 2008 / 2005, Remedy 7.6.04, ESM Lotus Notes , Cognos 8.3 and SSRS.
- Data mapping with an analysis of service levels of various ITIL areas with respect to business requirements.
- Responsible for the DBA support to multiple regions globally
- Integration of data with various sources such as Remedy, Avaya, CMDB, Survey Central, Lotus Notes and flat files
- Implement and maintain database security create and maintain users, roles, assign privileges
- Creation of SSIS packages, loading of data, package execution scheduler, data error logging and trigger of automated emails.
- Deploy SSIS Packages include, a master package which executes a child packages. The package created includes a variety of transformations like Execute SQL Task, Script Task, Execute Package Task, File Connection, Derived Column and For Each Loop.
- Co-ordinates with customers / other teams / vendors in resolving issues
- Participated in the detailed requirement analysis for the design of data marts and schemas.
- Rebuild / Monitor the indexes at regular intervals for better performance.
- Implemented numerous components as part of an end-to-end implementation of BI solutions includes data mart, ETL design and SQL queries.
- Responsible for monitoring and making recommendations for performance improvement in hosted databases. This involved index creation, removal, modification, file group modifications and adding scheduled jobs to re-index and update statistics in databases.
- Designed a metric summary page which provides a high level view of account health. Graphic Visualizations includes Incident Management, Problem Management, Change Management, Service Desk and Configuration Management.