- Sr. Big data Developer with 11 years of professional experience in Big data and data warehouse capabilities using latest frameworks, tools and design patterns.
- Experience in real world analytics projects using Hadoop (MapReduce, HDFS) framework and its ecosystems - Hive, Pig, Sqoop, Oozie
- Experience in batch and streaming big data analytics using Spark, Scala, Apache Kafka.
- Experience and Knowledge of working in Relational Databases (Oracle) and NoSQL databases like HBase.
- Knowledge and experience in various AWS cloud computing services like S3, EC2, EMR, Lambda, SNS, SQS, Step functions
- Experience in Informatica power center, SQL, Intensive data processing and data modeling and creating ETL data pipelines.
- Experience working in Scrum, Agile and Continuous Integration development methodologies.
- Strong aptitude and willingness to learn and implement new technologies.
- Hadoop, MapReduce-Java, Spark-Scala, Hive, Pig, Sqoop, Oozie
- HBase, Oracle 9i/10g, MySQL
- AWS Cloud
- Machine Learning
- PL/SQL, and SQL, Python, R, Unix Shell Scripts
- Informatica power center
- Data modeling
Sr. Big Data developer
- Generated Market performance reports of Amex vs its competitors using third party data using Hive and mapreduce framework on mapr platform.
- Implemented complex string matching algorithms in Java mapreduce jobs.
- Implemented data pipelines and intensive data processing jobs for cleaning, transforming and matching externally sourced data with Amex internal data.
- Fine tuned map reduce jobs by implementing MapReduce(Java/Hive) ETL jobs in Spark(Scala) framework.
- Implemented Data quality module using python to validate files and generate statistics.
- Implemented batch job scheduling in Oozie using forks and joins.
- Coordinated with product managers and other development team members to translate loose requirements to completed stories and implement using Agile methodology.
Big Data developer
- For a major retail client, implemented Business critical solution for Data analytics using Spark(Scala) programming, Hive and Hadoop HDFS and HBase storage to replace existing ETL process for better efficiency.
- Implemented ETL workflows into AWS cloud using S3 storage and EMR platform and automated workflows using AWS Step functions and notification services like SNS, SQS via event based triggering using lambda.
- Implemented and automated Hive queries to extract and dump data in HDFS to be used for Analytics and reporting.
- Implemented Pig scripts for data cleansing and processing.
- Implemented Sqoop for data extraction from relational database to HDFS.
- Worked as a key ETL(Informatica) and Big data developer as well as data analyst for one of the largest telecom client in India and acted as a technical mentor across team .
- Worked in Onsite-Offshore delivery model with development teams and customers distributed across geographies.
- Designed and implemented complex ETL data process for a major Insurance client using Informatica Power Center and advanced SQL queries (analytical functions) .
- Worked on query optimization and wrote stored procedures, functions in Oracle PL/SQL .
- Designed and implemented dimensional model for the warehouse system.