Responsible for designing and implementing the infrastructure needed to store, process and analyze huge data sets by collaborating with cross - functional teams and customer.
Big Data Platform: Hadoop, Confidential, Cloudera, Hortonworks
Cloud Implementation: AWS, Azure, Oracle(OCI), Cloud Formation, Terraform
Micro-services: Docker, Kubernetes, Mesosphere
Data Ingestion/ETL: Spark, Kafka, Pig, Hive, Elastic Search, Flume, Talend
NoSQL/ MPP DB: Cassandra, HBase, MongoDB, Presto, EnterpriseDB
Scripting Tools: R-Studio, Python, Scala, Perl, Zeppelin
BI Tools: Tableau, Kibana, Pentaho
- Performing hands-on architecture and solution design of analytics applications used by customers and their internal users at scale on Cloud Infrastructure.
- Building solutions that use Apache Kafka alongside Hadoop, relational and NoSQL databases, message queues, and related products as MPP Database, Scala and Python.
- Working closely with Management and Decision Science team in the identification, acquisition, cleansing and visualization of data to support various data analytical projects.
- Preparing and presenting potential technical solutions, advice business, and product owners on the technical value of proposals, including tradeoffs and opportunities.
- Building next generation data services platform using Microservice, Docker Container, Kubernetes and Cloud native services.
- Developed data pipeline for advanced analytics, with processing frameworks like Hadoop, Apache Spark and Streaming technologies such as Kafka, Flume and Kinesis.
- Build highly scalable and extensible Big Data platforms on AWS, which enabled collection and analysis of massive data sets including those from IoT and streaming data.
- Managed Hadoop and Spark cluster environments, on bare-metal and container infrastructure, including configuration for the cluster and capacity planning.
- Leading the tool selection and development of key projects implementing Big Data technologies with client's On-Premise data centers or using Cloud Infrastructure.
Hadoop Lead Engineer
- Developed automation framework for installation of Hadoop ecosystem components like HBase, HDFS, Map/Reduce, Yarn, Oozie, Pig, Hive, Impala, Spark and Kafka.
- Leveraging the ETL tools like Pentaho, Control-M and Subversion for developing efficient solutions for data management, conversion, migration and integration.
- Migrated legacy POS application data from Mainframe to Hadoop distributed platform.
- Applying statistical techniques such as Segmentation Analysis, Time Series Analysis to develop and monitor scorecards.
- Worked with Data Science team to implement the machine learning algorithms for advanced analytics use cases including demand forecasting, classification and clustering.
Senior Data Engineer
- Implemented Big Data ecosystem (Hadoop, Map Reduce, Sqoop HBase, Hive, Pig and Mongo DB) to derive insights/analytics from data.
- Build applications for finding baseline stats with mean, deviation and implementing Social Media Analysis, Recommendation and Trends Prediction using R programming.
- Collaborating with project teams (managers, architects, data science) on tools and technology related to the design and development of Big Data solutions in Agile.
- Developed Entity-Relationship models for Retail, Insurance, and Life Science customer. Specific work in modeling data warehouse and operational data store.
- Experience with advanced level of SQL programming and performance tuning techniques for Data Integration and Consumption within an OLTP, OLAP, and MPP architecture.
- Build ETL mappings for processing fact, dimension tables with complex transformation.
- Developed key modules Mappings, Workflows using Informatica, Reports using OBIEE.