- In - depth understanding of SnowFlake cloud technology.
- In-Depth understanding of SnowFlake Multi-cluster Size and Credit Usage
- Played key role in Migrating Teradata objects into SnowFlake environment.
- Experience with Snowflake Multi-Cluster Warehouses .
- Experience with Snowflake Virtual Warehouses.
- Experience in building Snowpipe.
- In-depth knowledge of Data Sharing in Snowflake.
- In-depth knowledge of. Snowflake Database, Schema and Table structures.
- Experience in using Snowflake Clone and Time Travel.
- In-depth understanding of NiFi.
- Experience in building ETL pipelines using NiFi.
- Deep knowledge of various NiFi Processors .
- Experience in Splunk reporting system.
- U nderstanding of Spark Architecture including Spark Core, Spark SQL, Data Frames.
- Excellent knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programing paradigm.
- Progressive experience in the field of Big Data Technologies, Software Programming and Developing, which also includes Design, Integration, Maintenance.
- Worked on Cloudera and Hortonworks distribution.
- Experience in analyzing data using HiveQL and Hive-llap
- Experience in Apache Druid.
- Experience in Sqoop ingesting data from relational to hive.
- Experience in Elastic Search, Kibana.
- Experience in various data ingestion patterns to hadoop.
- Have good Knowledge in ETL and hands on experience in ETL.
- Hands on experience in Hbase, Pig.
- Experience in various methodologies like Waterfall and Agile .
- Experience in working on Unix/Linux operating systems.
Cloud Technologies: Snowflake, AWS.
Spark, Hive: LLAP, Beeline, Hdfs,MapReduce,Pig,Sqoop,HBase,Oozie,Flume
Reporting Systems: Splunk
Hadoop Distributions: Cloudera,Hortonworks
Programming Languages: Scala, Python, Perl, Shell scripting.
Dashboard: Ambari, Elastic Search,Kibana.
DataWareHousing: Snowflake Teradata
DBMS: Oracle,SQL Server,MySql,Db2
Operating System: Windows,Linux,Solaris,Centos,OS X
Servers: Apache Tomcat
Data Integration Tool: NiFi, SSIS
- Involved in Migrating Objects from Teradata to Snowflake.
- Created Snowpipe for continuous data load.
- Used COPY to bulk load the data.
- Created internal and external stage and t ransformed data during load.
- Used FLATTEN table function to produce lateral view of VARIENT, OBECT and ARRAY column.
- Worked with both Maximized and Auto-scale functionality.
- Used Temporary and Transient tables on diff datasets.
- Cloned Production data for code modifications and testing.
- Shared sample data using grant access to customer for UAT.
- Time traveled to 56 days to recover missed data.
- Developed data warehouse model in snowflake for over 100 datasets using whereScape.
- Heavily involved in testing Snowflake to understand best possible way to use the cloud resources.
- Developed ELT workflows using NiFI to load data into Hive and Teradata.
- Worked on Migrating jobs from NiFi development to Pre-PROD and Production cluster.
- Scheduled different Snowflake jobs using NiFi.
- Used NiFi to ping snowflake to keep Client Session alive.
Big Data Engineer
- Played key role in testing Hive LLAP and ACID properties to leverage row level transactions in hive.
- Volunteered in designing an architecture for a dataset in Hadoop with estimated data size of 2PT/day.
- Integrated Splunk reporting services with Hadoop eco system to monitor different datasets.
- Used Avro, Parquet and ORC data formats to store in to HDFS.
- Developed workflow in SSIS to automate the tasks of loading the data into HDFS and processing using hive.
- Develop alerts and timed reports Develop and manage Splunk applications.
- Provide leadership and key stakeholders with the information and venues to make effective, timely decisions.
- Establish and ensure adoption of best practices and development standards.
- Communicate with peers and supervisors routinely, document work, meetings, and decisions.
- Work with multiple data sources.
- Designed and Created Hive external tables using shared Meta-store instead of derby with partitioning, dynamic partitioning and buckets.
- Implemented Apache PIG scripts to load data to Hive.
- Worked with Various HDFS file formats like Avro, Sequence File and various compression formats like snappy, Gzip.
- Used spark-sql to create Schema RDD and loaded it into Hive Tables and handled structured data using Spark SQL.
- Analyzed the SQL scripts and designed the solution to implement using PySpark.
- Partner with Source teams to source the data to hadoop for supporting data science models.
- Used Avro, Parquet and ORC data formats to store in to HDFS.
- Involved in converting Hive/SQL quries into Spark transformation using Spark RDDs.
- Involved in creating Teradata FastLoad scripts.
- Provide assistance to business users for various reporting needs.
- Worked on data transfer mechanism from hive to Teradata.
- Worked with different platform teams to resolve cross dependency.
- Involved in Code Review Discussions, Demo’s to stakeholders.
- Worked on data ingestion from Oracle to hive.
- Managing and scheduling Jobs on a Hadoop Cluster using Active Batch and Crontab.
- Involved in different data migration activities.
- Involved in fixing various issues related to data quality, data availability and data stability.
- Worked in determining various strategies related to data security.
- Worked on Hue interface for Loading the data into HDFS and querying the data.
- Played a key role in Hadoop 2.5.3 Testing.
- Involved in creating and partitioning of hive tables for data loading and analyzing which runs internally in map reduce way.
- Worked with the business users to gather, define business requirements and analyze the possible technical solutions.
- Used real time streaming frameworks like Apache storm to load the data from messaging distribution systems like Apache kafka into hdfs.
- Involved in setting up 3 node storm and Kafka cluster in open stack servers using chef.
- Provide support to data analysts in running hive queries.
- Created Partitioned tables in hive for better performance and fast querying.
- Used Hive to compute various metrics for reporting.
- Implemented dynamic partitions in Hive .
- Involved in Hadoop jobs for processing billions of records of text data.
- Involved in importing data using Sqoop from traditional RDBMS like Db2, oracle, mysql including Teradata to hive.
- Involved in importing data of different formats like JSON, txt, csv, tsv formats to hdfs, hive.
- Monitoring the jobs to analyze performance statistics.
- Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
- Trained the team members regarding different data ingestion patterns.
- Used Kibana for data analysis and product metric visualizations.
- Worked on the maintenance of ISU web pages.
- Involved in requirements discussion with department heads.
- Performed POC on Drupal Framework.