- Progressive experience in the field of Big Data Technologies, Software Programming and Developing, which also includes Design, Integration, Maintenance,
- In - depth understanding of SnowFlake cloud technology.
- Excellent knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce program ming paradigm.
- In-Depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames.
- Worked on Cloudera and Hortonworks distribution.
- Experience in analyzing data using HiveQL.
- Experience in building ETL pipelines using NIFI.
- Experience in Splunk reporting system.
- Experience in Apache Druid.
- Experience in Sqoop ingesting data from relational to hive.
- Experience in real time streaming frameworks like Apache Storm.
- Experience in messaging systems like Apache Kafka.
- Experience in Elastic Search, Kibana.
- Experience in various data ingestion patterns to hadoop.
- Have good Knowledge in ETL and hands on experience in ETL.
- Hands on experience in Hbase, Pig.
- Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
- Experience in build tools such as Ant, Maven.
- Experience in various methodologies like Waterfall and Agile.
- Experience in working on Unix/Linux operating systems.
Cloud Technologies: Snowflake, AWS.
Spark, Hive: LLAP, Hdfs,MapReduce,Pig,Sqoop,HBase,Oozie,Flume
Reporting Systems: Splunk
Frameworks: Apache Storm,Apache Kafka
Hadoop Distributions: Cloudera,Hortonworks
Programming Languages: Scala, Python, Perl, Shell scripting.
J2EE Technologies: J2EE,Servlets,JSP,JDBC,Spring,Hibernate
Dashboard: Elastic Search,Kibana, Ambari
DataWareHousing: Teradata, Snowflake
DBMS: Oracle,SQL Server,MySql,Db2
Operating System: Windows,Linux,Solaris,Centos,OS X
Servers: Apache Tomcat
Data Integration Tool: SSIS
- Played key role in Migrating Teradata objects into SnowFlake environment.
- Developed data warehouse model in snowflake for over 100 datasets using whereScape.
- Heavily involved in testing Snowflake to understand best possible way to use the cloud resources.
- Developed ELT workflows using NiFI.
- Integrated Splunk reporting services with Hadoop eco system to monitor different datasets.
- Played key role in testing Hive LLAP and ACID properties to leverage row level transactions in hive.
- Volunteered in designing an architecture for a dataset in Hadoop with estimated data size of 2PT/day.
- Used Avro, Parquet and ORC data formats to store in to HDFS.
Spark/Big Data Engineer
- Developed workflow in SSIS to automate the tasks of loading the data into HDFS and processing using hive.
- Develop alerts and timed reports Develop and manage Splunk applications.
- Provide leadership and key stakeholders with the information and venues to make effective, timely decisions.
- Establish and ensure adoption of best practices and development standards.
- Communicate with peers and supervisors routinely, document work, meetings, and decisions.
- Work with multiple data sources.
- Designed and Created Hive external tables using shared Meta-store instead of derby with partitioning, dynamic partitioning and buckets.
- Implemented Apache PIG scripts to load data to Hive.
- Worked with Various HDFS file formats like Avro, Sequence File and various compression formats like snappy, Gzip.
- Used spark-sql to create Schema RDD and loaded it into Hive Tables and handled structured data using Spark SQL.
- Analyzed the SQL scripts and designed the solution to implement using PySpark.
- Partner with Source teams to source the data to hadoop for supporting data science models.
- Used Avro, Parquet and ORC data formats to store in to HDFS.
- Involved in converting Hive/SQL quries into Spark transformation using Spark RDDs.
- Involved in creating Teradata FastLoad scripts.
- Provide assistance to business users for various reporting needs.
- Worked on data transfer mechanism from hive to Teradata.
- Worked with different platform teams to resolve cross dependency.
- Involved in Code Review Discussions, Demo’s to stakeholders.
- Worked on data ingestion from Oracle to hive.
- Managing and scheduling Jobs on a Hadoop Cluster using Active Batch and Crontab.
- Involved in different data migration activities.
- Involved in fixing various issues related to data quality, data availability and data stability.
- Worked in determining various strategies related to data security.
- Worked on Hue interface for Loading the data into HDFS and querying the data.
- Played a key role in Hadoop 2.5.3 Testing.
- Involved in creating and partitioning of hive tables for data loading and analyzing which runs internally in map reduce way.
- Worked with the business users to gather, define business requirements and analyze the possible technical solutions.
- Used real time streaming frameworks like Apache storm to load the data from messaging distribution systems like Apache kafka into hdfs.
- Involved in setting up 3 node storm and Kafka cluster in open stack servers using chef.
- Provide support to data analysts in running hive queries.
- Created Partitioned tables in hive for better performance and fast querying.
- Used Hive to compute various metrics for reporting.
- Implemented dynamic partitions in Hive.
- Involved in Hadoop jobs for processing billions of records of text data.
- Involved in importing data using Sqoop from traditional RDBMS like Db2, oracle, mysql including Teradata to hive.
- Involved in importing data of different formats like JSON, txt, csv, tsv formats to hdfs, hive.
- Monitoring the jobs to analyze performance statistics.
- Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
- Trained the team members regarding different data ingestion patterns.
- Used Kibana for data analysis and product metric visualizations.
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig and Map Reduce.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
- Computed various metrics using Java MapReduce to calculate metrics that define user experience, revenue etc.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS Designed and implemented various metrics that can statistically signify the success of the experiment.
- Involved in using SQOOP for importing and exporting data into HDFS and Hive.
- Involved in processing ingested raw data using MapReduce, Apache Pig and Hive.
- Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
- Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
- Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or CopyToLocal.
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
- Responsible for coding Map Reduce program, Hive queries, testing and debugging the Map Reduce programs.
- Responsible for Installing, Configuring and Managing of Hadoop Cluster spanning multiple racks.
- Developed Pig Latin scripts in the areas where extensive coding needs to be reduced to analyze large data sets.
- Used Sqoop tool to extract data from a relational database into Hadoop.
- Involved in performance enhancements of the code and optimization by writing custom comparators and combiner logic.
- Worked closely with data warehouse architect and business intelligence analyst to develop solutions.
- Good understanding of job schedulers like Fair Scheduler which assigns resources to jobs such that all jobs get, on average, an equal share of resources over time and an idea about Capacity Scheduler.
- Responsible for performing peer code reviews, troubleshooting issues and maintaining status report.
- Involved in creating Hive Tables, loading with data and writing Hive queries, which will invoke and run MapReduce jobs in the backend.
- Involved in identifying possible ways to improve the efficiency of the system. Involved in the requirement analysis, design, development and Unit Testing use of MRUnit and Junit
- Prepare daily and weekly project status report and share it with the client.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.