Senior Data Engineer (snowflake Developer) Resume
San Francisco, CA
SUMMARY
- Over 10 years of experience in the field of Big Data, RDBMS and Busniess Intaligences with fine prominence in designing and implementing statistically significant analytic solutions to bulid enterprise applications.
- 4+ years of implementation and extensive working experience in writing Hadoop Jobs for analyzing data using wide array of tools in Big Data like Hive, Pig, MapReduce, Spark, Flume, Oozie, Sqoop, Kafka, ZooKeeper and HBase.
- 5+ years of implementation and extensive working experience in creating enterprise BI application using MicroStratgey, Looker, Tableau and Power BI.
- Having good experience in Hadoop framework and related technologies like HDFS, MapReduce, Pig, Hive, HBase, Spark, ZooKeeper, Kafka, Sqoop and Oozie.
- Experience in complete Software Development Life Cycle (SDLC) in both Waterfall and Agile methodologies.
- Experience in Managing scalable Hadoop clusters including Cluster designing, provisioning, custom configurations, monitoring and maintaining using Hadoop distributions: Cloudera CDH, Horton Works HDP
- Experience in monitoring, tuning and administrating Hadoop cluster.
- Hands on experience in installing configuring and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase Hive Sqoop Pig Zookeeper and Flume.
- Having experience on Apache Hadoop technologies like Hadoop distributed file system (HDFS), Map Reduce framework, Hive, PIG, Pyspark, Sqoop, Oozie, HBase, Spark, Scala and Python.
- Proficient in SQLite, MySQL and SQL databases with Python.
- 3+ years of experience in writing python as ETL framework and Pyspark to process huge amount of data daily.
- Good understanding of HDFS Designs, Daemons, federation and HDFS high availability (HA).
- Good working experience in importing data using Sqoop from various sources like RDMS, Teradata, Mainframes, Oracle, Netezza to HDFS and performed transformations and aggregations on it using Hive, Pig and Spark.
- Experience in AWS cloud solution development using Lambda, SQS, SNS, Dynamo DB, Athena, S3, EMR, EC2, Redshift, Glue, and CloudFormation.
- Experience in using Microsoft Azure SQL database, Data Lake, Azure ML, Azure data factory, Functions, Databricks and HDInsight.
- Working experience in Big data on cloud using AWS EC2 & Microsoft Azure, and handled redshift & Dynamo databases with huge amount of data 300 TB.
- Extensive experience in migrating on premise Hadoop platforms to cloud solutions using AWS And Azure.
- Experience in analyzing data using Python, R, SQL, Microsoft Excel, Hive, PySpark, Spark SQL for Data Mining, Data Cleansing, Data Munging
- Extensive knowledge in using Elasticsearch and Kibana.
- Good experience in optimizing MapReduce algorithms using Mappers, Reducers, combiners and partitioners to deliver the best results for the large datasets.
- Expertise in using optimization techniques on Hive tables such as Partitioning, Bucketing, and Vectorization.
- Extensive Knowledge of using Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
- Experience in writing Complex HQL queries for filtering the data loaded in Hive tables.
- Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark Streaming and Spark SQL.
- Experience working with multiple data structures in Spark like RDD, Data frames, and Datasets.
- Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
- Hands on experience in Multiplexing, Replicating and Cosolidation with Flume.
- Hands on experience in importing and exporting streaming data into HDFS using stream processing platforms like Flume and Kafka messaging system.
- Worked with Kafka tools like Kafka migration, Mirror maker and Consumer offset checker.
- Good Knowledge in using Stream and Connector API’s in Kafka messaging system.
- Excellent Programming skills at a higher level of abstraction using Scala.
- Experience in writing python as ETL framework and Pyspark to process huge amount of data daily.
- Experience in writing Sqoop jobs to perform incremental updates.
- Exposure to Data Lake Implementation using Apache Spark.
- Designed ETL workflows on Tableau, Deployed data from various sources to HDFS.
- Experience in automated scripts using UNIX shell scripting to perform database activities.
- Good understanding of all aspects of Testing such as Unit, Regression, Agile, White - box, Black-box.
- Experience in MicroStrategy suite (Desktop, OLAP services, Reporting Services, Dashboards and Score cards).
- Experience with Data Modeling, designing Data Marts using Star Schema and Snow Flake Schema, Physical and Logical Tables.
- Experience working on MicroStrategy Semantic modeling.
- Expert in MicroStrategy Architecting such as creating Schema objects (Facts, Attributes, Hierarchies, Transformations) and Public Objects (Metrics, Filters, Prompts, Custom groups, Consolidations, Drill maps and Templates.
- Hands-on development assisting users in creating and modifying worksheets and data visualization dashboards in Tableau.
- Experience working in Dash boarding and Scorecards using MicroStrategy Reporting Suite.
- Extensively worked on Visual Insights for data discovery, data analysis and dashboard prototyping using files based and database sources.
- Extensively worked on creating and publishing Intelligence Cubes.
TECHNICAL SKILLS
Big Data Technologies: HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Apache Kafka, Zookeeper, Ambari, Oozie, MongoDB, Puppet, Avro, Parquet, and Snappy.
NO SQL Databases: HBase, Cassandra, and MongoDB
Graph Databases: Neo4j and Amazon Neptune
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5) and Hortonworks.
Languages: C, C++, Java, Scala, XML, HTML, AJAX, CSS, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Shell Scripting, Python
Java & J2EE Technologies: Spring Framework, Spring-Boot, Core Java, JDBC, Junit, MR-Unit, PigUnit, JQuery, Servlets, ActiveMQ, AMQStreams, RabbitMQ
Operating Systems: UNIX, Red Hat LINUX, Mac OS and Windows Variants
Source Code Control: Github, CVS, SVN
Databases: Teradata, Microsoft SQL Server, MySQL, DB2, Elasticsearch
DB languages: MySQL, PL/SQL, PostgreSQL & Oracle
Build Tools: Jenkins, Maven, ANT, Gradle, Log4j
Business Intelligence Tools: MicroStrategy, Looker, Power BI, Tableau, QlikView, Kibana
Development Tools: Eclipse, IntelliJ, Microsoft SQL Studio, NetBeans
ETL Tools: Talend, Pentaho, Informatica
Development Methodologies: Agile, Scrum, Waterfall
PROFESSIONAL EXPERIENCE
Confidential, San Francisco, CA
Senior Data Engineer (Snowflake Developer)
Responsibilities:
- Involved in various phases ofSoftware Development Life Cycle(SDLC), including requirement gathering, modeling, analysis, architecture design, prototyping, developing and testing
- Designed streaming platform using AMQStreams, Kafka, Camel, Spring, Spark.
- Used staging table to import and synchronize the data from Oracle and Hadoop.
- Used Sqoop to load data for creation of RDD’s, Datasets and Dataframes in Spark SQL.
- Experience in using Avro, Parquet, ORCFile and JSON file formats, developed UDFs in Hive and Pig.
- Used Qlik Replicate to manage the Change Data Capture (CDC) and automate the data loading into HDFS.
- Used Spring Kafka API calls to process the messages smoothly on Kafka Cluster setup
- Worked on. Real-time data streaming and analytics using KStreams
- Helped in the deployment and maintenance of Kafka on Kubernetes
- Written ETL flows and MapReduce to process data from AWS S3 to Hive and HBase.
- Extensive experience managing the Kafka topics.
- CreatedServicesto consumeREST API'sand to communicate between components usingDependency Injectionprovided by Spring Framework.
- Developed server-side application to interact with database usingSpring Boot.
- UsedSwagger, Postmanto test the RESTful API for HTTP requests such as GET, POST, and PUT.
- Performed Reverse Engineering to mapPOJOclasses to database.
- CreatedDAOinterface, abstract class and concrete classes to interact withpersistenceentities.
- Worked on container-based technologies like KubernetesandOpenshift
- Implemented API’s to solve technology problems using NoSQL and graph databases like Neo4j and MongoDB.
- Created custom Serialization and Deserialization methods for data manipulation.
- Involved in planning and implementation of data formats like Protocol buffers.
- Create Pyspark frame to bring data from DB2 to Amazon S3.
- Created data sharing between two snowflake accounts.
- Provide guidance to development team working on PySpark as ETL platform
- Experienced in developing templates and pipelines for Elasticsearch.
- Expert in using data visualization tools like Kibana.
- Redesigned the Views in snowflake to increase the performance.
- Expertise in managing logs through kafka with Logstash
- Experienced in using Docker tools for end-to-end integration testing.
- Used Apache Camel jobs to create data pipelines between multiple Kafka messaging systems.
- Experienced in version control tools likeGITand ticket tracking platforms likeJIRA.
- Expert at handlingUnit TestingusingJunit4, Junit5 and Mockito.
- Developed data warehouse model in snowflake for over 100 datasets using whereScape.
- Creating Reports in Looker based on Snowflake Connections.
- Optimize the Pyspark jobs to run on Kubernetes Cluster for faster data processing
- Strong experience usingMAVEN 3.0Build System.
- Experienced in using Atlassian tools like Jira and bitbucket.
- Architect on logical data models, recognize source tables to build MicroStrategy schema objects including Attributes, Facts, Hierarchies and Relationships.
- Created various public Objects Filters, Matrix, Custom Groups and Consolidations according to the requirements.
- Trained Business users on MicroStrategy Dossiers and MicroStrategy office.
- Created complex metrics using pass through functions, conditional statements, and level dimensionality.
- Created custom Enterprise Manger Dashboards for advanced usage analysis.
- Created Migration packages, integrity manager test cases for validation.
Environment: Spring Framework, Spring Boot, Kafka, Kubernetes, Openshift, Docker, Snowflake, Python, Java (JDK SE 8), Scala, Shell scripting, Linux, MongoDB, Neo4j, Elasticsearch, Kibana, Protocol Buffers, PostgreSQL, Oracle, Qlik Replicate, Git, bamboo, Maven, intelliJ, Atlassian tools and Agile Methodologies, MicroStrategy 11, Tableau and Power BI.
Confidential, Bedminster, NJ
Senior Data Engineer (Hadoop/Spark Developer)
Responsibilities:
- Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Map Reduce, Hive.
- Developed a 16-node cluster in designing the Data Lake with the Cloudera Distribution.
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Using the JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into HIVE tables
- Implemented to reprocess the failure messages in Kafka using offset id
- Written Hive queries on the analyzed data for aggregation and reporting.
- Worked on converting the dynamic XML data for injection into HDFS.
- Implement One time Data Migration of Multistate level data from SQL server to Snowflake by using Python.
- Day to-day responsibility includes developing ETL Pipelines in and out of data warehouse, develop major regulatory and financial reports using advanced SQL queries in snowflake.
- Loading data from UNIX file system to HDFS.
- Participated in building the data lake environment on Hadoop (Cloudera) and building the Campaign Preprocessing and opportunity generation pipeline using Hadoop services such as Hive and Spark.
- AWS Infrastructure setup on EC2 and S3 API implementation for accessing S3 bucket data file.
- Created Databases and tables using Redshift and dynamo DB and written complex EMR scripts to process Tera bytes of data into AWS S3 cluster.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Used several RDD transformation to filter the data injected into Spark SQL.
- Used HiveContext and SQLContext to integrate Hive metastore and Spark SQL for optimum performance.
Environment: Spark SQL, HDFS, Hive, Pig, Apache Sqoop, Python, Java (JDK SE 6, 7), Scala, Shell scripting, Linux, MySQL PostgreSQL, intelliJ, Oracle, SubVersion and Agile Methodologies.
Confidential, Scottsdale, AZ
AWS Data Engineer (Scala Developer)
Responsibilities:
- Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Map Reduce, Hive.
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Was responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python.
- Using the JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into HIVE tables.
- Worked on setting upAWS EMR, EC2 clustersandMulti-Node Hadoop Clusterinside developer environment.
- Implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper
- Implemented to reprocess the failure messages in Kafka using offset id
- Used Hive QL to analyze the partitioned and bucketed data, Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business specification logic.
- Written Hive queries on the analyzed data for aggregation and reporting.
- Developed Sqoop Jobs to load data from RDBMS to external systems like HDFS and HIVE.
- Worked on converting the dynamic XML data for injection into HDFS.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
- Loading data from UNIX file system to HDFS
- Used python Boto3 libraries to configure the services AWS glue, EC2, S3
- Work with Snowflake utilities, Snowflake SQL, Snow Pipe, etc
- Developed complete ETL from end to end using Lamda Function, AWS GLUE DATA BREW.
- Worked on ETL pipeline to source these tables and to deliver this calculated ratio data from AWS to Datamart (SQL Server).
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
- Analyze SQL scripts and design the solution to implement using Pyspark
- Developed Spark code using Python (Pysaprk) for faster processing and testing of data.
- Used SparkAPI to perform analytics on data in Hive Created Databases and tables using Redshift and dynamo DB and written complex EMR scripts to process Tera bytes of data into AWS S3 cluster.
- Responsible for converting row-like regular hive external tables into columnar snappy compressed parquet tables with key-value pairs.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Used several RDD transformation to filter the data injected into Spark SQL.
- Used HiveContext and SQLContext to integrate Hive metastore and Spark SQL for optimum performance.
- Used Control-M to schedule the jobs on a daily basis and validated the jobs.
Environment: AWS,Spark SQL, HDFS, Hive, Pig, Apache Sqoop, Scala, Python, Shell scripting, Linux, MySQL Oracle Enterprise DB, PostgreSQL, intelliJ, Oracle, SubVersion, Control-M, Teradata,ETL and Agile Methodologies.
Confidential, Charlotte, NC
Data Engineer (Hadoop/Spark Developer)
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Used Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark Sessions, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Designed and developed complex ETL data pipelines and maintained the data quality to support a rapidly growing business.
- ETL Data pipelines is Created, Developed and designed by using spark and scala.
- Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Map Reduce, Hive.
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Hands on design and development of an application using Hive (UDF).
- Used HUE for Hive Query execution.
- Developed Data Pipelines, Stastical Models by using ETL Python.
- Used Hive QL to analyze the partitioned and bucketed data, Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business specification logic.
- Built real timedata pipelinesby developingKafkaproducers andSparkstreaming applications for consuming
- Written Hive queries on the analyzed data for aggregation and reporting.
- Developed end to end data processing pipelines that begin with receiving data using distributed messaging systemsKafkathrough persistence of data intoHBase
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Used several RDD transformation to filter the data injected into Spark SQL.
- Good understanding on DAG cycle for entire spark application flow.
- Analyzed the SQL scripts and designed the solution to implement using Pyspark.
- Used HiveContext to integrate Hive metastore and Spark SQL for optimum performance.
- Developed Sqoop Jobs to load data from RDBMS to external systems like HDFS and HIVE.
- Developed multiple POCs using PySpark and deployed on the YARN cluster, compared the performance of Spark, with Hive and SQL.
- Worked on ETL services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.
- Worked in Agile Iterative sessions to create Hadoop Data Lake for the client
- Developed Stored Procedures and Functions, Views for the Oracle database PL/SQL.
- Responsible for generating actionable insights from complex data to drive real business results for various application teams and worked in Agile Methodology projects extensively.
- Worked closely with architects and front-end developers to design data models and coding optimizations to build ingestion and aggregation tables.
Environment: Spark SQL, CDH, HDFS, Hive, Pig, Apache Sqoop, Scala, Shell scripting, Pyspark, Linux, MySQL Oracle Enterprise DB, Eclipse, Oracle, Git, Oozie, Tableau, MySQL, Soap, and Agile Methodologies.
Confidential, Grand Rapids, MI
Business Intelligence Consultant
Responsibilities:
- Interacted with the business users to build the sample report layouts.
- Worked with Data modeling team on logical data models, recognize source tables to build MicroStrategy schema objects including Attributes, Facts, Hierarchies and Relationships.
- Worked on logical data modeling by creating Semantic Layer views.
- Worked extensively on creating Metrics and Compound Metrics, Filters, Custom Groups and Consolidations.
- Used Pass-Through Functions in Attributes, Metrics, and Filters.
- Implemented intelligent cubes for sourcing datasets to build Dashboards.
- Creation of Intelligent cubes and sharing to reduce the database load and decreasing the report execution by using the Cube Services.
- Created Dynamic Enterprise Dashboards and utilized the new features to convert the dashboards into Flash mode.
- Involved in troubleshooting MicroStrategy Web Reports, optimizing the SQL using the VLDB Properties.
- Developed Auto Prompt Filters that give user a choice of different filtering criteria each time they run the filter.
- Used Intelligent Cube datasets to provide performance for dashboards.
- Created and Converted dashboards to suit to MicroStrategy Mobile, especially for iPad.
- Performed Object Manager to deploy the MicroStrategy Objects from the development stage to QA and then to Production environment.
- Used Enterprise Manager to generate reports to analyze the system performance.
- Sliced and diced data using quick filters, targeting graphs, widgets for interactivity using MicroStrategy Visual Insights.
- Created multi layered analyses using layers and panels for showing KPIs using MicroStrategy Visual Insights.
- Hand full experience with Narrowcast which includes creation of Services, Information Objects, Subscription Set, Publication and Schedules as per the requirement.
- Used Cube Advisor to determine the best practices for supporting dynamic sourcing for existing project.
- Tested all the reports by running queries against the warehouse using TOAD. Also Compared those queries with the MicroStrategy SQL engine generated queries.
- Exported data from Teradata using Fast Export utility.
- Experience in using Teradata utilities such as Fast load, Multiload and Fast Export.
- Involved in troubleshooting MicroStrategy prompts, filter, template, consolidations, and custom group objects in an enterprise data warehouse team environment.
- Optimized Query performance by using VLDB settings.
- Created Transformations (table based, expression based) for time based comparative analysis.
- Used Cube Advisor to determine the best practices for supporting dynamic sourcing for existing project.
- Created Dynamic Enterprise Dashboards and utilized the new features to convert the dashboards into Flash mode.
- Worked with ETL team to finalize tables for national level aggregate data for new product launch reporting.
- Worked with the team on the upgrading process from MicroStrategy 9.2.1 to 9.3.1 and to 9.4.1
Environment: MicroStrategy 9.2/9.2.1/9.3.1/9.4.1 (Architect, Desktop, Command Manager, Object Manager, Narrowcast Server, Enterprise Manager) Teradata 14.
Confidential, Greenwich, Connecticut
MicroStrategy Consultant
Responsibilities:
- Worked with multiple Business Users to discuss and finalize Requirements.
- Experienced in Providing LOE for Agile development.
- Worked with Data modeling team and Data Architect on logical data models, recognize source tables to build MicroStrategy schema objects including Attributes, Facts, Hierarchies and Relationships.
- Created intelligent cubes, datasets with derived elements and cube-based dashboards.
- Created various Metrics as Conditional Metrics, Nested Metrics and Level Metrics as recommended.
- Worked with Data Engineering team on data ingestion work and creating aggregated tables.
- Created Aggregate fact tables and Logical views on Database Using SQL.
- Created Schedules to send email notification (PDF Reports, Failure notification, ETC)
- Created various public Objects Filters, Matrix according to the requirements.
- Headed Conversion of existing reports and dashboards from Tableau and Power BI to MicroStrategy.
- Connected to Multiple databases (TERADATA/HANA/HADOOP) for supporting Logistics and Digital.
- Worked with DATA modeling and ETL/DE to create Enterprise Data Model.
- Architected and Solutioned self-service application on Dossiers.
- Trained Business users on Dossiers.
- Experience in Data Acquisition and data visualizations for client and customers.
- Experienced in Enterprise DATA governance.
- Experienced on Click Stream DATA and Adobe Analytics.
- Handled and Planed Migrations.
- Created Custom Views on Teradata.
- Designed and documented the Enterprise Self Service DATA Architecture.
- Installed and configured MicroStrategy Server.
- Created Workspace and content packs for business users to view the developed reports.
- Scheduled Automatic refresh and scheduling Email Subscriptions to send out more then 3000 email every day.
- Created Audit reports using Enterprise Manager on the Usage of MicroStrategy.
- Created MicroStrategy Schema objects and Application objects by creating facts, attributes, hierarchies, reports, filters, metrics and templates using MicroStrategy Desktop.
- Extensiveexperience and knowledge on Advanced Prompts, Conditional, Non-Aggregatable, Metrics, Transformation metrics and Level Metrics for creating complex reports.
Environment: MicroStrategy 11.2/10.11 (Architect, Desktop, Command Manager, Object Manager, Enterprise Manager), SQL Server developer, Teradata, Hadoop, Teradata express.