Data Engineer Resume
Des Moines, IA
SUMMARY
- Having 10+years of experience as a Data Engineer/ Scientist/ Analyst in Data Science and Analytics domain including Machine Learning, Data Mining and Statistical Analysis.
- Proficient in managing data science projects’ life cycle including data acquisition, cleaning, engineering, feature scaling, engineering, and modelling (Regression Models, Classification and Clustering, Decision Trees, Naive Bayes, Random Forest, Gradient Boosting, SVM, KNN, Neural Networks)
- Strong experience in migrating other databases to Snowflake.
- Work with domain experts, engineers, and other data scientists to develop, implement, and improve upon existing systems.
- Experience in analyzing data using HiveQL
- Participate in design meetings for creation of teh Data Model and provide guidance on best data architecture practices
- Hands - on exposure to big-data technologies like Hadoop, HDFS, Spark and Hive.
- In depth understanding of Hadoop Architecture including YARN and various components such as HDFS Resource Manager, Node Manager, Name Node, Data Node.
- Ability to build machine learning solutions using PySpark for large datasets on Hadoop ecosystem.
- Experience in building PySpark and Spark-Scala applications for interactive analysis, batch processing and stream processing.
- Can develop shell and python scripts to automate Spark jobs and Hive scripts.
- Knowledge of tools like Snowflake, SSIS, SSAS, SSRS to design warehousing applications.
- Experience in data mining, including Predictive Behavior Analysis, Optimization and Customer Segmentation analysis using SAS and SQL.
- Hands on experience in creating real time data streaming solutions using Apache Spark Core, Spark SQL, and Data Frames.
- Experience in Applied Statistics, Exploratory Data Analysis and Visualization using matplotlib, Tableau, Power BI, and Google Analytics.
- Expertise in developing multiple Kafka Producers and Consumers as per teh software requirement specifications.
- Data pipelining using frameworks based on Kafka and Spark and Kafka producer APIs to send live-stream data into various Kafka topics.
- Participates in teh development improvement and maintenance of snowflake database applications
- Having strong experience as Apache Airflow developer.
- Proficiency in Apache Airflow development.
- Strong understanding of data warehouse concepts of ETL tools (like informatica, Pentaho, Apache Airflow)
- Experience in various methodologies like Waterfall and Agile.
- Extensive experience in developing complex stored Procedures/BTEQ Queries.
- In-depth understanding of Data Warehouse/ODS, ETL concept and modeling structure principles
- Build teh Logical and Physical data model for snowflake as per teh changes required
- Define roles, privileges required to access different database objects.
- In-depth noledge of Snowflake Database, Schema and Table structures.
- Define virtual warehouse sizing for Snowflake for different type of workloads.
- Worked with cloud architect to set up teh environment
- Coding for Stored Procedures/ Triggers.
- Designs batch cycle procedures on major projects using scripting and Control
- Develop SQL queries SnowSQL
- Develop transformation logic using snow pipeline.
- Optimize and fine tune queries
- Performance tuning of Big Data workloads.
- Has good Knowledge in ETL and hands on experience in ETL.
- Experience on Migrating SQL database to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.
- Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data. Understand current Production state of application and determine teh impact of new implementation on existing business processes.
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing teh data in In Azure Databricks.
- Operationalize data ingestion, data transformation and data visualization for enterprise use.
- Mentor and train junior team members and ensure coding standard is followed across teh project.
- Help talent acquisition team in hiring quality engineers.
- Experience in real time streaming frameworks like Apache Storm.
- Worked on Cloudera and Hortonworks distribution.
- Progressive experience in teh field of Big Data Technologies, Software Programming and Developing, which also includes Design, Integration, Maintenance.
- Hands-on experience with Snowflake utilities, SnowSQL, SnowPipe, Big Data model techniques using Python / Java.
TECHNICAL SKILLS
Cloud Technologies: Snowflake, SnowSQL, Snowpipe AWS.
Data Warehousing: Snowflake, Redshift, Teradata
DBMS: Oracle, SQL Server, MySQL, Db2
Operating System: Windows, Linux, Solaris, Centos, OS X
IDEs: Eclipse, Netbeans.
Servers: Apache Tomcat Spark, Hive LLAP, Beeline, Hdfs, MapReduce, Pig, Sqoop, HBase, Oozie, Flume
Reporting Systems: Splunk
Hadoop Distributions: Cloudera, Hortonworks
Programming Languages: Scala, Python, Perl, Shell scripting
PROFESSIONAL EXPERIENCE
Confidential
Data Engineer
Responsibilities:
- Evaluate Snowflake Design considerations for any change in teh application.
- Build teh Logical and Physical data model for snowflake as per teh changes required
- Define roles, privileges required to access different database objects.
- Define virtual warehouse sizing for Snowflake for different type of workloads.
- Design and code required Database structures and components
- Build teh Logical and Physical data model for snowflake as per teh changes required
- Experience on working various distributions of Hadoop like Cloudera, HortonWorks and MapR.
- Worked with cloud architect to set up teh environment
- Worked on Oracle Databases, RedShift and Snowflakes
- Define virtual warehouse sizing for Snowflake for different type of workloads.
- Major challenges of teh system were to integrate many systems and access them which are spread across South America; creating a process to involve third party vendors and suppliers; creating authorization for various department users with different roles.
- Developed workflow in SSIS to automate teh tasks of loading teh data into HDFS and processing using hive.
- Develop alerts and timed reports Develop and manage Splunk applications.
- Involved in various Transformation and data cleansing activities using various Control flow and data flow tasks in SSIS packages during data migration
- Applied various data transformations like Lookup, Aggregate, Sort, Multicasting, Conditional Split, Derived column etc.
- Worked on schedule and configuring teh airflow jobs and fixing teh ongoing issues.
- Work with multiple data sources.
- Developed Mappings, Sessions, and Workflows to extract, validate, and transform data according to teh business rules using Informatica.
- Worked with Various HDFS file formats like Avro, Sequence File and various compression formats like snappy, GZip.
- Worked on data ingestion from Oracle to hive.
- Involved in fixing various issues related to data quality, data availability and data stability.
- Worked in determining various strategies related to data security.
- Performance monitoring and Optimizing Indexes tasks by using Performance Monitor, SQL Profiler, Database Tuning Advisor and Index tuning wizard.
- Worked on Hue interface for Loading teh data into HDFS and querying teh data.
- Designed and Created Hive external tables using shared Meta-store instead of derby with partitioning, dynamic partitioning and buckets.
- Wrote scripts and indexing strategy for a migration to Confidential Redshift from SQL Server and MySQL databases
- Used spark-sql to create Schema RDD and loaded it into Hive Tables and handled structured data using Spark SQL.
- Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift
- Used JSON schema to define table and column mapping from S3 data to Redshift
- Involved in converting Hive/SQL quires into Spark transformation using Spark RDDs.
- Used Avro, Parquet and ORC data formats to store in to HDFS.
Confidential, Des Moines, IA
Sr. Snowflake Data Engineer
Responsibilities:
- Designed and implemented scalable infrastructure and platform for large amounts of data ingestion, aggregation, integration and analytics in Hadoop, including Spark, Hive, Hbase.
- Creating job workflows using Oozie scheduler.
- Involved in designing, developing, testing and documenting an application to combine personal loan, credit card and mortgage from different countries and load data to Sybase database from hive database for Reporting insights.
- Developing an architecture to move teh project from Abinitio to pyspark and Scala spark.
- Implemented enterprise grade platform (Mark logic) for ETL from mainframe to NoSQL(Cassandra).
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing teh data in In Azure Databricks.
- Used Chef for configuration management of hosted Instances within GCP. Configuring and Networking of Virtual Private Cloud (VPC).
- Using Sqoop to load data from HDFS, Hive, MySQL and many other sources on daily bases.
- Creating MapReduce programs to enable data for transformation, extraction and aggregation of multiple formats like Avro, Parquet, XML, JSON, CSV and other compressed file formats.
- Use Python, Scala programming on a daily basis to perform transformations for applying business logic.
- Writing Hive Queries in Spark-SQL for analysis and processing teh data.
- Setting up Hbase column-based storage repository for archiving data on daily bases.
- Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
- Using Enterprise data lake to support various use cases including Analytics, Storing and reporting of Voluminous, structured and unstructured, rapidly changing data.
- Exported teh analyzed data into relational databases using Sqoop for visualization and to generate reports for teh BI team.
- Converting data load pipeline algorithms written in python and SQL to Scala spark and pyspark.
- Mentor and support other members of teh team (both on-shore and off-shore) to assist in completing tasks and meet objectives.
Confidential
Snowflake Data Engineer
Responsibilities:
- Served as teh Snowflake Database Administrator responsible for leading teh data model design and database migration deployment production releases to endure our database objects and corresponding metadata were successfully implemented to teh production platform environments; (Dev, Qual and Prod) AWS Cloud (Snowflake).
- Performed day-to-day integration with teh Database Administrators (DBA) DB2, SQL Server, Oracle and AWS Cloud teams to ensure teh insertion of database tables, columns and its metadata has been successfully implemented out to teh DEV, QUAL and PROD region environments in AWS Cloud - Aurora and Snowflake.
- Performed ETL data translation using informatica of functional requirements to Source to Target Data Mapping documents to support large datasets (Big Data) out to teh AWS Cloud databases; Snowflake and Aurora.
- Performed logical and physical data structure designs and DDL generation to facilitate teh implementation of database tables and columns out to teh DB2, SQL Server, AWS Cloud (Snowflake) and Oracle DB schema environment using ERwin Data Modeler Model Mart Repository version 9.6.
- Assisted Project Managers and Developers in performing ETL solution design and development to produce reporting, dashboarding and data analytics deliverables.
- Technical Team Member of teh T. Rowe Price Information Architect-Data Modeling Agile team; responsible for developing Enterprise Conceptual, Logical and Physical Data Models; Data Dictionary, supporting teh three Business Units: Retirement Plan Services (RPS), Shared Support Platforms and Global Investment Services (GIS).
Hadoop & Snowflake Data Engineer
Confidential, Charlotte, NC
Responsibilities:
- Involved in Migrating Objects using teh custom ingestion framework from variety of sources such as Oracle, SAP/HANA, MongoDB, & Teradata.
- Created Snow pipe for continuous data load from staged data residing on cloud gateway servers.
- Used COPY to bulk load teh data.
- Using FLATTEN table function to produce lateral view of VARIENT, OBJECT and ARRAY column.
- Working with both Maximized and Auto-scale functionalities while running teh multi-cluster warehouses.
- Using Temporary and Transient tables on different datasets.
- Sharing sample data using grant access to customer for UAT/BAT.
- Used Snowflake time travel feature to access historical data.
- Heavily involved in testing Snowflake to understand best possible way to use teh cloud resources.
- Working on migration of jobs from Tidal to Control-M & creating new scheduled jobs in Control-M.
- Worked on analyzing data using hive.
- Orchestrating scrum calls for couple of functions in Supply Chain to track teh project progress.
Environment: Snowflake Web UI, Snow SQL, Hadoop MapR 5.2, Hive, Hue, Toad 12.9, Share point, Control-M, Tidal, ServiceNow, Teradata Studio, Oracle 12c, Tableau.
Hadoop Developer
Confidential, Oak Brook, IL
Responsibilities:
- Developed Hive, and Bash scripts for source data validation and transformation. Automated data loading into HDFS and Hive for pre-processing teh data using One Automation.
- Gather data from Data warehouses in Teradata and Snowflake.
- Developed Spark/Scala, and Python for regular expression projects in teh Hadoop/Hive environment.
- Designed and implemented an ETL framework to load data from multiple sources into Hive and from Hive into Teradata.
- Experience in building Big Data applications using Cassandra and Hadoop.
- Utilized SQOOP, ETL, and Hadoop Filesystems APIs for implementing data ingestion pipelines
- Worked on Batch data of different granularity ranging from hourly, daily to weekly, and monthly.
- Hands-on experience in Hadoop administration and support activities for installations and configuring Apache Big Data Tools and Hadoop clusters using Cloudera Manager.
- Handled Hadoop cluster installations in various environments such as Unix, Linux, and Windows.
- Assisted in upgrading, configuration, and maintenance of various Hadoop infrastructures like Ambari, PIG, and Hive.
- Developing and writing SQLs and stored procedures in Teradata. Loading data into a snowflake and writing Snow SQLs scripts
- TDCH scripts for a full and incremental refresh of Hadoop tables.