Associate Data Engineer Resume Irving, TX Associate Data Engineer Irving, TX - Hire IT People

SUMMARY

8+ years of experience in implementing various Big Data/ Cloud Engineering, Snowflake, Data Warehouse, Data Mart, Data Visualization, Reporting, Data Quality, Data virtualization and Data Science Solutions.
Experience in Data transformation, Data mapping from source to target database schema, Data Cleansing procedures
Adept in programming languages like R and Python including Big Data technologies like Hadoop, Hive
Excellent Knowledge of Relational Database Design, Data Warehouse/OLAP concepts, and methodologies
Experience in designing star schema & Snowflake schema for Data Warehouse, ODS architecture
Expertise in OLTP/OLAP System Study, Analysis and E - R modeling, developing Database Schemas like Star schema and Snowflake schema used in relational, dimensional and multidimensional modeling
Experience in designing, building and implementing complete Hadoop ecosystem comprising of MapReduce, HDFS, Hive, Impala, Sqoop, Oozie, HBase, MongoDB, and Spark and Kafka
Experienced in Data Architecture and data modeling using Erwin, ER-Studio and MS Visio
Experience in coding SQL for developing Procedures, Triggers, and Packages
Experience in creating separate virtual data warehouses with difference size classes in AWS Snowflake
Hands-on experience in bulk loading & unloading data into Snowflake tables using COPY command
Experience with data transformations utilizing SnowSQL in Snowflake
Experience writing spark streaming and spark batch jobs, using spark MLlib for analytics
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS)- Oracle, DB2 and SQL Server and from RDBMS to HDFS
Well experienced in Normalization, De-Normalization and Standardization techniques for optimal performance in relational and dimensional database environments
Solid understanding of AWS, Redshift, S3, EC2 and Apache Spark, Scala process, and concepts
Hands on experience in machine learning, big data, data visualization, R and Python development, Linux, SQL, GIT/GitHub
Experience with data visualization using tools like ggplot, Matplotlib, Seaborn, Tableau and using Tableau software to publish and presenting dashboards, storyline on web and desktop platforms
Experienced in python data manipulation for loading and extraction as well as with python libraries such as NumPy, SciPy and Pandas for data analysis and numerical computations
Hands on experience with RStudio for doing data pre-processing and building machine learning algorithms on different datasets
Experienced working on NoSQL databases like MongoDB and HBase
Extensive working experience with Python including Scikit-learn, SciPy, Pandas, and NumPy developing machine learning models, manipulating and handling data
Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python
Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors, Random Forest, SVM, Bayesian, XG Boost, K-means Clustering, K-Nearest Neighbors) and Statistical Modeling in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression-based models, Hypothesis testing, Factor analysis/ PCA, Ensembles
Implemented machine learning algorithms on large datasets to understand hidden patterns and capture insights
Experienced in building and optimizing big data pipelines, architectures, and data sets using TensorFlow. Data API, Spark, and Hive.

TECHNICAL SKILLS

Big Data: Hadoop, HDFS, Sqoop, Hbase, Hive, MapReduce, Spark, Cassandra, Kafka

Languages: Python (Jupiter Notebook, PyCharm IDE), R, Java, C

Cloud Computing Tools: Snowflake, SnowSQL, AWS, Databricks, GCP, Azure data lake services

ETL tools: TensorFlow. Data API, PySpark

Modelling and Architect Tools: Erwin, ER Studio, Star-Schema, Snowflake-Schema Modelling, FACT and dimension tables, Pivot Tables

Databases: Snowflake Cloud Database, Oracle, MS SQL Server, Teradata, MySQL, DB2

Database Tools: SQL Server Data Tools, Visual Studio, Spotlight, SQL Server Management Studio, Query Analyzer, Enterprise Manager, JIRA, Profiler

Reporting Tools: MS Excel, Tableau, Tableau server, Tableau Reader, Power BI, QlikView

Machine Learning Algorithm's: Logistic Regression, Linear Regression, Support Vector Machines, Decision TreesK-Nearest Neighbors, Random Forests, Gradient Boost Decision Trees, Stacking ClassifiersCascading Models, Naive Bayes, K-Means Clustering, Hierarchical Clustering and Density

PROFESSIONAL EXPERIENCE

Confidential, Irving, TX

Associate Data Engineer

Responsibilities:

Migrated the existing data from Teradata/SQL Server to Hadoop and perform ETL operations on it.
Responsible for loading structured, unstructured, and semi-structured data into Hadoop by creating static and dynamic partitions.
Worked on different data formats such as JSON and performed machine learning algorithms in Python.
Created a task scheduling application to run in an EC2 environment on multiple servers.
Strong knowledge of various Data warehousing methodologies and Data modeling concepts.
Created Hive partitioned tables using Parquet Avro format to improve query performance and efficient space utilization.
Responsibilities include Database Design and Creation of User Database.
Moving ETL pipelines from SQL server to Hadoop Environment.
Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
Implemented a CI/CD pipeline using Jenkins, Airflow for Containers from Docker and Kubernetes.
Used SSIS, NIFI, Python scripts, Spark Applications for ETL Operations to create data flow pipelines and involved in transforming data from legacy tables to Hive, HBase tables, and S3 buckets for handoff to business and Data scientists to create analytics over the data.
Support current and new services that leverage AWS cloud computing architecture including EC2, S3, and other managed service offerings.
Used advanced SQL methods to code, test, debug, and document complex database queries.
Design relational database models for small and large applications.
Designed and developed Scala workflows for data pull from cloud-based systems and applying transformations on it.
The ability to develop reliable, maintainable, efficient code in most of SQL, Linux shell, and Python.
Implemented Apache-spark code to read multiple tables from the real-time records and filter the data based on the requirement.
Stored final computation result to Cassandra tables and used Spark-SQL, spark-dataset to perform data computation.
Used Spark for data analysis and store final computation result to HBase tables.
Troubleshoot and resolve complex production issues while providing data analysis and data validation.

Environment: Teradata, SQL Server, Hadoop, ETL operations, Data Warehousing, Data Modelling, Cassandra, AWS Cloud computing architecture, EC2, S3, Advanced SQL methods, NiFi,Python, Linux, Apache Spark, Scala, Spark-SQL, HBase

Confidential, San Francisco, CA

Data Engineer

Responsibilities:

Gathered, analyzed, and translated business requirements to technical requirements, communicated with other departments to collect client business requirements and access available data
Acquiring, cleaning and structuring data from multiple sources and maintain databases/data systems. Identifying, analyzing, and interpreting trends or patterns in complex data sets
Develop, prototype and test predictive algorithms. Filtering and cleaning data, review reports and performance indicators
Developing and implementing data collection systems and other strategies that optimize statistical efficiency and data quality
Create and statistically analyze large data sets of internal and external data
Working closely with marketing team to deliver actionable insights from huge volume of data, coming from different marketing campaigns and customer interaction matrices such as web portal usage, email campaign responses, public site interaction, and other customer specific parameters
Performed incremental loads as well as full loads to transfer data from OLTP to Data Warehouse of snowflake schema using different data flow and control flow tasks and provide maintenance for existing jobs
Design and implement secure data pipelines into a Snowflake data warehouse from on-premises and cloud data sources
Kafka was used as message broker to collect large data and to analyze the collected data in the distributed system.
Designed ETL process using Talend Tool to load from Sources to Snowflake through data Transformations
Have knowledge on partition of Kafka messages and setting up the replication factors in Kafka Cluster.
Developed Snowpipes for continuous injection of data using event handler from AWS (S3 bucket)
Design and developed end-to-end ETL process from various source systems to Staging area, from staging to Data Marts and data load
Loading data into Snowflake tables from internal stage using SnowSQL
Prepared data warehouse using Star/Snowflake schema concepts in Snowflake using SnowSQL
Responsible for Data Cleaning, features scaling, features engineering by using NumPy and Pandas in Python
Conducted Exploratory Data Analysis using Python Matplotlib and Seaborn to identify underlying patterns and correlation between features
Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems
Used information value, principal components analysis, and Chi square feature selection techniques
Used Python and R scripting by implementing machine algorithms to predict data and forecast data for better results
Experience in developing packages in R studio with a shiny interface
Improve efficiency and accuracy by evaluating model in Python and R
Used Python and R script for improvement of model
Experimented with multiple classification algorithms, such as Random Forest and Gradient boosting using Python Scikit-Learn and evaluated performance on customer discount optimization on millions of customers
Built models using Python and Pyspark to predict probability of attendance for various campaigns and events
Performed data visualization and Designed dashboards with Tableau, and generated complex reports, including charts, summaries, and graphs to interpret findings to team and stakeholders

Environment: Snowflake, SnowSQL, AWS S3, Hadoop, Hive, HBase, Spark, R/R studio, Python- Pandas, Numpy, Scikit-Learn, SciPy, Seaborn, Matplotlib, SQL, Powershell, Machine Learning, Kafka.

We provide IT Staff Augmentation Services!

Associate Data Engineer Resume

Irving Tx Associate Data Engineer Irving, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship