Data Analyst/data Engineer Resume
Miliptas, CA
SUMMARY:
- 5 years’ experience as data analyst and 3 years’ experience data engineer with a wide range knowledge of big data analysis, data cleaning/visualizing/exploring, building ETL pipelines, and cloud computing
- Experience identifying, analyzing and interpreting trends and patterns in complex datasets, drawing insights to answer business questions and identify opportunities for improvement
- Worked on data cleaning, data exploratory analysis, data visualizing and data mining using SQL/ Python/ R
- Advanced knowledge of employing Tableau to design interactive dashboards and reports from different databases and display key metrics and data driven insights to managers, other staff
- Experience defining key metrics to assess overall business performance and delivered an in - depth analysis and recommend
- Experience building ETL Pipelines to query data from relational databases like MySQL/ SQLite/ SQL Server and nonrelational databases like MongoDB, scheduling in a specific order through airflow
- Hands on experience of clouding computing, like using AWS and Google Cloud to run big data analysis
- Knowledge of database design, like create views/ tables, manage access permissions to databases/ tables/ views, normalize/de-normalize databases, partition tables
- Hands on experience of implementing Hadoop to handling big data processing and ETL with Hive, Spark
- Experience of developing Machine Learning Models with Linear Regression, Logistic Regression, Decision Tree, K-Nearest Neighbor, Support Vector Machines, Random Forests, Boosting, K-means
- Experience collaborating with marketing and strategy teams to identify business goals and transformed the business goals into IT specifications
- Hands on experience of Git to track changes in different versions of source code and coordinating work on those versions with other peoples
- Experience of using Jira to capture/track/resolve issues, custom workflows and projects configuration
SKILLS
Programming Languages: SQL | Python (NumPy, Pandas, Statsmodels, Matplotlib, Seaborn, Folium, spaCy, Scikit-Learn, BeautifulSoup, Pyspark, Flask, TensorFlow) | R (dplyr, ggplot2, Shiny, lme4)
Databases: MySQL | SQL Server | SQLite | MongoDB
Tools: Tableau | Hadoop (Hive, Spark) | Visual Studio | Jupyter Notebook | RStudio | Git | Jira | Microsoft Office (Word, Excel, PowerPoint)
Clouds: AWS (S3, EC2, DynamoDB) | Google Cloud Platform (BigQuery, Cloud Storage, Kubernetes)
Transferable skills: teamwork ability | critical thinking and problem solving | attention to detail | self-motivated individual | written and verbal communication
Machine Learning: Linear Regression | Logistic Regression | SVM | K-Nearest Neighbors | Naïve Bayes | Random Forest | Gradient Boosting | K-means | Time Series | NLP
PROFESSIONAL EXPERIENCE:
Confidential, Miliptas, CA
Data Analyst/Data Engineer
Responsibilities:
- Created database objects like tables, views, procedures, and functions using SQL to provide definition, structure and to maintain data efficiently
- Created dashboards and interactive charts using Tableau to provide insights for managers and stakeholders and enable decision-making for market development
- Worked on designing ETL pipelines to retrieve the dataset from MySQL and MongoDB into AWS S3 bucket, managed bucket and objects access permission
- Performed data cleaning and wrangling using Python with a cluster computing framework like Spark
- Ability to manage multiple project tasks with changing priorities and tight deadlines in Agile environment
- Employed statistical analysis with R to examine hypothesis assumptions and choose features for machine learning
- Worked with cross-functional team, designed, developed and implemented a BI solution for marketing strategies
- Implemented Feature Engineering in Spark to tokenize text data, transform features with scaling, normalization and imputation
- Involved in building machine learning pipelines to do customer segmentation with Spark, clustered with PCA and K-means, and assisted the Data Scientist team to implement association rules mining
- Developed presentations using MS PowerPoint for internal and external audiences
Confidential
Data Analyst/Data Engineer
Responsibilities:
- Collaborated with the Engineer team to design and maintain MySQL databases for storing and retrieving customer review data
- Employed SQL to build ETL Pipelines that filter, aggerate and join various tables to retrieve the desired data from MySQL databases
- Ingested data, explored, cleaned and integrated data from MySQL and MongoDB databases on AWS EC2 using Python and Hadoop to perform initial investigation, discover patterns, and check assumptions
- Provided BI Analysis for the marketing team to review impact on key metrics in relation to the project
- Used R to query the data, run statistical analysis and create reports or dashboards
- Prepared project progress reports and status reports and submitted to the management team on an ongoing basis
- Built compelling visualizations and dashboards using Tableau to deliver actionable insights
- Employed feature engineering pipelines with Python to do normalization and scaling for numerical features, and tokenizing for categorial features, implemented PCA to reduce the dimensions
- Contributed in building Machine Learning models with scikit-learn library in Python, like Logistic Regression model, SVMs model, Random Forest model, and Naive Bayes model
Confidential
Data Analyst
Responsibilities:
- Collaborated with data managers to define and implement data standards and common data elements for data collection
- Built ETL Pipeline using SQL to query telecom data from MySQL database by filtering, joining and aggerating various tables
- Used Tableau to design and maintain reports and dashboards to track and communicate customer churn prediction performance
- Manipulated the raw data with NumPy and Pandas library in Python for data cleaning, exploratory analysis and feature engineering
- Generated interactive charts with Matplotlib and Seaborn library in Python for exploring and explaining data
- Collaborated with Marketing Managers to identify root causes of customer discontent and constructing dashboards that reflect these predictions
- Designed A/B tests to identify variables that contributed to customer churn and used Shiny library in R to turn analyses into dashboards
- Applied data mining in Spark to extract diverse features that provide additional information to enhance the churn prediction
- Supported in constructing machine learning models using scikit-learn library in Python to predict customer churn, including Decision Tree Model, Random Forest Model, Gradient Boost Model
Confidential
Data Analyst Intern
Responsibilities:
- Assisted in the maintenance of all MySQL database applications, and resolved database related issues that are submitted to the help desk ticketing system
- Retrieve raw data, applied data cleaning, transforming and exploratory analysis with NumPy and Pandas library in Python
- Employed Matplotlib library in Python to monitoring and analyzing Weekly/Monthly/Yearly sales data to identify market trends and patterns
- Designed and conducted statistical analysis with lme4 library in R to identify and remediate data quality/integrity issues and to recognize metrics used to monitor product performance
- Developed dashboards and frameworks with Tableau to monitor business and product performance
- Supported business lead for special assignments and to ensure production efficiency
