Data Engineer Resume
5.00/5 (Submit Your Rating)
San Ramon, CA
SUMMARY
- 5 - year experience of SQL queries on data manipulation using Window functions and sub-queries
- Experienced in designing and maintaining relational databases using MySQL, MS SQL Server and PostgreSQL
- Experienced in handling and manipulating NoSQL databases using MongoDB
- Experienced in data warehouse and ETL (extract, transform and load) technologies
- Experienced in web scraping with Python using Beautiful Soup for JSON and XML
- Experienced in data cleaning using Python packages like Numpy and Pandas
- Experienced in data visualization with Python libraries (matplotlib, Seaborn, Plotly, Bokeh)
- Experienced in natural language analysis with Python NLTK and TextBlob libraries
- Experienced in AWS Lambda, CloudWatch, DynamoDB, Cloud9 and API Gateway
- Experienced in creating dashboards and delivering business insights using Tableau
- Extensive knowledge of data mining and analyzing with R
- Solid at applying Machine Learning and Deep Learning models, such as Linear Regression, Logistic Regression, SVM (Support Vector Machine), Random Forest, Gradient Boosting, CNN (Convolutional Neural Networks), RNN (Recurrent Neural Networks), KNN
- Adapt knowledge of big data tools like Hadoop (HDFS, Hive, MapReduce) and Spark (SparkSQL, Spark MLlib)
- Extensive knowledge of A/B Testing
- Working knowledge of object and bucket operations on Amazon Web Services S3 and instance operations on Amazon Web Services EC2
- Extensive knowledge of Cloud Platform Google Cloud Platform
- Experience in creating, manipulating tables and data cleaning on Databricks Cloud
- Working knowledge of deploying applications, image control, container management and configurations of Docker registries
- Working experiences of developing application of cloud storage operations using Go
- Excellent understanding of SDLC (systems development life cycle), Agile and Waterfall
- Extensive experience of version control tool Git
TECHNICAL SKILLS
Programming Languages: Python, SQL, Go, R, Java
Database tools: MySQL, Oracle SQL, SQL Server, PostgreSQL, NoSQL, MongoDB
Big data tools: Numpy, Pandas, NLTK, Matplotlib, Seaborn, PyTorch, TensorFlow, scikit-learn, Spark, Hadoop, Hive, Pig, AWS S3, Google Cloud Platform, Databricks, Docker
Data visualization & BI Tools: Tableau, Microsoft Excel, PowerBI, Pentaho, Splunk
Operating systems: MacOS, Windows, Unix/Linux
PROFESSIONAL EXPERIENCE
Data Engineer
Confidential | San Ramon, CA
Responsibilities:
- Experienced in AWS Lambda, CloudWatch, Cloud9, API Gateway and other resources
- Executed data loading jobs using SQL and Pentaho to load data into databases periodically
- Validated data with Oracle SQL and Excel, composed reports and notified BA about the results
- Composed documentations for multiple Oracle SQL databases
- Designed and implemented monitoring processes using Python for products on AWS Lambda, CloudWatch and DynamoDB
- Engaged in software development and testing using Python
- Experienced in version control tools using Git
- Designed and created Splunk dashboards for monitoring and report purposes
- Created and updated jobs in AutoSys and Linux to transfer databases
- Experience in working with teams to consolidate database structures and documentations
Data Analyst
Confidential
Responsibilities:
- Managed MySQL database administration, and the internal and external MySQL database security
- Operated ETL (extract, transform and load) on NoSQL unstructured data
- Responsible for data cleaning and filling in missing values with Python packages (Numpy, Pandas)
- Engaged in data plotting using Python packages (matplotlib, seaborn, Plotly)
- Defined, executed and interpreted simple to complex SQL queries such as join, window functions and sub-queries for diverse business requirements
- Used the company’s Python-based infrastructure, designed new APIs using REST with Flask.
- Wrote Python code to improve the infrastructure and make API access fast, easy, and reliable
- Conducted data visualization using Tableau in collaboration with marketing team and product team
- Drew geographical data, average revenue, average spend and designed new features in Tableau
- Predicted the business outcomes such as order volumes by splitting data into training and testing datasets for regression model
- Worked on Hadoop, Hive, Pig, and Spark to manage data processing and storage for big data applications running in clustered systems
- Demonstrated results using charts in Microsoft Excel
- Experienced in version control tools using Git
Data Analyst
Confidential | Boston, MA
Responsibilities:
- Analyzed user preferences and order information by using Python Pandas library with aggregate functions and lambda functions
- Applied Python(pandasql) to find dormant one-time buyers and developed a marketing program targeted dormant buyers and incentive them to purchase again
- Used Google BigQuery to query datasets (transactions, accounts) from Google Cloud for future analyzing (complex SQL was used: subqueries, joins, grouping/ aggregate functions)
- Tested different SQL syntax to examine the optimal time complexity
- Clustered geographical data, average revenue, average spend and designed new dashboards in Tableau
- Engaged in deploying applications on AWS EC2
- Implemented object and bucket operations as well as multiple parts copy and upload functions on AWS S3
- Deployed the storage driver on Docker registry
- Integrated with VMware Harbor for better image management and higher security
- Worked with Agile team and Waterfall procedure to tailor each part of project within short period
- Communicated with client and composed an instruction manual covering installation, configurations and usage information
- Experienced in shell scripting for Linux system
Data Analyst
Confidential
Responsibilities:
- Gathered and organized 7 GB speech data with Python
- Explored data and plotted features using Python packages like LibROSA and Matplotlib
- Built and trained a Chinese speech-to-text model based on Convolutional Neural Networks (CNNs) and TensorFlow
- Investigated existing 4 speech recognition APIs and compared their performances using Python packages
- Applied Python package Matplotlib and Google Data Studio to visualize the datasets (histogram, cumsumChart, Q-Q plot, piechart)
- Experience in creating, manipulating tables and data cleaning with Apache Spark on Databricks Cloud
- Collected and prepared 10 GB data from NYTimes Archive API using Beautiful Soup
- Analyzed readers’ preferences for 3 topics using Python NLTK
- Extracted articles from JSON format dataset that mentioned presidential candidates to compare their popularity using Python package glob
- Performed sentiment analysis on selected article reviews to predict presidential election result using Python
- Generated area plot and scatter plot using Matplotlib
- Combine MySQL and Tableau to created and modified Tableau worksheets by performing Table level calculations