Data Engineer Resume San Ramon, CA - Hire IT People

SUMMARY

5 - year experience of SQL queries on data manipulation using Window functions and sub-queries
Experienced in designing and maintaining relational databases using MySQL, MS SQL Server and PostgreSQL
Experienced in handling and manipulating NoSQL databases using MongoDB
Experienced in data warehouse and ETL (extract, transform and load) technologies
Experienced in web scraping with Python using Beautiful Soup for JSON and XML
Experienced in data cleaning using Python packages like Numpy and Pandas
Experienced in data visualization with Python libraries (matplotlib, Seaborn, Plotly, Bokeh)
Experienced in natural language analysis with Python NLTK and TextBlob libraries
Experienced in AWS Lambda, CloudWatch, DynamoDB, Cloud9 and API Gateway
Experienced in creating dashboards and delivering business insights using Tableau
Extensive knowledge of data mining and analyzing with R
Solid at applying Machine Learning and Deep Learning models, such as Linear Regression, Logistic Regression, SVM (Support Vector Machine), Random Forest, Gradient Boosting, CNN (Convolutional Neural Networks), RNN (Recurrent Neural Networks), KNN
Adapt knowledge of big data tools like Hadoop (HDFS, Hive, MapReduce) and Spark (SparkSQL, Spark MLlib)
Extensive knowledge of A/B Testing
Working knowledge of object and bucket operations on Amazon Web Services S3 and instance operations on Amazon Web Services EC2
Extensive knowledge of Cloud Platform Google Cloud Platform
Experience in creating, manipulating tables and data cleaning on Databricks Cloud
Working knowledge of deploying applications, image control, container management and configurations of Docker registries
Working experiences of developing application of cloud storage operations using Go
Excellent understanding of SDLC (systems development life cycle), Agile and Waterfall
Extensive experience of version control tool Git

TECHNICAL SKILLS

Programming Languages: Python, SQL, Go, R, Java

Database tools: MySQL, Oracle SQL, SQL Server, PostgreSQL, NoSQL, MongoDB

Big data tools: Numpy, Pandas, NLTK, Matplotlib, Seaborn, PyTorch, TensorFlow, scikit-learn, Spark, Hadoop, Hive, Pig, AWS S3, Google Cloud Platform, Databricks, Docker

Data visualization & BI Tools: Tableau, Microsoft Excel, PowerBI, Pentaho, Splunk

Operating systems: MacOS, Windows, Unix/Linux

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential | San Ramon, CA

Responsibilities:

Experienced in AWS Lambda, CloudWatch, Cloud9, API Gateway and other resources
Executed data loading jobs using SQL and Pentaho to load data into databases periodically
Validated data with Oracle SQL and Excel, composed reports and notified BA about the results
Composed documentations for multiple Oracle SQL databases
Designed and implemented monitoring processes using Python for products on AWS Lambda, CloudWatch and DynamoDB
Engaged in software development and testing using Python
Experienced in version control tools using Git
Designed and created Splunk dashboards for monitoring and report purposes
Created and updated jobs in AutoSys and Linux to transfer databases
Experience in working with teams to consolidate database structures and documentations

Data Analyst

Confidential

Responsibilities:

Managed MySQL database administration, and the internal and external MySQL database security
Operated ETL (extract, transform and load) on NoSQL unstructured data
Responsible for data cleaning and filling in missing values with Python packages (Numpy, Pandas)
Engaged in data plotting using Python packages (matplotlib, seaborn, Plotly)
Defined, executed and interpreted simple to complex SQL queries such as join, window functions and sub-queries for diverse business requirements
Used the company’s Python-based infrastructure, designed new APIs using REST with Flask.
Wrote Python code to improve the infrastructure and make API access fast, easy, and reliable
Conducted data visualization using Tableau in collaboration with marketing team and product team
Drew geographical data, average revenue, average spend and designed new features in Tableau
Predicted the business outcomes such as order volumes by splitting data into training and testing datasets for regression model
Worked on Hadoop, Hive, Pig, and Spark to manage data processing and storage for big data applications running in clustered systems
Demonstrated results using charts in Microsoft Excel
Experienced in version control tools using Git

Data Analyst

Confidential | Boston, MA

Responsibilities:

Analyzed user preferences and order information by using Python Pandas library with aggregate functions and lambda functions
Applied Python(pandasql) to find dormant one-time buyers and developed a marketing program targeted dormant buyers and incentive them to purchase again
Used Google BigQuery to query datasets (transactions, accounts) from Google Cloud for future analyzing (complex SQL was used: subqueries, joins, grouping/ aggregate functions)
Tested different SQL syntax to examine the optimal time complexity
Clustered geographical data, average revenue, average spend and designed new dashboards in Tableau
Engaged in deploying applications on AWS EC2
Implemented object and bucket operations as well as multiple parts copy and upload functions on AWS S3
Deployed the storage driver on Docker registry
Integrated with VMware Harbor for better image management and higher security
Worked with Agile team and Waterfall procedure to tailor each part of project within short period
Communicated with client and composed an instruction manual covering installation, configurations and usage information
Experienced in shell scripting for Linux system

Data Analyst

Confidential

Responsibilities:

Gathered and organized 7 GB speech data with Python
Explored data and plotted features using Python packages like LibROSA and Matplotlib
Built and trained a Chinese speech-to-text model based on Convolutional Neural Networks (CNNs) and TensorFlow
Investigated existing 4 speech recognition APIs and compared their performances using Python packages
Applied Python package Matplotlib and Google Data Studio to visualize the datasets (histogram, cumsumChart, Q-Q plot, piechart)
Experience in creating, manipulating tables and data cleaning with Apache Spark on Databricks Cloud
Collected and prepared 10 GB data from NYTimes Archive API using Beautiful Soup
Analyzed readers’ preferences for 3 topics using Python NLTK
Extracted articles from JSON format dataset that mentioned presidential candidates to compare their popularity using Python package glob
Performed sentiment analysis on selected article reviews to predict presidential election result using Python
Generated area plot and scatter plot using Matplotlib
Combine MySQL and Tableau to created and modified Tableau worksheets by performing Table level calculations

We provide IT Staff Augmentation Services!

Data Engineer Resume

San Ramon, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship