Data Analyst Resume
4.00/5 (Submit Your Rating)
SUMMARY
Data analyst with a diverse background in data engineering, statistical modeling, software development, and project management.
TECHNICAL SKILLS
- Bayesian Hierarchical Modeling
- Generalized/Linear Mixed Modeling
- AWS (Lambda, EC2, S3, Spot Instance)
- R, Python, Bash, Linux, SQL, PL/SQL
- Web scraping (R, Python, Selenium)
- Agile, Git (version control), Docker
- GIS/QGIS
- Cluster Computing
- Stochastic & Numerical Simulations
- PostgreSQL, MS SQL
PROFESSIONAL EXPERIENCE
Data Analyst
Confidential - Houston, TX
- Prototyped mixed model regressions to improve short and long term forecasting methodologies.
- Collaborated with domain experts to develop algorithms that perform spatial analyses quantifying oil and gas risking metrics.
- Developed a data pipeline to aggregate time series data on oil rigs for inferring movement patterns, drilling efficiencies, and utilization rates (written with Python and Selenium).
- Aligned separate data sources using fuzzy matching methods (set/sort tokenization) to couple oil well production data with time series data on rig locations for inferring individual oil rig production rates.
- Developed API endpoints to isolate logic for data transformations using Python and Flask.
- Currently migrating the data pipeline to be serverless using S3, AWS Lambda, and AWS Fargate (i.e. write cloud formation templates for resources and functions, deploy lambda code, create containers and bash scripts to manage long running processes).
- Established ETL processes for aggregating internal data sources (i.e. migrating data out of databases, scraping internal workbooks, and xml data archives).
Data Analyst
Confidential - Houston, TX
- Developed protocols and methods to sort, identify, and compile 150 disparate time series datasets (up to 30 years in length) into a PostgreSQL database.
- Designed, maintained, and updated database records, schemas, and handled database migrations on local and remote servers (Linux & AWS).
- Developed ETL tools with graphical interfaces to transform raw data into a normalized and standard format (using Python, Pandas, SQLAlchemy, and PyQt).
- Automated processes for quality assurance checks, implemented prototypes, tested code, assessed product performance, and updated products as necessary.
Graduate Student (National Science Foundation Fellow)
Confidential - Houston, TX
- Parallel processed 10 GB of data to numerically analyze a theoretical model regarding species extinction and persistence.
- Used power analyses to determine experimental sample size requirements.
- Developed novel hierarchical Bayesian models (linear and logistic regression) for analyses.
- Used stochastic simulations to assess the accuracy and precision of novel statistical methods.
- Developed methods to visualize complex conditions of theoretical models and results.
- Presented work in oral form at local and national conferences as well as with written manuscripts.