Data Engineer And Analyst Resume

Raleigh, NC


Data Science/Data Engineering professional worked at Confidential, and EPA skilled in Data Analysis, Statistics, Operations Research, Big Data, Data Warehousing, ETL | Traits: adaptable, self - reliant, team player


Data Formatting, Cleaning, and Processing: Python Pandas, SciPy, NumPy, SQL, MongoDB, SAS, R, Apache Storm

Optimization and Simulation: IBM ILOG CPLEX, GAMS, Lingo, Simio

Web Scraping: Requests, Scrapy, Selenium, Chrome driver, PhantomJS, Web browser simulation, curl, Postman

Data Structures and Formats: JSON, CSV, XML, HTML, PDF, Plain Text, xlsx

Parsing Technologies: XPath, regex, PDFMiner, PDF APIs like PDFTables, XQuery, IBM Content Analytics, GATE

Data Storage: AWS(S3 REST API), Boto3, FTP, Vertica, SQL Server, DataGrip, RoboMongo

Programming Languages and Data Visualization: Python, Java, Tableau, Jupyter Dashboards, Python matplotlib


Confidential, Raleigh, NC

Data Engineer and Analyst


  • Communicated data stories with data scientists, financial analysts, and portfolio managers through employing data visualization in Tableau and matplot to help them to understand their business problems and leaded them to utilize data by providing insights and recommendations. This also required implementing data modelling.
  • Utilized SQL Server, MongoDB, and Vertica (for terabytes of data) to store and query data in analytical pipelines
  • Designed and Implemented robust data ingestion and ETL solutions for problems in Healthcare, Banking and Mortgaging Institutes, Oil and Gas, Transportation, Entertainment, E-commerce, and Sports fields
  • Developed more than 35 web scrapers in python and Java to run on hourly/daily/weekly/monthly basis to provide data scientists, analysts, and portfolio managers with the highest quality (recent, complete, and correct) data
  • Employed Tableau, Python Pandas, and NumPy to find the outliers and data stories in data
  • Communicated with more than 6 sector data analysts and data scientists from different business groups to understand their data problems to come up with appropriate data modelling and data quality assurance solutions
  • Resolved data quality issues such as timeliness, completeness, and correctness in a team framework through parsing python and Linux logs to be visualized on an internal-developed visualization tool

Confidential , Upper Gwynedd, PA

Data Scientist and Analyst


  • Used machine learning techniques in Python employing SciPy, Pandas, NumPy, and matplotlib
  • Used Python scikit-learn, text mining, and text analytics tools (GATE) for feature extraction and entity resolution
  • Implemented data analytics for optimization and planning problems
  • D 2015 esigned and implemented optimization models for representative planning and sample allocation problems U.S. Environmental Protection Agency, Research Triangle Park, NC, USA Software Developer, Summer Intern
  • Defined a problem in a database management system setting and solved it utilizing Java, JavaFX, and DBM

Quantitative Research


  • Designed, developed, and implemented a personalized product ranking software in Java and GATE using sentiment analysis and opinion mining to extract opinions from a database of customers’ online reviews
  • Developed a recommendation system in Java and MongoDB for a data visualization software (Tableau).
  • Designed and developed an optimization package integrating Java and IBM ILOG CPLEX

