We provide IT Staff Augmentation Services!

Data Scientist Resume

2.00/5 (Submit Your Rating)

Dallas, Tx

PROFESSIONAL SUMMARY:

  • Over 7+ years of professional experience with about 4 years as a data scientist, and the rest with 3+ years data analyst, statistical analysis using Python, SQL etc.
  • Strong Experience in Data Analysis, Data Cleaning, Data Migration, Data Integration and Data Conversion.
  • Machine learning skills to analyze real - world datasets, validate the findings using testing, feature selection, tuning of the algorithm for maximum. Performance.
  • Deep Learning Experience making use of various Python based frameworks.
  • Experience with statistical analysis packages (Python) and A/B Testing; develop, validate, evaluate, deploy, and optimize modelling techniques/algorithms that support many aspects of the business.
  • Experience in Statistical Modeling, Data Mining and Data Visualization.
  • Worked on performance tuning and query optimization techniques in transactional and data warehouse environments
  • Effective team player with good oral and written communication skills
  • Expert programming skills with knowledge of data structures and, worked on various optimization techniques in C++/Python, etc.
  • Participated in code reviews with managers and team leads to ensure modifications adhere to standards set and simplifying development process.
  • Application of infrastructure tools such as Docker and scripting languages for model deployment and management.
  • Good Experience with AWS tools for EC2, and management console to understand system load during data processing for process improvements.
  • Hands on experience in writing queries in SQL to extract, transform and load (ETL) data from large datasets using Data Staging.

PROFESSIONAL EXPERIENCE:

Confidential

Data Scientist

Responsibilities:

  • Create statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
  • Analysis of detailed logical flow chart to object-oriented python language.
  • Algorithm prototypes for various products using supervised machine learning algorithm via data analysis/simulations.
  • API development to process tens of TB of data Python on Linux platform in a multithreaded framework.
  • Rearchitecting the Neural Network pipeline as an IaaS on AWS cloud for accelerating model building on multiple GPUs and for integrating all the application components.
  • Client facing operations involving product requirement gathering and identification of development goals.
  • Business Intelligence and data visualization tools to simplify decision making.
  • Data cleaning to ensure data quality, consistency, integrity using Pandas and Numpy.
  • Provides input and recommendations on technical issues to BI Engineers, Business & Data Analysts and Data Scientists.
  • Dockerization of model building process on AWS EC2 to ease out the model deployment, using data connectors to the container for the data feed.

Environment: Machine learning, Neural Networks, AWS, EC2, Digital Ocean, Linux, Python (Scikit-Learn/SciPy/Numpy/Pandas), R, MySQL, Eclipse, PL/SQL, SQL connector, Git, JIRA, NLP.

Confidential

Data Scientist

Responsibilities:

  • Used Python and Spark to implement different machine learning algorithms including Generalized Linear Model, SVM, Random Forest, Boosting and Neural Network.
  • Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
  • Application of CNN to identify images based on torch, Python library on AWS.
  • Algorithms making use of NVIDIA GPUs on AWS for optimization and scaling up the model building process.
  • Development activity on Jupyter Notebooks for quick comparison and prototyping.
  • Neural Network API development and containerization for easy deployment of CNN models on AWS EKS and EC2.
  • Data transformation - Normalization, standardization and aggregation.
  • Designed dashboards with various tools for complex reports, including summaries, charts, and graphs to interpret findings to team and stakeholders
  • Interacting with the ETL, BI teams to understand / support on various ongoing projects.
  • Generating weekly, monthly reports for various business users according to the business requirements.
  • Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing. Data Imputation using variant methods in Scikit-learn package in Python.
  • Organized reports, an app demo, produced rich data visualizations to model data into human-readable form with Matplotlib to show client how prediction can help the business.
  • Enforced F-Score, AUC/ROC, Confusion Matrix, Precision, and Recall to evaluate different models' performance.
  • Performed analysis, auditing, forecasting, programming, research, report generation, and software integration for an expert understanding of the current end-to-end BI platform architecture to support the deployed solution.

Environment: Machine learning, AWS, EC2, ELB, Linux, Python (Scikit-Learn/SciPy/Numpy/Pandas), R, MySQL, Eclipse, PL/SQL, SQL connector, Git, JIRA.

Confidential

Data Science

Responsibilities:

  • Developed GUI using Python and Django for dynamically displaying the test block documentation and other features of python code using a web browser.
  • Regularly accessing JIRA tool and other internal issue trackers for the Project development.
  • Cross team coordination for simulation workflow injection and data quality check via feeds from several sources.
  • Data reporting, utilizing using word, charts, graphs, and other visualizations to present your findings.
  • Responsible for retrieving data using SQL from the database and perform analysis enhancements.
  • Addressed overfitting and under fitting by tuning the hyper parameter of the algorithm and by using L1 and L2 regularization.
  • Created multiple Visualization reports/dashboards using Dual Axes charts, Histograms, Filled map, Bubble chart, Bar chart, Line chart, Tree map, Box and Whisker Plot, Stacked Bar etc.,
  • Developed multi-tiered ETL pipeline feeds for hundreds of TBs of simulation data generation and deployment into Confidential central databases.
  • Event analysis to classify fake tracks based on several TB of data using a variety of ML/AI techniques with good precision scores.

Environment: Confidential, HDFS, Linux, Python (3.xy, 2.xy), R, SQL, MongoDB

Confidential

Data Analyst

Responsibilities:

  • Developed anomaly detection methodologies using various ML based techniques to identify the feature size.
  • Designed and automated forecasting model with 92% in the domain of NLP.
  • Reducing the computational overhead and noise reduction by trimming the precision to the extent insights are meaningful.
  • Data integration: Integration of multiple databases, data cubes, or files.
  • Developed ad-hoc tests and scripts within the existing frameworks for data validation for identifying trends, and buy-sell opportunities.
  • Did complex simulations running over several days on the computing cloud for parameter space scanning.

Environment: Tableau, Linux, SQL, SQL Connectors, Python, Git, JavaScript

Confidential, Dallas, TX.

Data Analyst

Responsibilities:

  • Conducted adaptive pricing to reduce the effort required to do A/B for different markets.
  • Worked with data profiling to answer business questions by providing insights to business users
  • Documentation of process workflows like implementation, integration, and reporting services.
  • Wrote bash scripts for automating the tests and tasks for various services
  • Worked with large data sets of the order of Tera Bytes for data association pairing and extracting meaning from the results.
  • Involved in test data preparation and reporting.
  • Developed data transformation tools from different formats like TSV, JSON, CSV, etc.

Environment: Linux, Bash SQL, SQL Connectors, Python, HTML, JSON, CSS.

TECHNICAL SKILLS:

Programming Languages: Python, C, C++, Bash, Go, JavaScript

Packages: Numpy, SciPy, Pandas, matplotlib, scikit and seaborn, ROOT

Operating Systems: Linux/Unix, Windows

Databases: Relational (MySQL, PostGres), NoSQL (MongoDB, Hive), Cache (REDIS)

Modeling techniques: Predictive Modeling Linear Regression, Logistic, Regression/Cluster analysis

Machine Learning/Artificial Intelligence: Naïve Bayes, Decision Trees, Regression models, random forests, K-means clustering, Market Basket Analysis, Time-series, SVM, Preprocessing

Techniques: AI: Language Processing, Convolutional Neural Networks

Version Control/Issue Tracking: Git, JIRA, GGUS, Jupyter Notebook

We'd love your feedback!