Data Scientist Resume Dallas, TX. - Hire IT People

PROFESSIONAL SUMMARY:

Over 7+ years of professional experience with about 4 years as a data scientist, and the rest with 3+ years data analyst, statistical analysis using Python, SQL etc.
Strong Experience in Data Analysis, Data Cleaning, Data Migration, Data Integration and Data Conversion.
Machine learning skills to analyze real - world datasets, validate the findings using testing, feature selection, tuning of the algorithm for maximum. Performance.
Deep Learning Experience making use of various Python based frameworks.
Experience with statistical analysis packages (Python) and A/B Testing; develop, validate, evaluate, deploy, and optimize modelling techniques/algorithms that support many aspects of the business.
Experience in Statistical Modeling, Data Mining and Data Visualization.
Worked on performance tuning and query optimization techniques in transactional and data warehouse environments
Effective team player with good oral and written communication skills
Expert programming skills with knowledge of data structures and, worked on various optimization techniques in C++/Python, etc.
Participated in code reviews with managers and team leads to ensure modifications adhere to standards set and simplifying development process.
Application of infrastructure tools such as Docker and scripting languages for model deployment and management.
Good Experience with AWS tools for EC2, and management console to understand system load during data processing for process improvements.
Hands on experience in writing queries in SQL to extract, transform and load (ETL) data from large datasets using Data Staging.

PROFESSIONAL EXPERIENCE:

Confidential

Data Scientist

Responsibilities:

Create statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
Analysis of detailed logical flow chart to object-oriented python language.
Algorithm prototypes for various products using supervised machine learning algorithm via data analysis/simulations.
API development to process tens of TB of data Python on Linux platform in a multithreaded framework.
Rearchitecting the Neural Network pipeline as an IaaS on AWS cloud for accelerating model building on multiple GPUs and for integrating all the application components.
Client facing operations involving product requirement gathering and identification of development goals.
Business Intelligence and data visualization tools to simplify decision making.
Data cleaning to ensure data quality, consistency, integrity using Pandas and Numpy.
Provides input and recommendations on technical issues to BI Engineers, Business & Data Analysts and Data Scientists.
Dockerization of model building process on AWS EC2 to ease out the model deployment, using data connectors to the container for the data feed.

Environment: Machine learning, Neural Networks, AWS, EC2, Digital Ocean, Linux, Python (Scikit-Learn/SciPy/Numpy/Pandas), R, MySQL, Eclipse, PL/SQL, SQL connector, Git, JIRA, NLP.

Confidential

Data Scientist

Responsibilities:

Used Python and Spark to implement different machine learning algorithms including Generalized Linear Model, SVM, Random Forest, Boosting and Neural Network.
Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
Application of CNN to identify images based on torch, Python library on AWS.
Algorithms making use of NVIDIA GPUs on AWS for optimization and scaling up the model building process.
Development activity on Jupyter Notebooks for quick comparison and prototyping.
Neural Network API development and containerization for easy deployment of CNN models on AWS EKS and EC2.
Data transformation - Normalization, standardization and aggregation.
Designed dashboards with various tools for complex reports, including summaries, charts, and graphs to interpret findings to team and stakeholders
Interacting with the ETL, BI teams to understand / support on various ongoing projects.
Generating weekly, monthly reports for various business users according to the business requirements.
Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing. Data Imputation using variant methods in Scikit-learn package in Python.
Organized reports, an app demo, produced rich data visualizations to model data into human-readable form with Matplotlib to show client how prediction can help the business.
Enforced F-Score, AUC/ROC, Confusion Matrix, Precision, and Recall to evaluate different models' performance.
Performed analysis, auditing, forecasting, programming, research, report generation, and software integration for an expert understanding of the current end-to-end BI platform architecture to support the deployed solution.

Environment: Machine learning, AWS, EC2, ELB, Linux, Python (Scikit-Learn/SciPy/Numpy/Pandas), R, MySQL, Eclipse, PL/SQL, SQL connector, Git, JIRA.

Confidential

Data Science

Responsibilities:

Developed GUI using Python and Django for dynamically displaying the test block documentation and other features of python code using a web browser.
Regularly accessing JIRA tool and other internal issue trackers for the Project development.
Cross team coordination for simulation workflow injection and data quality check via feeds from several sources.
Data reporting, utilizing using word, charts, graphs, and other visualizations to present your findings.
Responsible for retrieving data using SQL from the database and perform analysis enhancements.
Addressed overfitting and under fitting by tuning the hyper parameter of the algorithm and by using L1 and L2 regularization.
Created multiple Visualization reports/dashboards using Dual Axes charts, Histograms, Filled map, Bubble chart, Bar chart, Line chart, Tree map, Box and Whisker Plot, Stacked Bar etc.,
Developed multi-tiered ETL pipeline feeds for hundreds of TBs of simulation data generation and deployment into Confidential central databases.
Event analysis to classify fake tracks based on several TB of data using a variety of ML/AI techniques with good precision scores.

Environment: Confidential, HDFS, Linux, Python (3.xy, 2.xy), R, SQL, MongoDB

Confidential

Data Analyst

Responsibilities:

Developed anomaly detection methodologies using various ML based techniques to identify the feature size.
Designed and automated forecasting model with 92% in the domain of NLP.
Reducing the computational overhead and noise reduction by trimming the precision to the extent insights are meaningful.
Data integration: Integration of multiple databases, data cubes, or files.
Developed ad-hoc tests and scripts within the existing frameworks for data validation for identifying trends, and buy-sell opportunities.
Did complex simulations running over several days on the computing cloud for parameter space scanning.

Environment: Tableau, Linux, SQL, SQL Connectors, Python, Git, JavaScript

Confidential, Dallas, TX.

Data Analyst

Responsibilities:

Conducted adaptive pricing to reduce the effort required to do A/B for different markets.
Worked with data profiling to answer business questions by providing insights to business users
Documentation of process workflows like implementation, integration, and reporting services.
Wrote bash scripts for automating the tests and tasks for various services
Worked with large data sets of the order of Tera Bytes for data association pairing and extracting meaning from the results.
Involved in test data preparation and reporting.
Developed data transformation tools from different formats like TSV, JSON, CSV, etc.

Environment: Linux, Bash SQL, SQL Connectors, Python, HTML, JSON, CSS.

TECHNICAL SKILLS:

Programming Languages: Python, C, C++, Bash, Go, JavaScript

Packages: Numpy, SciPy, Pandas, matplotlib, scikit and seaborn, ROOT

Operating Systems: Linux/Unix, Windows

Databases: Relational (MySQL, PostGres), NoSQL (MongoDB, Hive), Cache (REDIS)

Modeling techniques: Predictive Modeling Linear Regression, Logistic, Regression/Cluster analysis

Machine Learning/Artificial Intelligence: Naïve Bayes, Decision Trees, Regression models, random forests, K-means clustering, Market Basket Analysis, Time-series, SVM, Preprocessing

Techniques: AI: Language Processing, Convolutional Neural Networks

Version Control/Issue Tracking: Git, JIRA, GGUS, Jupyter Notebook

We provide IT Staff Augmentation Services!

Data Scientist Resume

Dallas, Tx

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship