Data Scientist Resume
Dallas, Tx
PROFESSIONAL SUMMARY:
- Over 7+ years of professional experience with about 4 years as a data scientist, and the rest with 3+ years data analyst, statistical analysis using Python, SQL etc.
- Strong Experience in Data Analysis, Data Cleaning, Data Migration, Data Integration and Data Conversion.
- Machine learning skills to analyze real - world datasets, validate the findings using testing, feature selection, tuning of the algorithm for maximum. Performance.
- Deep Learning Experience making use of various Python based frameworks.
- Experience with statistical analysis packages (Python) and A/B Testing; develop, validate, evaluate, deploy, and optimize modelling techniques/algorithms that support many aspects of the business.
- Experience in Statistical Modeling, Data Mining and Data Visualization.
- Worked on performance tuning and query optimization techniques in transactional and data warehouse environments
- Effective team player with good oral and written communication skills
- Expert programming skills with knowledge of data structures and, worked on various optimization techniques in C++/Python, etc.
- Participated in code reviews with managers and team leads to ensure modifications adhere to standards set and simplifying development process.
- Application of infrastructure tools such as Docker and scripting languages for model deployment and management.
- Good Experience with AWS tools for EC2, and management console to understand system load during data processing for process improvements.
- Hands on experience in writing queries in SQL to extract, transform and load (ETL) data from large datasets using Data Staging.
PROFESSIONAL EXPERIENCE:
Confidential
Data Scientist
Responsibilities:
- Create statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
- Analysis of detailed logical flow chart to object-oriented python language.
- Algorithm prototypes for various products using supervised machine learning algorithm via data analysis/simulations.
- API development to process tens of TB of data Python on Linux platform in a multithreaded framework.
- Rearchitecting the Neural Network pipeline as an IaaS on AWS cloud for accelerating model building on multiple GPUs and for integrating all the application components.
- Client facing operations involving product requirement gathering and identification of development goals.
- Business Intelligence and data visualization tools to simplify decision making.
- Data cleaning to ensure data quality, consistency, integrity using Pandas and Numpy.
- Provides input and recommendations on technical issues to BI Engineers, Business & Data Analysts and Data Scientists.
- Dockerization of model building process on AWS EC2 to ease out the model deployment, using data connectors to the container for the data feed.
Environment: Machine learning, Neural Networks, AWS, EC2, Digital Ocean, Linux, Python (Scikit-Learn/SciPy/Numpy/Pandas), R, MySQL, Eclipse, PL/SQL, SQL connector, Git, JIRA, NLP.
Confidential
Data Scientist
Responsibilities:
- Used Python and Spark to implement different machine learning algorithms including Generalized Linear Model, SVM, Random Forest, Boosting and Neural Network.
- Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
- Application of CNN to identify images based on torch, Python library on AWS.
- Algorithms making use of NVIDIA GPUs on AWS for optimization and scaling up the model building process.
- Development activity on Jupyter Notebooks for quick comparison and prototyping.
- Neural Network API development and containerization for easy deployment of CNN models on AWS EKS and EC2.
- Data transformation - Normalization, standardization and aggregation.
- Designed dashboards with various tools for complex reports, including summaries, charts, and graphs to interpret findings to team and stakeholders
- Interacting with the ETL, BI teams to understand / support on various ongoing projects.
- Generating weekly, monthly reports for various business users according to the business requirements.
- Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing. Data Imputation using variant methods in Scikit-learn package in Python.
- Organized reports, an app demo, produced rich data visualizations to model data into human-readable form with Matplotlib to show client how prediction can help the business.
- Enforced F-Score, AUC/ROC, Confusion Matrix, Precision, and Recall to evaluate different models' performance.
- Performed analysis, auditing, forecasting, programming, research, report generation, and software integration for an expert understanding of the current end-to-end BI platform architecture to support the deployed solution.
Environment: Machine learning, AWS, EC2, ELB, Linux, Python (Scikit-Learn/SciPy/Numpy/Pandas), R, MySQL, Eclipse, PL/SQL, SQL connector, Git, JIRA.
Confidential
Data Science
Responsibilities:
- Developed GUI using Python and Django for dynamically displaying the test block documentation and other features of python code using a web browser.
- Regularly accessing JIRA tool and other internal issue trackers for the Project development.
- Cross team coordination for simulation workflow injection and data quality check via feeds from several sources.
- Data reporting, utilizing using word, charts, graphs, and other visualizations to present your findings.
- Responsible for retrieving data using SQL from the database and perform analysis enhancements.
- Addressed overfitting and under fitting by tuning the hyper parameter of the algorithm and by using L1 and L2 regularization.
- Created multiple Visualization reports/dashboards using Dual Axes charts, Histograms, Filled map, Bubble chart, Bar chart, Line chart, Tree map, Box and Whisker Plot, Stacked Bar etc.,
- Developed multi-tiered ETL pipeline feeds for hundreds of TBs of simulation data generation and deployment into Confidential central databases.
- Event analysis to classify fake tracks based on several TB of data using a variety of ML/AI techniques with good precision scores.
Environment: Confidential, HDFS, Linux, Python (3.xy, 2.xy), R, SQL, MongoDB
Confidential
Data Analyst
Responsibilities:
- Developed anomaly detection methodologies using various ML based techniques to identify the feature size.
- Designed and automated forecasting model with 92% in the domain of NLP.
- Reducing the computational overhead and noise reduction by trimming the precision to the extent insights are meaningful.
- Data integration: Integration of multiple databases, data cubes, or files.
- Developed ad-hoc tests and scripts within the existing frameworks for data validation for identifying trends, and buy-sell opportunities.
- Did complex simulations running over several days on the computing cloud for parameter space scanning.
Environment: Tableau, Linux, SQL, SQL Connectors, Python, Git, JavaScript
Confidential, Dallas, TX.
Data Analyst
Responsibilities:
- Conducted adaptive pricing to reduce the effort required to do A/B for different markets.
- Worked with data profiling to answer business questions by providing insights to business users
- Documentation of process workflows like implementation, integration, and reporting services.
- Wrote bash scripts for automating the tests and tasks for various services
- Worked with large data sets of the order of Tera Bytes for data association pairing and extracting meaning from the results.
- Involved in test data preparation and reporting.
- Developed data transformation tools from different formats like TSV, JSON, CSV, etc.
Environment: Linux, Bash SQL, SQL Connectors, Python, HTML, JSON, CSS.
TECHNICAL SKILLS:
Programming Languages: Python, C, C++, Bash, Go, JavaScript
Packages: Numpy, SciPy, Pandas, matplotlib, scikit and seaborn, ROOT
Operating Systems: Linux/Unix, Windows
Databases: Relational (MySQL, PostGres), NoSQL (MongoDB, Hive), Cache (REDIS)
Modeling techniques: Predictive Modeling Linear Regression, Logistic, Regression/Cluster analysis
Machine Learning/Artificial Intelligence: Naïve Bayes, Decision Trees, Regression models, random forests, K-means clustering, Market Basket Analysis, Time-series, SVM, Preprocessing
Techniques: AI: Language Processing, Convolutional Neural Networks
Version Control/Issue Tracking: Git, JIRA, GGUS, Jupyter Notebook
