Data Scientist Resume
New York, NY
SUMMARY:
- Data scientist with 6 years of experience in transforming business requirements into actionable data models, prediction models and informative reporting solutions.
- Experience in developing business solutions and generating data - driven ideas working in different industries and platforms.
- Expert in the entire Data Science process life cycle including Data Acquisition, Data Preparation, Data Manipulation, Feature Engineering, Machine Learning Algorithms, Validation and Visualization.
- Strong knowledge in Statistical methodologies such as Hypothesis Testing, Principal Component Analysis (PCA), Sampling Distributions and Time Series Analysis.
- Proficient in Python and its libraries such as Numpy, Pandas, Scikit-learn, Matplotlib and Seaborn.
- Expert in preprocessing data in Pandas using visualization, data cleaning and engineering methods such as looking for Correlations, Imputations, Scaling and Handling Categories
- Experience in building various machine learning models using algorithms such as Linear Regression, Gradient Descent, Support Vector Machines (SVM), Logistic Regression, KNN, Decision Tree, Ensembles such as Random Forrest, AdaBoost, Gradient Boosting Trees.
- Experience in working with Hadoop Big Data tools such as HDFS, Hive, Pig Latin and Spark.
- Experience in using cloud services like Amazon Web Services (AWS) such as EC2, S3 to work with different virtual machines.
- Experience in tuning algorithms using methods such as Grid Search, Randomized Search, K-Fold Cross Validation and Error Analysis.
- Experience in Unsupervised learning working on social network datasets using K-means Clustering and Dimension Reduction methods.
- Expertise in implementing writing and optimizing the HiveQL queries .
- Experience of building machine learning solutions using PySpark for large sets of data on Hadoop ecosystem.
- Experience of building and publishing interactive reports and dashboards with design customizations based on the stakeholders needs in Tableau.
- Deep knowledge of SQL languages for writing Queries, Stored Procedures, User-Defined Functions, Views, Triggers, Indexes and etc.
- Experience in developing and designing ETL packages and reporting solutions using MS BI Suite (SSIS/SSRS) and Tableau.
- Knowledge and experience in agile environments such as Scrum and using project management tools like Jira/Confluence and version control tools such as Github/Git.
- Quick learner in any new business industries or software environments to deliver the best solutions adapted to new requirements and challenges
- Driven by delivering the best results and taking the ownership of work while being an effective communicator with teammates and a constant learner in this changing technology.
TECHNICAL SKILLS:
Languages: Python 2.7/3.3 - SQL (T-SQL, MySQL) - C - Java.
BI Tools: Tableau 9.x/10.x - MS Suite 2012 (SSIS/SSRS) - MS Excel
Big Data Tools: Hadoop 2 (HDFS, Hive, Pig) - Spark 2.0 (PySpark).
Python Libraries: Numpy - Pandas - Scikit-learn - Matplotlib - Seaborn.
Operating Systems: Windows 7/8/10 - Linux - Mac.
Other Tools: Git/Github - JIRA/Confluence AWS.
ML Algorithms: Linear Regressions, Logistic Regression, KNN, SVM, Decision Trees, Na ve Bayes, Stochastic Gradient Descent, Random Forrest, AdaBoost, Gradient Boosting Trees, PCA, LDA, K-Means.
PROFESSIONAL EXPERIENCE:
Confidential, New York, NY
Data Scientist
Responsibilities:
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization to deliver data science solutions.
- Built a model to predict direct mail marketing campaign for loan pre-approvals to optimize the number of letters being sent.
- Worked on fraud detection analysis on payments transactions using the history of transactions with supervised learning methods.
- Collected data using Hadoop tools to retrieve the data required for building models such as Hive and Pig Latin.
- Worked on Amazon Web Services cloud virtual machine to do machine learning on big data.
- Developed Spark Python modules for machine learning & predictive analytics in Hadoop.
- Implemented a Python-based distributed random forest via PySpark.
- Used Pandas, Numpy, Seaborn, Matplotlib, Scikit-learn in Python for developing various machine learning models and utilized algorithms such as Linear regression, Logistic regression, Gradient Boosting, SVM and KNN
- Used cross-validation to test the models with different batches of data to optimize the models and prevent overfitting.
- Used PCA and other feature engineering techniques for high dimensional datasets while maintaining the variance of most important features.
- Created Transformation Pipelines for preprocessing large amount of data with methods such as imputing, scaling, selecting and etc.
- Performed Ensemble methods to increase the accuracy of the training model with different Bagging and Boosting methods
Technology Stack: Hadoop 2.x, HDFS, Hive, Pig Latin, PySpark, Python 3.x (Numpy, Pandas, Scikit-learn, Matplotlib), Jupyter, Github, Linux
Confidential, Chicago, IL
Data Scientist
Responsibilities:
- Designed and developed machine learning models to improve advertising agencies programmatic strategies for optimal biddings of impression opportunities.
- Worked on unsupervised segmentation, targeting based on the social network activities and finding clusters of user groups using k-means method
- Worked with high dimensional data sets retrieved from users, media agencies or third-party app and used methods such as PCA, LDA and Kernel Approximations.
- Used different feature engineering methods in Python (Pandas, Numpy, Matplotlib, Seaborn) to cleanse high dimensional datasets and prepare it for modeling.
- Developed and Supervised classification models to predict if the users will click on certain ads. Using algorithms such as Stochastic Gradient Descent (SGD), Logistic Regression, Random Forest, SVM and more.
- Analyzed and visualized different segments of users to understand their advertisement behaviors better with Tableau.
- Helped the team to generate more ideas and ask new questions about the dataset to improve our accuracies and gain more insights
- Worked in an agile environment using Jira for ticketing and confluence for documentation
Technology Stack: Python 3.x (Pandas, Numpy, Matplotlib, Seaborn), Jira/Confluence, Tableau 9.x, Github, Jupyter, Linux
Confidential, Chicago, IL
Data Analyst/Data Scientist
Responsibilities:
- Worked with data scientists and the research team to gain valuable insights
- Worked with Amazon EC2 based cloud-hosted architecture systems to provide solutions for client.
- Developed a portfolio optimization system for longing strategies using stocks history, budget and number of days for investment as input to offer the most profitable stocks to buy using Python and Scikit-learn
- Used Time Series Analysis to gain insight about stocks and derivate investments based on the historic data available
- Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis.
- Used big data technologies to access and extract data with tools such as Apache Hadoop HDFS, Hive and Pig
- Good Knowledge in AWS Environment for loading data files from the cloud servers.
- Preprocessed the data based on the trading strategy and created new financial features in the dataset.
- Collaborated with business leaders to analyze problems optimize processes and build presentation dashboards.
- Created ad-hoc reports about stocks, futures, options and other investments in Tableau.
- Implemented data refreshes on Tableau Server for biweekly and monthly increments based on business change to ensure that the views and dashboards were displaying the changed data accurately.
Technology Stack: Apache Hadoop 2, Hive, Pig Latin, Linux, SQL, Tableau, Python 3.2 (Numpy, Pandas, Scikit-learn, Matplotlib), AWS
Confidential, Jersey City, NJ
Database/BI Developer
Responsibilities:
- Became an expert in the company’s database and business model and was actively part of gathering user/project requirements from different stakeholders to convert into documentations required for the project in hand.
- Extracted data using T-SQL in SQL server to write Queries, Stored procedures, Triggers, Views, Temp Tables and User-Defined Functions (UDFs).
- Designed and developed ETL packages using SSIS to create Data Warehouses from different tables and file sources like Flat and Excel files.
- Used different methods in SSIS such as derived columns, aggregations, Merge joins, count, conditional split and more to transform the data.
- Developed reporting solutions for different stakeholders from mock-up till deployment in different areas such as Claims, Transactions, Supply, Assets and others in SSRS.
- Optimized Queries in T-SQL by removing redundancies, retrieving essential data and using SQL methods like Joins efficiently.
Technology Stack: MS SQL Server 2008/2012 (T-SQL), SQL Server Management Studio, SQL Server Integration Service, SQL Server Reporting Service, Windows 7, MS Office Suite 2010
