Data Scientist Resume
San Jose, CA
SUMMARY:
- Data Scientist experienced in modeling Multivariate Linear and Logistic Regression, Classification Trees, Machine Learning algorithms using statistical tools like Python. Proficient in writing SQL.
- Experienced in providing end - to-end analytical solutions from data acquisition and manipulation, exploratory analysis, building and validating predictive models to generating actionable insights from the analysis.
SKILLS:
Machine Learning: Regression analysis, Ridge, Lasso Regression, K-NN, Decision Tree, Support Vector Machine (SVM), Ensembles method like Bagging, Boosting, Stacking, K Means clustering, Deep Learning.
Python Libraries: Jupyter Notebook, NumPy, Pandas, Sci-kit, SciPy, TensorFlow, NLTK, Keras, RE.
Platforms: Unix, Windows, MacOS.
Databases: MySQL, SQL Server, Oracle, Informix.
Programming: Python, SQL, Hive, GitHub/Git
Data Pipeline / ETL Tools: Airflow
Cloud Services: AWS (S3, EC2, RDS, ECS)
Reporting & Visualization Tools: Seaborn, Matplotlib, ggplot2, Tableau Desktop, Tableau Online.
Statistics: Hypothetical Testing, ANOVA, Chi-Square, Confidence Intervals, MLE, Principal Component Analysis (PCA), Cross-Validation, Correlation.
EXPERIENCE:
Data Scientist
Confidential, San Jose, CA
Responsibilities:
- Developing machine learning tools to provide industrial solutions.
- Implemented data models, algorithms to implement machine learning solutions as member of data analyst/data scientist team.
- Conducted exploratory analysis and feature engineering to fit the best models using SciKit Learn, Pyhton.
- Optimized clients’ product portfolios by evaluating market segmentation data queried from the SQL Server
- Proposed strategies to increase clients’ market shares by performing brand competition data analysis.
- Explored the factors that influence the cancellation of client subscription,
- Feature engineered, handled with missing values and created new features by using existed ones,
- Used Matplotlib and Seaborn libraries in Python for visualization,
- Used PySpark in Python by initializing Spark and loading the Big data to retrieving RDD information, sorting, filtering and sampling the data,
- Applied ANOVA, chi-square statistical tests and ensemble methods in order to understand the prediction power and relationship among the categorical features,
- Selected influenced features to develop less complex but more robust machine learning models,
- Designed Logistics Regression, Random Forest, SVC, and Neural Networks, compared model’s performance, and tuned hyper-parameters to get better results,
- Developed classification model to make classification whether the client will cancel the service or not.
- Predicted the turnout of Fenerbahce games by using various surveys (implicit) and historical games ticket sales data
- Optimized settlement plan of Fenerbahce Stadium and raised annual benefits of the club with predicting combine ticket sales
- Optimized the fan club goods and raised revenue of the club by classifying the supporters according to their social status and attitudes
- For prediction we used linear regression, Random forest regression and SVM using Python.
- Developed ensemble techniques to predict the value of transactions for potential customers
Data Engineer
Confidential
Responsibilities:
- Participated in Turkey's largest historical archives digitization project whose Phase 1 was successfully completed in 18 months, including a 3-month preliminary work. This phase had 30M pages and used up 4.5 Petabytes of storage.
- Designed and built data driven archive applications aimed at optimizing business and operational efficiency.
- Performed database design, tuning and optimization on MS SQL Server.
- Optimized complex SQL queries, index and database objects.
- Processed, cleansed and verified the integrity of data used for analysis.
- Developed machine learning algorithms for document classification. Tools: python, Scikit Learn, NLTK, SVM.
- Followed agile methodologies to manage the life-cycle of the project.
Data Architect & Project Manager
Confidential
Responsibilities:
- Designed and built data driven archive applications aimed at optimizing business and operational efficiency.
- Managed numerous web projects that have won national and international awards.
- Providing vision to company in product improvement matters (document digitization, e-commerce, B2B, B2C, web content management, e-government, archive management, open innovation platform).
- Turkey's first major historical and top-secret document digitization project, targeting digitization of 5.000.000 pages and 1 Petabytes of data, was completed on time.
- Turkish Presidency’s project had numerous stages consisting of document ontology, classification, document scanning, digitization, data entry, data tagging, digital signature, secure sharing platform, searching platform, a new technical approach for Ottoman handwritten document character recognition using NLP, Turkish character recognition.
- The assumed role in this project was the project management and data architecture design.
- Development and revision management of Content Policy Document with our client team.
Software Development Specialist
Confidential
Responsibilities:
- Performing system development activities for the structuring of information technology systems in newly created airport
- Technical and administrative management in planning, establishment and operating of systems such as IGCS, SITA, FPL, BRS, METAR, FIDS, ATIS, ATCS, AODB, Finance Systems, Intranet.
- Developed Intranet, Airport Operational Database Application and Aircraft Maintenance Application.
Tools: Oracle, ASP, Delphi, IIS, Windows.