Data Science Consultant/software Conversion Lead Resume
SUMMARY
- About 5 years of experience in Data Science with large datasets of structured and unstructured data, Predictive modelling, Data analysis, Data acquisition, Data validation and Data visualization.
- Hands - on experience with algorithms such as Regression Analysis, Clustering, Boosting, Classification, Principal Component Analysis and Data Visualization Tools.
- Data scientist with proven expertise in Data Analysis, Machine Learning, and Modeling.
- Experience in Machine Learning algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, K-Means Clustering and Association Rules.
- Experience in applying predictive modeling and Machine Learning algorithms for analytical reports.
- Experience using technology to work efficiently with datasets such as scripting, Data cleaning tools, statistical software packages.
- Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks .
- Very Strong in Python, statistical analysis, tools, and modeling .
- Experienced in Machine Learning and Statistical Analysis with Python Scikit-Learn.
- Strong programming skills in a variety of languages such as Python, R and SQL.
- Experience implementing Machine Learning back-end pipeline Scikit-learn, Pandas, NumPy.
- Working knowledge of extract, transform, and Load (ETL) components and process flow using Talend
- Experience with AWS cloud services EC2, S3, SNS, ECS .
- Experience working with Agile and Waterfall models, dealing with sprints and resolving issues within each story of a sprint.
TECHNICAL SKILLS
Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka.
Machine Learning: Regression, Polynomial Regression, Random Forest, Logistic Regression, Decision Trees, Classification, Clustering, Association, Simple/Multiple linear, Kernel SVM, K-Nearest Neighbours (K-NN).
BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, SAP Business Intelligence, Amazon Redshift
Packages: pandas, numPy, seaborn, sciPy, scikit-learn, Beautiful Soup, matplotlib, ggplot, NLTK
Languages: Python, R, C, C++
Databases: SQL, MySQL, Oracle, Netezza, MS Access, Mongo DB
Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0.
Version Control Tools: GitHub, Bitbucket
Operating System: Windows, Linux, Unix
PROFESSIONAL EXPERIENCE
Data Science Consultant/Software Conversion Lead
Confidential
Responsibilities:
- Lead consultant and technical expert to the world’s leading measurement and data company with presence in more than 100 countries and services covering more than 90% of the globe’s GDP and population.
- Established execution plan for transitioning analytics from SAS to Python, eventually paving the way for saving hundreds of thousands of dollars in SAS subscription and licensing fees every year.
- Converted and enhanced over half a dozen SAS processes and spreadsheet models to Python scripts that were modular, scalable and highly efficient. Trained associates to adapt and excel in the new Pythonic environment.
- Developed robust Python scripts that automated processes leading to tremendous increase in efficiency, bringing down execution time from hours to minutes and seconds.
- Collaborated across teams and departments for conceptualizing, designing and development of churn -rate forecasting models that accurately predicted panelists turnover rates over short-term and long-term horizon.
- Developed and optimized SQL queries to extract data from Oracle, Netezza and MySQL databases. Cleaned and validated the data to make it suitable for further processing.
- Successfully managed projects using Agile development methodology (Scrum and Kanban frameworks). Facilitated stakeholder meetings and sprint reviews to drive project completion.
- Planned project deliverables and milestones using Gantt charts, simultaneously preparing weekly and monthly reports for client highlighting progress, challenges and future plans.
- Liaised with team of three data scientists to migrate the application to detect and mitigate outliers to AWS using services such as S3, ECR, ECS, and SNS, among others.
Confidential
Data Science Consultant
Responsibilities:
- Determine customer satisfaction and help enhance customer experience using NLP.
- Worked with NLTK library to NLP data processing and finding the patterns.
- Worked on Text Analytics, Naive Bayes, Sentiment analysis, creating word clouds and retrieving data from Twitter and other social networking platforms.
- Categorize comments into positive and negative clusters from different social networking sites using Sentiment Analysis and Text Analytics.
- Implemented technologies in NLP such as Noise Removal, Lemmatization, Stemming, Tokenizing, POS tagging, Bag of Words, Topic Modelling, TF-IDF, Word2Vec etc.
- Recommended and evaluated marketing approaches based on quality analytics of customer consuming behavior.
- Used AWS S3 service to store daily data from Twitter feeds in a text format using Python Boto library.
- A team member of Analytical Group and assisted in designing and development of statistical models for the end clients. Coordinated with end users for designing and implementation of e-commerce analytics solutions as per project proposals.
- Conducted market research for client; developed and designed sampling methodologies, and analyzed the survey data for pricing and availability of clients' products. Investigated product feasibility by performing analyses that include market sizing, competitive analysis and positioning.
- Successfully optimized codes in Python to solve a variety of purposes in data mining and machine learning in Python.
- Facilitated stakeholder meetings and sprint reviews to drive project completion. Successfully managed projects using Agile development methodology.
- Project experience in Data mining, segmentation analysis, business forecasting and association rule mining using Large Data Sets with Machine Learning.
Environment: Python, MATLAB, MongoDB, exploratory analysis, Naïve Bayes K-Means Clustering, Hierarchical Clustering, Spark (MLlib, PySpark), Tableau, SAS, Hadoop 2.7, OLTP, Random Forest, AWS.
Confidential, San Francisco, CA
Sr. Data Analyst
Responsibilities:
- Performed intensive data pre-processing, feature engineering, feature scaling using python.
- Develop and implement innovative data quality improvement tools.
- Involved in Peer Reviews, Functional and Requirement Reviews.
- Developed project requirements and deliverable timelines; execute efficiently to meet the plan timelines.
- Creating and support a data management workflow from data collection, storage, analysis to training and validation.
- Involved with data analysis, primarily identifying data sets, source data, meta data, data definitions and data formats.
- Performed Data Cleaning, Feature Scaling, Feature Engineering using Pandas and numPy libraries and applied Principle Component Analysis (PCA) for dimensionality reduction.
- Understanding requirements, significance of weld point data, energy efficiency using large datasets.
- Creating and support a data management workflow from data collection, storage, analysis to training and validation.
- Wrangled data, worked on large datasets (acquired data and cleaned the data), analyzed trends by making visualization tools (power BI, Tableau 9.0) using matplotlib and python.
- Understanding the business problems and analyzing the data by using appropriate Statistical models to generate insights.
- Used advanced Microsoft Excel to create pivot tables and pivot reporting, as well as use VLOOKUP function.
- Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
Environment: Python, Machine learning, Pandas, SQL, Spark, AWS(S3/Redshift), Scikit-learn, Data Warehouse, Apache, Tableau.
Confidential
Data Analyst
Responsibilities:
- Involved in Analysis & Marketing Team to make business decisions.
- Involved with key departments to analyze areas and discuss the primary model requirements for the project.
- Documented methodology, data reports and machine learning model results and communicated with the Project Team Manager to share the knowledge.
- Well experienced in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments
- Performed machine learning to estimate the probability of a new customer being classified as good or bad.
- De sign develop and produce reports that connect quantitative data to give better insights.
- Involved in defining the source to business rules, target data mappings, data definitions.
- Built various graphs for business decision making using Python matplotlib library.
- Used Python library Beautiful Soup for web scrapping to extract data for building graphs.
- Performed troubleshooting, fixed and deployed many Python bug fixes of the two main applications that were a main source of data for both customers and internal customer service team
- Responsible for defining the key identifiers for each mapping/interface.
- Remain knowledgeable in all areas of business operations in order to identify systems needs and requirements.
- Document data quality and traceability documents for each source interface.
- Establish standards of procedures.
- Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
Environment: Python, R, Linux, Spark, Tableau, SQL Server 2012, Microsoft Excel, MATLAB, SQL, Scikit-learn, Pandas, XML, SQL Profiler, and Query Analyze.
