Data Scientist Resume
Austin, TX
SUMMARY
- Around 9 years of extensive experience in the IT industry, concentrated in dealing with data using Python, R and SQL.
- Over 3+ years of professional working experience in data science, advanced analytics, data visualization, machine learning domain and AI techniques.
- Extensive experience in Data analysis such as in uncovering insights and discovering patterns.
- Experience and highly skilled in Financial, Telecom, ecommerce and Retail Marketing Domains.
- Extensive knowledge in NLTK for NLP, Text Mining and other social network analysis.
- Well versed in machine learning and data mining techniques such as feature engineering, semantic, machine learning pipelines and their automation.
- Good with deep learning frameworks such as TensorFlow, Keras, PyTorch.
- Expert in creating Dashboards and visualization charts using Tableau, Python (Matplotlib, Seaborn, and Bokeh) and R (ShinyR, Ggplot2).
- Deep and extensive knowledge with Hadoop, Spark, Hive, HBase, etc.
- Experience working and knowledge about various database management systems and data platforms such as MySQL, Sybase, SQLite, NoSQL, MongoDB, AWS S3, etc.
- Hands - on experience working on cloud platform such as Apache Spark, Microsoft Azure.
- Strong Knowledge and experience processing structured, semi-structured and unstructured data and handled different file formats like CSV, XML, and JSON etc.
- Good Confidential statistical techniques and concepts such as A/B test, experiment design, properties of distributions, statistical tests, hypothesis test, ANOVA, time series analysis.
- Expertise in developing analytical approaches to meet business requirements.
- Experience in Agile Methodology and Scrum Software Development processes.
- Excellent communication and presentation skills in presenting research to large audiences.
- Ability to learn complex methodologies quickly and ability to convey complex analytical approaches and findings.
- Successfully worked in fast-paced and dynamic environment both independently and in collaborative team.
TECHNICAL SKILLS
Machine Learning: Classification, Regression, Feature Engineering, Clustering, Regression analysis, Naive Bayes, Decision Tree, Random Forest, Support Vector Machine, KNN, Ensemble Methods, K-Means Clustering.
Statistical Analysis: Time Series Analysis, Regression models, Confidence Intervals, Principal Component Analysis and Dimensionality Reduction.
Programming Languages: Python (panda, numpy, Scikit-learn), R (dplyr, ggplot2), SQL, ML (MLlib).
Big Data: Hadoop Ecosystem - Hive, Pig, MapReduce, Apache Spark.
Databases: Oracle, MySQL, PostgreSQL, HDFS, HBase, Teradata, MongoDB, Sybase.
Data Visualization Tools: Tableau 9.x/10.x, Seaborn/ Matplotlib, Plotly, SSRS, Shiny, Dash.
Cloud Data Systems: AWS (RedShift), AWS S3.
IDE: Jupyter-lab, R Studio, Eclipse, Spyder, Atom, Notepad++, Sublime.
Version Control Tools: Git, GitHub.
SDLC Methodologies: Agile, Scrum & Waterfall.
PROFESSIONAL EXPERIENCE
Confidential, Austin, TX
Data Scientist
Responsibilities:
- Cleaned and manipulated complex datasets to create the data foundation for further analysis and the development of key insights (MSSQL server, R, Tableau, Excel).
- Applied various machine learning algorithms and advanced statistical analysis like decision trees, regression models, SVM, clustering to identify volume using scikit-learn package in Python.
- Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
- Worked on the technical implementation of advanced analytics projects. Defined the mathematical approaches, developed new and effective analytics algorithms and wrote the key pieces of by utilizing R, Python and other tools and languages needed.
- Developed analytical approaches to answer high-level questions and develop insightful recommendations.
- Responsible for conducting various statistical analysis like linear regression, ANOVA and classification models to the data for analysis.
- Involved in extracting customer's Big Data from various data sources (Excel, Flat Files, Oracle, SQL Server, Teradata, and also log data from servers) into Hadoop HDFS.
- Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database.
- Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn in Python for developing various machine learning algorithms.
- Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data.
- Created various types of data visualizations using Python and Tableau.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Developed and applied metrics and prototypes that could be used to drive business decisions.
- Participated in ongoing research, and evaluation of new technologies and analytical solutions.
- Used problem-solving skills to find and correct data problems, applied statistical methods to adjust and project results when necessary.
- Worked across cross-functional teams in a matrix environment in completion of technical proof of concepts.
- Involved in Agile/SCRUM process, attending daily stand up and completing tasks in sprints.
Environment: Python, R, MongoDB, MS SQL Server, HDFS, Pig, Hive, Tableau, SQL, Hadoop Framework, Spark SQL, SparkMLLib, HBase, Pyspark, Excel, Linux.
Confidential, San Francisco, CA
Data Scientist
Responsibilities:
- Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.
- Used K-Means Algorithm Model with different clusters to find meaningful segments on customers, and calculated the accuracy of model.
- Performed data mining on data using very complex SQL queries and discovered pattern and used extensive SQL for data profiling/analysis to provide guidance in building the data model.
- Responsible for mining large data sets and connected data from different sources in order to identify insights and designs.
- Analyzed data and predicted end customer behaviors and product performance by applying machine learning algorithms using Spark MLlib.
- Used R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks.
- Involved in creating Data Lake by extracting customer's Big Data from various data sources (from Excel, Flat Files, Oracle, SQL Server, Mongo DB, HBase, Teradata and also log data from servers) into Hadoop HDFS.
- Helped in migration and conversion of data from the Oracle database, preparing mapping documents and developing partial SQL scripts as required.
- Generated ad-hoc SQL queries to fetch data from SQL Server database systems.
- Worked on predictive and what-if analysis using R from HDFS and successfully loaded files to HDFS and loaded from HDFS to HIVE.
- Created dashboards in Tableau desktop based on the data collected from MS-excel and CSV files, with MS SQL server databases.
- Prepared and presented complex written and verbal reports, findings, and presentations by using various visualization tools such as Matplotlib, ggplot2.
Environment: Python, R, Machine Learning, Data Mining, Pyspark, Spark MLLib, SQL, Excel, CSV, SQL Server, Flat Files, XML, Tableau.
Confidential, Austin, TX
ScrumMaster
Responsibilities:
- Organized and facilitated project planning, daily stand-up meetings, reviews, Program Increment & Sprintretrospectives and release planning, demos and other Scrum-related meetings.
- Initiated / participated in cross organization dependencies for/from other scrum teams.
- Supported Product Owners in managing customer expectations for project deliverables and managingstakeholder communications.
- Collaborated with the team to identify and resolve problems and remove impediments/barriers for the teamdirectly or via escalation.
- Teamed with the product owners to ensure the user stories are properly written and the backlog is prioritizedand groomed.
- Conducted Design sessions with Business & IT community to identify best approachable solution.
- Gave presentations on the Design & Solution of the project to the Business & Management community.
- Wrote custom SQL queries in SQL Server Management Studio 2014 to extract data from data marts and data warehouses to perform data analysis on business data.
- Experienced using LOD (level of detail expressions), calculated fields, parameters, Actions (drill-throughs), quick table calculations with advanced selections in Tableau.
- Extracted data from multiple data sources and created tableau data extract files (TDE) for fast query execution in Tableau.
- Designed report views as a POC in Tableau for approval from stakeholders before developing the application.
- Created data views and dashboards in various formats for analysis and decision-making using Tableau BI (10.4) and publish applications to Tableau server.
- Worked with globally located teams in APJ, EMEA and AMS zones.
- Prepared and ran Burndown Charts to estimate the remaining work effort of the Sprint and presented the weekly status reports on the project health to the Senior Management on a weekly basis.
- Actively coordinated with Release Managers during Production deployment on SAP ERP based Fusion and obtained IT and Business sign-off.
Environment: - MS Visio 2013, SAP ERP 6.0, SharePoint 2016, Tableau Business Intelligence (10.4), MS Office 2016, .Net 4.6, JIRA, Scrum Methodology.
Confidential, Tampa, FL
Sybase Developer
Responsibilities:
- Involved in requirements gathering and analysis. Designed and created database objectsbased on business requirements.
- Created database Tables, Constraints, Views, Synonyms, Sequences, Indexes and Object Types.
- Created Entity Relation Diagrams for coding integrations for the database using Microsoft Visio.
- Developed Procedures on Sybase database.
- Worked with users and created views for simplified reporting.
- Worked with developers and business users to tune T-SQL.
- Created Unix shell scripts to automate backend jobs, loading data into the Database using BCP commands.
- Used Database trigger for Data validations and making history of insertion, updating, deletion and all kind of Audit routines.
- Developed and modified database procedures, triggers to enhance and improved functionality using T-SQL.
- Involved in data loading. Used BCP to load data into the database using T-SQL.
- Created users and assigned roles.
- Provide DBA support for a range of Database Administration activities including: - Performance monitoring and tuning- Debugging- Documentation-Capacity planning- Data modeling- Database system software patch and release applications.
- Generated T-SQL scripts to install create and drop database objects including: tables, views, primary keys, indexes, constraints, sequences, grants and synonyms.
- Provided Production support for various applications& data warehouses.
Environment: T-SQL,PL/SQL, KSH, PERL, UNIX, Sybase 12.x, SQL Loader, BCP, ISQL, CVS, DB Artisan.
Confidential, Orlando, FL
Sybase Developer
Responsibilities:
- Analysis, Design, Construction and testing of Sybase database.
- Involved in smaller projects (e.g. development of new reports, market data analysis, support of scenario developments).
- Involved in support in case of ad-hoc analysis.
- Worked extensively on Stored Procedures and fixed existing bugs in Stored Procedures.
- Created ad-hoc reports using TSQL.
- Created many Perl scripts for data loading and batch scripts using Perl and KSH.
- Took part in backend support using Perl/TSQL.
- Worked extensively to coordinate support effort with DBA/SA.
- Participated in query and stored procedure performance tuning.
- Created many Tables, Indexes, complex Stored Procedures and Triggers.
- Used BCP extensively for data handling between data files and Sybase tables.
- Worked in query optimization.
- Created many Perl modules and packages.
- Involved in documentation and testing.
- Involved in scheduling jobs using Crontab.
Environment: Sybase 11.5, Secure CRT, UNIX, Shell scripting, MS Outlook, Rapid SQL, Putty.