Data scientist Resume
Bartlesville, OK
OBJECTIVE:
A position where there is need for an individual who is a great analytic thinker, has an optimistic nature and the ability to prioritize as well as create and sustain high work standards.
SUMMARY:
- Over 6 years of experience in large datasets of Structured and Unstructured data, Data Visualization, Data Acquisition, Predictive modeling, Data Validation.
- Experience in developing different Statistical Machine Learning, Text Analytics, Data Mining solutions to various business generating and problems data visualizations using Python, R and Tableau.
- Expertise in transforming business requirements into building models, designing algorithms, developing data mining and reporting solutions that scales across massive volume of unstructured data and structured.
- Proficient in Machine Learning techniques (Decision Trees, Linear, Logistics, Random Forest, SVM, Bayesian, XG Boost, K - Nearest Neighbors) and Statistical Modeling in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
- Experience in designing visualizations using Tableau software and Storyline on web and desktop platforms, publishing and presenting dashboards.
- Experience on advanced SAS programming techniques, such as PROC APPEND, PROC DATASETS, and PROC TRANSPOSE.
- Hands on experience in implementing LDA, Naive Bayes and skilled in Decision Trees, Random Forests, Linear and Logistic Regression, SVM, Clustering, neural networks and good knowledge on Recommender Systems.
- Expertise in complete software development life cycle process that includes Design, Development, Testing and Implementation in Hadoop Eco System, Documentum sp2 suits of products and Java technologies.
- Having good domain knowledge on Retail and Airlines.
- Highly skilled in using visualization tools like Tableau, Matplotlib for creating dashboards.
- Extensive working experience with Python including Scikit-learn, Pandas and Numpy.
- Well experienced in Normalization & De-Normalization techniques for optimum performance in relational and dimensional database environments.
- Integration Architect & Data Scientist experience in Analytics, Big Data, SOA, ETL and Cloud technologies.
- Experience in foundational machine learning models and concepts( Decision Trees, regression, boosting, GBM, NNs, HMMs, CRFs, MRFs, deep learning).
- Skilled in System Analysis, Dimensional Data Modeling, Database Design and implementing RDBMS specific features.
- Worked and extracted data from various database sources like Oracle, SQL Server and Teradata.
- Developed data variation analysis and data pair association analysis in the Bioinformatics field.
- Regularly accessing JIRA tool and other internal issue trackers for the Project development.
- Facilitated and helped translate complex quantitative methods into simplified solutions for users.
- Knowledge of working with Proof of Concepts and gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using data munging.
- Analyzed data using R, Perl, Hadoop and queried data using structured and unstructured databases
- Highly experienced and knowledgeable in developing analytics and statistical models as per organizational requirements and ability to produce alternate cost effective and efficient models.
TECHNICAL SKILLS:
Databases: SQL Server, MS Access, Teradata, Oracle
NoSql Databases: HBase
Programming Languages: C, C++, MATLAB, R, Python, JavaScript, Scala, Pig
Markup languages: XML, HTML, DHTML, XSLT, X Path, X Query and UML
ETL Tools: ETL Informatica Power Center, SSIS, MDM
Data Modeling Tools: MS Visio, Rational Rose, Erwin
Testing Tools: HP Quality Center ALM
Machine Learning: Supervised and Unsupervised learning
Machine Learning Algorithms: Linear regression, SVM, KNN, Naive Bayes, Logistic regression, LDA/QDA, SVM, CART, Random Forest, Boosting, K-means clustering, Hierarchical clustering, Collaborative Filtering, Neural Network, NLP
Big Data Tools: Hadoop, Hive, Apache Spark, Pig
Cloud Technology: Amazon Web Services, EC2, Elastic MapReduce(EMR), Kinesis
Operating Systems: UNIX, Linux, Windows
Reporting & Visualization: Tableau, Matplotlib, Seaborn, ggplot, SAP Business Objects, Crystal Reports, SSRS, Cognos, Shiny.
WORK EXPERIENCE:
Confidential, Bartlesville, OK
Responsibilities:
- Performing data profiling and analysis on different source systems that are required for Customer Master.
- Identifying the Customer and account attributes required for MDM implementation from disparate sources and preparing detailed documentation.
- Develop extensions and customizations to the database tier of Informatica MDM Hub
- Reviews MDM unit tests and walks through code with developers
- Used T-SQL queries to pull the data from disparate systems and Data warehouse in different environments.
- Worked closely with the Data Governance Office team in assessing the source systems for project deliverables.
- Presented DQ analysis reports and score cards on all the validated data elements and presented -to the business teams and stakeholders.
- Used Data Quality validation techniques to validate Critical Data elements (CDE) and identified various anomalies.
- Extensively used open source tools - R Studio(R) and Spyder(Python) for statistical analysis and building the machine learning.
- Involved in defining the Source To business rules, Target data mappings, data definitions.
- Performing Data Validation / Data Reconciliation between disparate source and target systems (Salesforce, Cisco-UIC, Cognos, Data Warehouse) for various projects.
- Interacting with the Business teams and Project Managers to clearly articulate the anomalies, issues, findings during data validation.
- Writing complex SQL queries for validating the data against different kinds of reports generated by Cognos.
- Extracting data from different databases as per the business requirements using Sql Server Management Studio.
- Interacting with the ETL, BI teams to understand / support on various ongoing projects.
- Extensively using MS Excel for data validation.
- Generating weekly, monthly reports for various business users according to the business requirements. Manipulating/mining data from database tables (Redshift, Oracle, Data Warehouse).
- Providing analytical network support to improve quality and standard work results.
- Create statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
- Interface with other technology teams to load (ETL), extract and transform data from a wide variety of data sources.
- Utilize a broad variety of statistical packages like SAS, R, MLIB, Graphs, Hadoop, Spark, Map Reduce and others.
- Provides input and recommendations on technical issues to Business & Data Analysts, BI Engineers and Data Scientists.
Environment: Data Governance, SQL Server, ETL, MS Office Suite - Excel(Pivot, VLOOKUP), DB2, R, Python, Visio, HP ALM, Agile, Azure, Data Quality, Tableau and Reference Data Management.
Data Scientist
Confidential, Boston, MA
- Worked as a liaison between multiple teams to gather and document requirements and developed data science platform designed to cover the end-to-end Machine learning workflow: manage data, train, evaluate, and deploy models, make predictions, and monitor predictions using different machine learning methodologies like Regression, Bayesian, Decision Trees, Random Forests, SVM, Kernel SVM Clustering, Instance based methods, Association Rules, Dimensionality Reduction etc.
- Collaborated with business to understand company needs and devise possible solutions
- Analyzed and solved business problems, and found patterns and insights within structured and unstructured data.
- Cleaned, analyzed and selected data to gauge customer experience
- Implemented new statistical and mathematical methodologies as needed for specific models and analysis.
- Used algorithms and programming to efficiently go through large datasets and apply treatments, filters, and conditions as needed.
- Created meaningful data visualizations to communicate findings and relate them back to how they create business impact.
- Utilized a diverse array of technologies and tools as needed, to deliver insights such as Python R, SAS, Matlab, Tableau and more.
- Presented proposals and results in a clear manner backed by data and coupled with actionable conclusions to drive business decisions.
- Performing data profiling and analysis on different source systems that are required for Customer Master.
- Identifying the Customer and account attributes required for MDM implementation from disparate sources and preparing detailed documentation.
- Used T-SQL queries to pull the data from disparate systems and Data warehouse in different environments.
- Worked closely with the Data Governance Office team in assessing the source systems for project deliverables.
- Presented DQ analysis reports and score cards on all the validated data elements and presented -to the business teams and stakeholders.
- Used Data Quality validation techniques to validate Critical Data elements (CDE) and identified various anomalies.
- Extensively used open source tools - R Studio(R) and Spyder(Python) for statistical analysis and building the machine learning.
- Involved in defining the Source To business rules, Target data mappings, data definitions.
- Performing Data Validation / Data Reconciliation between disparate source and target systems for various projects.
- Interacting with the Business teams and Project Managers to clearly articulate the anomalies, issues, findings during data validation.
- Writing complex SQL queries for validating the data against different kinds of reports generated by Cognos.
- Extracting data from different databases as per the business requirements using SQL Server Management Studio.
- Interacting with the ETL, BI teams to understand / support on various ongoing projects extensively using MS Excel for data validation.
- Generating weekly, monthly reports for various business users according to the business requirements. Manipulating/mining data from database tables (Redshift, Oracle, and Data Warehouse).
- Providing analytical network support to improve quality and standard work results.
- Create statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
- Interface with other technology teams to load (ETL), extract and transform data from a wide variety of data sources.
- Utilize a broad variety of statistical packages like SAS, R, MLIB, Python and others.
- Provides input and recommendations on technical issues to Business & Data Analysts, BI Engineers and Data Scientists.
Environment: Data Governance, SQL Server, ETL, MS Office Suite - Excel(Pivot, VLOOKUP), DB2, R, Python, Visio, HP ALM, Agile, Azure, MDM, Share point, Data Quality, Tableau and Reference Data Management.
Data Analyst
Confidential
- Responsible for the Study/Creation of SAS Code, SQL Queries, Analysis enhancements and documentation of the system.
- Used R, Python, SAS, and SQL to manipulate data, and develop and validate quantitative models.
- Brainstorming sessions and propose hypothesis, approaches, and techniques.
- Created and optimized processes in the Data Warehouse to import, retrieve and analyze data from the Cyber Life database.
- Analyzed data collected in stores (JCL jobs, stored-procedures, and queries) and provided reports to the Business team by storing the data in excel/SPSS/SAS file.
- Performed Analysis and Interpretation of the reports on various findings.
- Prepared Test documents for zap before and after changes in Model, Test, and Production regions.
- Responsible for production support Abend Resolution and other production support activities and comparing the seasonal trends based on the data by Excel.
- Used advanced Microsoft Excel functions such as pivot tables and VLOOKU Pin order to analyze the data and prepare programs.
- Successfully implemented migration of client's requirement application from Test/DSS/Model regions to Production.
- Prepared SQL scripts for ODBC and Teradata servers for analysis and modeling.
- Provided complete assistance of the trends of the financial time series data.
- Various statistical tests performed for clear understanding to the client.
- Implemented procedures for extracting Excel sheet data into the mainframe environment by connecting to the database using SQL.
- Provided training to Beginners regarding the Cyber Life system and other basics.
- Complete support to all regions. (Test/Model/System/Regression/Production).
- Actively involved in Analysis, Development, and Unit testing of the data.
Environment: Python, R/R Studio, SQL Enterprise Manager, SAS, Microsoft Excel, Microsoft Access, MS Outlook.
Data Analyst
Confidential
- Partner with analysts and live producers to identify strategic business questions, key metrics, and actionable insights.
- Conducted feasibility study and performed GAP and Impact analysis for the proposed state.
- Conducted JAD sessions to allow different stakeholders to communicate their perspective with each other, resolve any issues and come to an agreement quickly.
- Managed direct relationship with third party vendors and off-shore development teams, created
- Communication channel and project's review for management.
- Collaborated with Project Architect/Designer to detail technical requirements and refine the solution design.
- Created use case scenarios and documented workflow and business processes.
- Prepared Business Requirement Document (BRD) and Functional Specification Document (FSD).
- Queried the database using complex SQL queries.
- Reviewed business requirements documented by other business analysts to scope level of effort in testing functionality and identifying possible inter-dependencies.
- Assisted the Project Manager with creating detailed project plans and in developing, scheduling and tracking project timelines.
- Created executive summary with the risks and mitigation plan for the projects.
- Collaborated with the QA team to ensure adequate testing on software, maintained quality procedures, and ensured that appropriate documentation was in place.
- Led Defect triage sessions with client during UAT phase, and was the point of contact on Defect management for the project team and led the change management process for the PMO.
- Utilized data analytics to evaluate workload performance data and recommend changes.
- Conducted peer review meetings periodically to keep track of the project's milestones.
- Facilitated training sessions for internal and external teams for smooth transition.