Sr. Data Scientist Resume
Cincinnati, OH
SUMMARY:
- Over 8 years of working experience in designing, building and implementing analytical enterprise applications using machine learning, Python, R and Scala.
- Experience in Machine learning algorithms using Python Scientific Stack and R.
- Ability to use Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, K - Means Clustering and Association Rules efficiently.
- Experience in applying predictive modeling and Machine Learning algorithms for analytical reports.
- Experience with Deep Learning frameworks like Sklearn, TensorFlow and Keras.
- Solid understanding of Data Mapping, Data Warehousing, Data Mining, Data Governance and Data Management services with quality assurance.
- Adept in Statistical Data Analysis, Exploratory Data Analysis, Machine Learning, Data Mining, Data Visualization using R, Python, Base SAS, SAS Enterprise Guide, SAS Enterprise Miner, Tableau and SQL.
- Ability to identify subtle patterns and be able to find unique patterns and perspectives to be able to build new analytical solutions.
- Collaborated with engineers and developers to deploy successful models and algorithms into production environments.
- Ability to prepare ETL Mappings (Source-Stage, Stage-Integration, ISD), Requirements gathering, Data Reporting, Data Visualization, Advanced Business Dashboard and be able to present it effectively to clients.
- Experience in handling various forms of data such as Master Data, Metadata, Source Data.
- Leveraging Big Data (Hadoop framework) to ingest, store, analyze and process big data.
- Experience working on Integrated Development Environments (IDE) like PyCharm, Sublime Text and Eclipse.
- Developed highly scalable classifiers and tools by leveraging Machine Learning, Apache Spark and Deep Learning.
- Developed MapReduce programs to perform Data Transformation and analysis.
- Delivered multiple end-to-end Big Data analytical based solutions and distributed systems using Apache Spark.
- Worked closely with the QA Team in executing the test scenarios, plans, providing test data, creating test cases, Issuing STR’s upon identification of bugs and collecting the test metrics to be able to quickly reiterate the prototype algorithm.
TECHNICAL SKILLS:
Languages: SQL, PL/SQL, MATLAB, Python and R
Statistical Analysis: R, Python, SAS E-Miner, SAS Programming, MATLAB, Minitab and Jupyter
Databases: SQL Server 2014/2012/2008/2005 , MS Access, Oracle 11g/10g/9i
DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Pentaho, Kettle, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1, R Studio
Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/ Snowflake Schema modeling, Fact & Dimensions tables, Physical and logical data modeling, Normalization and Denormalization techniques.
Tools: and Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import and Export Wizard, Microsoft Management Console, Visual SourceSafe 6.0, DTA, Crystal Reports, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau.
PROFESSIONAL EXPERIENCE:
Confidential, Cincinnati, OH
Sr. Data Scientist
Responsibilities:
- Data Extraction, Data Scaling, Data Transformation, Data Modeling and Visualizations using R, SQL and Tableau depending on requirement.
- Writing R scripts working in parallel with Oracle R Enterprise (ORE).
- Performed Ad-hoc reporting / customer profiling, segmentation using R/Python.
- Created various Database Schemas (Oracle) with several tables containing data related to application details, DB and OS details, Asset configuration details, Server details and developed several queries to obtain encryption readiness results.
- Developed MapReduce / Spark Python modules for Machine Learning and Predictive Analytics in Hadoop.
- Created several corresponding ISD (Integration Specific Document) which includes interface list (Inbound / Outbound), detailed file sizes, production support details (SLA’s, Servers etc.,)
- Performed Wellness Checks (SQL) to make sure both stages and production environments are collecting the data as intended and addressed all the production related issues.
- Worked closely with ADM’s (Application Development Managers) for data to determine the DB and OS types for various versions and then created a GTAC supported compatible matrix using Spark Excel to see which applications need an upgrade both software and hardware.
- Designing Machine Learning Pipeline using Microsoft Azure Machine Learning to predict, prescribe and implement a Machine Learning scenario for a given data problem.
- Designed and developed NLP models for Neural Networks Sentiment Analysis.
- Worked with dimensional and relational database design, ETL and Lifecycle development using Informatica, Power center, Repository Manager, Designer, Workflow Manager and Workflow Monitor.
- Extensive work on Data Mining and was responsible for generating reports and dashboards with relevant numbers and graphs to ensure processing times and pipeline functioning is exactly as required.
Environment: SQL/Server, Oracle 11g/10g, MS Office, Informatica, ER Studio, XML, Hive, HDFS, R connector, Python, R and Tableau 9.2.
Confidential, Frederick, MD
Data Scientist
Responsibilities:
- A highly immersive Data Science role involving Data Manipulation and Visualization, Web scraping, Machine Learning, Python programming, Scala, SQL and Hadoop.
- Used R and Python for exploratory Data Analysis, A/B testing, HQL, VQL, Data Lake, Pyspark, Anova test and Hypothesis test to compare and identify the effectiveness of Creative campaigns.
- Created functional specific document for the Phase 3 work including but not limited to Informatica requirements, architectural references, ETL sequence diagrams, data mappings, quality management, use cases and data reconciliation details.
- Worked in developing complex informatica maps and extensively used Data warehousing concepts along with utilizing standard ETL transformation methodologies.
- Responsible for handling defects and escalations. Was responsible for addressing them within timelines by updating corresponding documents (DDL’s, Mappings, Model changes etc.)
- Created wellness check scripts (SQL) to ensure data collection is as intended.
- Explained the limitations of the statistical models and the possible outliers in the samples collected and provided necessary guidance to remove them when making a decision.
- Identified patterns, data quality issues and leveraged insights by communicating with BI team.
- Used correlation analysis and graphical techniques to get some insights about the claim data during exploration stage.
- Use tools extensively like R, Python, ODS, DB2, Metadata, MS Excel etc. to analyze data from multiple perspectives and was able to provide a robust Machine Learning algorithm.
Environment: R studio, Python, Tableau, SQL Server 2012, 2014 and Oracle 11g/10g.
Confidential
Data Scientist
Responsibilities:
- Worked on corpus of enterprise data as a team, to reduce redundancies, coordinated with different functional teams and delivered insights from a client’s requirements and built tasks that assisted our team to perform tasks faster and become more reliable.
- Discovered data sources, web-scraped data from public webpages, from company databases, cleaned and condensed them to create features and define classes from underlying data.
- Designed and built statistical models and features extraction systems. Used models to solve business problems related to company’s data pipeline and communicated the solutions to executive stakeholders.
- Worked on missing value imputation, outlier identification with statistical methodologies using Pandas NumPy.
- Conducted analysis on assessing customer consuming behaviors and discovered value of customers with RFM analysis, applied customer segmentation with clustering algorithms.
- Worked in developing complex informatica maps and extensively used Data warehousing concepts along with utilizing standard ETL transformation methodologies.
- Responsible for handling defects and escalations. Was responsible for addressing them within timelines by updating corresponding documents (DDL’s, Mappings, Model changes etc.)
- Created wellness check scripts (SQL) to ensure data collection is as intended.
- Explained the limitations of the statistical models and the possible outliers in the samples collected and provided necessary guidance to remove them when making a decision.
- Collected feedback after deployment and retrained models to improve the performance.
- Use tools extensively like R, Python, ODS, DB2, Metadata, MS Excel etc. to analyze data from multiple perspectives and was able to provide a robust Machine Learning algorithm.
Environment: Oracle 11g, DB2, SQL server 2008, SQL, PL/ SQL, XML, Windows NT, Tableau, Workday, SPSS, SAS, Business Objects, Teradata.
Confidential
Data Analyst
Responsibilities:
- Worked as a part of the team that developed Machine Learning models using Natural Language Processing using Python to provide insights on the fraudulent claims and recommending actionable insights to help counter them.
- Gathered, analyzed and translated business requirements, communicated with other departments to collect client business requirements and access available data.
- Actively involved in gathering requirement and system specification.
- Developed and optimized database structures, stored procedures, DDL triggers and user-defined functions.
- Designed and Implemented customized Linear Regression model to predict the sales utilizing diverse sources of data to predict demand, risk and price elasticity.
- Developed and optimized database structures, stored procedures, DDL triggers and user-defined functions.
- Designed and Implemented customized Linear Regression model to predict the sales utilizing diverse sources of data to predict demand, risk and price elasticity.
- Created stored procedures to transform the data and worked extensively on T-SQL for various needs with transforming the data when loading.
- Participated in developing prototype modeling and communicated the results.
Environment: Statistical Modeling, Machine Learning, NLP, Python, Sklearn, Genesis, Pyspark, R, SAS, Seaborn, Tableau, SQL.
Confidential
Data Analyst
Responsibilities:
- Developed the ETL (SSIS) Pipelines for data extraction.
- Developed software tools in Python to automatically scrutinize documents and electronic content.
- Developed the Database SQL schema for the data pipelines.
- Performed Data Analysis and subsequent reports for QA teams to prioritize the issues.
- Was assigned to ensure data quality to make sure the compliance standards are properly incorporated.
- Participated in requirements gathering and development of value-adding use-cases and applications in cooperation with other intra-organizations units, product managers, product development teams.
- Developed the SQL jobs for generating the analytic reports.
- Assisted with maintaining electronic filing system for all reports, documentation and official client communication.
Environment: Python, SSIS, SSRS, SQL, Sklearn, Matplotlib.