Data Scientist/data Analyst Resume
Newbury Park, CA
PROFESSIONAL SUMMARY
- Over 9+ years of experience in Machine Learning, Data Mining, Data Analysis, with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Datamining solutions to various business problems and generating data visualizations using R, Python and Tableau.
- Expertise in transforming business requirements into analyticalmodels, designingalgorithms, buildingmodels, developing datamining and reportingsolutions that scales across massive volume of structured and unstructured data.
- Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
- Designing of PhysicalDataArchitecture of New system engines.
- Hands on experience in implementing LDA, Naïve Bayes and skilled in RandomForests, Decision Trees, Linear and Logistic Regression, SVM, Clustering,, Principle Component Analysis and good knowledge on Recommender Systems.
- Proficient in StatisticalModeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K - Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
- Developing Logical Data Architecture with adherence to Enterprise Architecture.
- Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
- Adept in statistical programming languages like R and also Python including Big Data technologies like Hadoop, Hive.
- Skilled in using dplyr and pandas in R and python for performing exploratory data analysis.
- Experience working withdatamodeling tools like Erwin, Power Designer and Studio.
- Experience in designing star schema, Snowflake schema forDataWarehouse, OD architecture.
- Experience in Data collection, Data Extraction, Data Cleaning, Data Aggregation, Data Mining, Data verification, Data analysis, Reporting, and data warehousing environments.
- Extensive experience in querying languages using SQL, PL/SQL, T-SQL, SAS.
- Proficient in Data Analysis with sound knowledge in extraction of data from various database sources like MySQL, MSSQL, Oracle, Teradata and other database systems.
- Expertise in developing advanced PL/SQL code through Stored Procedures, Triggers, Cursors, Tables, Views and User Defined Functions.
- Experience in building Data Integration, Workflow Solutions and Extract, Transform, and Load (ETL) solutions for data warehousing using SQL Server Integration Service (SSIS).
- Experience in Developing LAP Cubes by using SQL Server Analysis Services (SSAS) and defined Data Source views, Dimensions, Measures, Hierarchies, Attributes, Calculations using multi-dimensional expression (MDX), Perspectives and Roles.
- Expertise in Normalization/Denormalization techniques for effective and optimum performance in OLTP and OLAP environments.
- Developed Merge jobs in Python to extract and load data into MySQL database.
- Experience and Technical proficiency in Designing,DataModeling Online Applications, Solution Lead for ArchitectingDataWarehouse/Business Intelligence Applications.
- Good understanding of TeradataSQLAssistant, Teradata Administrator anddataload/ export utilities like BTEQ, FastLoad, MultiLoad, and FastExport.
- Experience with DataAnalytics, DataReporting, Ad-hoc-Reporting,Graphs, Scales, PivotTables and OLAP reporting.
- Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide data summarization.
- Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
- Worked and extracteddatafrom various database sources like Oracle, SQLServer, DB2, regularly accessing JIRA tool and other internal issue trackers for the Project development.
- Skilled in System Analysis, E-R/DimensionalDataModeling, Database Design and implementing RDBMS specific features.
- Knowledge of working with Proof of Concepts (PoC’s) and gapanalysis and gathered necessary data for analysis from different sources, prepared data for data exploration using datamunging and Teradata.
- Well experienced in Normalization&De-Normalization techniques for optimum performance in relational and dimensional database environments.
TECHNICAL SKILLS
Data Science: Predictive Modeling, Machine learning, Statistics& Probability, Data Warehouse, Data Mining, SAS, Data Analysis, Python, R
DataModeling Tools: Erwin r 9.6/9.5, ER/Studio 9.7, Star-Schema Modeling, Snowflake-Schema Modeling, FACT and dimension tables, Pivot Tables.
Databases: Oracle 11g/12c, MS Access, SQL Server 2012/2014, Sybase and DB2,Teradata14/15, Hive.
Big Data Tools: Hadoop, Map Reduce, Hive, Apache Spark, Pig, HBase, Sqoop, Flume.
BI Tools: Tableau 7.0/8.2, Tableau server 8.2, Tableau Reader 8.1,SAP Business Objects, Crystal Reports
Packages: Microsoft Office 2010, Microsoft Project 2010, SAP and Microsoft Visio, Share point Portal Server
Operating Systems: Microsoft Windows 8/7/XP, Linux and UNIX.
Languages: SQL, PL/SQL, ASP, Visual Basic, XML,Python, SQL, T-SQL, SQL Server, C, C++, JAVA, HTML, UNIX shell scripting, PERL, R.
Applications: Toad for Oracle, Oracle SQL Developer, MS Word, MS Excel MS Power Point, Teradata, Designer 6i.
Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.
PROFESSIONAL EXPERIENCE
Confidential, Newbury Park, CA
Data Scientist/Data Analyst
Responsibilities:
- Design, develop and produce reports that connect quantitativedatato insights that drive and change business.
- Analyzeddatausingdatavisualization tools and reported key features using statistic tools and supervised machine learning techniques to achieve project objectives.
- Accounts Receivables program; ensured data feeds and supervised program function.
- Developed implementation strategies for Sales Plan integration into the system.
- Trained Business Analysts on system operation. Provided requirements gathering and UAT support for capability development.
- Defined configuration specifications and business analysis requirements.
- Performed quality assurance and defined reporting and alerting requirements.
- Assisted in designing, documenting and maintaining system processes.
- Reported on common sources of technical issues or questions and make recommendations to product team.
- Communicate key insights and findings to product team.
- Executed data integration using ETL processes of the source systems.
- Performed data mining tasks related to system break/fix issues, and provided work- arounds and problem solving.
- Designed dashboards with Tableau and provided complex reports, including summaries, charts, and graphs to interpret findings to team and stakeholders
- Progressive and experienced background in analytics and root cause analysis.
- Communicate key insights and findings to product team.
- Proposed and suggested meaningful solutions to the upper management after analyzing dashboard/KPIs.
- Tracked key project milestones and adjusted project plans and resources to meet the needs of customers.
- Provided support for projects in project planning, quality plan, risk management, requirements management, change
- management, defect management and release management.
- Work independently or collaboratively throughout the complete analytics project lifecycle includingdataextraction/preparation, design and implementation of scalable machine learning analysis and solutions, and documentation of results.
- Developed and deployed Machine learning as a service on Microsoft Azure cloud service.
- Worked with sales and Marketing team for Partner and collaborate with a cross-functional team to frame and answer importantdataquestions
- Tracked key project milestones and adjusted project plans and resources to meet the needs of customers.
- Provided support for projects in project planning, quality plan, risk management, requirements management, changemanagement, defect management and release management.
- Built and maintained queries foranalysis/extraction for different databases.
- Developed Excel Services Reports for the Network Team.
- Created technical documentations for each Mapping for future developments.
- Maintained data warehouse tables through the loading of data and monitored system
- configurations to ensure data integrity.
- Administered database password/security settings and approvals.
- Acted as a liaison between the Developer teams and Sales.
- Performing statistical analysis and building statistical models in R and Python using various
- Supervised and Unsupervised Machine learning algorithms like Regression, Decision Trees, Random
Environment: Machine Learning,ER Studio 9.7, Tableau 9.03, SAS,AWS, Teradata 15, MDM, GIT, Unix, Python 3.5.2,, MLLib, SAS, regression, logistic regression, Hadoop, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML, MapReduce.
Confidential - Dallas, TX.
Data Analyst
Responsibilities:
- Working in AmazonWebServices cloud computing environment
- Used Tableau to automatically generate reports, Worked with partially adjudicated insurance flat files, internal records, 3rd partydatasources, JSON, XML and more.
- Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, spacetime.
- Implemented end-to-end systems for DataAnalytics, DataAutomation and integrated with custom visualization tools using R,Mahout, Hadoop and MongoDB.
- Gathering all the data that is required from multiple data sources and creating datasets that will be used in analysis.
- Performed Exploratory DataAnalysis and DataVisualizations using R, andTableau.
- Perform a proper EDA, Univariate and bi-variateanalysis to understand the intrinsic effect/combined effects.
- Worked withDatagovernance,Dataquality,datalineage,Dataarchitectto design various models and processes.
- Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with BigData/Hadoop.
- Designeddatamodels anddataflow diagrams using Erwin and MSVisio.
- As an Architect implemented MDM hub to provide clean, consistent data for a SOA implementation.
- Developed, Implemented & Maintained the Conceptual, Logical&PhysicalDataModels using Erwin for Forward/ReverseEngineeredDatabases.
- Validated and select models using k-fold cross validation, confusion matrices and worked on optimizing models for high recall rate.
- Implemented Ensemble Models with majority votes to enhance the efficiency and performance.
- Designed rich data visualizations with Tableau.
- Lead the development and presentation of a dataanalytics data-hub prototype with the halp of the other members of the emerging solutions team
- Worked with Hadoop eco system covering HDFS, HBase, YARN and MapReduce
- Take up ad-hoc requests based on different departments and locations
- Used Hive to store the data and perform datacleaning steps for huge datasets.
- Created dash boards and visualization on regular basis using ggplot2 and Tableau
- Responsible for creating ETL design specification document to load data from operational data store to data warehouse.
- Prepared scripts to ensure proper data access, manipulation and reporting functions with R-programming languages.
- Formulated procedures for integration of R-programming plans with data sources and delivery systems.
- Creating customized business reports and sharing insights to the management
- Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
- Interacted with the other departments to understand and identify dataneeds and requirements and work with other members of the ITorganization to deliver data visualization and reportingsolutions to address those needs.
Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.
Confidential - Bloomington, IL
Data Analyst
Responsibilities:
- Deployed and implemented information management systems which collected data from over 4,700 participants.
- Performed data merging, cleaning, and quality control procedures by programming data object rules into a database management system.
- Actively reviewed over 208 unique variables and 4,700 rows of data using Excel andPython.
- Created detailed reports for management.
- Reported daily on returned survey data and thoroughly communicated survey progress statistics, data issues, and their resolution.
- Assisted in the development of a new data review and coding system which finished delivery task two weeks early and required TEMPfewer staff to complete overall.
- Performed data harmonization between two distinct data sources to create a master data delivery file.
- Coordinated training and technical materials for a staff of five in survey collection and issue resolution.
- Develop a master data flowchart which was used to measure the completion of study objectives.
- Served as primary contact for the acceptance or rejection of surveys where unique or rare issues were involved.
- Involved in Data analysis and quality check
- Created the source to target mapping spreadsheet detailing the source, target data structure and transformation rule around it.
- Wrote Python scripts to parse XML documents and load the data in database, used Python to extract weekly information from XML files, Developed Python scripts to clean the raw data.
- Worked on datasets of various file types including HTML, Excel, PDF, Word, XML and its conversions.
- Involved in testing the XML files and checked whether data is parsed and loaded to staging tables.
- Mine and analyze data from company databases to drive optimization and improvement of product development, marketing techniques and business strategies
- Performed Database and ETL development per new requirements as well as actively involved inimproving overall system performance by optimizing slow running/resource intensive queries.
- Implemented big data processing applications to collect, clean and normalization large volumes of opendata using Hadoop ecosystems such as PIG, HIVE, and HBase.
- Python and resolved customer issues and recommended solutions for improvement.
- Developed data mapping documentation to establish relationships between source and target tables including transformation processes using SQL.
- Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Createdseveral types of data visualizations using Python and Tableau.
- Data wrangling and scripting in Python, database cleanup in SQL, advanced model building in R/Python, and expertise in data visualization and Tableau dashboard development.
- Effectively led multiple client projects. These projects contained a heavy Python, SQL, Tableau, modeling, and forecasting component.
- Created views in Tableau Desktop that were published to internal team for review and further data analysis and customization using filters and actions.
- Created interactive dashboards using Tableau desktop 9.1/10 using filters.
- Designed an automated validation system in python that generates a detailed report explaining the differences in two data sets of any format with comparisons through visualizations.
- Used extracted data for analysis and carried out various mathematical operations for calculation purpose using python library - NumPy, SciPy
- Designing a mobile implementation of the site and its analytical services.
- Implemented a strategic business model to create cheaper certifications for people through semi-virtual program.
- Working on an ISO SaaS platform.
- Conducted performance tuning of complex SQL queries and stored procedures by using SQL Profiler and index tuning wizard. Used Database Mirroring for increasing database availability.
- Participated in data modeling discussion and provided inputs on both logical and physical data modelling.
- Participated in stakeholder discussions, change adoption discussion and job scheduling discussion to ensure smooth implementation with minimal impact to other service areas.
- Reviewed the UTRs, STRs and Performance Test results to ensure all the test results meet requirement needs.
- Opened Risks or Issues that the current project is facing and worked towards resolving them.
- Worked on big data migration project of a North America's largest auto insurer as a Requirement Gathering resource from data movement point of view.
- Created master Data workbook which represents the ETL requirements such as mapping rules, physical Data element structure and their description.
- Participated in DMCM (Data Model Change Management) & RCN (Requirement Change Notification) process.
Confidential - Alexandria, VA
Data Analyst
Responsibilities
- Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, and time, Date and Time etc.
- Created ecosystem models (e.g. conceptual, logical, physical, canonical) that are required for supporting services within the enterprise data architecture (conceptual data model for defining the major subject areas used, ecosystem logical model for defining standard business meaning for entities and fields, and an ecosystem canonical model for defining the standard messages and formats to be used in data integration services throughout the ecosystem).
- Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, &KNN for data analysis.
- Integrated new tools and developed technology frameworks/prototypes to accelerate the data integration process and empower the deployment of predictive analytics by developing Spark Scala modules with R.
- Wrote several Teradata SQL Queries using Teradata SQL Assistant for Ad Hoc Data Pull request.
- Responsible for business case analysis, requirements gathering use case documentation, prioritization and product/portfolio strategic roadmap planning, high level design and data model.
- Oversee development of all Tableau dashboards for organization.
- Coordinate data delivery from other developers, in order to update dashboards on a monthly basis.
- Experience parsing data stored in Excel, CSV, JSON, HTML, PDF, TXT, and other file formats.
- Finished project focusing on predicting blood born infections in patients, after undergoing surgery.
- Built object-oriented framework to easily allow construction of multi-layer ensemble machine-learning models, using Scikit-learn, XGBoost, Theano, and other Python toolkits.
- Developed Java application to extract text features from hundreds of thousands of clinical encounters.
- Developed simulations to study and understand the effects of CMS bundled payment model on hospital output using claims data and hospital accounting data.
- Learned HTML, CSS, and JavaScript to develop a web application to demonstrate organizations.
- Built website to act as a code repository for all the organizations parsers
Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.
Confidential
Data Analyst
Responsibilities:
- Communicated and coordinated with other departments to collect business requirement
- Worked on missing value, outlier detection with statistical methodologies using Pandas, NumPy.
- Applied dimensionality reduction technique PCA to reduce the dimensionality of given data.
- Design, built and deployed a set of python modeling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior prediction and support multiple marketing segmentation programs.
- Participated in features engineering such as feature creating, feature scaling and One-Hot encoding.
- Visualize the data using matplotlib like bar chart, heat map, and histogram.
- Implemented machine learning model like logistic regression, SVM with Python Scikit-learn.
- Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
- Perform detailed data analysis (me.e. determine the structure, content, and quality of the data through examination of source systems and data samples) using SQL and Python.
- Used Python's multiple data science packages like Pandas, NumPy, matplotlib, Seaborn, SciPy, Scikit-learn and NLTK.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn in Python for developing various machine learning algorithms.
- Worked on data that was a combination of unstructured and structured data from multiple sources and automated the cleaning using Python scripts.
- Improved fraud prediction performance by using random forest and gradient boosting.
- Optimized algorithm with stochastic gradient descent algorithm.
- Fine-tuned the algorithm parameter with manual tuning and automated tuning such as Bayesian Optimization.
- Validated and select models using k-fold cross validation, confusion matrices and worked on optimizing models for high recall rate.
- Implemented Ensemble Models with majority votes to enhance the efficiency and performance.
Confidential
SSIS/SSRS/SSAS/SQL Developer
Responsibilities:
- Work closely with management and end-users to create and evaluate business requirements.
- Responsible for the Extraction, Transformation and Loading (ETL) of data from Multiple Sources to Data Warehouse using SSIS.
- Used different Control Flow Tasks and Data Flow Tasks for creating SSIS Packages. Designed packages using different types of Transformations for Data Conversion, Derived Columns with Multiple Data Flow tasks
- Involved in Extraction, Transformation amp, loading (ETL) process and used Informatica Power Center tools - Source Analyzer, warehouse designer, Mapping Designer, Workflow Manager and Workflow Monitor.
- Involved in Creating Complex ad-hoc Reports, Sub reportsLinked Reports, Charts, Drill through and Drill Down Reports.
- Experience in writing custom code expressions in SSRS.
- Designed & created OLAP Cubes with Star schema using SSAS.
- Created Dashboards and Scorecards with Key Performance Indicators (KPI) in SQL Server Analysis Services (SSAS).
- Created complex stored Procedures, Functions, Indexes, Tables, Views and other T-SQL code and SQL Joins for applications.
- Create and maintain data model/architecture standards, including master data management (MDM).
- Managed security and user access to Analysis Services cubes using creation of Groups and Roles by creating windows AD and Perspectives within the OLAP cube.
- Monitoring nightly ETL process from various highly different source systems. Sources included SQL based databases and Excel Files. Also ensured that nightly backup jobs or cube processing or other ETL jobs didn't interfere with each other. Managed SQL 2005 and 2008 R2 databases and nightly SSIS Processes.
- Monitoring the scheduled SSRS and Crystal reports. Re-running the reports in case of any failures or data mismatches.
- Designing and implementing a variety of SSRS reports such as Parameterized, Drilldown, Ad hoc and Sub-reports using Report Designer and Report Builder based on the requirements.
- Created the logical and physical data modeling using Erwin tool.
- Designed SSRS reports with sub reports, dynamic sorting, defining data source and subtotals for the report.
- Followed agile methodology and coordinated daily scrum meetings.
- Performed data cleansing for accurate reporting. Thoroughly analyzed data and integrate different data sources to process matching functions
- Developed Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
Environment: Windows XP, MS SQL Server 2005/2008, SQL Server Management Studio, MSBI (SSRS, SSAS, SSIS), MS Excel, T-SQL, ERWIN.
