We provide IT Staff Augmentation Services!

Data Scientist Resume

4.00/5 (Submit Your Rating)

Irving, TX

PROFESSIONAL SUMMARY:

  • Over8+ years of experience in MachineLearning, Data Mining, Data Analysis, with largedatasets of Structured and Unstructureddata, DataAcquisition, DataValidation, Predictivemodeling, DataVisualization.
  • Extensive experience in Text Analytics, developing different StatisticalMachineLearning, DataMining solutions to various business problems and generating datavisualizations using R, PythonandTableau.
  • Expertise in transforming business requirements into analyticalmodels, designingalgorithms, buildingmodels, developing datamining and reportingsolutions that scales across massive volume of structured and unstructured data.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Designing of PhysicalDataArchitecture of New system engines.
  • Hands on experience in implementing LDA, Naïve Bayes and skilled in RandomForests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neuralnetworks, Principle Component Analysis and good knowledge on Recommender Systems.
  • Proficient in StatisticalModeling and MachineLearning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K - Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Developing Logical Data Architecture with adherence to Enterprise Architecture.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Adept in statistical programming languages like R and also Python including BigData technologies like Hadoop, Hive.
  • Skilled in using dplyr and pandas in R and python for performing exploratory data analysis.
  • Experience working with data modeling tools like Erwin, Power Designer and Studio.
  • Experience in designing star schema, Snowflake schema for Data Warehouse, OD architecture.
  • Experience in Data collection, Data Extraction, Data Cleaning, Data Aggregation, Data Mining, Data verification, Data analysis, Reporting, and data warehousing environments.
  • Extensive experience in querying languages using SQL, PL/SQL, T-SQL, SAS.
  • Proficient in Data Analysis with sound knowledge in extraction of data from various database sources like MySQL, MSSQL, Oracle, Teradata and other database systems.
  • Expertise in developing advanced PL/SQL code through Stored Procedures, Triggers, Cursors, Tables, Views and User Defined Functions.
  • Experience in building Data Integration, Workflow Solutions and Extract, Transform, and Load (ETL) solutions for data warehousing using SQL Server Integration Service (SSIS).
  • Experience in Developing LAP Cubes by using SQL Server Analysis Services (SSAS) and defined Data Source views, Dimensions, Measures, Hierarchies, Attributes, Calculations using multi-dimensional expression (MDX), Perspectives and Roles.
  • Expertise in Normalization/Denormalization techniques for effective and optimum performance in OLTP and OLAP environments.
  • Developed Merge jobs in Python to extract and load data into MySQL database.
  • Experience and Technical proficiency in Designing, Data Modeling Online Applications, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
  • Good understanding of Teradata SQL Assistant, Teradata Administrator and data load/ export utilities like BTEQ, FastLoad, MultiLoad, and Fast Export.
  • Experience with DataAnalytics, DataReporting, Ad-hoc-Reporting,Graphs, Scales, PivotTables and OLAP reporting.
  • Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide data summarization.
  • Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
  • Worked and extracted data from various database sources like Oracle, SQLServer, DB2, regularly accessing JIRA tool and other internal issue trackers for the Project development.
  • Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features.
  • Knowledge of working with Proof of Concepts (PoC’s) and gapanalysis and gathered necessary data for analysis from different sources, prepared data for data exploration using datamunging and Teradata.
  • Well experienced in Normalization & De-Normalization techniques for optimum performance in relational and dimensional database environments.

TECHNICAL SKILLS:

Data Science: Predictive Modeling, Machine learning, Statistics& Probability, Data Warehouse, Data Mining, Data Analysis, Python, R

Data Modeling Tools: Erwin r 9.6/9.5, ER/Studio 9.7, Star-Schema Modeling, Snowflake-Schema Modeling, FACT and dimension tables, Pivot Tables.

Databases: Oracle 11g/12c, MS Access, SQL Server 2012/2014, Sybase and DB2,Teradata14/15, Hive.

Big Data Tools: Hadoop, Map Reduce, Hive, Apache Spark, Pig, HBase, Sqoop, Flume.

BI Tools: Tableau 7.0/8.2, Tableau server 8.2, Tableau Reader 8.1,SAP Business Objects, Crystal Reports

Packages: Microsoft Office 2010, Microsoft Project 2010, SAP and Microsoft Visio, Share point Portal Server

Operating Systems: Microsoft Windows 8/7/XP, Linux and UNIX.

Languages: SQL, PL/SQL, ASP, Visual Basic, XML, Python, SQL, T-SQL, SQL Server, C, C++, JAVA, HTML, UNIX shell scripting, PERL, R.

Applications: Toad for Oracle, Oracle SQL Developer, MS Word, MS ExcelMS Power Point, Teradata, Designer 6i.

Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.

PROFESSIONAL EXPERIENCE:

Confidential - Irving, TX

Data Scientist

Responsibilities:

  • Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XGBoost, SVM, and RandomForest.
  • A highly immersive Data Science program involving Data Manipulation Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, UNIX Commands, NoSQL, MongoDB, Hadoop.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Used pandas, NumPy, seaborn, SciPy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms.
  • Installed and used CaffeDeepLearningFramework
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ERStudio9.7
  • Worked on miss value imputation, outlier's identification with statistical methodologies using Pandas, NumPy.
  • Participated in features engineering such as feature creating, feature scaling and one-Hot encoding with Scikit-learn.
  • Tackled highly imbalanced fraud dataset using under sampling with ensemble methods, oversampling with SMOTE and cost sensitive algorithms.
  • Improved fraud prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn.
  • Implemented machine learning model (Logistic regression) with PythonScikit-learn.
  • Optimized algorithm with stochastic gradient descent algorithm.
  • Participated in all phases of datamining; datacollection, datacleaning, developingmodels, validation, visualization and performed Gapanalysis.
  • DataManipulation and Aggregation from different source using Nexus, Toad, BusinessObjects, PowerBI and SmartView.
  • Implemented Agile Methodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Good knowledge of HadoopArchitecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, SecondaryNameNode, and MapReduce concepts.
  • As Architect delivered various complex OLAPdatabases/cubes, scorecards, dashboards and reports.
  • Programmed a utility in Python that used multiple packages (scipy, numpy, pandas)
  • Implemented Classification using supervised algorithms like LogisticRegression, Decisiontrees, KNN, andNaïve Bayes.
  • Used Teradata15 utilities such as FastExport, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Experience in Hadoop ecosystem components like HadoopMapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Flume including their installation and configuration.
  • Updated Pythonscripts to match training data with our database stored in AWSCloud Search, so that we would be able to assign each document a response label for further classification.
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Validated the machine learning classifiers using ROC Curves and Lift Charts.
  • Extracted data from HDFS and prepared data for exploratory analysis using datamunging.

Environment: Machine Learning,ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, MDM, GIT, Unix, Python 3.5.2,, MLLib, SAS, regression, logistic regression, Hadoop, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML, MapReduce.

Confidential - Dallas, TX

Data Analyst

Responsibilities:

  • Working in Amazon Web Services cloud computing environment
  • Used Tableau to automatically generate reports, Worked with partially adjudicated insurance flat files, internal records, 3rd party data sources, JSON, XML and more.
  • Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, spacetime.
  • Implemented end-to-end systems for DataAnalytics, DataAutomation and integrated with custom visualization tools using R,Mahout, Hadoop and MongoDB.
  • Gathering all the data that is required from multiple data sources and creating datasets that will be used in analysis.
  • Performed Exploratory DataAnalysis and DataVisualizations using R, andTableau.
  • Perform a proper EDA, Univariate and bi-variateanalysis to understand the intrinsic effect/combined effects.
  • Worked with Data governance, Data quality, data lineage, Data architect to design various models and processes.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
  • Designed data models and data flow diagrams using Erwin and MSVisio.
  • As an Architect implemented MDM hub to provide clean, consistent data for a SOA implementation.
  • Developed, Implemented & Maintained the Conceptual, Logical&Physical Data Models using Erwin for Forward/ReverseEngineered Databases.
  • Validated and select models using k-fold cross validation, confusion matrices and worked on optimizing models for high recall rate.
  • Implemented Ensemble Models with majority votes to enhance the efficiency and performance.
  • Designed rich data visualizations with Tableau.
  • Lead the development and presentation of a dataanalytics data-hub prototype with the help of the other members of the emerging solutions team
  • Worked with Hadoop eco system covering HDFS, HBase, YARN and MapReduce
  • Take up ad-hoc requests based on different departments and locations
  • Used Hive to store the data and perform datacleaning steps for huge datasets.
  • Created dash boards and visualization on regular basis using ggplot2 and Tableau
  • Responsible for creating ETL design specification document to load data from operational data store to data warehouse.
  • Prepared scripts to ensure proper data access, manipulation and reporting functions with Rprogramming languages.
  • Formulated procedures for integration of Rprogramming plans with data sources and delivery systems.
  • Creating customized business reports and sharing insights to the management
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Interacted with the other departments to understand and identify data needs and requirements and work with other members of the IT organization to deliver data visualization and reporting solutions to address those needs.

Environment: : R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

Confidential - Bloomington, IL

Data Analyst

Responsibilities:

  • Deployed and implemented information management systems which collected data from over 4,700 participants.
  • Performed data merging, cleaning, and quality control procedures by programming data object rules into a database management system.
  • Actively reviewed over 208 unique variables and 4,700 rows of data using Excel and Python.
  • Created detailed reports for management.
  • Reported daily on returned survey data and thoroughly communicated survey progress statistics, data issues, and their resolution.
  • Assisted in the development of a new data review and coding system which finished delivery task two weeks early and required fewer staff to complete overall.
  • Performed data harmonization between two distinct data sources to create a master data delivery file.
  • Coordinated training and technical materials for a staff of five in survey collection and issue resolution.
  • Develop a master data flowchart which was used to measure the completion of study objectives.
  • Served as primary contact for the acceptance or rejection of surveys where unique or rare issues were involved.
  • Involved in Data analysis and quality check
  • Created the source to target mapping spreadsheet detailing the source, target data structure and transformation rule around it.
  • Wrote Python scripts to parse XML documents and load the data in database, used Python to extract weekly information from XML files, Developed Python scripts to clean the raw data.
  • Worked on datasets of various file types including HTML, Excel, PDF, Word, XML and its conversions.
  • Involved in testing the XML files and checked whether data is parsed and loaded to staging tables.
  • Mine and analyze data from company databases to drive optimization and improvement of product development, marketing techniques and business strategies
  • Performed Database and ETL development per new requirements as well as actively involved in improving overall system performance by optimizing slow running/resource intensive queries.
  • Implemented big data processing applications to collect, clean and normalization large volumes of open data using Hadoop ecosystems such as PIG, HIVE, and HBase.
  • Python and resolved customer issues and recommended solutions for improvement.
  • Developed data mapping documentation to establish relationships between source and target tables including transformation processes using SQL.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created several types of data visualizations using Python and Tableau.
  • Data wrangling and scripting in Python, database cleanup in SQL, advanced model building in R/Python, and expertise in data visualization and Tableau dashboard development.
  • Effectively led multiple client projects. These projects contained a heavy Python, SQL, Tableau, modeling, and forecasting component.
  • Created views in Tableau Desktop that were published to internal team for review and further data analysis and customization using filters and actions.
  • Created interactive dashboards using Tableau desktop 9.1/10 using filters.
  • Designed an automated validation system in python that generates a detailed report explaining the differences in two data sets of any format with comparisons through visualizations.
  • Used extracted data for analysis and carried out various mathematical operations for calculation purpose using python library - NumPy, SciPy
  • Conducted performance tuning of complex SQL queries and stored procedures by using SQL Profiler and index tuning wizard. Used Database Mirroring for increasing database availability.
  • Participated in data modeling discussion and provided inputs on both logical and physical data modelling.
  • Participated in stakeholder discussions, change adoption discussion and job scheduling discussion to ensure smooth implementation with minimal impact to other service areas.
  • Reviewed the UTRs, STRs and Performance Test results to ensure all the test results meet requirement needs.
  • Opened Risks or Issues that the current project is facing and worked towards resolving them.
  • Worked on big data migration project of a North America's largest auto insurer as a Requirement Gathering resource from data movement point of view.
  • Created master Data workbook which represents the ETL requirements such as mapping rules, physical Data element structure and their description.
  • Participated in DMCM (Data Model Change Management) & RCN (Requirement Change Notification) process.

Confidential - Alexandria, VA

Data Analyst

Responsibilities:

  • Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, and time, Date and Time etc.
  • Created ecosystem models (e.g. conceptual, logical, physical, canonical) that are required for supporting services within the enterprise data architecture (conceptual data model for defining the major subject areas used, ecosystem logical model for defining standard business meaning for entities and fields, and an ecosystem canonical model for defining the standard messages and formats to be used in data integration services throughout the ecosystem).
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, &KNN for data analysis.
  • Integrated new tools and developed technology frameworks/prototypes to accelerate the data integration process and empower the deployment of predictive analytics by developing Spark Scala modules with R.
  • Wrote several Teradata SQL Queries using Teradata SQL Assistant for Ad Hoc Data Pull request.
  • Responsible for business case analysis, requirements gathering use case documentation, prioritization and product/portfolio strategic roadmap planning, high level design and data model.
  • Oversee development of all Tableau dashboards for organization.
  • Coordinate data delivery from other developers, in order to update dashboards on a monthly basis.
  • Experience parsing data stored in Excel, CSV, JSON, HTML, PDF, TXT, and other file formats.
  • Finished project focusing on predicting blood born infections in patients, after undergoing surgery.
  • Built object-oriented framework to easily allow construction of multi-layer ensemble machine-learning models, using Scikit-learn, XGBoost, Theano, and other Python toolkits.
  • Developed Java application to extract text features from hundreds of thousands of clinical encounters.
  • Developed simulations to study and understand the effects of CMS bundled payment model on hospital output using claims data and hospital accounting data.
  • Learned HTML, CSS, and JavaScript to develop a web application to demonstrate organizations.
  • Built website to act as a code repository for all the organizations parsers

Environment: : R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

Confidential

Data Analyst

Responsibilities:

  • Communicated and coordinated with other departments to collect business requirement
  • Worked on missing value, outlier detection with statistical methodologies using Pandas, NumPy.
  • Applied dimensionality reduction technique PCA to reduce the dimensionality of given data.
  • Design, built and deployed a set of python modeling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior prediction and support multiple marketing segmentation programs.
  • Participated in features engineering such as feature creating, feature scaling and One-Hot encoding.
  • Visualize the data using matplotlib like bar chart, heat map, and histogram.
  • Implemented machine learning model like logistic regression, SVM with Python Scikit-learn.
  • Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
  • Perform detailed data analysis (i.e. determine the structure, content, and quality of the data through examination of source systems and data samples) using SQL and Python.
  • Used Python's multiple data science packages like Pandas, NumPy, matplotlib, Seaborn, SciPy, Scikit-learn and NLTK.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Used Pandas, Numpy, Seaborn, Scipy, Matplotlib, Scikit-learn in Python for developing various machine learning algorithms.
  • Worked on data that was a combination of unstructured and structured data from multiple sources and automated the cleaning using Python scripts.
  • Improved fraud prediction performance by using random forest and gradient boosting.
  • Optimized algorithm with stochastic gradient descent algorithm.
  • Fine-tuned the algorithm parameter with manual tuning and automated tuning such as Bayesian Optimization.
  • Validated and select models using k-fold cross validation, confusion matrices and worked on optimizing models for high recall rate.
  • Implemented Ensemble Models with majority votes to enhance the efficiency and performance.
  • Designing and implementing a variety of SSRS reports such as Parameterized, Drilldown, Ad hoc and Sub-reports using Report Designer and Report Builder based on the requirements.
  • Created the logical and physical data modeling using Erwin tool.
  • Designed SSRS reports with sub reports , dynamic sorting, defining data source and subtotals for the report .
  • Followed agile methodology and coordinated daily scrum meetings.
  • Performed data cleansing for accurate reporting. Thoroughly analyzed data and integrate different data sources to process matching functions
  • Developed Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW) .

Environment : Windows XP, MS SQL Server 2005/2008, SQL Server Management Studio, MSBI (SSRS, SSAS, SSIS), MS Excel, T-SQL, ERWIN.

We'd love your feedback!