We provide IT Staff Augmentation Services!

Data Scientist Resume

4.00/5 (Submit Your Rating)

Auburn Hills, MI

SUMMARY

  • Above 8 years of experience in Machine Learning, Datamining with large datasets of Structured and Unstructured data, Data Acquisition, DataValidation, Predictive modeling, Data Visualization.
  • Experience in coding SQL/PL SQL using Procedures, Triggers and Packages.
  • Extensive experience in Text Analytics, developing different Statistical MachineLearning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
  • Excellent Knowledge in Relational Database Design, Data Warehouse/OLAP concepts and methodologies.
  • Demonstrated expertise in utilizing ETL tools Informatica power center 9x/ 8.6.1/8.1/7.1/6.2 and PowerExchange 8.6.1/9x/ for developing teh Data warehouse loads as per client requirement.
  • Experience with Big Data technologies like Hadoop&Spark.
  • Experience working at Pricing and/or Revenue Management.
  • Experience in multiple software tools and languages to provide data - driven analytical solutions to decision makers or research teams.
  • Experience in developing different statistical MachineLearning, Text Analytics, Data Mining solutions to various business generating and problems data visualization using Python.
  • Experienced in MachineLearning techniques ANOVA, PCA, Forecasting, Time Series Regression, Linear/Nonlinear Regression, Logistics Regression, Clustering and Tree based models.
  • Good Knowledge in NoSQL databases like MongoDB and HBase.
  • Inferential Statistics and Descriptive, Hypothesis Testing and Sampling.
  • Time Series Analysis -ARIMA, Neural Networks, Sentiment Analysis, Forecasting and Text Mining.
  • Develop, maintain and teach new tools and methodologies related to data science and high performance computing.
  • Extensively used ETL methodology for supporting Data Extraction, Transformation and Loading process, in a corporate-wide-ETL Solution using different versions of Informatica.
  • Involved in Testing/Debugging SQL for performance issues, used different scenarios, and fixed different test cases.
  • Good in implementing Change Data Capture (CDC) for Incremental Data and slowly growing target for data warehouses.
  • Experience in creating Tableau scorecards, dashboards using stack bars, bar graphs, scattered plots, geographical maps, Gantt charts using show me functionality.
  • Cluster Analysis, TEMPPrincipal Component Analysis (PCA), Association Rules, Recommender Systems.
  • Hands on experience in credentials and experience in database management and data visualization.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Adept in statistical programming languages like R and also Python including Big Data technologies like Hadoop, Hive.
  • Hands on experience with RStudio for doing data preprocessing and building machine learning algorithms on different datasets.
  • Collaborated with teh lead Data Architect to model teh Data warehouse in accordance to FSLDMsubject areas, 3NF format, Snow flake schema.
  • Worked and extracted data from various database sources like Oracle, SQL Server and DB2.
  • Experienced with machine learning algorithm such as logistic regression, KNN, SVM, random forest, neural network, linear regression, lasso regression and k-means
  • Experienced in Machine Learning and Statistical Analysis with Python Scikit-Learn.
  • Predictive Modeling Algorithms: Logistic Regression, Linear Regression, Decision Trees, K-Nearest Neighbors, Bootstrap Aggregation (Bagging), Naive Bayes Classifier, Random Forests, Boosting, Support Vector Machines.

TECHNICAL SKILLS

Languages: T-SQL, PL/SQL, SQL, C, C++, XML, HTML, DHTML, HTTP, MATLAB, DAX, Python, JAVA, R& Base SAS

Databases: SQL Server, MS-Access, Oracleand Teradata, big data, Hadoop

Data Analysis Tools: Tableau, IBM SPSS

DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies

Tools: and Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Visual Studio .Net, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity,Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau, JIRA, SparkMlib

Operating Systems: Windows, Linux, Mac OS

Visualization Tools: Tableau, Power BI, Pivot Tables, MS Excel, SQL Server Reporting Service - Report Builder

Bigdata Ecosystems: MapReduce V2, HBase, HIVE, Sqoop, Oozie, Kafka

PROFESSIONAL EXPERIENCE

Confidential, Auburn Hills, MI

Data Scientist

Responsibilities:

  • Transformation of data using SSIS.
  • Build analytical solutions and models by manipulating large data sets.
  • A highly immersive Data Science program involving Data Manipulation&Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, MongoDB, Hadoop.
  • Applied machine learning and statistical techniques to large datasets to find actionable insights.
  • Involved in complete Software Development Life Cycle (SDLC) process by analyzing business requirements and understanding teh functional work flow of information from source systems to destination systems.
  • Played critical role in collecting data from different data sources and data system like SAP, JDE, Lubes, Hadoop, etc.
  • Extensively used ETL processes to load data from various source systems such as SQL Server and Flat Files, XML files into target system SQL Server by applying business logic on transformation mapping for inserting and updating records when loaded.
  • Involved in designing teh ETL Extract from various sources like Teradata, Flat files and load teh data into target using teh Teradata ODBC as well as Mload target connections and involved in creating Stage Tables in Teradata.
  • Extensively used Informatica client Tools Source Analyzer, Warehouse designer, Mapping designer, Mapplets Designer, Transformation Developer.
  • Expertise in working in Teradata systems and used utilities like Multiload, Fast load, Fast export, BTEQ, TPump, Teradata SQL
  • Involved in conducting trainings to user on interact, filter, sort and customize views on an existing visualization generated thru Tableau desktop.
  • Utilized advance features of Tableau software like to link data from different connections together on one dashboard and to filter data in multiple views at once.
  • Extensively used Tabadmin and Tabcmd commands in creating backups and restoring backups of Tableau repository.
  • Wrote Python scripts to parse XML documents and load teh data in database.
  • Built models using Statistical techniques like Bayesia HMM and Machine Learning classification models like XGBoost, SVM, and Random Forest.
  • Worked on different data formats such as JSON, XML and performed Machine Learning algorithms in Python.
  • Transforming and merging all teh weekly client data into yearly file using ETL SSIS
  • Used Visual Team Foundation server for version control, source control and reporting.
  • KT with teh client to understand their various Data Management systems and understanding teh data.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Creating meta-data and data dictionary for teh future data use/ data refresh of teh same client.
  • Structuring teh Data Marts to store and organize teh customer's data.
  • Used pandas, NumPy, seaborn, matplotlib, Scikit-learn, SciPy, NLTK in Python for developing various Machine learning algorithms.
  • Mapping flow of trade cycle data from source to target and documenting teh same.
  • Performing QA on teh data extracted, transformed and exported to excel.

Environment: Hadoop, Oracle 11g, MS Office, SSMS, SSIS, Power BI, Microsoft reporting tools, Big Data,Machine Learning,Tableau.

Confidential, SanDiego, CA

Data Scientist

Responsibilities:

  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Used pandas, NumPy, seaborn, SciPy, matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms.
  • Installed and used CaffeDeep Learning Framework
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Utilized Spark, Scala, Hadoop, HQL, VQL, oozie, PySpark, Data Lake, TensorFlow, HBase, Cassandra, Redshift, MongoDB, Kafka, Kinesis, Spark Streaming, Edward, CUDA, MLLib, AWS, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Created ETL jobs for teh PDAC and DME data ware houses and used history preserve, map operations, table comparison transforms to CDC changes.
  • Redesigned broken ETL processes using Informatica to load data from heterogeneous sources to target Oracle and DB2 Warehouse database.
  • Developed story telling dashboards in Tableau Desktop and published them on to Tableau Server which allowed end users to understand teh data on teh fly with teh usage of quick filters for on demand needed information.
  • Extracted, Transformed and Loaded OLTP data into teh Staging area and Data Warehouse using Informatica mapping and complex transformations (Aggregator, Joiner, Lookup, Normalizer, Filter, Sorter etc.).
  • Involved in publishing of various kinds of live, interactive data visualizations, dashboards, reports and workbooks from Tableau Desktop to Tableau servers.
  • Apply different Machine Learning algorithms/methods on data sets to predict credit risk, fraud detection, customer churn, and target marketing.
  • Identify and assess available Machine Learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
  • Worked as Data Architects and IT Architects to understand teh movement of data and its storage and ER Studio 9.7.
  • Extracted and loaded data using Python scripts and PL/SQL packages.
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, Power BI and Smart View.
  • Implemented Agile Methodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with teh acquisition of Identity Systems.
  • Used MLLib, Spark's Machine learning library to build and evaluate different models.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and MapReduce concepts.
  • As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
  • Programmed a utility in Python that used multiple packages (SciPy, NumPy, pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Used Teradata15 utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Experience in Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Flume including their installation and configuration.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.

Environment: regression, logistic regression, Hadoop, Teradata, OLTP, Unix, Python, MLLib, SAS, random forest, OLAP, HDFS, NLTK, SVM, JSON,Machine Learning,and XML.

Confidential - Boston, MA

Data Scientist

Responsibilities:

  • Review and determine risk profiles of data based on metadata and underlying data elements
  • Worked on Developing Fraud Detection Platform using various machine learning algorithms in Python
  • Worked on Linear discriminant analysis, Greedy Forward Selection, Greedy Backward Selection and Feature reduction algorithms like TEMPPrincipal Component Analysis (PCA) and Factor Analysis
  • Manipulating and Cleaning data using missing value treatment in Pandas and performed standardization
  • Implemented Classification using Supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes
  • Worked on customer segmentation using an unsupervised learning technique - clustering
  • Performed Exploratory Data Analysis and Data Visualizations using Python
  • Strong skills in data visualization like matplotlib and seaborn library
  • Create different charts such as Heat maps, Bar charts, Line charts etc.,
  • Experienced in working with SVM-Kernel method like RBF, polynomial, linear
  • Implemented Ensemble models like Boosting and Bagging
  • Worked with cross validation technique and grid search to improve project model results

Environment: Python, SQL, Microsoft Excel

Confidential, Overland Park, KS

Data Analyst

Responsibilities:

  • Generated cost-benefit analysis to quantify teh model implementation comparing with teh former situation
  • Worked on model selection based on confusion matrices, minimized teh Type II error
  • Worked on data cleaning and reshaping, generated segmented subsets using Numpyand Pandas in Python
  • Wrote and optimized complex SQL queries involving multiple joins and advanced analytical functions to perform data extraction and merging from large volumes of historical data stored in Oracle 11g, validating teh ETL processed data in target database
  • Applied various machine learning algorithms and statistical modeling like decision tree, logistic regression, Gradient Boosting Machine to build predictive model using scikit-learn package in Python
  • Developed Python scripts to automate data sampling process. Ensured teh data integrity by checking for completeness, duplication, accuracy, and consistency
  • Identified teh variables that significantly affect teh target
  • Continuously collected business requirements during teh whole project life cycle.
  • Conducted model optimization and comparison using stepwise function based on AIC value
  • Generated data analysis reports using Matplotlib, Tableau, successfully delivered and presented teh results for C-level decision makers

Environment: SAP ERP, SCM APO GATP Software, User acceptance testing (UAT), Rational clear case, Clear quest, Use cases, UML, MS Office, Requisite Pro.

Confidential

Data Modeler

Responsibilities:

  • Worked with large amounts of structured and unstructured data.
  • Knowledge in Machine Learning concepts (Generalized Linear models, Regularization, Random Forest, Time Series models, etc.)
  • Worked in Business Intelligence tools and visualization tools such as Business Objects, Tableau, ChartIO, etc.
  • Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX.
  • Configured teh project on WebSphere 6.1 application servers
  • Implemented teh online application by using Core Java, Jdbc, JSP, Servlets and EJB 1.1, Web Services, SOAP, WSDL.
  • Handled end-to-end project from data discovery to model deployment.
  • Monitoring teh automated loading processes.
  • Communicated with other Health Care info by using Web Services with teh halp of SOAP, WSDLJAX-RPC
  • Used Singleton, factory design pattern, DAO Design Patterns based on teh application requirements
  • Used SAX and DOM parsers to parse teh raw XML documents
  • Used RAD as Development IDE for web applications.
  • Preparing and executing Unit test cases
  • Used Log4J logging framework to write Log messages with various levels.
  • Involved in fixing bugs and minor enhancements for teh front-end modules.
  • Implemented Microsoft Visio and Rational Rose for designing teh Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of teh application
  • Doing functional and technical reviews
  • Maintenance in teh testing team for System testing/Integration/UAT.
  • Guaranteeing quality in teh deliverables.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Was a part of teh complete life cycle of teh project from teh requirements to teh production support
  • Created test plan documents for all back-end database modules
  • Implemented teh project in Linux environment.

Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS

Confidential

Data Analyst

Responsibilities:

  • Performed data profiling in teh source systems that are required for New Customer Engagement (NCE) Data mart.
  • Document teh complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Manipulating, cleansing & processing data using Excel, Access and SQL.
  • Responsible for loading, extracting and validation of client data.
  • Liaising with end-users and 3rd party suppliers. Analyzing raw data, drawing conclusions & developing recommendations writing SQLscripts to manipulate data for data loads and extracts.
  • Developing data analytical databases from complex financial source data. Performing daily system checks. Data entry, data auditing, creating data reports & monitoring all data for accuracy. Designing, developing and implementing new functionality.
  • Monitoring teh automated loading processes. Advising on teh suitability of methodologies and suggesting improvements.
  • Involved in defining teh source to target data mappings, business rules, and business and data definitions.
  • Responsible for defining teh key identifiers for each mapping/interface.
  • Coordinate with teh business users in providing appropriate, TEMPeffective and efficient way to design teh new reporting needs based on teh user with teh existing functionality.
  • Document data quality and traceability documents for each source interface.
  • Designed and implemented data integration modules for Extract/Transform/Load (ETL) functions.
  • Documented teh complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Worked with internal architects and, assisting in teh development of current and target state data architectures.
  • Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.

Environment: SQL/Server, Oracle10 &11g, MS-Office, Netezza, Teradata, Enterprise Architect, Informatica Data Quality, ER Studio, TOAD, Business Objects, Green plum Database, PL/SQL.

We'd love your feedback!