We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00/5 (Submit Your Rating)

Farmington, MI

SUMMARY:

  • Over 8 years of experienceworking as a Data Scientist/Data Analyst/Data Modeling with emphasis on Data Mapping, Data Validation in Data Warehousing Environment.
  • Extensively experienced in business intelligence (and BI technologies) tools such as OLAP, Data warehousing, reporting, and querying tools, Data mining and Spreadsheets
  • Worked on a different type of Python modules such as requests, Boto, flake8, flask, mock and nose
  • Efficient in developing Logical and Physical Data model and organizing data as per the business requirements using Sybase Power Designer, Erwin, ER Studio in both OLTP and OLAP applications
  • Strong understanding of when to use an ODS or data mart or data warehousing.
  • Experienced in employing R Programming, MATLAB, SAS, Tableau,and SQL for data cleaning, data visualization, risk analysis, and predictive analytics
  • Adept at using SAS Enterprise suite, R, Python, and Big Data related technologies including Hadoop, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, Map - Reduce and Cloudera Manager for the design of business intelligence applications
  • Ability to provide wing-to-wing analytic support including pulling data, preparing analysis, interpreting data, making strategic recommendations and presenting to client/product teams.
  • Hands-onexperience with Machine Learning, Regression Analysis, Clustering, Boosting, Classification, Principal Component Analysis, and Data Visualization Tools
  • Strong programming skills in a variety of languages such as Python and SQL.
  • Familiarity with Crystal Reports, and SSRS - Query, Reporting, Analysis and Enterprise Information Management
  • Excellent knowledge on creating reports on Pentaho Business Intelligence.
  • Experienced in Database using Oracle, XML, DB2, Teradata15/14, Netezza, server, Big Data and NoSQL.
  • Worked with engineering teams to integrate algorithms and data into Return Path solutions
  • Worked closely with other data scientists to create data-driven products
  • Strong experienced in Statistical Modeling/Machine Learning and Visualization Tools
  • Experienced in working with a large-scale data set
  • Expert at Full SDLC processes involving Requirements Gathering, Source Data Analysis, Creating Data Models, and Source to target data mapping, DDL generation, performance tuning for data models.
  • Extensively used the Agile methodology as the Organization Standard to implement the Data Models
  • Experienced with machine learning tools and libraries such as Scikit-learn, R, Spark,and Weka
  • Experienced working with large, real-world data - big, messy, incomplete, full of errors
  • Hands-on experienced with NLP, mining of structured, semi-structured, and unstructured data
  • Experienced and in-depth knowledge of the SAS Enterprise Miner, Python programming language

TECHNICAL SKILLS:

Big Data/Hadoop Technologies: Hadoop, HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Zookeeper, and Oozie

Languages: C, C++, HTML5, DHTML, WSDL, css3 XML, R/R Studio, SAS Enterprise Guide, SAS R, R (Caret, Weka, ggplot), Python (NumPy, SciPy, Pandas), SQL, PL/SQL, Pig Latin, HiveQL, Shell Scripting.

Cloud Computing Tools: Amazon AWS

Databases: Microsoft SQL Server 2008 MySQL 4.x/5.x, Oracle 10g, 11g, 12c, DB2, Teradata, Netezza

NO SQL Databases: HBase, Cassandra, MongoDB, MariaDB

Build Tools: Maven, ANT, Toad, SQL Loader, RTC, RSA, Control-M, Oozie, Hue, SOAP UI

Development Tools: Microsoft SQL Studio, Eclipse, NetBeans, IntelliJ

Development Methodologies: Agile/Scrum, Waterfall, UML, Design Patterns

Version Control Tools and Testing: API Git, SVM, GitHub, SVN and JUNIT

ETL Tools: Informatica Power Centre, SSIS

Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos7.0/6.0.

Operating Systems: All versions of UNIX, Windows, LINUX, Macintosh HD, Sun Solaris

PROFESSIONAL EXPERIENCE:

Confidential- Farmington, MI

Data Scientist

Responsibilities:

  • Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
  • Responsible for design and development of advanced R/Python programs to prepare to transform and harmonize data sets in preparation for modeling.
  • Developed large data sets from structured and unstructured data. Perform data mining.
  • Partnered with modelers to develop data frame requirements for projects.
  • Performed Ad-hoc reporting/customer profiling, segmentation using R/Python.
  • Tracked various campaigns, generating customer profiling analysis and data manipulation.
  • Provided R/SQL programming, with detailed direction, in the execution of data analysis that contributed to the final project deliverables. Responsible for data mining.
  • Analyzed large datasets to answer business questions by generating reports and outcome- driven marketing strategies.
  • Used Python to apply time series models, the fast growth opportunities for our clients.
  • Involved in fixing bugs and minor enhancements for the front-end modules.
  • Class model, Sequence diagrams, and Activity diagrams for SDLC process of the applications.
  • Maintenance in the testing team for System testing/Integration/UAT.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Was a part of the complete life cycle of the project from the requirements to the production support.
  • Involved in loading data from RDBMS and weblogs into HDFS using Sqoop and Flume.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Developed Hive queries for Analysis across different banners.
  • Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to the database the existing system.

Environment: Spark, Hadoop, R3.0, Erwin9.5, Tableau8.0, MDM, MLLib, PL/SQL, HDFS, Teradata14.1, JSON, MapReduce, MySQL, Spark, R Studio.

Confidential - Collegeville, PA

Data Scientist

Responsibilities:

  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, time, Date and Time etc.
  • Application of various machine learning algorithms and statistical Modeling like decision trees, regression models, neural networks, SVM, clustering to identify Volume using Scikit-learn package in python, MATLAB .
  • Developed Spark/Scala , Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources. Used clustering technique K-Means to identify outliers and to classify unlabeled data.
  • Determined customer satisfaction and helped enhance customer experience using NLP
  • Performed data visualization with Tableau and D3.js , and generated dashboards to present the findings
  • Recommended and evaluated marketing approaches based on quality analytics of customer consuming behavior
  • Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection.
  • Worked with sales and Marketing team for Partner and collaborate with a cross-functional team to frame and answer important data questions prototyping and experimenting ML/DL algorithms and integrating into production system for different business needs.
  • Researched the existing client processes and guided the team in aligning with the HIPAA rules and regulations for the systems for all the EDI transaction sets.
  • Consulted with the healthcare insurance company to develop conversion specifications for other insurance Coordination of Benefits (including Medicare).
  • Analyze traffic patterns by calculating autocorrelation with different time lags.
  • Ensured that the model has a low False Positive Rate.
  • Addressed overfitting by implementing the algorithm regularization methods like L2 and L1 .
  • Used Principal Component Analysis in feature engineering to analyze high dimensional data.
  • Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior.
  • Performed Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
  • Performed data analysis by using Hive to retrieve the data from Hadoopcluster , SQL to retrieve data from Oracle database.
  • Used Python and Spark to implement different machine learning algorithms including Generalized Linear Model, SVM, Random Forest, Boosting and Neural Network
  • Used MLLib, Spark's Machine learning library to build and evaluate different models.
  • Implemented a rule-based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
  • Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
  • Communicated the results with operations team for taking best decisions.
  • Collected data needs and requirements by Interacting with the other departments.
  • Developed MapReduce pipeline for feature extraction using Hive .
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau .

Environment: Python 2.x, CDH5, HDFS, Hadoop 2.3, Hive, Impala, Linux, Spark, Tableau Desktop, SQL Server 2012, Microsoft Excel, MATLAB, Spark SQL, PySpark.

Confidential, Pennsylvania

Data Scientist.

Responsibilities:

  • Developed applications of Machine Learning, Statistical Analysis,and Data Visualizations with challenging data Processing problems in sustainability and biomedical domain.
  • The goal is to identify the subtypes in autism for the development of targeted and more effective therapies.
  • We used hierarchical clustering methods to identify the clusters in the data based on some important features,further analysis to identify the most significant brain volumes is underway.
  • Compiled data from various sources public and private databases to perform complex analysis and data manipulation for actionable results.
  • Designed and developed Natural Language Processing models for sentiment analysis.
  • Worked on Natural Language Processing with NLTK module of python for application development for automated customer response.
  • Applied concepts of probability, distribution, and statistical inference on the given dataset to unearth interesting findings through the use of comparison, T-test, F-test, R-squared, P-value etc.
  • Applied linear regression, multiple regression, ordinary least square method, mean-variance, the theory of large numbers, logistic regression, dummy variable, residuals, Poisson distribution, Bayes, Naive Bayes, fitting function etc to data with help of Scikit, SciPy, NumPy and Pandas module of Python.
  • Applied clustering algorithms i.e.Hierarchical, K-means with help of Scikit and SciPy.
  • Developed visualizations and dashboards using ggplot, Tableau
  • Worked on development of data warehouse, Data Lake and ETL systems using relational and non-relational tools like SQL, No SQL.
  • Built and analyzed datasets using R, SAS, MATLAB,and Python (in decreasing order of usage).
  • Applied linear regression in Python and SAS to understand the relationship between different attributes of the dataset and causal relationship between them
  • Performs complex pattern recognition of financial time series data and forecast of returns through the ARMA and ARIMA models and exponential smoothening for multivariate time series data
  • Used ClouderaHadoop YARN to perform analytics on data in Hive.
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Expertise in Business Intelligence and data visualization using R and Tableau.
  • Expert in Agile and Scrum Process.
  • Validated the Macro-Economic data (e.g. BlackRock, Moody's etc.) and predictive analysis of world markets using key indicators in Python and machine learning concepts like regression, Bootstrap Aggregation and Random Forest.
  • Worked in large-scale database environments like Hadoop and MapReduce, with working mechanism of Hadoop clusters, nodes and Hadoop Distributed File System (HDFS).

Environment: AWS, MS Azure, Cassandra, Spark, HDFS, Hive, Pig, Linux, Python (Scikit-Learn/SciPy/NumPy/Pandas), R, SAS, SPSS, MySQL, Eclipse, PL/SQL, SQL connector.

Confidential - IN

Data Analyst.

Responsibilities:

  • Designed, Build the Dimensions, cubes with star schema and Snow Flake Schema using SQL Server Analysis Services (SSAS).
  • Participated in JAD session with business users and sponsors to understand and document the business requirements in alignment with the financial goals of the company.
  • Involved in the analysis of Business requirement, Design, and Development of the High level and Low-level designs, Unit, and Integration testing
  • Performed data analysis and data profiling using complex SQL on various sources systems including Teradata, SQL Server.
  • Developed the logical data models and physical data models that confine existing condition/potential status data fundamentals and data flows using ER Studio
  • Reviewed and implemented the naming standards for the entities, attributes, alternate keys, and primary keys for the logical model.
  • Performed second and third normalizations for ER data model of an OLTP system
  • Worked with data compliance teams, Data governance team to maintain data models, Metadata, Data Dictionaries; define source fields and its definitions.
  • Translate business and data requirements into Logical data models in support of Enterprise Data Models, ODS, OLAP, OLTP, Operational Data Structures and Analytical systems.
  • Design and model the reporting data warehouse considering current and future reporting requirement
  • Involved in the daily maintenance of the database that involved monitoring the daily run of the scripts as well as troubleshooting in the event of any errors in the entire process.
  • Worked with Data Scientist in order to create a Data marts for data science specific functions.
  • Determined data rules and conducted Logical and Physical design reviews with business analysts, developers, and DBAs.
  • Used External Loaders like Multi-Load, TPump and Fast Load to load data into Oracle and Database analysis, development, testing, implementation, and deployment.
  • Reviewed the logical model with application developers, ETL Team, DBAs, and testing team to provide information about the data model and business requirements.

Environment:Erwin r7.0, Informatica 6.2, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro, Hadoop, PL/SQL.

Confidential - IN

Data Analyst.

Responsibilities:

  • Used SAS Proc SQLpass-throughfacility to connect to Oracle tables and created SAS datasets using various SQL joins such as left join, right join, inner join and full join.
  • Performing data validation, transforming data from RDBMS oracle to SAS datasets.
  • Produce quality customized reports by using PROC TABULATE, PROC REPORT Styles, and ODS RTF and provide descriptive statistics using PROC MEANS, PROC FREQ, and PROC UNIVARIATE.
  • Developed SAS macros for data cleaning, reporting and to support routing processing.
  • Performed advanced querying using SAS Enterprise Guide, calculating computed columns, using a filter, manipulate and prepare data for Reporting, Graphing, and Summarization, statistical analysis, finally generating SAS datasets.
  • Involved in Developing, Debugging, and validating the project-specific SAS programs to generate derived SAS datasets, summary tables, and data listings according to study documents.
  • Created datasets as per the approved specification collaborated with project teams to complete scientific reports and review reports to ensure accuracy and clarity.
  • Performed different calculations like Quick table calculations, Date Calculations, Aggregate Calculations, String and Number Calculations.
  • Good expertise in building dashboards and stories based on the available data points.
  • Created action filters, user filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Expertise in Agile Scrum Methodology to implement project life cycles of reports design and development
  • Combined Tableau visualizations into Interactive Dashboards using filter actions, highlight actions etc. and published them to the web.
  • Created Rich dashboards using Tableau Dashboard and prepared user stories to create compelling dashboards to deliver actionable insights
  • Working with the manager to prioritize requirements and preparing reports on the weekly and monthly basis.

Environment: SQL Server, Oracle 11g/10g, MS Office Suite, PowerPivot, Power Point, SAS Base, SAS Enterprise Guide, SAS/MACRO, SAS/SQL, SAS/ODS, SQL, PL/SQL, Visio.

We'd love your feedback!