We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00/5 (Submit Your Rating)

Sunnyvale, CA

SUMMARY

  • Over 8+ years of Professional QualifiedData Scientist/Data Analyst in Data Science and Analytics including Machine Learning, Data Mining and Statistical Analysis
  • Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks.
  • Implemented Bagging and Boosting to enhance the model performance.
  • Extensively worked on Python 3.5(NumPy, Pandas, Matplotlib, NLTK and Scikit - learn)
  • Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0 Jupiter Notebook 4.X, R 3.0 (ggplot2, Caret, dplyr) and Excel
  • Involved in the entire data science project life cycle and actively involved in all the phases including data extraction, data cleaning, statistical modeling and data visualization with large data sets of structured and unstructured data
  • Experienced with machine learning algorithm such as logistic regression, random forest, XGBoost,KNN, SVM, neural network, linear regression, lasso regression and k-means
  • Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQLServer 2008, NoSQL databases like MongoDB 3.2
  • Strong experience in Big Data technologies like Spark 1.6, SparkSQL, PySpark, Hadoop 2.X, HDFS, Hive 1.X
  • Proficient in Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution and other advanced statistical and econometric techniques.
  • Experienced in Machine Learning and Statistical Analysis with Python Scikit-Learn.
  • Worked with complex applications such as R, SAS, MATLAB and SPSS to develop neural network, cluster analysis.
  • Experience in visualization tools like, Tableau 9.X, 10.X for creating dashboards
  • Excellent understanding Agile and Scrum development methodology
  • Used the version control tools like Git 2.X
  • Passionate about gleaning insightful information from massive data assets and developing a culture of sound, data-driven decision making
  • Ability to maintain a fun, casual, professional and productive team atmosphere
  • Skilled in Advanced Regression Modeling, Correlation, Multivariate Analysis, Model Building, Business Intelligence tools and application of Statistical Concepts.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data.
  • Strong SQL programming skills, with experience in working with functions, packages and triggers.
  • Experienced in Visual Basic for Applications and VB programming languages to work with developing applications.
  • Worked with NoSQL Database including HBase, Cassandra and MongoDB.
  • Experienced in Big Data with Hadoop, HDFS, MapReduce, and Spark.
  • Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio SSIS, SSAS, SSRS.
  • Proficient in Tableau and R-Shiny data visualization tools to analyze and obtain insights into large datasets, create visually powerful and actionable interactive reports and dashboards.
  • Automated recurring reports using SQL and Python and visualized them on BI platform like Tableau.
  • Worked in development environment like Git and VM.
  • Experienced the full software life cycle in SDLC, Agile and Scrum methodologies.
  • Excellent communication skills (verbal and written) to communicate with clients and team, prepare + deliver effective presentations.

TECHNICAL SKILLS

BigData/Hadoop Technologies: Hadoop, HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Storm, Drill, Zookeeper and Oozie

Languages: C, C++, HTML5, DHTML, WSDL, css3 XML, R/R Studio, SAS Enterprise Guide, SAS R,R (Caret, Weka, ggplot),Perl, MATLAB, Mathematica, FORTRAN, DTD, Schemas, JSON, Ajax, Java, Scala, Python (NumPy, SciPy, Pandas, Gensim, Keras), SQL, PL/SQL, Pig Latin, HiveQL, Java Script, Shell Scripting.

Java & J2EE Technologies: Core Java, JSP, Servlets, JDBC, JAAS, JNDI, Hibernate, Spring, Struts, JMS, EJB, RESTful

Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.

Cloud Computing Tools: Amazon AWS

Databases: Microsoft SQL Server 2008 … MySQL 4.x/5.x, Oracle 10g, 11g, 12c, DB2, Teradata, Netezza

NO SQL Databases: HBase, Cassandra, MongoDB, MariaDB

Build Tools: Jenkins, Maven, ANT, Toad, SQL Loader, RTC, RSA, Control- Confidential, Oozie, Hue, SOAP UI

Business Intelligence Tools: Tableau, Tableau server, Tableau Reader, Splunk, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse

Development Tools: Microsoft SQL Studio, Eclipse, NetBeans, IntelliJ

Development Methodologies: Agile/Scrum, Waterfall, UML, Design Patterns

Version Control Tools and Testing: API Git, SVM, GitHub, SVN and JUNIT

ETL Tools: Informatica Power Centre, SSIS

Reporting Tools: MS Office (Word/Excel/Power Point/ Visio/Outlook), Crystal reports XI, SSRS, Cognos7.0/6.0.

Data Modelling Tools: Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, Oracle Designer, SAP Power designer, Enterprise Architect.

Operating Systems: All versions of UNIX, Windows, LINUX, Macintosh HD, Sun Solaris

PROFESSIONAL EXPERIENCE

Confidential, Sunnyvale,CA

Data Scientist

Responsibilities:

  • Responsible for preparing data and exploratory analysis for machine learning to develop models
  • Created standard data summaries, extracted subset of data and split data and created data partitions
  • Created various types of data visualizations using R and Tableau
  • Involved in extracting data from source to HDFS, preparing data for exploratory analysis using data munging.
  • Segmenting data by implementing k-means algorithm. Created visualization using R
  • Gathered requirements for various data mining projects
  • Involved in loading data from Hive and imported to R for data analysis and visualization
  • Responsible for data identification, collection, exploration & cleaning for modeling, participate in model development
  • Visualize, interpret, report findings and develop strategic uses of data.
  • Exploratory analysis and model building to develop predictive insights
  • Ground up Data understanding, Hypothesis formulation, data preparation and model building experience.
  • Accountable for business requirements gathering process and converting them into functional and technical requirements (HLD's, LLD's)
  • Interacted with the other departments to understand and identify data needs and requirements and work with other members of the IT organization to deliver data visualization and reporting solutions to address those needs
  • Responsible for providing reporting, analysis and insightful recommendations to business leaders on key performance metrics pertaining to sales & marketing.
  • Design, develop and produce reports that connect quantitative data to insights that drive and change business.
  • Used Natural Language Processing (NLP) for response modeling and fraud detection efforts for credit cards
  • Identified, analyzed and interpreted trends or patterns in complex data sets using data mining tools
  • Help create and design reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior
  • Statistical data Analysis, modeling/machine learning, data visualization and reporting of big data related to digital advertising.
  • Used Pig and Hive to retrieve the data from Hadoop cluster, PostgreSQL to retrieve data from PostgreSQL and Greenplum distributed database, Sql to retrieve data from Oracle database for data Analysis.
  • Used R, SAS to do Exploratory Data Analysis, A/B testing, Anova test, Hypothesis test to compare and identify the effectiveness of Creative Email Campaigns. Created clusters to classify and group campaigns to form Control and test groups.
  • Used Python, R, SAS, SQL to create machine learning algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Matrix factorization models, Bayes collaborative models to target users with email campaigns and native Ads.
  • Have done Text analytics on historical email subject lines to retrieve effective Keywords and suggest them to creative team to create new subject line that would increase open rates and delivery rates.
  • Analyzed the email user click history and third-party data for pattern recognition and to support and change targeting algorithms
  • Supported client by developing Machine Learning Algorithms on Big Data using PySpark to analyze transaction fraud, Cluster Analysis etc.
  • Perform ad hoc custom analysis as needed using SQL and R.
  • Created Data Quality Scripts using SQL and PIG to validate successful data load and quality of the data.
  • Have done Time series analysis using R on channels historical Revenue to forecast weekly, monthly and quarterly Revenue.
  • Created Revenue optimization algorithm to divert click traffic to different advertiser throughout the day to maximize Revenue.

Environment: R, R-Studio, CDH5, HDFS, Hadoop, Pig, Hive, Impala, Sqoop, LINUX, R, Tableau Desktop, Tableau Server, Unix Shell scripting, Python, Perl, PIG, SQL, MySQL, R, SAS, Tableau, SQL Server, Microsoft Excel

Confidential, Dallas, Texas

Data Scientist

Responsibilities:

  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Performed Data Profiling to learn about behaviour with various features such as traffic pattern, location, time, Date and Time etc.
  • Application of various machine learning algorithms and statistical modeling like decision trees, regression models, neural networks, SVM, clustering to identify Volume using Scikit-learn package in python, MATLAB.
  • Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources. Used clustering technique K-Means to identify outliers and to classify unlabelled data.
  • Determined customer satisfaction and helped enhance customer experience using NLP
  • Performed data visualization with Tableau and D3.js, and generated dashboards to present the findings
  • Recommended and evaluated marketing approaches based on quality analytics of customer consuming behaviour
  • Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection.
  • Analyse traffic patterns by calculating autocorrelation with different time lags.
  • Ensured that the model has low False Positive Rate.
  • Addressed overfitting by implementing of the algorithm regularization methods like L2 and L1.
  • Used Principal Component Analysis in feature engineering to analyse high dimensional data.
  • Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behaviour.
  • Performed Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
  • Performed data analysis by using Hive to retrieve the data from Hadoopcluster, Sql to retrieve data from Oracle database.
  • Used Python and Spark to implement different machine learning algorithms including Generalized Linear Model, SVM, Random Forest, Boosting and Neural Network
  • Used MLlib, Spark's Machine learning library to build and evaluate different models.
  • Implemented rule based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
  • Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
  • Communicated the results with operations team for taking best decisions.
  • Collected data needs and requirements by Interacting with the other departments.
  • Developed MapReduce pipeline for feature extraction using Hive.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.

Environment: Python 2.x, CDH5, HDFS, Hadoop 2.3, Hive, Impala, Linux, Spark, Tableau Desktop, SQL Server 2012, Microsoft Excel, MATLAB, Spark SQL, PySpark.

Confidential

Data Scientist

Responsibilities:

  • Involved in detail designing of data marts by using Star Schema and plan data marts involving shared dimensions.
  • Conducted one-to-one sessions with business users to gather data for Data Warehouse requirements.
  • Part of team analyzing database requirements in detail with the project stakeholders through Joint Requirements Development (JRD) sessions.
  • Developed an Object modeling in UML for Conceptual Data Model using Enterprise Architect.
  • Developed logical and Physical data models using Erwin to design OLTP system for different applications.
  • Facilitated transition of logical data models into the physical database design and recommended technical approaches for good data management practices.
  • Worked with DBA group to create Best-Fit Physical Data Model with DDL from the Logical Data Model using Forward engineering.
  • Worked with the ETL team to document the transformation rules for data migration from OLTP to Warehouse environment for reporting purposes.
  • Extensive system study, design, development and testing were carried out in the Oracle environment to meet the customer requirements.
  • Developed Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW)
  • Used Teradata utilities such as Fast Export, Multi LOAD for handling various tasks.
  • Involved in migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Teradata.
  • Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables
  • Created entity process association matrices using Zachman Framework, functional decomposition diagrams and data flow diagrams from business requirements documents.
  • Used Model Manager Option in Erwin to synchronize the data models in Model Mart approach.
  • Gather various reporting requirements from Business Analysts.
  • Worked on enhancements to the Data Warehouse model using Erwin as per the business reporting requirements.
  • Hands on experience in implementing Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.
  • Performed K-means clustering, Multivariate analysis, and Support Vector Machines in R.
  • Written complex Hive and SQL queries for data analysis to meet business requirements.
  • Written complex SQL queries for implementing business requirements
  • Reverse Engineering the reports and identified Data Elements (in the source system) . Dimensions, Facts and Measures required for reports.
  • Developed data mapping documents between Legacy, Production, and User Interface Systems.
  • Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis.
  • Developed and maintained Data Dictionary to create Metadata Reports for technical and business purpose using Erwin report designer.
  • Generated ad-hoc repots using Crystal Reports 9and SQL Server Reporting Services (SSRS).

Environment: Erwin r9.5, DB2, Teradata, SQL-Server2008, Informatica 9.1, Enterprise Architect, Power Designer, MS SSAS, Crystal Reports, SSRS, ER Studio, Lotus Notes, Windows XP, MS Excel, word and Access.

Confidential - Alpharetta, GA

Data Scientist

Responsibilities:

  • Using Data mining models techniques to get satisfactory demanding products for customers.
  • Collected data from various databases and cleaning data for statistical analysis and model.
  • Business Reporting Requirements Analysis: - Interacted with clients, participated in requirement gathering and system analysis.
  • Acted as single point of contact between Organization Management and appropriate client or groups from solution planning, sizing, to fulfillment.
  • Performed time series forecasting to see future effects of business projections.
  • Responsible in analyzing and extracting relevant information from large data sets.
  • Involved in Data preparation over multiple iterations with inputs from senior analysts for the problem at hand.
  • Identified, analyzed, and interpreted trends within data, investigated divergences, and developed recommendations to the leadership team.
  • Worked closely with cross functional teams to encourage statistical best practices with respect to experimental design and data analysis.
  • Developed scripts and ad-hoc tests to ascertain data validity and correct attribute calculation.
  • Performed statistical and predictive analysis on corporate market data to identify trends, buy-sell opportunities.
  • Optimized data access by efficient SQL coding for high throughput and low latency.
  • Executed rich reports after close of business to provide users instant access to last day reports.
  • Performed correlation and time-series analysis to recommend pairs trading strategies to management.
  • Performed Data Mining, Data Analytics, Statistical Modelling and Predictive Analytics.
  • Used clustering and K-nn algorithms to categorize business expenses, improving budget.
  • Generated ad-hoc or management specific reports using SSRS, SPSS, and Excel.
  • Developed and implemented customer data management system.
  • Contributed to team effort by accomplishing related results as needed.
  • Closely monitored the operating and financial results against plans and budgets.
  • Used R to develop regression modeling for data analysis.
  • Increased pace & confidence of learning algorithm by combining statistical methods; provided expertise and assistance in integrating advanced analytics into ongoing business processes.
  • Parsed data, producing concise conclusions from raw data in a clean, well-structured and easily maintainable format.
  • Interpreted complex simulation data using statistical methods.
  • Designed various reports using Pivot - tables, and Pivot chart charts like Bar, Pie, Line etc.
  • Putting in place adequate operational planning and financial control systems.
  • Found insights to identify interesting patterns that could help client understand the business better.
  • Established strategic goals by gathering pertinent business, financial, service, and operations information.

Environment: Microsoft SQL Server, SSIS, SSRS, SPSS, MS Excel, MS Access, Power Point, Python, C#, Crystal Reports, RStudio, Statistical Analysis, Machine Learning.

Confidential

Data Architect/Data Modeler

Responsibilities:

  • Communicated with other Health Care info by using Web Services with the help of SOAP, WSDL JAX-RPC
  • Used Singleton, factory design pattern, DAO Design Patterns based on the application requirements
  • Used SAX and DOM parsers to parse the raw XML documents
  • Used RAD as Development IDE for web applications.
  • Preparing and executing Unit test cases
  • Used Log4J logging framework to write Log messages with various levels.
  • Involved in fixing bugs and minor enhancements for the front-end modules.
  • Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX
  • Configured the project on WebSphere 6.1 application servers
  • Implemented the online application by using Core Java, Jdbc, JSP, Servlets and EJB 1.1, Web Services, SOAP, WSDL
  • Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
  • Maintenance in the testing team for System testing/Integration/UAT
  • Guaranteeing quality in the deliverables.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Was a part of the complete life cycle of the project from the requirements to the production support
  • Created test plan documents for all back-end database modules
  • Implemented the project in Linux environment.

Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLlib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

Confidential

Data Analyst

Responsibilities:

  • Designed, Build the Dimensions, cubes with star schema and Snow Flake Schema using SQL Server Analysis Services (SSAS).
  • Participated in JAD session with business users and sponsors to understand and document the business requirements in alignment to the financial goals of the company.
  • Involved in analysis of Business requirement, Design and Development of High level and Low level designs, Unit and Integration testing
  • Performed data analysis and data profiling using complex SQL on various sources systems including Teradata, SQL Server.
  • Developed the logical data models and physical data models that confine existing condition/potential status data fundamentals and data flows using ER Studio
  • Created the conceptual model for the data warehouse using Erwin data modeling tool.
  • Reviewed and implemented the naming standards for the entities, attributes, alternate keys, and primary keys for the logical model.
  • Performed second and third normalizations for ER data model of OLTP system
  • Worked with data compliance teams, Data governance team to maintain data models, Metadata, Data Dictionaries; define source fields and its definitions.
  • Translate business and data requirements into Logical data models in support of Enterprise Data Models, ODS, OLAP, OLTP, Operational Data Structures and Analytical systems.
  • Design and model the reporting data warehouse considering current and future reporting requirement
  • Involved in the daily maintenance of the database that involved monitoring the daily run of the scripts as well as troubleshooting in the event of any errors in the entire process.
  • Worked with Data Scientist in order to create a Data marts for data science specific functions.
  • Determined data rules and conducted Logical and Physical design reviews with business analysts, developers and DBAs.
  • Used External Loaders like Multi Load, T Pump and Fast Load to load data into Oracle and Database analysis, development, testing, implementation and deployment.
  • Reviewed the logical model with application developers, ETL Team, DBAs and testing team to provide information about the data model and business requirements.

Environment: Erwin r7.0, Informatica 6.2, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro, Hadoop, PL/SQL

We'd love your feedback!