We provide IT Staff Augmentation Services!

Data Scientist Resume

Fermont, CA

PROFESSIONAL SUMMARY:

  • Over 8 years of strong experience in Data Science, Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Statastical modeling, Data modeling, Data Visualization, Web Crawling, Web Scraping. Adept in statistical programming languages like R and Python, SAS, Apache Spark, Matlab including Big Data technologies like Hadoop, Hive, Pig.
  • Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of Big Data Eco - system.
  • Experienced on data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing.
  • Deep analytics and understanding of Big Data and algorithms using Hadoop, MapReduce, NoSQL and distributed computing tools.
  • Expertise in synthesizing Machine learning, Predictive Analytics and Big data technologies into integrated solutions.
  • Experienced in Dimensional Data Modeling experience using Data modeling, Relational Data modeling, ER/ Studio, Erwin, and Sybase Power Designer, Star Join Schema/Snowflake modeling, FACT & Dimensions tables, Conceptual, Physical & logical data modeling.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.
  • Expertise in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling (decision trees, regression models, neural networks, SVM, clustering), dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K- fold cross validation and data visualization.
  • Experienced in writing Pig Latin scripts, MapReduce jobs and HiveQL.
  • Extensively used SQL, Numpy, Pandas, Scikit-learn, Spark, Hive for Data Analysis and Model building.
  • Extensively worked on ERWIN tool with all features like REVERSE Engineering, FORWARD Engineering, SUBJECTAREA, DOMAIN, Naming Standards Document etc.
  • Experience in using various packages in Rand python like ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitteR, NLP, Reshape2, rjson, plyr, pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, Beautiful Soup, Rpy2.
  • Excellent and experience and knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of Big Data Eco-system.
  • Experienced in importing and exporting the data using Sqoop from HDFS to RelationalDatabase systems/ mainframe and vice-versa.
  • Extensively worked on Sqoop, Hadoop, Hive, Spark, Cassandra to build ETL and Data Processing systems having various data sources, data targets and data formats
  • Strong experience and knowledge in Data Visualization with Tableau creating: Line and scatter plots, Bar Charts, Histograms, Pie chart, Dot charts, Box plots, Time series, Error Bars, Multiple Charts types, Multiple Axes, subplots etc.
  • Experienced with Integration Services (SSIS), Reporting Service (SSRS) and Analysis Services (SSAS)
  • Expertise in Normalization to 3NF/De-normalization techniques for optimum performance in relational and dimensional database environments.
  • Extensive experienced on ERModeling, Dimensional Modeling (StarSchema, SnowflakeSchema) and Data warehousing and OLAP tools.
  • Expertise in data base programming (SQL, PLSQL) XML, DB2, Informix, Teradata, Data base tuning and Query optimization.
  • Experience in designing, developing, scheduling reports/dashboards using Tableau and Cognos.
  • Expertise in performing data analysis and data profiling using complex SQL on various sources systems including Oracle and Teradata.

TECHNICAL SKILLS:

Database Design Tools and Data Modeling: Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball.

Databases: SQL Server 20017, MS-Access, Oracle 11g, Sybase and DB2.

Languages: PL/SQL, SQL, T-SQL, C, C++, XML, HTML, DHTML, HTTP, Matlab, Python.

Tools: and Utilities: SQL Server 2016/2017, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Visual Studio v14, .Net, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office 2007/10/13, Excel Power Pivot, Excel Data Explorer, Tableau 8/10, JIRA

Operating Systems: Microsoft Windows 8/7/XP, Linux and UNIX

PROFESSIONAL EXPERIENCE:

Confidential, Fermont, CA

Data Scientist

Responsibilities:

  • Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine Learning use cases under Spark ML and Mllib.
  • Identified areas of improvement in existing business by unearthing insights by analyzing vast amount of data using machine learning techniques.
  • Interpret problems and provides solutions to business problems using data analysis, data mining, optimization tools, and machine learning techniques and statistics.
  • Designed and developed NLP models for sentiment analysis.
  • Led discussions with users to gather business processes requirements and data requirements to develop a variety of Conceptual, Logical and Physical Data Models. Expert in Business Intelligence and Data Visualization tools: Tableau, Microstrategy.
  • Worked on machine learning on large size data using Spark and MapReduce.
  • Let the implementation of new statistical algorithms and operators on Hadoop and SQL platforms and utilized optimizations techniques, linear regressions, K-means clustering, Native Bayes and other approaches.
  • Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Data sources are extracted, transformed and loaded to generate CSV data files with Python programming and SQL queries.
  • Stored and retrieved data from data-warehouses using Amazon Redshift.
  • Worked on TeradataSQL queries, Teradata Indexes, Utilities such as Mload, Tpump, Fast load and FastExport.
  • Used Data Warehousing Concepts like Ralph Kimball Methodology, Bill Inmon Methodology, OLAP, OLTP, Star Schema, Snow Flake Schema, Fact Table and Dimension Table.
  • Refined time-series data and validated mathematical models using analytical tools like R and SPSS to reduce forecasting errors.
  • Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
  • Created Data Quality Scripts using SQL and Hive to validate successful dasta load and quality of the data. Created various types of data visualizations using Python and Tableau.

Environment: Hadoop, Map Reduce, Spark, Spark MLLib, Tableau, SQL, Excel, VBA, SAS, Matlab, AWS, SPSS, Cassandra, Oracle, MongoDB, SQL Server 2012, DB2, T-SQL, PL/SQL, XML, Tableau.

Confidential, Jacksonville, FL

Data Scientist

Responsibilities:

  • Collaborates with cross-functional team in support of business case development and identifying modeling method (s) to provide business solutions. Determines the appropriate statistical and analytical methodologies to solve business problems within specific areas of expertise.
  • Generating Data Models using Erwin9.6 and developed relational database system and involved in Logical modeling using the Dimensional Modeling techniques such as Star Schema and Snow Flake Schema.
  • Guide the full lifecycle of a Hadoop solution, including requirements analysis, platform selection, technical architecture design, application design and development, testing, and deployment
  • Consult on broad areas including data science, spatial econometrics, machine learning, information technology and systems and economic policy with R
  • Performed Datamapping between source systems to Target systems, logicaldata modeling, created classdiagrams and ERdiagrams and used SQLqueries to filter data
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
  • Used various techniques using R data structures to get the data in right format to be analyzed which is later used by other internal applications to calculate the thresholds.
  • Maintaining conceptual, logical and physical data models along with corresponding metadata.
  • Done data migration from an RDBMS to a NoSQL database, and gives the whole picture for data deployed in various data systems.
  • Developed triggers, stored procedures, functions and packages using cursors and ref cursor concepts associated with the project using PLSQL
  • Used Meta data tool for importing metadata from repository, new job categories and creating new data elements.

Environment: R, Oracle 12c, MS-SQL Server, Hive, NoSQL, PL/SQL, MS- Visio, Informatica, T-SQL, SQL, Crystal Reports 2008, Java, SPSS, SAS, Tableau, Excel, HDFS, PIG, SSRS, SSIS, Metadata.

Confidential, Santa Ana, CA

Data Scientist/Data Analyst

Responsibilities:

  • Worked on data cleaning and reshaping, generated segmented subsets using Numpy and Pandas in Python
  • Wrote and optimized complex SQL queries involving multiple joins and advanced analytical functions to perform data extraction and merging from large volumes of historical data stored in Oracle 11g, validating the ETL processed data in target database
  • Identified the variables that significantly affect the target
  • Continuously collected business requirements during the whole project life cycle.
  • Conducted model optimization and comparison using stepwise function based on AIC value
  • Applied various machine learning algorithms and statistical modeling like decision tree, logistic regression, Gradient Boosting Machine to build predictive model using scikit-learn package in Python
  • Developed Python scripts to automate data sampling process. Ensured the data integrity by checking for completeness, duplication, accuracy, and consistency
  • Generated data analysis reports using Matplotlib, Tableau, successfully delivered and presented the results for C-level decision makers
  • Generated cost-benefit analysis to quantify the model implementation comparing with the former situation
  • Worked on model selection based on confusion matrices, minimized the Type II error

Environment: Tableau 7, Python 2.6.8, Numpy, Pandas, Matplotlib, Scikit-Learn, MongoDB, Oracle 10g, SQL

Confidential

Data Analyst/Data Modeler

Responsibilities:

  • Created and maintained Logical and Physicalmodels for the data mart. Created partitions and indexes for the tables in the datamart.
  • Performed data profiling and analysis applied various data cleansing rules designed data standards and architecture/designed the relational models.
  • Developed SQLscripts for creating tables, Sequences, Triggers, views and materializedviews
  • Worked on query optimization and performance tuning using SQL Profiler and performance monitoring.
  • Developed mappings to load Fact and Dimension tables, SCD Type 1 and SCD Type 2 dimensions and Incremental loading and unit tested the mappings.
  • Utilized Erwin's forward/reverse engineering tools and target database schema conversion process.
  • Worked on creating enterprise wide Model EDM for products and services in Teradata Environment based on the data from PDM. Conceived, designed, developed and implemented this model from the scratch.
  • Involved in extensive DATA validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
  • Developed and executed load scripts using Teradata client utilities MULTILOAD, FASTLOAD and BTEQ.
  • Exporting and importing the data between different platforms such as SAS, MS-Excel.
  • Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services (SSRS)
  • Write SQLscripts to test the mappings and Developed Traceability Matrix of Business Requirements mapped to Test Scripts to ensure any Change Control in requirements leads to test case update.

Environment: DB2, Oracle SQL Developer, PL/SQL, Business Objects, Erwin, MS office suite, Windows XP, TOAD, SQL*PLUS, SQL*LOADER.

Confidential

Data Analyst

Responsibilities:

  • Designed different type of STARschemas for detailed data marts and plan data marts in the OLAP environment.
  • Developed and executed load scripts using Teradata client utilities MULTILOAD, FASTLOAD and BTEQ.
  • Responsible for development and testing of conversion programs for importing Data from text files into map Oracle Database utilizing PERL shell scripts & SQL*Loader.
  • Used Graphical Entity-Relationship Diagramming to create new database design via easy to use, graphical interface.
  • Responsible for development and testing of conversion programs for importing Data from text files into map Oracle Database utilizing PERL shell scripts &SQL*Loader.
  • Worked with the ETL team to document the Transformation Rules for Data Migration from OLTP to Warehouse Environment for reporting purposes.
  • Created SQLscripts to find dataquality issues and to identify keys, data anomalies, and data validation issues.
  • Formatting the data sets read into SAS by using Format statement in the data step as well as Proc Format.
  • Applied Business Objects best practices during development with a strong focus on reusability and better performance.
  • Developed Tableau visualizations and dashboards using Tableau Desktop.
  • Used Graphical Entity-Relationship Diagramming to create new database design via easy to use, graphical interface.
  • Co-ordinate with various business users, stakeholders and SME to get Functional expertise, design and business test scenarios review, UAT participation and validation of financial data.
  • Write SQL scripts to test the mappings and Developed Traceability Matrix of Business Requirements mapped to Test Scripts to ensure any Change Control in requirements leads to test case update.

Environment: Oracle SQL Developer, PL/SQL, Business Objects, TOAD, Tableau, Informatica, MS SQL Server, SQL*PLUS, SQL*LOADER, XML.

Hire Now