We provide IT Staff Augmentation Services!

Data Scientist Resume

2.00/5 (Submit Your Rating)

Colombus, OH

SUMMARY

  • Highly experienced Data Scientist with over 8 years experience in Data Extraction, Data Modeling, Data Wrangling, Statistical Modeling, Data Mining, Machine Learning and Data Visualization.
  • Experience in conducting Joint Application Development (JAD) sessions for requirements gathering, analysis, design and Rapid Application Development (RAD) sessions to converge early toward a design acceptable to the customer and feasible for the developers and to limit a project’s exposure to the forces of change
  • Experience in coding SQL/PL SQL using Procedures, Triggers and Packages.
  • Experience in creating functional/technical specifications, data design documents, data lineage documents based on the requirements.
  • Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K - NearestNeighbors, Bayesian, XG Boost) in Forecasting/ PredictiveAnalytics, Segmentation methodologies, Regression-basedmodels, Hypothesistesting, Factor analysis/ PCA, Ensembles.
  • Worked and extracted data from various database sources like Oracle, SQLServer, DB2, and Teradata.
  • Well experienced in Normalization & De-Normalizationtechniques for optimum performance in relational and dimensional database environments.
  • Regularly accessing JIRA tool and other internal issue trackers for the Project development.
  • Skilled in SystemAnalysis, E-R/Dimensional DataModeling, DatabaseDesign and implementingRDBMS specific features.
  • Expertise in all aspects of Software Development LifeCycle (SDLC) from requirement analysis, Design, Development Coding, Testing, Implementation, and Maintenance.
  • Experience in implementing MDM software solutions with Informatica MDM formerly, Siperian, Strong exposure to working on scheduling tools like AutoSys and Control-M.
  • Expertise in Technical proficiency in Designing, Data Modeling Online Applications, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
  • Expertise in the implementation of Core concepts of Java, JEE Technologies, JSP, Servlets, JSTL, EJB, JMS, Struts, Spring, Hibernate, JDBC, XML, WebServices, and JNDI.
  • Extensive experience working in a Test-DrivenDevelopment and Agile-ScrumDevelopment.
  • Experience in working on both windows, Linux and UNIX platforms including programming and debugging skills in UNIX Shell Scripting.
  • Domain noledge and experience in Retail, Banking and Manufacture industries.
  • Proficiency in multiple databases like Teradata, MongoDB, Cassandra, MySQL, ORACLE and MS SQL Server
  • Experience in Creating Teradata SQL scripts using OLAP functions like rank and rank () Over to improve the query performance while pulling the data from large tables
  • Proficient in Statistical Methodologies including Hypothetical Testing, ANOVA, Time Series, TEMPPrincipal Component Analysis, Factor Analysis, Cluster Analysis, Discriminant Analysis.
  • Knowledge on time series analysis using AR, MA, ARIMA, GARCH and ARCH model.
  • Worked in large scale database environment like Hadoop and MapReduce, with working mechanism of Hadoop clusters, nodes and Hadoop Distributed File System (HDFS).
  • Strong experience with Python(2.x,3.x) to develop analytic models and solutions.
  • Proficient in Python 2.x/3.x with SciPy Stack packages including NumPy, Pandas, SciPy, Python.
  • Working experience in Hadoop ecosystem and Apache Spark framework such as HDFS,MapReduce, HiveQL, SparkSQL, PySpark.
  • Very good experience and noledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.
  • Proficient in data visualization tools such as Tableau, Python Matplotlib, R Shiny to create visually powerful and actionable interactive reports and dashboards.

TECHNICAL SKILLS

Program Languages: T-SQL, PL/SQL, SQL, C, C++, XML, HTML, DHTML, HTTP, Matlab, DAX, Python

Databases: SQL Server 2014/2012/2008/2005/2000 , MS-Access,Oracle 11g/10g/9i and Teradata, big data, hadoop

DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies

Tools: and Utilities SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Visual Studio .Net, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau, JIRA

PROFESSIONAL EXPERIENCE:

Confidential

Data Scientist

Responsibilities:

  • Worked as DataArchitects and ITArchitects to understand the movement of data and its storage and ER Studio 9.7
  • Participated in all phases of datamining; datacollection, datacleaning, developingmodels, validation, visualization and performedGapanalysis.
  • Data Manipulation and Aggregation from adifferent source using Nexus, Toad, BusinessObjects, PowerBI and SmartView.
  • Implemented AgileMethodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Good noledge of HadoopArchitecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, SecondaryNameNode, and MapReduce concepts.
  • As Architect delivered various complex OLAPdatabases/cubes, scorecards, dashboards and reports.
  • Setup storage and data analysis tools in AmazonWebServices cloud computing infrastructure.
  • Used pandas, numpy, Seaborn, scipy, matplotlib, sci-kit-learn, NLTK in Python for developing various machine learning algorithms.
  • Installed and used CaffeDeepLearning Framework
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Programmed a utility in Python dat used multiple packages (scipy, numpy, pandas)
  • Implemented Classification using supervised algorithms like LogisticRegression, Decisiontrees, KNN, Naïve Bayes.
  • Independently coded new programs and designed Tables to load and test the program TEMPeffectively for the given POC's using with BigData/Hadoop.
  • Designed data models and data flow diagrams using Erwin and MSVisio.
  • Utilized ADO.Net Object Model to implement middle-tier components dat interacted with MSSQL Server 2000database.
  • Participated in AMS (AlertManagementSystem) JAVA and SYBASE project. Designed SYBASE database utilizing ERWIN. Customized error messages utilizing SP ADDMESSAGE and SP BINDMSG. Created indexes, made query optimizations. Wrote stored procedures, triggers utilizing T-SQL.
  • Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
  • Exported the result set from Hive to MySQL using Sqoop after processing the data.

Environment: R 9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Rational Rose.

Confidential, Colombus, OH

Data Scientist

Responsibilities:

  • Applied Lean Six Sigma process improvement in plant and developed Capacity Calculation systems using purchase order tracking system and improvement inbound efficiency by 23.56%.
  • Worked with Machine learning algorithms like Linear Regressions (linear, logistic etc.) SVMs, Decision trees for classification of groups and analyzing most significant variables such as FTE, Waiting times of purchase orders and Capacities available and applied process improvement techniques.
  • And calculated Process Cycle efficiency of 33.2% and identified value added and non-value added activities
  • And utilized SAS for developing Pareto Chart for identifying highly impacting categories in modules to find the work force distribution and created various data visualization charts.
  • Performed univariate, bivariate and multivariate analysis of approx. 4890 tuples using bar charts, box plots and histograms.
  • Participated in features engineering such as feature creating, feature scaling and One-Hot encoding with Scikit-learn.
  • Converted raw data to processed data by merging, finding outliers, errors, trends, missing values and distributions in the data.
  • Generated detailed report after validating the graphs using R, and adjusting the variables to fit the model.
  • Worked on Clustering and factor analysis for classification of data using machine learning algorithms.
  • Developed Descriptive statistics and inferential statistics for Logistics optimization, Average hours per job, Value throughput data to at 95% confidence interval.
  • Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration.
  • Created SQL tables with referential integrity and developed advanced queries using stored procedures and functions using SQL server management studio.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Used packages like dplyr, tidyr& ggplot2 in R Studio for data visualization and generated scatter plot and high low graph to identify relation between different variables.
  • Worked on Business forecasting, segmentation analysis and Data mining and prepared management reports defining the problem; documenting the analysis and recommending courses of action to determine the best outcomes.
  • Worked on various Statistical models like DOE, hypothesis testing, Survey testing and queuing theory.
  • Experience with risk analysis, root cause analysis, cluster analysis, correlation and optimization and K-means algorithm for clustering data into groups.
  • Coordinate with data scientists and senior technical staff to identify client's needs and document assumptions.

Environment: SQL Server 2012, Jupyter, R 3.1.2, Python, MATLAB, SSRS, SSIS, SSAS, MongoDB, HBase, HDFS, Hive, Pig, Microsoft office, SQL Server Management Studio, Business Intelligence Development Studio, MS Access.

Confidential

Data Science

Responsibilities:

  • Gatheird, analyzed, documented and translated application requirements into data models and Supports standardization of documentation and the adoption of standards and practices related to data and applications.
  • Participated in Data Acquisition with Data Engineer team to extract historical and real-time data by using Sqoop, Pig, Flume, Hive, Map Reduce and HDFS.
  • Wrote user defined functions (UDFs) in Hive to manipulate strings, dates and other data.
  • Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
  • Applied clustering algorithms i.e. Hierarchical, K-means using Scikit and Scipy.
  • Performs complex pattern recognition of automotive time series data and forecast demand through the ARMA and ARIMA models and exponential smootaning for multivariate time series data.
  • Delivered and communicated research results, recommendations, opportunities to the managerial and executive teams,and implemented the techniques for priority projects.
  • Designed, developed and maintained daily and monthly summary, trending and benchmark reports repository in Tableau Desktop.
  • Generated complex calculated fields and parameters, toggled and global filters, dynamic sets, groups, actions, custom color palettes, statistical analysis to meet business requirements.
  • Implemented visualizations and views like combo charts, stacked bar charts, pareto charts, donut charts, geographic maps, spark lines, crosstabs etc.
  • Published workbooks and extract data sources to Tableau Server, implemented row-level security and scheduled automatic extract refresh.

Environment: Machine learning(KNN, Clustering, Regressions, Random Forest, SVM,Ensemble), Linux, Python 2.x (Scikit-Learn/Scipy/Numpy/Pandas), R, Tableau (Desktop 8.x/Server 8.x), Hadoop, Map Reduce,HDFS, Hive, Pig, HBase,Sqoop, Flume,Oracle 11g, SQL Server 2012

Confidential, Irvine, California

Data Analytics

Responsibilities:

  • Good grip on Cloudera and HDP ecosystem components.
  • Used ElasticSearch (Big Data) to retrieve data into theapplication as required.
  • Performed Map Reduce Programs those are running on the cluster.
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Analyzed the partitioned and bucketed data and compute various metrics for reporting.
  • Involved in loading data from RDBMS and weblogs into HDFS using Sqoop and Flume.
  • Worked with large amounts of structured and unstructured data.
  • Knowledge of Machine Learning concepts (Generalized Linear models, Regularization, Random Forest, Time Series models, etc.)
  • Worked in Business Intelligence tools and visualization tools such as Business Objects, Tableau, ChartIO, etc.
  • Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, and AJAX.
  • Configured the project on WebSphere 6.1 application servers
  • Implemented the online application by using Core Java, JDBC, JSP, Servlets and EJB 1.1, Web Services, SOAP, WSDL.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Developed Hive queries for Analysis across different banners.
  • Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to the database.
  • Handled end-to-end project from data discovery to model deployment.
  • Monitoring the automated loading processes.
  • Communicated with other Health Care info by using Web Services with the halp of SOAP, WSDL JAX-RPC
  • Used Singleton, factory design pattern, DAO Design Patterns based on the application requirements
  • Used SAX and DOM parsers to parse the raw XML documents
  • Used RAD as Development IDE for web applications.
  • Preparing and executing Unit test cases

Environment: SQL/Server, Oracle, MS-Office, Teradata, Informatica, ER Studio, XML, Business Objects, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE,AWS.

Confidential

Data Analyst

Responsibilities:

  • Developed and implemented predictive models using Natural Language Processing Techniques and machinelearning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, RandomForests, K-means clustering, KNN, PCA and regularization for data analysis.
  • Designed and developed Natural Language Processing models for sentiment analysis.
  • Developed visualizations and dashboards using ggplot, Tableau.
  • Worked on development of data warehouse, Data Lake and ETL systems using relational and non relationaltools like SQL, No SQL.
  • Built and analyzed datasets using R, SAS, Matlab and Python (in decreasing order of usage).
  • Participated in all phases of datamining; datacollection, datacleaning, developingmodels, validation, visualization and performed Gapanalysis.
  • DataManipulation and Aggregation from different source using Nexus, Toad, BusinessObjects, PowerBI and SmartView.
  • Implemented Agile Methodology for building an internal application.
  • Good noledge of HadoopArchitecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and MapReduce concepts.
  • Implemented Classification using supervised algorithms like LogisticRegression, Decisiontrees, KNN, Naïve Bayes.
  • Used Teradata15 utilities such as FastExport, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Maintenance in the testing team for System testing/Integration/UAT.
  • Involved in preparation & design of technical documents like Bus Matrix Document, PPDM Model, and LDM & PDM.
  • Understanding the client business problems and analyzing the data by using appropriate Statistical models to generate insights.

Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, ML Lib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

Confidential

Data Analyst

Responsibilities:

  • Identifying what factors could influence the overall satisfaction of consumers. Range (1-5).
  • Considered the SMG (Service Management Group) database survey results in analyzing the impact on overall customer satisfaction.
  • Analyzed business process workflows and assisted in the development of ETL procedures for mappingdatafrom source to target systems
  • We used Ordinal logistic regression methodology in explaining the importance of features.
  • The analysis involved predicting the overall satisfaction - ordinal rating, by analyzing the impact of each independent factors in explaining the output.
  • Packages used: MASS package, plot function for Ordinal logistics regression model.

Environment: R Studio, SQL Server, Dplyr, Tidyr, ggplot2, Tableau, MASS package - plot function

We'd love your feedback!