We provide IT Staff Augmentation Services!

Data Scientist Resume

2.00/5 (Submit Your Rating)

Pittsburg, PA

SUMMARY

  • Overall 8+ years of experience in Information Technology and Data Science, Machine Learning, Data Mining with large data sets of Structured and Unstructured data, Data acquisition, Data validation, Predictive Modeling, Data Visualization and develop predictive models that helps to provide intelligent solutions.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating Data Visualizations using R, Python and Tableau.
  • Proficient in advising on the use of data for compiling personnel and statistical reports and preparing personnel action documents, patterns within data, analyzing data and interpreting results.
  • Strong ability to analyze sets of data for signals, patterns, ways to group data to answer questions and solve complex data puzzles.
  • Skilled in Advanced Regression Modeling, Time Series Analysis, Statistical Testing, Correlation, Multivariate Analysis, Forecasting, Model Building, Business Intelligence tools and application of Statistical Concepts.
  • Proficient in: Data Acquisition, Storage, Analysis, Integration, Predictive Modeling, Logistic Regression, Decision Trees, Data Mining Methods, Forecasting, Factor Analysis, Cluster Analysis, Neural Networks and other advanced statistical and econometric techniques.
  • Adept in writing code in R and T - SQL scripts to manipulate data for data loads and extracts.
  • Proficient in data entry, data auditing, creating data reports & monitoring data for accuracy.
  • Ability to extract Web search and data collection, Web data mining, Extract database from website, Extract Data entry and Data processing.
  • Strong experience with R Visualization, QlikView and Tableau to use in data analytics and graphic visualization.
  • Extensively worked on using major statistical analysis tools such as R, SQL, SAS.
  • Strong knowledge in all phases of the SDLC (Software Development Life Cycle) from analysis, design, development, testing, implementation and maintenance with timely delivery against deadlines.
  • Good knowledge and understanding of data mining techniques like classification, clustering, regression techniques and random forests.
  • Extensive experience with creating MapReduce jobs, SQL on Hadoop using Hive and ETL using PIG scripts, and Flume for transferring unstructured data to HDFS.
  • Strong Oracle/SQL Server programming skills, with experience in working with functions, packages and triggers.
  • Experience in all phases of Data warehouse development from Requirements, analysis, design, development, testing and post production support.
  • Strong in-depth knowledge in doing data analysis, data quality and source system analysis.
  • Independent, Self-starter, enthusiastic team player with strong adaptability to new technologies.
  • Experience in Big Data Technologies using Hadoop, Sqoop, Pig and Hive.
  • Experience in writing Hive and Unix shell scripts.
  • Excellent track record in delivering quality software on time to meet the business priorities.
  • Developed Data Warehouse/Data Mart systems, using various RDBMS (Oracle, MS-SQL Server, Mainframes and DB2).
  • Assist in the collection and documentation of user's requirements, development of user stories estimates and work plans.
  • Adhere to high-quality development principles while delivering solutions on-time and on-budget.
  • Used Apache spark in handling huge sets of data and built machine learning models using spark ML libraries.
  • Excellent initiative, innovative thinking skills, and the ability to analyze details and adopt a big-picture view and Excellent organizational, project management and problem-solving skills.
  • Excellent oral and written communication skills. Ability to explain complex technical information to technical and non-technical contacts.
  • Excellent interpersonal skills. Ability to effectively build relationships, promote a collaborative and team environment, and influence others.

TECHNICAL SKILLS

Languages: HTML5, DHTML, WSDL, CSS3, C, C++, XML, R/R Studio, SAS Enterprise Guide, SAS, R (Caret, Weka, ggplot), Perl, MATLAB, Mathematica, FORTRAN, DTD, Schemas, JSON, Ajax, Java, ScalaNO SQL Databases: Cassandra, HBase, MongoDB, MariaDB

Software/Libraries: Keras, Caffe, TensorFlow, OpenCV, Scikit-learn, Pandas, NumPy, Microsoft Visual Studio, Microsoft Office.

Development Tools: Microsoft SQL Studio, IntelliJ, Eclipse, NetBeans.

Machine Learning Algorithms: Neural Networks, Decision trees, Support Vector Machines, Random forest, Convolutional Neural Networks, Logistic Regression, PCA, K- means, KNN.

Development Methodologies: Agile/Scrum, UML, Design Patterns, Waterfall

Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos 7.0/6.0.

BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies

PROFESSIONAL EXPERIENCE

Confidential, IN

Data Scientist

Responsibilities:

  • Developed and Maintained applications to send personalized emails to customers
  • Involved in web scraping of Instacart data generating frequent item sets using Apriori algorithm
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis
  • Worked on customer segmentation on customer database to improve personalized marketing using an unsupervised learning technique -K means clustering
  • Responsible for clarifying business objective,data collection, data wrangling,data preprocessing,exploratory data analysis,feature engineering, machine learning modeling,model tuning,deploying models.
  • Used sparkML to leverage the computational power of spark to machine learning in improving the performance and optimization of the existing algorithms using spark context. Spark - SQL and spark data frames.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis
  • Identify, analyze, and interpret trends or patterns in complex data sets and present findings as
  • Visualizations to Business
  • Creating and Develop machine learning models like, Regression, Classification and Natural Language Processing for Information Retrieval and Recommender Systems
  • Leverage AWS sage makes to build, train, tune and deploy state of Art Machine Learning and Deep learning Models.
  • Addressed over fitting and under fitting by tuning the hyper parameter of the machine learning algorithms by using lasso and ridge regularization.
  • Cleaning, organizing and preprocessing the raw data using various statistical techniques primarily in Python, making sure the data is consistent
  • Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments
  • Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques
  • Working closely with the Retail Business Unit to deliver actionable insights from huge volume of data, coming from different marketing campaigns and customer interaction matrices such as web portal usage, email campaign responses, public site interaction, and other customer specific parameters
  • Worked with statistical models for data analysis, predictive modeling, machine learning approaches and recommendation and optimization algorithms.
  • Working in Business and Data Analysis, Data Profiling, Data Migration, Data Integration and Metadata Management Services.
  • Worked extensively on Databases preferably SQL and writing PL/SQL scripts for multiple purposes.
  • Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest using Python packages
  • Worked with data compliance teams, data governance team to maintain data models, Metadata, data Dictionaries, define source fields and its definitions.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS
  • Managed existing team members, lead the recruiting and on boarding of a larger Data Science team that addresses analytical knowledge requirements.
  • Worked directly with upper executives to define requirements of scoring models.
  • Created SQL scripts and analyzed the data in MS Access/Excel and Worked on SQL and SAS script mapping

Environment: Python, Clustering, Linear and Logistic Regression, Support Vector Machine, Association Rule Mining, Tableau, Neural Networks, Microsoft Azure, Data bricks, Jupyter Notebooks, Apache Spark

Confidential, Pittsburg, PA

Data Scientist

Responsibilities:

  • Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, and time, Date and Time etc.
  • Used Principal Component Analysis in feature engineering to analyze high dimensional data.
  • Performed data integrity checks, data cleaning, exploratory analysis and feature engineer using R and Python.
  • Responsible for design and development of advanced R/Python programs to prepare to transform and harmonize data sets in preparation for Modeling Application of various machine learning algorithms and statistical Modeling like decision trees, regression models, neural networks, SVM, clustering to identify Volume using Scikit-learn package in python, MATLAB.
  • Researched the existing client processes and guided the team in aligning with the HIPAA rules and regulations for the systems for all the EDI transaction sets.
  • Optimize algorithm with stochastic gradient descent algorithm Fine-tuned the algorithm parameter with manual tuning and automated tuning such as Bayesian Optimization.
  • Develop a technical brief based on the business brief. This contains detailed steps and stages of developing and delivering the project including timelines.
  • Optimizing the search relevance for a recommender system in accordance with the user behavior and NLP.
  • Used Python and Spark to implement different machine learning algorithms including Generalized LinearModel, SVM, Random Forest, Boosting and Neural Network Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
  • Communicated the results with operations team for taking best decisions.
  • Used R, Python and Spark to develop avariety of models and algorithms for analytic purposes Performed Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
  • Performed data analysis by using Hive to retrieve the data from the Hadoop cluster, SQL to retrieve data from Oracle database.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLLib, python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Used Natural Language Processing (NLP) for response Modeling and fraud detection efforts for credit cards Used MLLib, Spark'sMachine learning library to build and evaluate different models.
  • Implemented rule-based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
  • Developed Map Reduce/Spark Python modules for machine learning & predictive analytics in Hadoop. Implemented a Python-based distributed random forest via Python streaming.
  • Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
  • Developed Map Reduce pipeline for feature extraction using Hive.
  • Determined customer satisfaction and helped enhance customer experience using NLP.
  • Collected data needs and requirements by Interacting with the other departments.

Environment: SAS, Python, C++, MS Excel, Perl, MS SQL Server, HIPAA, EDI, Power BI, Tableau, T-SQL, ETL, MS Access, XML, JSON, MS office 2010, Outlook.

Confidential, Pittsburgh, PA

Data Scientist

Responsibilities:

  • Worked on data cleaning and reshaping, generated segmented subsets using NumPy and Pandas in Python. Developed Python scripts to automate data sampling process. Ensured the data integrity by checking for completeness, duplication, accuracy, and consistency.
  • Developed tools using Python, Shell scripting, XML to automate some of the menial tasks. Interfacing with supervisors, artists, systems administrators, and production to ensure production deadlines are met.
  • Worked on model selection based on confusion matrices, minimized the Type II error. Generated cost-benefit analysis to quantify the model implementation compared with the former situation.
  • Wrote and optimized complex SQL queries involving multiple joins and advanced analytical functions to perform data extraction and merging from large volumes of historical data stored in Oracle 11g, validating the ETL processed data in the target database.
  • Conducted model optimization and comparison using stepwise function based on AIC value.
  • Applied various machine learning algorithms and statistical modeling like a decision tree, logistic regression, Gradient Boosting Machine to build a predictive model using Scikit-learn package in Python.
  • Continuously collected business requirements during the whole project life cycle. Identified the variables that significantly affect the target.

Environment: Decision Tree, Logistic regression, Hadoop, Teradata, Python, MLLib, SAS, random forest, OLAP, HDFS, NLTK, SVM, JSON, and XML.

Confidential, Austin TX

Data Engineer

Responsibilities:

  • Developed applications of Machine Learning, Statistical Analysis, and Data Visualizations with challenging data Processing problems in sustainability and biomedical domain.
  • Compiled data from various sources public and private databases to perform complex analysis and data manipulation for actionable results.
  • Designed and developed Natural Language Processing models for sentiment analysis.
  • Worked on Natural Language Processing with NLTK module of python for application development for automated customer response.
  • Used predictive modeling with tools in SAS, SPSS, R, and Python.
  • Applied concepts of probability, distribution and statistical inference on given dataset to unearth interesting findings using a comparison, T-test, F-test, R-squared, P-value etc.
  • Applied linear regression, multiple regression, ordinary least square method, mean-variance, the theory of large numbers, logistic regression, dummy variable, residuals, Poisson distribution, Bayes, Naive Bayes, fitting function etc. to data with help of Scikit, SciPy, NumPy,andPandas module of Python.
  • Applied clustering algorithms i.e. Hierarchical, K-means with help of Scikit and SciPy.
  • Developed visualizations and dashboards using ggplot, Tableau
  • Worked on development of data warehouse, Datalake and ETL systems using relational and nonrelational tools like SQL, No SQL.
  • Built and analyzed datasets using R, SAS, MATLAB, and Python (in decreasing order of usage).
  • Applied linear regression in Python and SAS to understand the relationship between different attributes of the dataset and causal relationship between them
  • Performs complex pattern recognition of financial time series data and forecast of returns through the ARMA and ARIMA models and exponential smoothening for multivariate time series data
  • Pipelined (ingest/clean/munge/transform) data for feature extraction toward downstream classification.
  • Used ClouderaHadoopYARN to perform analytics on data in Hive.
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Expertise in Business Intelligence and data visualization using R and Tableau.
  • Expert in Agile and Scrum Process.
  • Validated the Macro-Economic data (e.g. BlackRock, Moody's etc.) and predictive analysis of world markets using key indicators in Python and machine learning concepts like regression, Bootstrap Aggregation and Random Forest.
  • Worked in large-scale database environments like Hadoop and MapReduce, with working mechanism of Hadoop clusters, nodes and Hadoop Distributed File System (HDFS).
  • Interfaced with large-scale database system through an ETL server for data extraction and preparation.
  • Identified patterns, data quality issues, and opportunities and leveraged insights by communicating opportunities with business partners.

Environment: Machine learning, AWS, MS Azure, Cassandra, Spark, HDFS, Hive, Pig, Linux, Python (Scikit-Learn/SciPy/NumPy/Pandas), R, SAS, SPSS, MySQL, Eclipse, PL/SQL, SQL connector, Tableau.

Confidential

Data Analyst

Responsibilities:

  • Used SAS Proc SQLpass-throughfacility to connect to Oracle tables and created SAS datasets using various SQL joins such as left join, right join, inner join and full join.
  • Performing data validation, transforming data from RDBMS oracle to SAS datasets.
  • Produce quality customized reports by using PROC TABULATE, PROC REPORT Styles, and ODS RTF and provide descriptive statistics using PROC MEANS, PROC FREQ, and PROC UNIVARIATE.
  • Developed SAS macros for data cleaning, reporting and to support routing processing.
  • Performed advanced querying using SAS Enterprise Guide, calculating computed columns, using a filter, manipulate and prepare data for Reporting, Graphing, and Summarization, statistical analysis, finally generating SAS datasets.
  • Involved in Developing, Debugging, and validating the project-specific SAS programs to generate derived SAS datasets, summary tables, and data listings according to study documents.
  • Created datasets as per the approved specification collaborated with project teams to complete scientific reports and review reports to ensure accuracy and clarity.
  • Experienced in working with data modelers to translate business rules/requirements into conceptual/logical dimensional models and worked with complex de-normalized and normalized data models
  • Performed different calculations like Quick table calculations, Date Calculations, Aggregate Calculations, String and Number Calculations.
  • Designing the ETL process using Informatica to populate the Data Mart using the flat files to Oracle database
  • Expertise in Agile Scrum Methodology to implement project life cycles of reports design and development
  • Combined Tableau visualizations into Interactive Dashboards using filter actions, highlight actions etc. and published them on the web.
  • Gathering business requirements, creating business requirement documents (BRD /FRD).
  • Work closely with business leaders and users to define and design the data sources requirements and data accessCode, test, identify, implement and document technical solutions utilizing JavaScript, PHP&MySQL.
  • Created Rich dashboards using Tableau Dashboard and prepared user stories to create compelling dashboards to deliver actionable insights
  • Working with the manager to prioritize requirements and preparing reports on the weekly and monthly basis.

Environment: SQL Server, Oracle 11g/10g, MS Office Suite, PowerPivot, Power Point, SAS Base, SAS Enterprise Guide, SAS/MACRO, SAS/SQL, SAS/ODS, SQL, PL/SQL, Visio.

Confidential

Data Analyst

Responsibilities:

  • Collected requirements from business clients, and designed report models to meet business requirements.
  • Created, managed, and delivered interactive web-based reports to support weekly operations.
  • Designed SSIS package to perform extract, transform and load (ETL) data across different platforms validate the data and achieve the data from the database.
  • Developed and implemented several types of financial reports by using SSRS and Tableau
  • Developed parameterized dynamic performance reports (Gross Margin, Revenue based on geographic regions) and ran the reports every month and distributed them to respective departments through mailing server subscriptions and SharePoint server
  • Designed and developed new reports and maintained existing reports using SSRS and Microsoft Excel to support the business strategy and management
  • Generated complex calculate fields and parameters, toggled and global filters, dynamic sets, groups, actions, custom color palettes, statistical analysis to meet business requirements
  • Implemented data refreshes on Tableau Server for biweekly and monthly increments based on business change to ensure that the views and dashboards were displaying the change data accurately

Environment: SQL Server 2012, SQL Server Management Studio, Microsoft BI Suite(SSIS/SSRS), T-SQL, Visual Studio 2010, Tableau, and Python.

We'd love your feedback!