Sr. Data Scientist Resume
Riverwoods, IL
PROFESSIONAL SUMMARY:
- Around 8 years of IT experience as a Data Scientist, including profound expertise and experience on statistical data analysis such as transforming business requirements into analytical models, designing algorithms, and strategic solutions that scales across massive volumes of data.
- Proficient in Statistical Methods like Regression models, hypothesis testing, confidence intervals, principal component analysis and dimensionality reduction.
- Expert in R and Python scripting. Worked in stats function with Numpy, visualization using Matplotlib/Seaborn and Pandas for organizing data.
- 4 years of experience in Scala and spark.
- Experience in using various packages in R and python like ggplot2, caret, dplyr, Rweka, rjson, plyr, SciPy, scikit - learn, Beautiful Soup, Rpy2.
- Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau.
- Experience in writing code in R and Python to manipulate data for data loads, extracts, statistical analysis, modeling, and data munging.
- Utilized analytical applications like R, SPSS, Rattle and Python to identify trends and relationships between different pieces of data, draw appropriate conclusions and translate analytical findings into risk management and marketing strategies that drive value.
- Skilled in performing data parsing, data manipulation and data preparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, Remap, merge, subset, reindex, melt and reshape.
- Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
- Professional working experience in Machine Learning algorithms such as, linear regression, logistic regression, Naive Bayes, Decision Trees, Clustering, and Principal Component Analysis.
- Hands on experience with big data tools like Hadoop, Spark, Hive, Pig, Impala, Pyspark, Spark SQL.
- Good knowledge in Database Creation and maintenance of physical data models with Oracle, Teradata, Netezza, DB2, MongoDB, HBase and SQL Server databases.
- Experienced in writing complex SQL Quires like Stored Procedures, triggers, joints, and Sub quires.
- Interpret problems and provides solutions to business problems using data analysis, data mining, optimization tools, and machine learning techniques and statistics.
- Knowledge of working with Proof of Concepts (PoC's) and gap analysis and gathered necessary data for analysis from various sources, prepared data for data exploration using data munging and Teradata.
- Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables and OLAP reporting.
- Ability to work with managers and executives to understand the business objectives and deliver as per the business needs and a firm believer in team work.
- Experience and domain knowledge in various industries such as healthcare, insurance, retail, banking, media and technology.
- Work closely with customer's, cross-functional teams, research scientists, software developers, and business teams in an Agile/Scrum work environment to drive data model implementations and algorithms into practice.
- Strong written and oral communication skills for giving presentations to non-technical stakeholders.
TECHNICAL SKILLS:
Databases: Oracle, MySQL, SQLite, NO SQL,RDBMS, SQL Server 2014, HBase 1.2, MongoDB 3.2.Teradata, Netezza.Cassandra
Database Tools: PL/SQL Developer, Toad, SQL Loader, Erwin.
Web Programming: Html, CSS, Xml, JavaScript.
Programming Languages: R, Python, SQL, Scala, UNIX, C, JAVA, Tableau
DWH BI Tools: Data Stage 9.1, 11.3, Tableau Desktop, D3.js
Machine Learning: Regression, clustering, SVM, Decision trees, Classification, Recommendation systems, Association Rules, Survival Analysis etc
Data Visualization: Qlickview, Tableau9.4/9.2, ggplot2 (R), D3,Zeeplin
Bigdata Framework: HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Zookeeper, Flume and HBase, Amazon EC2, S3 and Red Shift), Spark, Storm, Impala,Kafka.
Technologies/Tools: Azure Machine Learning, SPSS, Rattle, Caffe, Tensor flow, Theano, Torch, Keras, NumPy.
Scheduling Tools: Autosys, Control-M.
Operating Systems: AIX, LINUX, UNIX.
Environment: AWS, AZURE, Databricks.com
PROFESSIONAL EXPERIENCE:
Confidential, Riverwoods, IL
Sr. Data Scientist
Responsibilities:
- This project was focused on customer clustering based on ML and statistical modeling effort including building predictive models and generatedataproducts to support customer classification and segmentation.
- Develop a Estimation model for various product & services bundled offering to optimize and predict the gross margin
- Built sales model for various product and services bundled offering
- Developed predictive causal model using annual failure rate and standard cost basis for the new bundled services.
- Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning.
- Worked with sales and Marketing team for Partner and collaborate with a cross-functional team to frame and answer importantdataquestions.
- prototyping and experimenting ML algorithms and integrating into production system for different business needs.
- Application Machine Learning algorithms with Spark Mlib standalone and R/Python .
- Worked on Multiple datasets containing 2billion values which are structured and unstructureddata about web applications usage and online customer surveys
- Design, built and deployed a set of python modeling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior predictionand support multiple marketing segmentation programs
- Segmented the customers based on demographics using K-means Clustering
- Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring
- Designed and implemented end-to-end systems forDataAnalytics and Automation, integrating custom visualization tools using R, Tableau,Power BI
Environment: MS SQL Server, R/R studio, Python, Spark frame work,Redshift, MS Excel, Tableau, T-SQL, ETL,RNN, LSTM MS Access, XML, MS office 2007, Outlook.
Confidential, Deerfield, IL
Data Scientist
Responsibilities:
- Responsible for analyzing large data sets to develop multiple custom models and algorithms to drive innovative business solutions.
- Perform Data profiling, preliminary data analysis and handle anomalies such as missing, duplicates, outliers, and imputed irrelevant data.
- Remove outliers using Proximity Distance and Density based techniques.
- Involved in Analysis, Design and Implementation/translation of Business User requirements.
- Experienced in using supervised, unsupervised and regression techniques in building models.
- Performed Market Basket Analysis to identify the groups of assets moving together and recommended the client their risks
- Experience in determine trends and significant data relationships using advanced Statistical Methods.
- Implemented techniques like forward selection, backward elimination and step wise approach for selection of most significant independent variables.
- Performed Feature selection and Feature extraction dimensionality reduction methods to figure out significant variables.
- Used RMSE score, Confusion matrix, ROC, Cross validation and A/B testing to evaluate model performance in both simulated environment and real world.
- Performed Exploratory Data Analysis using R. Also involved in generating various graphs and charts for analyzing the data using Python Libraries.
- Involved in the execution of multiple business plans and projects Ensures business needs are being met Interpret data to identify trends to go across future data sets.
- Developed interactive dashboards, created various Ad Hoc reports for users in Tableau by connecting various data sources.
Environment: Python, SQL server, Hadoop, HDFS, HBase, MapReduce, Hive, Impala, Pig, Sqoop, Mahout, LSTM, RNN, Spark MLLib, MongoDB, Tableau, Unix/Linux.
Confidential - Princeton, NJ
Data Analyst
Responsibilities:
- Involved in Analysis, Design and Implementation/translation of Business User requirements.
- Worked on collection of large sets using Python scripting. Spark SQL
- Worked on large sets of Structured and Unstructured data.
- Worked on creating DL algorithms using LSTM and RNN.
- Actively involved in designing and developing data ingestion, aggregation, and integration in Hadoop environment.
- Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
- Experience in creating Hive Tables, Partitioning and Bucketing.
- Performed data analysis and data profiling using complex SQL queries on various sources systems including Oracle 10g/11g and SQL Server 2012.
- Identified inconsistencies in data collected from different source.
- Worked with business owners/stakeholders to assess Risk impact, provided solution to business owners.
- Experienced in determine trends and significant data relationships Analyzing using advanced Statistical Methods.
- Carrying out specified data processing and statistical techniques such as sampling techniques, estimation, hypothesis testing, time series, correlation and regression analysis Using R.
- Applied various data mining techniques: Linear Regression & Logistic Regression, classification, clustering.
- Took personal responsibility for meeting deadlines and delivering high quality work.
- Strived to continually improve existing methodologies, processes, and deliverable templates.
Environment: R, SQL server, Oracle, HDFS, HBase, MapReduce, Hive, Impala, Pig, Sqoop, NoSQL, Tableau, RNN, LSTM, Unix/Linux, Core Java.
Confidential
Data Analyst
Responsibilities:
- Provide expertise and recommendations for physical database design, architecture, testing, performance tuning and implementation.
- Designed logical and physical data models for multiple OLTP and Analytic applications.
- Extensively used the Erwin design tool &Erwin model manager to create and maintain the Data Mart.
- Designed the physical model for implementing the model into oracle9i physical data base.
- Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats
- Performance tuning of the database, which includes indexes, and optimizing SQL statements, monitoring the server.
- Wrote simple and advanced SQL queries and scripts to create standard and adhoc reports for senior managers.
- Collaborated the data mapping document from source to target and the data quality assessments for the source data.
- Used Expert level understanding of different databases in combinations for Data extraction and loading, joining data extracted from different databases and loading to a specific database.
- Co-ordinate with various business users, stakeholders and SME to get Functional expertise, design and business test scenarios review, UAT participation and validation of financial data.
- Worked very close with Data Architects and DBA team to implement data model changes in database in all environments.
- Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
- Performed performance improvement of the existing Data warehouse applications to increase efficiency of the existing system.
- Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design)using UML and Visio.
Environment: Erwin r9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro, Hadoop, PL/SQL, etc.
Confidential
SQL Developer
Responsibilities:
- Extensively experienced working on different dataflow and control flow task, for loop container, sequence container, script task, execute SQL task and Package configuration.
- Created new procedures to handle complex logic for business and modified already existing stored procedures, functions, views and tables for new enhancements of the project and to resolve the existing defects.
- Loading data from various sources like OLEDB, flat files to SQL Server 2012 database Using SSIS Packages and created data mappings to load the data from source to destination.
- Created batch jobs and configuration files to create automated process using SSIS.
- Created SSIS packages to pull data from SQL Server and exported to Excel Spreadsheets and vice versa.
- Built SSIS packages, to fetch file from remote location like FTP and SFTP, decrypt it, transform it, mart it to data warehouse and provide proper error handling and alerting
- Extensive use of Expressions, Variables, Row Count in SSIS packages
- Data validation and cleansing of staged input records was performed before loading into Data Warehouse
- Automated the process of extracting the various files like flat/excel files from various sources like FTP and SFTP (Secure FTP).
- Deploying and scheduling reports using SSRS to generate daily, weekly, monthly and quarterly reports.
Environment: MS SQL Server 2005 & 2008, SQL Server Business Intelligence Development Studio, SSIS-2008, SSRS-2008, Report Builder, Office, Excel, Flat Files, .NET, T-SQL