Lead Data Analytic Scientist Resume
4.00/5 (Submit Your Rating)
Detroit, MI
SUMMARY
- Extensive experience in Predictive Modeling and Machine Learning: Classification, Regression & Clustering: Neural Network, Support Vector Machine, Fisher Discriminant, KNN, Naive Bayes, Generalized Linear Models, Decision Trees, Random Forest
- Extensive experience in application of statistical methods for large scale complex data analysis: maximum likelihood, least squares, hypothesis testing, distribution fitting, confidence region estimation, error analysis, multivariate analysis, principal component analysis
- Experience wif Big Data Ecosystems: Hadoop, HDFS, MapReduce, Oozie, Hive, Pig, Sqoop, Ranger, Spark: Scala, pySpark, SparkR, SparkSQL, Jupyter & Zeppelin notebook
- Proficient in data visualization and analytics using Tableau, QlikView and SiSense
- Lead distributed technical team in a on - shore/off-shore global delivery model
TECHNICAL SKILLS
Programming languages: C, C+ +, Java, MatLab, Python, R, Scala
Scripting languages: Shell and Perl scripting
Databases: SQL Server, Oracle, Teradata, GreenPlum, NoSQL, HBase
PROFESSIONAL EXPERIENCE
Confidential, Detroit, MI
Lead Data Analytic Scientist
Responsibilities:
- Building Customer Interaction Repository (CIR) in hadoop environment to bring data from different business units (Marketing, Sales, Call Centers, FordPass, Confidential Credit etc) to a single platform on hadoop to create historical customer interactions wif Confidential so dat the journey of a customer is recorded in a single actionable view to take better business decision based on predictive modeling.
- me is responsible for creating interactions from Confidential Credit data.
- me wrote thousands of lines of code in Hive & SparkSQL and setup automated processing using Oozie workflow.
- Collaborated wif other groups using Jupyter & Zeppelin notebook.
- Wrote many hundreds of lines of code in bash script for Oozie workflow.
- Analyze data and build models in Scala, pySpark, SparkR.
- Collaborate in code development and integrate code wif other team members using GitHub.
- Lead team to build Data Governance & Data Quality compliance framework in Scala and create the dashboard to visualize data quality metrics and alerting system for any anomaly observed in data.
- Wrote sqoop scripts and Alteryx & informatica workflows for ETL purpose.
Confidential, Chicago,
IL Lead Visualization & Analytics
Responsibilities:
- Built Data Lake (GreenPlum) to bring data from different data sources and developed analytics and visualization on top the data lake.
- These are mainly locomotive data extracted from onboard controllers and sensors; geo-coordinates and weather data.
- The motivation is to analyze the data and engine faults and correlate it to certain parameters so a predictive model can be built to reduce failure or create an advance warning system to avoid failure.
- Lead development of predictive analytics in R and Python, and using advanced tools such as H2O deep learning. Worked specifically on three machine learning projects:
- Find significant fault code and it's correlation to parameters effecting it using logistic regression and random forest.
- The code was written in R.
- Find Root-Cause-Code (RCC) using NLP.
- The code is initially written in python and then modified to R wif H2O deep learning for faster processing.
- Wrote modified clustering model to match zero demand parts in inventory to match wif active parts so dat it can be reused saving millions of dollars.
- me led the visualization team of 6-8 personnels to build the visualization of the data and analytics results.
- Worked on hadoop POC environment to move the analytical efforts on greenplum to hadoop platform. and built POC for visualization & analytics on hadoop.
Confidential, Waukesha
WI Data Analytics Leader
Responsibilities:
- Building RM&D (Remote Monitoring & Diagnostics) solution utilizing engine and sensor data collected from asset and applying machine learning and predictive modeling to improve asset's performance, optimize planned downtime and reduce unplanned downtime increasing reliability to help optimize the business and increase the profitability.
- Worked on the following specific tasks:
- Data collection, data cleaning and data quality checks and data ingestion. Defining KPIs (Key Performance Indicator) and its mathematical derivation
- Building data analytics and predictive modeling using MatLab, R & Python. wrote engine oil life modeling in R.
- Build reports using Tableau for executives and operation managers, business analysts, sales & services
- Design and build dashboard for APM (Asset Performance Management) development to deploy in GE's Predix platform
Confidential, Atlanta, GA
Senior Software Consultant
Responsibilities:
- Developed MatLab based GUI for live (real time) signal analysis of ultra-short (femtosecond) LASER pulses using FROG technique.
- The patented API is used in commercial product for live LASER pulse analysis and characterization.
- Interfaced MatLab wif C++ to speed up optimization algorithm for live (real time) analysis of data and display plots and numbers to characterize LASER pulse.
Confidential, Chicago,
IL Data Analyst
Responsibilities:
- Lead the team of physicists and engineers in R&D effort to develop sensors using magnetic principal for application mainly in automotive and aerospace industry.
- Lead the design of experiments, simulation, data collection and data analysis.
- Analyze test data for performance improvement of product and lead the implementation of improvement based on data.
- Work wif leading automakers (GM, Confidential, Toyota, BMW) in different R&D projects. Primary contributor for several patents published in sensor development.
- Developed algorithm for calibration and linearization of contactless linear position sensor for dual clutch transmission.
- Wrote MatLab based GUI for calibration of sensor in production line; improved the algorithm reducing error from calibration procedure giving a better performance and also reduced calibration time saving production cost.
- Developed algorithm for elimination of cross-magnetic influence between different sensors in proximity to each other.
- Developed and implemented 2D lookup table (LUT) to compensate for temperature and other environmental effects
Confidential, Batavia, IL
Research Associate
Responsibilities:
- Wrote event reconstruction and physics analysis code in C++: Used advance mathematical models to construct probability distribution functions and developed maximum likelihood dat would predict the global probability of an event in a particle physics detector based on complex experimental signatures to measure the mass of charm-quark using background subtracted events.
- Use Neural Network to identify signals in the sea of huge backgrounds from very big dataset.
- Designed Live Data Quality Monitoring System and wrote initial code in C++: Worked wif multi-talented and diverse team of experts consisting of engineers, computer scientists and physicists to develop live data quality monitoring system wifin the Data Acquisition (DAQ) system for the NOvA Experiment at Fermilab.
- Conceptualized, designed and wrote initial code in C++ to analyze raw data coming from millions of channels in real time and export the plots and numbers to detector control room using socket connection for shift crews for monitoring and checking the quality of data.
- Developed and wrote novel algorithm in C++: Developed a new and novel algorithm (method) based on Bayesian approaches to measure the lifetime of B mesons from biased data collected wif CDF detector at Fermilab by correcting the bias using data itself in collaboration wif the physicists from Oxford University, UK.
- Worked on large-scale complex data: Analyzed large-scale high energy physics data using grid computing spread across the world and manipulated large number of input and output files using combination of shell and Perl scripts.