Data Scientist Resume
Irving, TX
SUMMARY
- Overall 6 years of IT experience wif 4 plus years of experience on Statistics, Data Analysis, Machine Learning using Python.
- Versatile, intuitive and result - oriented data scientist wif excellent integration of Machine Learning algorithms on statistical data.
- Ability to analyse most complex projects at various levels Experience in building Big Data data-intense applications and products using open source frameworks like Hadoop, Pig, HIVE, Sqoop, Apache spark, Apache Kafka, Storm, Apache Mahout, revolution R software.
- Teh experience of working in text understanding, classification, pattern recognition, recommendation systems, targeting systems and ranking systems using Python.
- A deep understanding of Statistical Modelling, Multivariate Analysis, Big data analytics and Standard Procedures Highly efficient in Dimensionality Reduction methods such as PCA (Principal component Analysis), Factor Analysis etc. Implemented bootstrapping methods such as Random Forests (Classification), K-Means Clustering, KNN (K-nearest neighbors), Naïve Bayes, SVM (Support vector Machines), Decision Tree, BFS, Linear and Logistic Regression Methods.
- Worked wif applications like R, SPSS and Python to develop predictive models
- Strong skills in statistical methodologies such as A/B test, Experiment design, Hypothesis test, ANOVA, Cross Tabs, T tests and Correlation Techniques
- Extensively worked on Python 3.5/2.7 (Numpy, Pandas, Matplotlib, NLTK and Scikit-learn)
- Experience wif Natural Language Processing (NLP)
- Worked on Tableau, Quick View to create dashboards and visualizations.
- Experience in implementingdataanalysis wif various analytic tools, such as Anaconda 4.0 Jupiter Notebook 4.X, R 3.0(ggplot2) and Excel
- Solid ability to write and optimize diverse SQL queries, working noledge of RDBMS like SQL Server 2008 noledge of agile development techniques
- Used teh version control tools like Git 2.X
- Good noledge on cloud computing platform services platforms like AWS and Microsoft Azure
- Having good noledge of Exploratory Data Analysis, Descriptive Statistics and Predictive Modelling
- Develop and perform text classification using methods such as logistic regression, decision trees, support vector machines and maximum entropy classifiers
- Created tables, sequences, synonyms, join functions and operators in Netezza database.
- Configured bigdataprocessing platform usingApacheSpark, created predictive models using MLib and deployed models for large scale use.
TECHNICAL SKILLS
Programming Languages: Python, SQL, C, PL/SQL
Predictive Modeling Technique: Supervised Learning- Decision trees, Naive Bayes classification, Ordinary Least Squares regression, Logistic regression, Neural networks, Support vector machines Unsupervised Learning- Clustering Algorithms and Reinforcement Learning
Statistical Methods: Z-test, t-test, Chi-squared and ANOVA testing, A/B Testing, Descriptive and Inferential Statistics, Hypothesis testing
Analytics: Python (Numpy, Pandas, Scipy, Scikit), statsmodels and Visualization: Matplotlib, seaborn, scikit-image), BigData (HDFS, Pig, Hive, HBase, Sqoop, Spark), Excel, Tensorflow
Reporting Tools: Tableau, Spotfire, IBM Watson, QlickView
Database: Hadoop, Spark, Postgres, Access, Oracle, SQL Server, NoSQL (Mongo DB), Cassandra, HBase, Teradata, Netezza
Cloud Computing: Amazon AWS, Microsoft Azure, Apache CloudStack,, Google Analytics, OpenShift
Version control: Git, GitHub
PROFESSIONAL EXPERIENCE
Confidential, Irving TX
Data Scientist
Responsibilities:
- Analysed business requirements and developed teh applications, models, used appropriate algorithms for arriving at teh required insights.
- Used teh Classification machine learning algorithms Naïve Bayes, Logistic regression, Neural Networks, SVM, Random Forrest, Decision Tree and used Clustering Algorithm K Means.
- Conducted descriptive statistics, text analytics, exploratory analysis and data visualization wif Python.
- Worked on NOSQL databases like MongoDB.
- Worked on Text mining and Sentiment analysis to find polarity of teh opinions me.e. positive, negative and neutral.
- Worked ondatacleaning,datapreparation and feature engineering wif Python 3.X
- Worked on Map Reduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
- Extensively used Python's multipledatascience packages like Pandas, NumPy, matplotlib, Seaborn, SciPy, Scikit-learn and NLTK.
- Created Hive scripts to create external, internal data tables on Hive. Worked on creating datasets to load data into HIVE.
- Extensively used Apache Sqoop for efficiently transferring bulk data between Apache Hadoop and relational databases (Oracle, Mainframe, db2) for product level forecast.
- Created teh dashboards and reports in tableau for visualizing teh data in required format.
- Written map-reduce programs for processing data present on HDFS and inserted into HBase for analysing teh data on HBase.
- Worked on Apache spark for analyzing teh live streaming data.
Confidential, Kansas City
DataScience Analyst
Responsibilities:
- Communicated and coordinated wif other departments to collect business requirement
- Handled importingdatafrom variousdatasources, performed transformations using Hive, Map Reduce, and loadeddatainto HDFS.
- Performeddataanalysis by using Hive to retrieve thedatafrom Hadoop cluster, Sql to retrievedatafrom Oracle database.
- Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, Tensorflow in Python for developing various machine learning algorithms.
- Worked on differentdataformats such as JSON and XML
- Developed MapReduce pipeline for feature extraction using Hive.
- Used Statistical testing to evaluate Model performance
- Improved fraud prediction performance by using random forest and gradient boosting for feature selection wif Python Scikit-learn
- Implemented machine learning model (logistic regression, XGboost) wif Python Scikit- learn
- Designed richdatavisualizations wif Tableau and Python.
Confidential
Data Analyst/Python Developer
Responsibilities:
- Analysed customer Helpdata, contact volumes, and other operationaldatain MySQL to provide insights that enableimprovements to Help content and customer experience.
- Brought in and implemented updated analytical methods such as regression modeling, classification tree, statistical tests anddatavisualization techniques wifPython
- Deployed Machine Learning Models built using mahout on Hadoop cluster
- Maintained and updated existing automated solutions.
- Analyzed historical demand, filter out outliers/exceptions, identify teh most appropriate statistical forecasting algorithm,develop base plan, understand variance, propose improvement opportunities, and in corporate demand signal into forecast and executeddatavisualization by using plotly package inPython.
- Improveddatacollection and distribution processes by using pandas and numpy packages inPython while enhancingreporting capabilities to provide clear line of sight into key performance trends and metrics.
- Interacted wif QA to develop test plans from high-level design documentation
- Used Sqoop for loading existing data in Relational databases to HDFS.
Confidential
Jr. Data Analyst
Responsibilities:
- Gatheird user requirements and created teh business requirements documents.
- Documented teh technical specification for teh reports and tested teh generated reports.
- Created and managed Databases.
- Used teh technical document to design tables.
- Performed data analysis, data profiling, data scrubbing, data cleansing, generated data frequency reports.
- Generated SQL and PL/SQL scripts to install, create, and drop database objects including tables, views, primary keys, indexes, constraints, packages, sequences, grants and synonyms.
- Created Database triggers to maintain teh audit data in teh tables.
- Optimized teh SQL queries for improved performance.
- Prepared test plans for various modules.
- Prepared user manual and technical support manuals.
