Data Scientist Resume
MarylanD
SUMMARY
- Over 9.5+ years of strong IT experience in Machine Learning, Data Analytics, Data Mining and ETL Development .
- As a Data Scientist I have b uilt variety of statistical and predictive models using R, Python employing various Machine learning techniques - Supervised learning, Unsupervised Learning, Deep Learning (TensorFlow, Keras), NLP.
- Hands-on expertise in Machine Learning, Deep Learning, Data Visualization, Data Cleaning, Creating compelling stories as well as providing actionable insight.
- Strong Communication skills, Strong working knowledge in structured, semi-structured and unstructured data, large data warehouse, multiple platforms including AWS, Linux, Unix and Mainframe.
- Transform business requirements into analytical models, design algorithms, develop data mining and reporting solutions that scale across a massive volume of structured and unstructured data.
- Proficient in use of Statistical Modeling and Machine Learning techniques - Linear & Logistic Regression, Decision Trees, Random Forests, Clustering, SVM, Principle Component Analysis, XGBoost, KNN & Neural Networks (TensorFlow-Keras, Pytorch).
- Proficient in managing project life cycle including Data Acquisition, Data Preparation, Data Manipulation, Statistical Modeling, Exploratory data analysis and Data Visualization.
- Managed GitHub repositories and permissions, including branching and tagging.
- Good Knowledge on NLP using libraries such as Stanford NLP, NLTK, Scikit-Learn, Spacy etc.
- Extensive experience in Text Analytics and Forecasting, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R and Python.
- Experienced in all stages of the software lifecycle architecture (Waterfall & Agile Model).
- Hands on experience on Classification, regression, Clustering, Collaborative filtering, dimensionality reduction techniques.
- Hands on experience in Informatica Designer Tools, Workflow Manager Tools, Repository Manager & Admin console.
- Involved in troubleshooting bottlenecks, Performance tuning & implementing pushdown optimization.
- Hands on experience in UNIX Shell scripting for Automation of batch jobs.
- Worked on Crontab, Autosys and CA7(Mainframe) schedulers.
- Strong knowledge in RDBMS concepts and extensive experience in creation and maintenance of database objects and PL/SQL (Stored Procedures, Packages, Synonyms, Functions and Cursors) programming.
- Implemented Performance tuning on large queries to avoid bottlenecks.
- Experienced with Teradata utilities like Fast load, Multi load, TPT and BTEQ scripts.
- Created Code review checklist, Technical Design, Requirement Traceability Matrix documents.
- Ability to drive initiatives, grasp and expand on ideas, tackle and follow through assignments in the fast-pace changing environment.
- Independent, yet team oriented with excellent analytical, problem solving, multi-tasking and good inter-personal skills.
TECHNICAL SKILLS
Machine Learning: Linear Regression, Logistic Regression, Decision trees, Ensemble Models (Random forest), Association Rule Mining (Market Basket Analysis), KNN, PCA, Factor Analysis, Clustering (K-Means, Hierarchal), Gradient decent, XGBoost, SVM (Support Vector Machines), Deep Learning (ANN, CNN, RNN) using TensorFlow (Keras), Text Analytics (NLP)
Programming Languages: R, Python (Scikit-Learn), Spark-PySpark, SQL, PL/SQL, C
Databases: Oracle 12c, MS SQL Server 2005, Amazon REDSHIFT, Teradata, Veeva Salesforce, HIVE
ETL Tools: Informatica PowerCenter, AWS - Glue, AWS - SageMaker
Scripting Languages: Shell scripting, Teradata Macros, Bteq Scripts, CURL Scripting
Cloud Technologies: AWS Cloud, Informatica Cloud
BigData Technologies: Hadoop, Hive, Hbase, Pig, HDFS, Sqoop
Versioning Tools: SVN, GitHub
Operating systems: Windows, UNIX/Linux
PROFESSIONAL EXPERIENCE
Confidential, Maryland
Data Scientist
Responsibilities:
- Involved in conducting statistical analysis to determine key factors to prove total fraud loss using predictive analytics by applying machine learning algorithms.
- Used GitHub as hosting service by providing convenient place to store multiple versions of files for GIT.
- Managed GitHub repositories and permissions, including branching and tagging.
- Driving end to end analytical process from formulation of Requirements, Data Acquisition, Identification of right analytical methods and creation/validation of models and providing business-friendly summarization of results by following the traditional CRISP-DM (Cross Industry Standard Process for Data Mining) model to deliver the Analytical Solutions.
- Analyze and identify needs for data, Information and analysis/modeling.
- Experience in building models with deep learning frameworks like TensorFlow, PyTorch and Keras.
- Draw meaningful insights from data using machine learning techniques and statistics.
- Used with an ensemble of detection models to estimate risk/fraudulent behavior of a transaction in real-time.
Techniques used - Logistic Regression, Decision Trees, Random Forest, SVM, KNN, ANN (TensorFlow-Keras, Pytorch).
Environment: - R, Python, Spark, Pyspark, Flask, Ambari-Hive, GitHub, Docker.
Confidential, Missouri
Data Scientist
Responsibilities:
- Coordinating & interacting with business on requirements.
- Involved in Data analysis, Mining from company databases to drive marketing techniques and business strategies.
- Used AWS- SageMaker machine learning service to transform data by creating and using SageMaker Notebooks.
- Having knowledge on all the 4-key components (Build, Train, Tune & Deploy) to SageMaker end-points for real-time predictions.
- Used AWS-Glue to Extract, transform and load data from S3 bucket and into Redshift Database.
- Assess the effectiveness and accuracy of new data sources and data gathering techniques.
- Develop and manage relationships across the client base, discussing benefits.
- Drive key meetings and workshops to achieve the outcomes within the deadline.
Techniques used - Linear Regression, SVM, Decision Trees & Random Forest.
Environment: - R, Python, Spark, Pyspark, AWS - Glue, AWS - SageMaker, GitHub, Docker, Flask, Redshift, Ambari-Hive.
Confidential, New Jersey
ETL Lead Developer & Data Analytics
Responsibilities:
- Requirement gathering and Requirement Analysis.
- Working on performance improvement
- Testing and documentation of ETL Mappings and workflows.
- Hands on experience in building shell scripts and BTEQ scripts & Teradata macros.
- Design and Development of ETL mapping and workflows in Informatica 9x.
- Analysis of the specifications provided by the client for change requests/Enhancements.
- Analyzing and debugging of the ETL code to resolve defects, Production issues received through Incidents and Service requests.
- Independently handled 15+ applications and mentored team members.
- Coordinating & interacting with business on new requirements.
- Offshoring the work & mentoring them on technical challenges.
- Production Deployment and Post deployment support.
- Reviewing the work done by offshore team.
- Analysis of the specifications provided by the client for change requests.
Environment: - Informatica Powercenter, Oracle, Teradata, Veeva Salesforce