We provide IT Staff Augmentation Services!

Data Scientist Resume

5.00/5 (Submit Your Rating)

MarylanD

SUMMARY

  • Over 9.5+ years of strong IT experience in Machine Learning, Data Analytics, Data Mining and ETL Development .
  • As a Data Scientist I have b uilt variety of statistical and predictive models using R, Python employing various Machine learning techniques - Supervised learning, Unsupervised Learning, Deep Learning (TensorFlow, Keras), NLP.
  • Hands-on expertise in Machine Learning, Deep Learning, Data Visualization, Data Cleaning, Creating compelling stories as well as providing actionable insight.
  • Strong Communication skills, Strong working knowledge in structured, semi-structured and unstructured data, large data warehouse, multiple platforms including AWS, Linux, Unix and Mainframe.
  • Transform business requirements into analytical models, design algorithms, develop data mining and reporting solutions that scale across a massive volume of structured and unstructured data.
  • Proficient in use of Statistical Modeling and Machine Learning techniques - Linear & Logistic Regression, Decision Trees, Random Forests, Clustering, SVM, Principle Component Analysis, XGBoost, KNN & Neural Networks (TensorFlow-Keras, Pytorch).
  • Proficient in managing project life cycle including Data Acquisition, Data Preparation, Data Manipulation, Statistical Modeling, Exploratory data analysis and Data Visualization.
  • Managed GitHub repositories and permissions, including branching and tagging.
  • Good Knowledge on NLP using libraries such as Stanford NLP, NLTK, Scikit-Learn, Spacy etc.
  • Extensive experience in Text Analytics and Forecasting, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R and Python.
  • Experienced in all stages of the software lifecycle architecture (Waterfall & Agile Model).
  • Hands on experience on Classification, regression, Clustering, Collaborative filtering, dimensionality reduction techniques.
  • Hands on experience in Informatica Designer Tools, Workflow Manager Tools, Repository Manager & Admin console.
  • Involved in troubleshooting bottlenecks, Performance tuning & implementing pushdown optimization.
  • Hands on experience in UNIX Shell scripting for Automation of batch jobs.
  • Worked on Crontab, Autosys and CA7(Mainframe) schedulers.
  • Strong knowledge in RDBMS concepts and extensive experience in creation and maintenance of database objects and PL/SQL (Stored Procedures, Packages, Synonyms, Functions and Cursors) programming.
  • Implemented Performance tuning on large queries to avoid bottlenecks.
  • Experienced with Teradata utilities like Fast load, Multi load, TPT and BTEQ scripts.
  • Created Code review checklist, Technical Design, Requirement Traceability Matrix documents.
  • Ability to drive initiatives, grasp and expand on ideas, tackle and follow through assignments in the fast-pace changing environment.
  • Independent, yet team oriented with excellent analytical, problem solving, multi-tasking and good inter-personal skills.

TECHNICAL SKILLS

Machine Learning: Linear Regression, Logistic Regression, Decision trees, Ensemble Models (Random forest), Association Rule Mining (Market Basket Analysis), KNN, PCA, Factor Analysis, Clustering (K-Means, Hierarchal), Gradient decent, XGBoost, SVM (Support Vector Machines), Deep Learning (ANN, CNN, RNN) using TensorFlow (Keras), Text Analytics (NLP)

Programming Languages: R, Python (Scikit-Learn), Spark-PySpark, SQL, PL/SQL, C

Databases: Oracle 12c, MS SQL Server 2005, Amazon REDSHIFT, Teradata, Veeva Salesforce, HIVE

ETL Tools: Informatica PowerCenter, AWS - Glue, AWS - SageMaker

Scripting Languages: Shell scripting, Teradata Macros, Bteq Scripts, CURL Scripting

Cloud Technologies: AWS Cloud, Informatica Cloud

BigData Technologies: Hadoop, Hive, Hbase, Pig, HDFS, Sqoop

Versioning Tools: SVN, GitHub

Operating systems: Windows, UNIX/Linux

PROFESSIONAL EXPERIENCE

Confidential, Maryland

Data Scientist

Responsibilities:

  • Involved in conducting statistical analysis to determine key factors to prove total fraud loss using predictive analytics by applying machine learning algorithms.
  • Used GitHub as hosting service by providing convenient place to store multiple versions of files for GIT.
  • Managed GitHub repositories and permissions, including branching and tagging.
  • Driving end to end analytical process from formulation of Requirements, Data Acquisition, Identification of right analytical methods and creation/validation of models and providing business-friendly summarization of results by following the traditional CRISP-DM (Cross Industry Standard Process for Data Mining) model to deliver the Analytical Solutions.
  • Analyze and identify needs for data, Information and analysis/modeling.
  • Experience in building models with deep learning frameworks like TensorFlow, PyTorch and Keras.
  • Draw meaningful insights from data using machine learning techniques and statistics.
  • Used with an ensemble of detection models to estimate risk/fraudulent behavior of a transaction in real-time.

Techniques used - Logistic Regression, Decision Trees, Random Forest, SVM, KNN, ANN (TensorFlow-Keras, Pytorch).

Environment: - R, Python, Spark, Pyspark, Flask, Ambari-Hive, GitHub, Docker.

Confidential, Missouri

Data Scientist

Responsibilities:

  • Coordinating & interacting with business on requirements.
  • Involved in Data analysis, Mining from company databases to drive marketing techniques and business strategies.
  • Used AWS- SageMaker machine learning service to transform data by creating and using SageMaker Notebooks.
  • Having knowledge on all the 4-key components (Build, Train, Tune & Deploy) to SageMaker end-points for real-time predictions.
  • Used AWS-Glue to Extract, transform and load data from S3 bucket and into Redshift Database.
  • Assess the effectiveness and accuracy of new data sources and data gathering techniques.
  • Develop and manage relationships across the client base, discussing benefits.
  • Drive key meetings and workshops to achieve the outcomes within the deadline.

Techniques used - Linear Regression, SVM, Decision Trees & Random Forest.

Environment: - R, Python, Spark, Pyspark, AWS - Glue, AWS - SageMaker, GitHub, Docker, Flask, Redshift, Ambari-Hive.

Confidential, New Jersey

ETL Lead Developer & Data Analytics

Responsibilities:

  • Requirement gathering and Requirement Analysis.
  • Working on performance improvement
  • Testing and documentation of ETL Mappings and workflows.
  • Hands on experience in building shell scripts and BTEQ scripts & Teradata macros.
  • Design and Development of ETL mapping and workflows in Informatica 9x.
  • Analysis of the specifications provided by the client for change requests/Enhancements.
  • Analyzing and debugging of the ETL code to resolve defects, Production issues received through Incidents and Service requests.
  • Independently handled 15+ applications and mentored team members.
  • Coordinating & interacting with business on new requirements.
  • Offshoring the work & mentoring them on technical challenges.
  • Production Deployment and Post deployment support.
  • Reviewing the work done by offshore team.
  • Analysis of the specifications provided by the client for change requests.

Environment: - Informatica Powercenter, Oracle, Teradata, Veeva Salesforce

We'd love your feedback!