Data Scientist Resume Maryland - Hire IT People

SUMMARY

Over 9.5+ years of strong IT experience in Machine Learning, Data Analytics, Data Mining and ETL Development .
As a Data Scientist I have b uilt variety of statistical and predictive models using R, Python employing various Machine learning techniques - Supervised learning, Unsupervised Learning, Deep Learning (TensorFlow, Keras), NLP.
Hands-on expertise in Machine Learning, Deep Learning, Data Visualization, Data Cleaning, Creating compelling stories as well as providing actionable insight.
Strong Communication skills, Strong working knowledge in structured, semi-structured and unstructured data, large data warehouse, multiple platforms including AWS, Linux, Unix and Mainframe.
Transform business requirements into analytical models, design algorithms, develop data mining and reporting solutions that scale across a massive volume of structured and unstructured data.
Proficient in use of Statistical Modeling and Machine Learning techniques - Linear & Logistic Regression, Decision Trees, Random Forests, Clustering, SVM, Principle Component Analysis, XGBoost, KNN & Neural Networks (TensorFlow-Keras, Pytorch).
Proficient in managing project life cycle including Data Acquisition, Data Preparation, Data Manipulation, Statistical Modeling, Exploratory data analysis and Data Visualization.
Managed GitHub repositories and permissions, including branching and tagging.
Good Knowledge on NLP using libraries such as Stanford NLP, NLTK, Scikit-Learn, Spacy etc.
Extensive experience in Text Analytics and Forecasting, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R and Python.
Experienced in all stages of the software lifecycle architecture (Waterfall & Agile Model).
Hands on experience on Classification, regression, Clustering, Collaborative filtering, dimensionality reduction techniques.
Hands on experience in Informatica Designer Tools, Workflow Manager Tools, Repository Manager & Admin console.
Involved in troubleshooting bottlenecks, Performance tuning & implementing pushdown optimization.
Hands on experience in UNIX Shell scripting for Automation of batch jobs.
Worked on Crontab, Autosys and CA7(Mainframe) schedulers.
Strong knowledge in RDBMS concepts and extensive experience in creation and maintenance of database objects and PL/SQL (Stored Procedures, Packages, Synonyms, Functions and Cursors) programming.
Implemented Performance tuning on large queries to avoid bottlenecks.
Experienced with Teradata utilities like Fast load, Multi load, TPT and BTEQ scripts.
Created Code review checklist, Technical Design, Requirement Traceability Matrix documents.
Ability to drive initiatives, grasp and expand on ideas, tackle and follow through assignments in the fast-pace changing environment.
Independent, yet team oriented with excellent analytical, problem solving, multi-tasking and good inter-personal skills.

TECHNICAL SKILLS

Machine Learning: Linear Regression, Logistic Regression, Decision trees, Ensemble Models (Random forest), Association Rule Mining (Market Basket Analysis), KNN, PCA, Factor Analysis, Clustering (K-Means, Hierarchal), Gradient decent, XGBoost, SVM (Support Vector Machines), Deep Learning (ANN, CNN, RNN) using TensorFlow (Keras), Text Analytics (NLP)

Programming Languages: R, Python (Scikit-Learn), Spark-PySpark, SQL, PL/SQL, C

Databases: Oracle 12c, MS SQL Server 2005, Amazon REDSHIFT, Teradata, Veeva Salesforce, HIVE

ETL Tools: Informatica PowerCenter, AWS - Glue, AWS - SageMaker

Scripting Languages: Shell scripting, Teradata Macros, Bteq Scripts, CURL Scripting

Cloud Technologies: AWS Cloud, Informatica Cloud

BigData Technologies: Hadoop, Hive, Hbase, Pig, HDFS, Sqoop

Versioning Tools: SVN, GitHub

Operating systems: Windows, UNIX/Linux

PROFESSIONAL EXPERIENCE

Confidential, Maryland

Data Scientist

Responsibilities:

Involved in conducting statistical analysis to determine key factors to prove total fraud loss using predictive analytics by applying machine learning algorithms.
Used GitHub as hosting service by providing convenient place to store multiple versions of files for GIT.
Managed GitHub repositories and permissions, including branching and tagging.
Driving end to end analytical process from formulation of Requirements, Data Acquisition, Identification of right analytical methods and creation/validation of models and providing business-friendly summarization of results by following the traditional CRISP-DM (Cross Industry Standard Process for Data Mining) model to deliver the Analytical Solutions.
Analyze and identify needs for data, Information and analysis/modeling.
Experience in building models with deep learning frameworks like TensorFlow, PyTorch and Keras.
Draw meaningful insights from data using machine learning techniques and statistics.
Used with an ensemble of detection models to estimate risk/fraudulent behavior of a transaction in real-time.

Techniques used - Logistic Regression, Decision Trees, Random Forest, SVM, KNN, ANN (TensorFlow-Keras, Pytorch).

Environment: - R, Python, Spark, Pyspark, Flask, Ambari-Hive, GitHub, Docker.

Confidential, Missouri

Data Scientist

Responsibilities:

Coordinating & interacting with business on requirements.
Involved in Data analysis, Mining from company databases to drive marketing techniques and business strategies.
Used AWS- SageMaker machine learning service to transform data by creating and using SageMaker Notebooks.
Having knowledge on all the 4-key components (Build, Train, Tune & Deploy) to SageMaker end-points for real-time predictions.
Used AWS-Glue to Extract, transform and load data from S3 bucket and into Redshift Database.
Assess the effectiveness and accuracy of new data sources and data gathering techniques.
Develop and manage relationships across the client base, discussing benefits.
Drive key meetings and workshops to achieve the outcomes within the deadline.

Techniques used - Linear Regression, SVM, Decision Trees & Random Forest.

Environment: - R, Python, Spark, Pyspark, AWS - Glue, AWS - SageMaker, GitHub, Docker, Flask, Redshift, Ambari-Hive.

Confidential, New Jersey

ETL Lead Developer & Data Analytics

Responsibilities:

Requirement gathering and Requirement Analysis.
Working on performance improvement
Testing and documentation of ETL Mappings and workflows.
Hands on experience in building shell scripts and BTEQ scripts & Teradata macros.
Design and Development of ETL mapping and workflows in Informatica 9x.
Analysis of the specifications provided by the client for change requests/Enhancements.
Analyzing and debugging of the ETL code to resolve defects, Production issues received through Incidents and Service requests.
Independently handled 15+ applications and mentored team members.
Coordinating & interacting with business on new requirements.
Offshoring the work & mentoring them on technical challenges.
Production Deployment and Post deployment support.
Reviewing the work done by offshore team.
Analysis of the specifications provided by the client for change requests.

Environment: - Informatica Powercenter, Oracle, Teradata, Veeva Salesforce

We provide IT Staff Augmentation Services!

Data Scientist Resume

MarylanD

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship