Data Scientist/machine Learning Engineer Resume
Newport, NJ
SUMMARY
- Seasoned technology specialist with 16+ years of experience in all the phases of software development life cycle.
- I have extensive experience in creating medium to large enterprise distributed software solutions from incubation to product. My experience also cuts through hands - on coding in Data scientist, Machine Learning, Deep Learning, Dev Ops, Big Data, and Cloud Technologies
TECHNICAL SKILLS
Technologies: Python, R, Scala, Java, Mule and C#
Scripting: Unix and Python Shell Scripting
Development Tools: Anaconda, Jupyter, Cloud Jupyter, Sub Lime Text, Vim, R Studio, PyCharm, IntelliJ, Visual Studio Code, Visual Studio 2017, Eclipse, Dev StatX TOAD and SQL Developer
Business Intelligence: Big Data, Hadoop, Spark, PySpark, Alteryx, Trifecta, Grafana, Kafka, Elastic Search(ELK), Hive, Pig and Kibana
Cloud Technologies: Microsoft Azure Cloud, Dockers, Edge Nodes, Kubernetes and AWS
Database Technologies: HBase, Cassandra, Mango DB, Maria DB, Mem SQL, SQL Server 2014, Oracle 12c and DB2
Reporting Tools: Tableau and Power BI
Source Control: Bit Bucket, GitHub and TFS
Build Tools: Maven, Jenkins, Jules, Ansible, Netflix Zulu, Eureka, Apigee, TFS Team Build and WiX 3.0
Business Knowledge: Retail Banking, Wealth Management, Financial and Healthcare
SDLC Process: Agile, Scrum, Jira, Kanban and Dev ops
PROFESSIONAL EXPERIENCE
Data Scientist/Machine Learning Engineer
Confidential, Newport, NJ
Responsibilities:
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
- Analyzing large data sets apply machine learning techniques and develop predictive models
- Used Pandas, NumPy, Seaborn, SciPy, TensorFlow, Pytorch, Textacy, Keras, NLTK, Matplotlib, Scikit-learn and XGBoost in Python for developing various machine learning algorithms.
- Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
- Application of various machine learning algorithms and statistical modeling like XGBoost, Random Forest, Decision trees, Neural Networks to identify Volume using Scikit-learn package in python.
- Evaluated models using Cross Validation, F1 score, ROC curves, Log Loss and used AUC for feature selection.
- Working on NoSQL databases like Cassandra and Maria DB.
- Performed data analysis by using Hive to retrieve the data from Hadoop cluster.
- Experience in building reliable and auditable CI/CD deployment pipeline using Jenkins
- Developing Kafka and Hadoop integration for data ingestion, data mapping and data process capabilities.
- Working in distributed computing and micro services
- Code paths are unit tested, defect free and integration tested
- Working with Cloud based applications and container orchestration like Docker and Kubernetes
- Experience with Agile Development approach
- Experience working with GPUs to develop models
- Experience handling terabyte size datasets
- Written Python script to integrate and productize the model
- Written Spark code for productize the model
Environment: Python, PySpark, AWS, S3, Cloudera, Horton, Kibana, Elastic Search, logintash, Netflix Zuul, Eureka, Side car, Control-M, Gunicorn, Edge Node, Dockers, Kubernetes, Kafka, Altyrex, Git, Jenkin, Bit Bucket, Dev Ops, Scrum and Tableau
Lead Analyst
Confidential, Pennington, NJ
Responsibilities:
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
- Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, Tensorflow, NLP and NLTK in Python for developing various machine learning algorithms.
- Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
- Application of various machine learning algorithms and statistical modeling like linear regression, Naïve Bayes, Random Forest, Decision trees, neural networks, SVM, K-means, KNN clustering to identify Volume using scikit-learn package in python.
- Analyzing large data sets apply machine learning techniques and develop predictive models
- Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection.
- Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database.
- Processed data to build matrix-based collaborative filtering recommendation model via spark (MLlibALS) to drive the web application of the financial product recommendation
- Worked with advanced NLP, clustering, classification, and graph analytics algorithms
Environment: Python, PySpark, HAAS, Azure, Java, C#, Hadoop, Tableau, PowerBI
Lead Analyst
Confidential, Pennington, NJ
Responsibilities:
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
- Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, Tensorflow, NLP and NLTK in Python for developing various machine learning algorithms.
- Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
- Application of various machine learning algorithms and statistical modeling like linear regression, Naïve Bayes, Random Forest, Decision trees, neural networks, SVM, K-means, KNN clustering to identify Volume using scikit-learn package in python.
- Analyzing large data sets apply machine learning techniques and develop predictive models
- Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection.
- Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database.
- Processed data to build matrix-based collaborative filtering recommendation model via spark (MLlibALS) to drive the web application of the financial product recommendation
- Worked with advanced NLP, clustering, classification, and graph analytics algorithms
Environment: Python, PySpark, HAAS, Azure, Java, C#, Hadoop, Tableau, PowerBI
Lead Analyst
Confidential, Pennington, NJ
Responsibilities:
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
- Used Pandas, NumPy, seaborn, SciPy, matplotlib, Scikit-learn, Tensorflow, NLP and NLTK in Python for developing various machine learning algorithms.
- Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
- Application of various machine learning algorithms and statistical modeling like linear regression, Naïve Bayes, Random Forest, Decision trees, neural networks, SVM, K-means, KNN clustering to identify Volume using scikit-learn package in python.
- Analyzing large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
- Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection.
- Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database.
- Processed data to build matrix-based collaborative filtering recommendation model via spark (MLlibALS) to drive the web application of the financial product recommendation
- Worked with advanced NLP, clustering, classification, and graph analytics algorithms
Environment: Python, PySpark, HAAS, Azure, Java, C#, Hadoop, Tableau, PowerBI