Data Scientist/ Machine Learning Engineer Resume

SUMMARY

6+ years of extensive experience in the IT industry as Data Scientist/Machine Learning Engineer and Data Analyst which includes proficiency in Machine Learning, NLP, Data Analysis and Data Visualization, Deep Learning, Big Data, Text Mining, Data Engineering and Business Intelligence/ETL.
Experience in text understanding, classification, pattern recognition and recommendation systems using Python’s in built NLTK library
A deep understanding of Statistical Modelling, Multivariate Analysis and Highly efficient in Dimensionality Reduction methods such as PCA (Principal component Analysis)
Knowledge in Hadoop Core Components (HDFS, MapReduce) and Hadoop Ecosystem (Sqoop, Hive, Pig).
Good knowledge on creating and monitoring Hadoop clusters on Amazon EC2, VM, Horton Works and Cloudera
Worked with NoSQL Database including HBase, Cassandra and MongoDB
Experience in foundational machine learning models and concepts: regression, random forest, boosting, GBM, NNs, HMMs, deep learning.
Comfortable with statistical concepts such as Hypothesis test, ANOVA, T tests, Correlation, A/B test, Experimental Design, Time Series etc.
Implemented deep learning models and numerical Computation with the help of data flow graphs using Tensor Flow Machine Learning.
Experience identifying and interpreting trends in datasets and developing multiple reports/dashboards - line chart, bar chart, donut, box-plot, geo-maps, bubble chart, tree map, etc., to visualize data
Expertise in transforming business requirements into analytical models, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data
Experience in pre-processing the data like dealing with Class Imbalance, treating a missing value, outlier treatment, scaling of feature
Experience object-oriented programming (OOP) concepts using Python, Extensive SQL experience in querying, data extraction and data transformations
Worked and extracted, migrated data from various database sources like Oracle , SQL Server, MYSQL and Teradata
Experienced with ETL process management, Data modeling, Data Wrangling and Data warehouse architecture
Applies advanced statistical and predictive modeling techniques to build, maintain, and improve on multiple real-time decision systems
Identifies what data is available and relevant, including internal and external data sources, leveraging new data collection processes
Closely works with product managers, Service development managers, and product development team in productizing the algorithms developed
Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies
Proven ability to work simultaneously on multiple projects as a team player and as an individual contributor with a strong adaptability to new technologies

TECHNICAL SKILLS

Coding: Python (Numpy, Pandas, Scikit-Learn, NLTK), MATLAB, SAS, MySQL, R, Minitab etc.

Visualization: Tableau, Python (Matplotlib, Seaborn, Plotly, Cufflinks), R (ggplot), MS Excel, Microsoft Power BI.

Microsoft Office Suite: MS Excel (VBA, MACROS, Pie Chart, Bar Chart, Pivot Table), MS Word, MS PowerPoint.

IDE: Anaconda, R-Studio, Visual Studio Code, Jupyter Notebook Azure Data Bricks, Amazon Sage Maker

Big Data: Big Data (PySpark, Spark Streaming, Spark MLlib, Spark SQL, HDFS, Pig, Hive, HBase, Sqoop, Spark)

Graduate Coursework: Data Mining, Business Analytics, Project Management, Time Series Analysis, Applied Data Science Game theory.

Soft Skills: Leadership, teamwork, analytical, attention to detail and problem solving skills.

PROFESSIONAL EXPERIENCE

Confidential

Data Scientist/ Machine Learning Engineer

Responsibilities:

Built customer segmentation and recommender system model using collaborative filtering
Used K-Means Clustering, LightGBM models for segmentation problems
Used Spark Data frames and Big Data technologies such as PySpark, SparkSQL, Spark MLLib extensively and developed ML algorithms using MLLib libraries
Utilized Amazon EMR Big Data Platform to Analyze huge customer data to develop clusters and find possible Customer segments.
Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Data Bricks on AWS Sage Maker. Implemented a Python-based distributed clustering via Pyspark streaming
Used Amazon EMR with Hive, Pig, Spark, MapReduce for Batch Analytics and scheduling daily/weekly/monthly jobs.
Worked with Amazon Redshift, Athena, Amazon EMR with Presto and Spark for Interactive Analytics.
Worked on feature engineering and data preprocessing using PySpark functions
Performed Cross validation and grid search on the model which achieved an 90% accuracy for Recommendations and actual sales
Used CI/CD pipeline with Git to deploy Machine Learning models
Enhancing data collection procedures to include information that is relevant for building analytic systems Processing, cleansing, and verifying the integrity of data used for analysis
Perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined
Performed Dimensionality reduction using near zero variance and correlation techniques
Used Tableau for data visualization to create reports, dashboards for insights and business process improvement
Worked with technical and development teams to deploy models
Build Model Performance Reports and Modeling Technical Documentation to support each of the models for the product line
Built a Recommender system that utilizes previous transaction data and can be used online/offline mode.
Used RMSE score, F-SCORE, PRECISION, RECALL, and A/B testing to evaluate recommender's performance in both simulated environment and real world.

Confidential, Springfield, MA

Data Science/ Data Engineer

Responsibilities:

Work collaboratively with senior management to identify potential Machine Learning use cases and to a setup server-side development environment.
Exploring with the Spark by improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Pair RDD's, Spark
Worked on batch processing of data sources using Apache Spark, Elastic search
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
Worked on migrating MapReduce programs to Spark Data frames API and Spark SQL to improve performance
Master key facets of data investigation, including data wrangling, cleaning, sampling, management, exploratory analysis, regression and classification, prediction, and data communication.
Performed Text Analytics and Text Mining . Developed this entire application as a service with REST API using Flask.
Extensively used Python’s and Spark’s multiple data science packages like Pandas, NumPy, Matplotlib, SciPy, Scikit-learn and NLTK.
Used Similarity Measure Algorithms like Jaro distance, Euclidean Distance and Manhattan Distance.
Performed Entity Tagging - Stanford NER Tagger and used Named Entity Recognition packages like SpaCy.
Used Principal Component Analysis for dimensionality reduction of features.
Performed nested cross validation to compare performance of different model’s and utilized the results for model optimization

Confidential

Data Analyst

Responsibilities:

Analyzed data and turn it into actionable business insights and strategies.
Developed complex SQL queries to analyze and understand data and to bring data together from various systems.
Used Joins like Inner Joins, Outer joins while creating tables from multiple tables.
Worked with SQL, PL/SQL procedures and functions , stored procedures and packages .
Implemented Indexes, Collecting Statistics, and Constraints while creating tables.
Developed SQL queries for retrieving data, updating data, database testing and data analysis.
Enhanced data collection procedures to include information that is relevant for building analytic systems Processing, cleansing and verified the integrity of data used for analysis.
Used advanced Microsoft Excel to create pivot tables, used VLOOKUP and other Excel functions.
Built and maintained on-demand custom reports (ad-hoc) and scheduled reports in response to internal and external users.
Performed different data transformations Extracting, Transforming and Loading (ETL) data using Informatica.
Applied Data Wrangling techniques to Convert unstructured data into structured format.
Developed reusable transformations and Mapplets wherever redundancy is needed.
Worked with various transformations including Router transformation, Joiner transformation, Update Strategy, Lookup transformation, Rank Transformation, Expressions, Aggregator, Sequence Generator and sorter transformation.
Created ETL mappings using Informatica Power center to move Data from multiple sources like Flat files, Oracle into a common target area such as Data Marts and Data Warehouse
Developed mapping for ETL team with source to target data mapping with physical naming standards, datatypes, volumetric, domain definitions, and corporate meta-data definitions.
Developed SQL scripts to validate the data loaded into Data warehouse and Data Mart tables using ETL Informatica.
Performed daily data analysis and prepared reports on daily, weekly, monthly, and quarterly basis
Guided the new team members, explained the process flow of the analysis, standards and the structural layout followed.

Confidential

Python Developer

Responsibilities:

Developed frontend and backend modules using Python on Django Web Framework.
Worked on designing, coding and developing the application in Python using Django MVC.
Wrote functional API test cases for testing REST API’s with Postman and Integrated with Jenkins server to build scripts.
Used Python Library Beautiful Soup for Web Scraping to extract data for building graphs.
Performed troubleshooting, fixed and deployed many Python bug fixes of the two main applications that were a main source of data for both customers and internal customer service team.
Created RESTful web services for Catalog and Pricing with Django MVT, MySQL, and MongoDB.
Development of Python APIs to dump the array structures in the Processor at the failure point for debugging.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship