We provide IT Staff Augmentation Services!

Data Scientist/ Machine Learning Engineer Resume

4.00/5 (Submit Your Rating)

SUMMARY

  • 6+ years of extensive experience in the IT industry as Data Scientist/Machine Learning Engineer and Data Analyst which includes proficiency in Machine Learning, NLP, Data Analysis and Data Visualization, Deep Learning, Big Data, Text Mining, Data Engineering and Business Intelligence/ETL.
  • Experience in text understanding, classification, pattern recognition and recommendation systems using Python’s in built NLTK library
  • A deep understanding of Statistical Modelling, Multivariate Analysis and Highly efficient in Dimensionality Reduction methods such as PCA (Principal component Analysis)
  • Knowledge in Hadoop Core Components (HDFS, MapReduce) and Hadoop Ecosystem (Sqoop, Hive, Pig).
  • Good knowledge on creating and monitoring Hadoop clusters on Amazon EC2, VM, Horton Works and Cloudera
  • Worked with NoSQL Database including HBase, Cassandra and MongoDB
  • Experience in foundational machine learning models and concepts: regression, random forest, boosting, GBM, NNs, HMMs, deep learning.
  • Comfortable with statistical concepts such as Hypothesis test, ANOVA, T tests, Correlation, A/B test, Experimental Design, Time Series etc.
  • Implemented deep learning models and numerical Computation with the help of data flow graphs using Tensor Flow Machine Learning.
  • Experience identifying and interpreting trends in datasets and developing multiple reports/dashboards - line chart, bar chart, donut, box-plot, geo-maps, bubble chart, tree map, etc., to visualize data
  • Expertise in transforming business requirements into analytical models, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data
  • Experience in pre-processing the data like dealing with Class Imbalance, treating a missing value, outlier treatment, scaling of feature
  • Experience object-oriented programming (OOP) concepts using Python, Extensive SQL experience in querying, data extraction and data transformations
  • Worked and extracted, migrated data from various database sources like Oracle , SQL Server, MYSQL and Teradata
  • Experienced with ETL process management, Data modeling, Data Wrangling and Data warehouse architecture
  • Applies advanced statistical and predictive modeling techniques to build, maintain, and improve on multiple real-time decision systems
  • Identifies what data is available and relevant, including internal and external data sources, leveraging new data collection processes
  • Closely works with product managers, Service development managers, and product development team in productizing the algorithms developed
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies
  • Proven ability to work simultaneously on multiple projects as a team player and as an individual contributor with a strong adaptability to new technologies

TECHNICAL SKILLS

Coding: Python (Numpy, Pandas, Scikit-Learn, NLTK), MATLAB, SAS, MySQL, R, Minitab etc.

Visualization: Tableau, Python (Matplotlib, Seaborn, Plotly, Cufflinks), R (ggplot), MS Excel, Microsoft Power BI.

Microsoft Office Suite: MS Excel (VBA, MACROS, Pie Chart, Bar Chart, Pivot Table), MS Word, MS PowerPoint.

IDE: Anaconda, R-Studio, Visual Studio Code, Jupyter Notebook Azure Data Bricks, Amazon Sage Maker

Big Data: Big Data (PySpark, Spark Streaming, Spark MLlib, Spark SQL, HDFS, Pig, Hive, HBase, Sqoop, Spark)

Graduate Coursework: Data Mining, Business Analytics, Project Management, Time Series Analysis, Applied Data Science Game theory.

Soft Skills: Leadership, teamwork, analytical, attention to detail and problem solving skills.

PROFESSIONAL EXPERIENCE

Confidential

Data Scientist/ Machine Learning Engineer

Responsibilities:

  • Built customer segmentation and recommender system model using collaborative filtering
  • Used K-Means Clustering, LightGBM models for segmentation problems
  • Used Spark Data frames and Big Data technologies such as PySpark, SparkSQL, Spark MLLib extensively and developed ML algorithms using MLLib libraries
  • Utilized Amazon EMR Big Data Platform to Analyze huge customer data to develop clusters and find possible Customer segments.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Data Bricks on AWS Sage Maker. Implemented a Python-based distributed clustering via Pyspark streaming
  • Used Amazon EMR with Hive, Pig, Spark, MapReduce for Batch Analytics and scheduling daily/weekly/monthly jobs.
  • Worked with Amazon Redshift, Athena, Amazon EMR with Presto and Spark for Interactive Analytics.
  • Worked on feature engineering and data preprocessing using PySpark functions
  • Performed Cross validation and grid search on the model which achieved an 90% accuracy for Recommendations and actual sales
  • Used CI/CD pipeline with Git to deploy Machine Learning models
  • Enhancing data collection procedures to include information that is relevant for building analytic systems Processing, cleansing, and verifying the integrity of data used for analysis
  • Perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined
  • Performed Dimensionality reduction using near zero variance and correlation techniques
  • Used Tableau for data visualization to create reports, dashboards for insights and business process improvement
  • Worked with technical and development teams to deploy models
  • Build Model Performance Reports and Modeling Technical Documentation to support each of the models for the product line
  • Built a Recommender system that utilizes previous transaction data and can be used online/offline mode.
  • Used RMSE score, F-SCORE, PRECISION, RECALL, and A/B testing to evaluate recommender's performance in both simulated environment and real world.

Confidential, Springfield, MA

Data Science/ Data Engineer

Responsibilities:

  • Work collaboratively with senior management to identify potential Machine Learning use cases and to a setup server-side development environment.
  • Exploring with the Spark by improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Pair RDD's, Spark
  • Worked on batch processing of data sources using Apache Spark, Elastic search
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
  • Worked on migrating MapReduce programs to Spark Data frames API and Spark SQL to improve performance
  • Master key facets of data investigation, including data wrangling, cleaning, sampling, management, exploratory analysis, regression and classification, prediction, and data communication.
  • Performed Text Analytics and Text Mining . Developed this entire application as a service with REST API using Flask.
  • Extensively used Python’s and Spark’s multiple data science packages like Pandas, NumPy, Matplotlib, SciPy, Scikit-learn and NLTK.
  • Used Similarity Measure Algorithms like Jaro distance, Euclidean Distance and Manhattan Distance.
  • Performed Entity Tagging - Stanford NER Tagger and used Named Entity Recognition packages like SpaCy.
  • Used Principal Component Analysis for dimensionality reduction of features.
  • Performed nested cross validation to compare performance of different model’s and utilized the results for model optimization

Confidential

Data Analyst

Responsibilities:

  • Analyzed data and turn it into actionable business insights and strategies.
  • Developed complex SQL queries to analyze and understand data and to bring data together from various systems.
  • Used Joins like Inner Joins, Outer joins while creating tables from multiple tables.
  • Worked with SQL, PL/SQL procedures and functions , stored procedures and packages .
  • Implemented Indexes, Collecting Statistics, and Constraints while creating tables.
  • Developed SQL queries for retrieving data, updating data, database testing and data analysis.
  • Enhanced data collection procedures to include information that is relevant for building analytic systems Processing, cleansing and verified the integrity of data used for analysis.
  • Used advanced Microsoft Excel to create pivot tables, used VLOOKUP and other Excel functions.
  • Built and maintained on-demand custom reports (ad-hoc) and scheduled reports in response to internal and external users.
  • Performed different data transformations Extracting, Transforming and Loading (ETL) data using Informatica.
  • Applied Data Wrangling techniques to Convert unstructured data into structured format.
  • Developed reusable transformations and Mapplets wherever redundancy is needed.
  • Worked with various transformations including Router transformation, Joiner transformation, Update Strategy, Lookup transformation, Rank Transformation, Expressions, Aggregator, Sequence Generator and sorter transformation.
  • Created ETL mappings using Informatica Power center to move Data from multiple sources like Flat files, Oracle into a common target area such as Data Marts and Data Warehouse
  • Developed mapping for ETL team with source to target data mapping with physical naming standards, datatypes, volumetric, domain definitions, and corporate meta-data definitions.
  • Developed SQL scripts to validate the data loaded into Data warehouse and Data Mart tables using ETL Informatica.
  • Performed daily data analysis and prepared reports on daily, weekly, monthly, and quarterly basis
  • Guided the new team members, explained the process flow of the analysis, standards and the structural layout followed.

Confidential

Python Developer

Responsibilities:

  • Developed frontend and backend modules using Python on Django Web Framework.
  • Worked on designing, coding and developing the application in Python using Django MVC.
  • Wrote functional API test cases for testing REST API’s with Postman and Integrated with Jenkins server to build scripts.
  • Used Python Library Beautiful Soup for Web Scraping to extract data for building graphs.
  • Performed troubleshooting, fixed and deployed many Python bug fixes of the two main applications that were a main source of data for both customers and internal customer service team.
  • Created RESTful web services for Catalog and Pricing with Django MVT, MySQL, and MongoDB.
  • Development of Python APIs to dump the array structures in the Processor at the failure point for debugging.

We'd love your feedback!