We provide IT Staff Augmentation Services!

Machine Learning Engineer / Data Scientist Resume

4.50/5 (Submit Your Rating)

SUMMARY

  • Over 10+ years as in IT Enterprise infrastructure and software development life cycle for various large organizations. Using big data tools and architectures, cloud development and cloud architecture such as Cloud Computing technologies with as AWS, cloud services was EC2 AWS databases, AWS No - sql AWS S3 bucket etc Apache Spark and Hadoop. Data science and experience using accurate machine-learning models in Python
  • Designing and developing various machine learning frameworks using Python and Matlab.
  • Collaborated with data engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction from various sources. Experience in entire data science project life cycle and involved in all phases, including data extraction, data collection, data cleaning, statistical modeling and data visualization, with large datasets of structured and unstructured data.
  • Hands-on experience in Machine Learning algorithms such as Linear Regression, SVM, KNN, Random Forest, SVM, XGBoost, K-means Clustering, Hierarchical clustering, PCA, Feature Selection, Collaborative Filtering, PCA, Ensembles.
  • Involved in developing the Information extraction using libraries like NLTK, spaCy, word2vec, tokenization, to extract named entity recognition in the emails tweets and across social media to determine sentiment analysis.
  • Data Ingestion from various sources into Hadoop and Cassandra using Kafka. Experienced in Automating, Configuring and deploying instances on AWS, Azure environments and Data centers, also familiar with EC2, Cloud watch, Cloud Formation and managing security groups on AWS. Deploying machine learning model with AWS SAGE MAKER
  • Automating Deployment of Machine learning models with a continuous integration and continuous deployment model. Automating machine learning pipeline with various tools like tensor-flow and kube-flow. Deployment in a micro service architecture.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data.
  • Proficient in design and development of various Dashboards, Reports utilizing Tableau Visualizations like bar graphs, scatter plots, pie-charts, Geographic's and other making use of actions, local and global filters, cascading filters, context filters, Quick filters, parameters according to the end user requirements.
  • Experience with Amazon Web Services (S3) storage AWS RDS both no-sql and relational DB and most AWS architecture and solid proficiency in Python, SQL and Machine learning tools/packages. Experience in Apache Spark, Kafka for Big Data Processing In depth knowledge and hands on experience of Big Data/ Hadoop ecosystem (MapReduce, HDFS, Hive and Sqoop).
  • Working experience in version control tools such as Git 2.X to coordinate work on file with multiple team members.
  • Experience with implementing machine learning solutions within production environments, to actively and skillfully conceptualize, apply, analyze, synthesize, and/or evaluate information gathered from, or generated by observation, experience, reflection, reasoning, or communication, as a guide to belief and action.
  • Employing various SDLC methodologies such as Agile and SCRUM methodologies.
  • Knowledge of working with Proof of Concepts (POC), Gap Analysis, gathering necessary data for analysis from different sources, preparing data for exploration using Data Mining
  • Applied a variety of a variety of machine learning techniques to increase identification of payment integrity issues for our clients, reduce the cost of auditing processes, or increase the quality of care and outcomes for our client's members.
  • Communicating and Implemented machine learning solutions within production environments at scale. Communicating and applying statistical methodology to solve business problems and appropriately interpret meaning from results.Strong ability to appropriately summarize and effectively communicate complex concepts & varied data sets to inform stakeholders, gain approval, or prompt actions.
  • Ability to communicate and interact with multiple audiences ranging from the analyst to executive level Skilled in oral & written communication and multimedia presentation.
  • Good team player and quick-learner; highly self-motivated person with good communication and interpersonal skills.

TECHNICAL SKILLS

  • Apache Hadoop/MapReduce on YARN, Apache Sqoop, Spark/Kafka/ PySpark and AWS to run computing in large volume of data.
  • Microservices Architecture, EC2, S3, IAM, VPC, Athena, Cloud Formation, SQS, SNS, Cloud Watch, Kinesis, Lambda, API Gateway
  • Dynamodb, ECS, Jupyter Notebook, RNN, ANN, Spark, Hadoop, Machine Learning, Deep Learning, TensorFlow, Spark 2.2, Scala
  • Linux, Spark SQL. NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Tableau, SQL, Linux, Git, Microsoft Excel, PySpark-ML
  • Random Forests, SVM, Tensor Flow, Keras. NLTK, spaCY. Data collection, data pre-processing feature engineering machine learning model development and deployment. Python 2.X /3.X libraries including MatplotLib
  • Numpy, Scipy, Pandas, Beautiful Soup, Seaborn, Scikit-learn and NLTK for analysis purpose.

PROFESSIONAL EXPERIENCE

Machine Learning Engineer / Data Scientist

Confidential

Responsibilities:

  • Responsibilities include: Using Logistic Regression and Random Forest algorithms to reduce attrition rate and help to determine retention and suggesting increase in components that incentivize value in retention.
  • Developing a predictive model to solve the business problem of Turnover based my Prediction on Logistic Regression and Random Forest algorithms And creating an End to End Business Intelligence solutions delivery to optimize the ease of decision making
  • Migrated the data from various data sources to AWS S3 and built tables in AWS Athena by scheduling ETL jobs using AWS Glue to replace multiple data sources connected to Tableau resulting in process improvement and efficiency.
  • Work with a Data Science Team Lead, data science management, business operations and product management to assess the potential value and risks associated with business problems that have the potential to be solved using machine learning and AI techniques.
  • Develop an exploratory data analysis approach with the team lead to verify the initial hypothesis associated with potential AI/ML use cases.
  • Trained a machine learning model based on XGBOOST a decision tree ensemble model with framework in Python to forecast the number of reservations that can be cancelled to facilitate better planning of the supply chain.
  • Created and developed model to improve pricing for rentals at each location using XGBOOST Regressor algorithm to improved the current forecast accuracy
  • Developed and designed program methods, processes and systems to consolidate and analyze the unstructured data from multiple sources to generate actionable insights and solutions. datasets, graphics and executive level reporting for mass distribution. Application of various machine learning algorithms and statistical Modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using Scikit-learn package in python.
  • Working on sentiment analysis on social media that includes both clustering and classification of messages.working on word2vec and deep learning for NLP algorithms such as RNN to get best results for sentiment analysis. Also Determining Information extraction from customer review and feedback. Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms). Ensure that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.
  • Develop machine learning models using Logistic Regression, SVC, Linear regression, Random Forest, XGBOOST and KNN. Perform Data Imputation using scikit-learn package of Python, create interactive analytic dashboards using Tableau. Conduct analysis on assessing customer consuming behaviors and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
  • Implement and test the model on AWS EC2; collaborated with development team to get the best algorithm and parameters.
  • Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements. Used python (NumPy, Scipy, Pandas, Scikit-learn, Matplotlib, Seaborn) and spark (Pyspark, MLlib) for model development for analytical purposes. Improved the accuracy of the models using bagging and boosting techniques to minimize variance and improve predictive accuracy.
  • Optimize and adapt the existing algorithms to leverage new increasing data. Create and design reports that will use gathered metrics to infer and draw logical conclusions from past and future behavior. Perform Data Cleaning, features scaling, features engineering using pandas and NumPy packages in Python.

Data Scientist/ Machine Learning Engineer

Confidential

Responsibilities:

  • Worked on data processing on very large datasets that handle missing values, creating dummy variables and various noise in data.
  • Solved business problems including segmenting customers by purchasing behavior, modeling customer profitability and lifetime, Managed the behavior score risk modeling for consumer, and business credit card products. Utilized various techniques like Histogram, bar plot, Pie-Chart, Scatter plot; Box plots to determine the condition of the data.
  • Performed data pre-processing tasks like merging, sorting, finding outliers, missing value imputation, data normalization, making it ready for statistical analysis.
  • Implemented ridge regression, subset selection methods to choose most statistically significant variables for analysis. Used various machine learning algorithms such as Linear Regression, Ridge Regression, Lasso Regression, Elastic net regression, KNN, Decision TreeRegressor, SVM, Bagging Decision Trees, Random Forest and XGBoost. Built classification machine learning model in Python to predict the probability of a customer to be default in credit card payment and improved the accuracy of the model.
  • Evaluated the performance on customer discount optimization on millions of customers. Used F-Score, AUC/ROC, Confusion Matrix and RMSE to evaluate different model performance. Performed data visualization and Designed dashboards with Tableau, and generated complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders.
  • Predicted potential credit card defaulters with 84% accuracy with Random Forest. Provide expertise and consultation regarding consumer and small business behavior score modeling issues and gives advice and guidance to risk manager using the models in strategies. Participate in strategically-critical analytic initiatives around customer segmentation, channel preference and targeting/propensity scoring.
  • In partnership with Marketing, utilize Machine Learning to improve customer retention and product deepening across all Financial products, including mortgages, cards and auto. Working closely with other model risk governance, credit bureaus, external consultants and compliance and regulatory response to ensure proper development and installation of performance reporting and model tracking.
  • Leverage a broad stack of technologies -- Python, Docker, AWS, Airflow, and Spark to reveal the insights hidden within huge volumes of numeric and textual data. Build machine learning processes that monitor and provide feedback on how or where to improve predictive models already deployed.
  • Deployed models as python package, as api for backend integration and as services in a micro-services architecture with a kubernetes orchestration layer for the dockers containers
  • Work closely with business stakeholders, Financial Analysts, Data Engineers, sound organizational decisions. Data Visualization Specialists and other team members to turn data into critical information and knowledge that can be used to make
  • Use creative thinking and propose innovative ways to look at problems by using data mining (the process of discovering new patterns from large datasets) approaches across a wide range and variety of data assets. Present ed findings to the business by exposing assumptions and validation to business teams. Establish and maintain effective working relationships with team members, as well as an innate curiosity around wanting to understand business processes, business strategy and strategic business initiatives to help drive incremental business value from enterprise data assets. implement data pipelines utilizing Glue, Lambda, Spark, Python.
  • Work with Data Engineers to determine how to best source data, including identification of potential proxy data sources, and design business analytics solutions, considering current and future needs, infrastructure and security requirements, load frequencies, etc.

Data Engineer/Data Scientist

Confidential

Responsibilities:

  • Role: Responsible for conducting data-driven strategic analyses and developing internal decision-making tools/products to enable retail team decision making. Constantly looking for opportunities to develop machine learning and statistical models to automate business decision making. Created highly impactful solutions to complex problems alongside a talented team in a cross- functional (e.g. product, merchandising, supply chain) environment.
  • Derived actionable insights from massive data sets with limited oversight. Automated quarterly insights across native / partner channels for external facing analyst sessions as well as internal tracking.
  • Built analytics expertise in multiple channels including display, website personalization, search, email, social, digital marketing and cross-channel attribution.
  • Maintained all the failure logs in Hadoop along with the process and history of applied solutions. The proactive model uses the log history to assess the nature of the fault and then applies the corresponding solution from the history.
  • Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
  • Loaded data in HDFS from web server log data using Apache Flume. Designed and implemented custom writable, custom input formats, custom partitions and custom comparators in MapReduce.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files. Implemented UDFs, UDAFs, UDTFs in Java for Hive to process the data that can't be performed using Hive inbuilt functions.
  • Effectively used Oozie to develop automatic workflows of Sqoop, MapReduce, and Hive jobs. Created Hive tables to store the processed results in tabular format. Written Map Reduce code that will take input as log files and parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Created External Hive Table on top of parsed data. Involved in gathering the requirements, designing, development, and testing.
  • Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
  • Weekly meetings with technical collaborators and active participation in ELT code review sessions with senior and junior developers. Loaded and analyzed logs generated by different web applications.
  • Loaded and transformed large sets of structured, semi-structured and unstructured data in various formats like text, sequence, XML, and JSON. Written multiple MapReduce programs to power data for extraction, transformation, and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
  • Provided input into the refinement of existing data sources and the collection of new ones to improve the development of insights and predictive models.
  • Developed predictive Model using historical and current data to identify interested customers for Email Campaign.
  • Experience in Gradient Boosting algorithms like XGBoost.
  • Involved in developing the Information extraction using libraries like NLTK, spaCy, word2vec, tokenization, to extract named entity recognition in the emails tweets and across social media to determine sentiment analysis.
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy. Participated in feature engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn pre-processing.
  • Used Python (NumPy, Scipy, Pandas, Scikit-Learn, Seaborn), and Spark 2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.
  • Utilized spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. Implemented, tuned, and tested the model on AWS EC2 to get the best algorithm and parameters. Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure. Designed and developed machine learning models in Apache - Spark (MLlib). Used NLTK in Python for developing various machine learning algorithms. Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis. Communicated the results with operations team for taking best decisions.Collected data needs and requirements by Interacting with the other departments.

AWS BigData Architect

Confidential

Responsibilities:

  • Experienced in Big-data stack and Big-data architecture from Data collection, data storage in huge volumes, data lake and data ware housing, database for No-sql and DBMS to Data Analytics in Bigdata with technology stacks like: Spark, Cassandra, Hive, Spark SQL, Spark Streaming. Kafka, Kinesis, Sqoop, EMR, AWS Data pipeline. e.tc.
  • Assess business requirements for each project phase and monitor progress to consistently meet deadlines, standards and cost targets. Implemented cost-savings initiatives to reduce infrastructure costs by consolidating databases, and made development standards. Automated various database and infrastructure tasks.
  • Designed and Implemented Enterprise Data Lake Platform (EDP) for storing vast quantities of data in different formats. Defined Capacity for storing data across multiple node Cluster a t various phases
  • Experience on Hadoop cluster maintenance, including data and metadata backups, file system checks, commissioning and decommissioning nodes and upgrade hands on Experience in Installing, Configuring and using Hadoop Eco System Components like HDFS, Hadoop Map Reduce, Yarn, Zookeeper, Sentry, Sqoop, Flume, Hive, HBase, Pig, Oozie.
  • Hadoop cluster maintenance, including data and metadata backups, file system checks, commissioning and decommissioning nodes and upgrade hands on Experience in Installing, Configuring and using Hadoop Eco System Components like HDFS, Hadoop Map Reduce, Yarn, Zookeeper, Sentry, Sqoop, Flume, Hive, HBase, Pig, Oozie.
  • Architecting environments for Hadoop, HDFS, Map Reduce and Hive, Pig, Spark and Scala.
  • Performed importing and exporting Data into HDFS and Hive using SQOOP.
  • Determining Data compression file for different data transportation scenarios like Avro, B-zip, paque compression
  • Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
  • Worked on Agile Methodology.
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in managing and reviewing Hadoop log files
  • Architecting and designing data flow pipeline to Loaded and transform large datasets such as Structured, Unstructured and Semi Structured data
  • Designing and also implementing Importing and Exporting data into HDFS and Hive using Sqoop
  • Good Understanding of Kafka messaging architecture and kinesis including rabbitMQ and likes
  • Architecting and designing cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration.
  • Working environment with developers to transform Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala, Importing data from different sources like HDFS/Hbase into Spark RDD.
  • Working with developers to Create multiple Hive tables, implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access.
  • Excellent experience in ETL analysis, designing, developing, testing and implementing ETL processes including performance tuning and query optimizing of database.
  • Implemented Spark using Pyspark, spark streaming for real time and SparkSQL for faster testing and processing of data.

Technologies: HDFS, CDH4, Kafka, kinesis, Spark, Spark SQL, Spark Streaming, DynamoDB, Cassandra, Hive, Pig, Oozie, Map Reduce, Java, python, Sqoop, Oracle. Hadoop Distribution of Horton-work s, Cloudera, Avro, Apache Spark

We'd love your feedback!