We provide IT Staff Augmentation Services!

Data Engineer Resume

0/5 (Submit Your Rating)

New York, NY

SUMMARY

  • Strong experience as a Data Engineer and Data Scientist with 7 plus years of experience in getting insights from large datasets through analysis. Strong exposure to working on Machine Learning and NLP projects.
  • Experience in Data Analysis and Business Intelligence tools including Tableau and Power BI, experienced in spatial data.
  • Experience in dealing with various databases like MongoDB, MS SQL Server, and MS Access.
  • Hands - on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SES, SQS, Lambda, EMR and other services of the AWS family.
  • Proficient in creating regression and classification models, using predictive modeling, and analyzing data to deliver insights and implement action-oriented solutions to complex business problems, feature engineering, hypothesis testing, A/B testing, and data visualization.
  • Efficient in Optimizing, Debugging and testing SQL queries, views, and stored procedures in SQL Servers.
  • Proficient in reporting tools like Tableau, Power BI
  • Extensively worked in the mortgage industry.
  • Hands-on experience in various phases of software development life cycle (SDLC) including Analysis, Design, Development, Testing & Production
  • Implemented plans and used agile methodology for managing projects into several phases.
  • Expertise in transforming business requirements into analytical models, developing data mining, designing algorithms, and reporting solutions that scale across the massive volume of structured and unstructured data.
  • Strong experience in Text Mining of cleaning and manipulating text and developing topic modeling using TF/IDF, Word2Vec, Glove2Vec, lemmatization, stop words, and n-grams.
  • Skilled in data preparation, exploratory analysis, Feature engineering, and parameter fine-tuning in supervised Machine Learning models.
  • Experienced in implementing linear and logistic regression, classification modeling, decision trees, cluster and Time Series Analysis, NLP, Dimensionality Reduction, CNN, ANN, Random Forest, XG Boost, Naive Bayes, SVM, Clustering, Association Rule Mining using Python programming.
  • Strong knowledge and skills in statistical methodologies such as experiment design, hypothesis test, Z-test, T-test, Chi-square independence test, and ANOVA.
  • Experienced in using various packages in Python like Pandas, NumPy, SciPy, Scikit-learn, Plotly, and Matplotlib.
  • Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS like MYSQL.
  • Developed data visualizations using Python, and R and created dashboards using tools like Tableau, and Power BI.
  • Excellent communication and interpersonal skills. Ability to work effectively while working as a team member as well as individually
  • Experienced in complex systems analysis and working with large transactional data sets, transcriptions, and data visualization.
  • Tackling complex business obstacles with data-driven, action-oriented solutions and strategic innovation.

TECHNICAL SKILLS

Languages: Python, R, SQL, MATLAB, Scala

Statistical Analysis: Python, R, MATLAB

ML Algorithm: Linear Regression, Logistic regression, Principal Component Analysis (PCA), K-means, Random Forest, Decision Trees, SVM, K-NN, Deep learning (CNN, RNN) and Ensemble methods

Python Packages: Scikit Learn, NumPy, Pandas, Plotly, Keras, NLTK, Matplotlib, Seaborn, SciPy

Databases: MongoDB, Kafka, MS SQL Server, MS Access

BI Tools: Tableau, Power BI

Deep Learning: Ktrain, Keras, Tensor Flow, PyTorch

Version Control: Bitbucket, GitHub

OS: Windows, MacOS, Linux

PROFESSIONAL EXPERIENCE

Confidential, New York, NY

Data Engineer

Responsibilities:

  • Designed and implemented data pipelines to extract, transform, and load data into Snowflake and MongoDB databases.
  • Designed and set up Enterprise Data Lake to provide support for various use cases including Analytics, processing, storing, and Reporting of voluminous, rapidly changing data.
  • Designed and developed Security Framework to provide fine-grained access to objects in AWS S3 using AWS Lambda.
  • Developed ETL workflows to automate data processing and analysis.
  • Designed and implemented database schemas and models for efficient data storage and retrieval.
  • Worked collaboratively with cross-functional teams to understand business requirements and deliver solutions that meet their needs.
  • Efficiently used Agile methodology by Participating in daily Scrum and Sprint initiatives
  • Handling Agile administrative tasks such as conducting meetings, facilitating collaboration, and eliminating hurdles affecting project progress.
  • Communicating between team members about evolving requirements and planning.
  • Implemented AWS Step Functions to automate and orchestrate the Amazon SageMaker-related tasks such as publishing data to S3, the ML model, and deploying it for prediction.
  • Conducted performance tuning to optimize data processing and query performance.
  • Utilized the existing Python modules and rewritten them to deliver data in required formats.
  • Built database Models, APIs, and Views utilizing Python, to build an interactive web-based solution.
  • Worked on object-oriented programming (OOP) concepts using Python and Linux.
  • Implemented automated monitoring and alerting systems to detect issues and ensure data quality.
  • Utilized Github and JIRA to manage code and track progress on projects.
  • Participated in code reviews and provided feedback to other team members.

Environment: AWS EMR, S3, RDS, Redshift, Lambda, Python, SageMaker, R, SQL, MongoDB, SQL Server, JIRA, GitHub

Confidential, Roseland, NJ

Lead Chatbot Python Developer / Machine Learning Engineer

Responsibilities:

  • Maintained and designed context variables in the MongoDB database for various use including analytics dashboard and chatbot model .
  • Collected data from various data warehouses, build a pipeline in Airflow to get the required data, and performed transformations like aggregating different sources, filtering data, and mapping to the target database.
  • Deploy and monitor scalable infrastructure on Amazon web services (AWS) and configuration management instances and Managed servers on the Amazon Web Services (AWS) platform using Ansible configuration management tools and Created instances in AWS as well as migrated data to AWS from Data Center.
  • Used data processing pipeline to get more intuitive results, also prepared the data to facilitate other analyses as requested.
  • Maintained and designed Papermill to allow hyper-parameters to pass into a designated Jupyter notebook and then execute.
  • Designed and Developed ETL Processes with pySpark in AWS Glue to migrate data from S3 to generate Reports. Involved in writing and Scheduling the Databricks jobs Using Airflow.
  • Maintained and designed data pipeline in Airflow and debugged failed tasks.
  • Presented analytics dashboard to stakeholders and explained metrics.
  • Reviewed, suggested, and designed context variables to include in the model by using feature engineering techniques, such as PCA.
  • Assisted and enhanced classifier to better identify user intent from user utterances.
  • Maintained and implemented UI changes in the analytics dashboard per request and aggregated and updated related data sources.
  • Implemented plans and used agile methodology for managing projects into several phases.

Environment: AWS, Python, Pyspark, Airflow, SQL, Data bricks, AWS S3, AWS Athena, and AWS EMR

Confidential, Gibbsboro, NJ

Data Scientist/Machine Learning

Responsibilities:

  • Query data from multiple data tables such as sales, promotions, events, etc., transform the data to the required format, and deliver to the data science group.
  • Assisted targeted audiences analysis and recommendation system to generate top recommendations for potential clients.
  • Used data processing pipeline to get more intuitive results and achieve better control.
  • Championed in analyzing end-user requirements and preparing the data to facilitate analysis. Collected data from the warehouse, build a pipeline to get the required data and performed transformations like aggregating different sources, filtering data, and mapping to the target database.
  • Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
  • Designed to collect customer-related data and tweets and stored them in a JSON file.
  • Developed data for machine learning & predictive analytics stored data.
  • Performed data pre-processing, normalization, and feature scaling, removed duplicate rows, outliers, and aggregated tables, and stored them in Pandas Data frames.
  • Analyzed data using SQL, Python, R, and Scala, and presented analytical reports to technical teams, experienced in spatial data.
  • Used Text mining, and LDA for topic modeling and extracting relevant features to use for analyzing the sentiment of users on different products offered by organizations.
  • Built a text classification model using classical machine learning models such as Logistic regression, SVM, KNN, Random Forest, and Ensemble methods and on the dataset for sentiment analysis
  • Segmented the customers based on demographics using K-means Clustering.
  • Python, as well as R, are used for programming and constant improvement of the model.
  • Used cross-validation to test the model with different batches of data, tuned the parameters to find the best parameters for the model, and optimized, which eventually boosted the performance.
  • Worked in Moto’s lab to perform testing, development, and integration of the software associated with mobile devices (Android tablets and phones).

Environment: Python, Git, NumPy, Pandas, Scikit Learn, SciPy, Matplotlib, Seaborn, ETL.

Confidential, Princeton, NJ

Data scientist/ Machine Learning

Responsibilities:

  • Developed computer vision models for visual inspection, anomaly detection, root cause analysis, and predictive maintenance to improve efficiency and reduce the cost and safety risk during various medical device production stages
  • Developed and optimized various ML/DL models to optimize the manufacturing process and create and deliver high-quality products to customer
  • Executed DATA cleaning, DATA extraction, DATA manipulation, and DATA analysis on massive Datasets of structured and unstructured DATA, experienced in spatial data.
  • Optimized model and Dataset to reduce bias from our results
  • Used libraries such as Numpy, Pandas, SciPy, and Scikit Learn to develop full DATA model and algorithm development cycle, to model train, tune validate and deploy
  • Coordinated with different functional teams to implement models and monitor outcomes
  • Integrated complex DATA and transformation in Database
  • Developed novel statistical model to analyze and interpret DATA generated by medical device
  • Built Machine Learning Algorithms to develop clinical decision support algorithms using Python, R and Matlab and used time series modeling techniques to provide time series medical outcome prediction and intervention solutions
  • Resolved real-time failures in mobile devices by maintaining and optimizing internal android packages such as UI modules etc. and reduced running time. Improved efficiency from 40% to 90% for different carriers. Submitted and merged more than 500 python scripts.
  • Worked on OCR tasks for UI-related issues. Improved accuracy by a spell checker to replace incorrect words with correct ones for the process of automated testing.
  • Interacted and collaborated with other teams in Motorola and the carriers. Ensured comprehensive test coverage by working closely with the product and engineering teams to prioritize testing execution and reported on test execution progress and results.
  • Participated in code reviews and performed Sanity testing on different mobile platforms in Jenkins; Help to pull all the requested logs for issues and work with developers until resolved.

Environment: Python 3.x / NumPy/ Scikit Learn/ Pandas/ Matplotlib/ Seaborn/ SciPy

Confidential, Jersey City, NJ

Data Scientist

Responsibilities:

  • Analyzed experiment results and provided insights and suggestions based on the analysis.
  • Used sentiment analysis to optimize individual pricing to maximize customer satisfaction.
  • Experimented with various statistical models such as AB and multivariate testing to analyze the impact and correlation of price change with users' behavioral patterns.
  • Worked closely with the marketing, design, product, and engineering team to effectively communicate the information generated by various models to optimize user experience.
  • Used and optimized various ML/ DL models to predict key growth obstacles and provide actionable insights on problems in Targeting and Personalization including user uplift modeling, recommendations, personalized promotions and targeting optimization, etc.
  • Developed ML/DL models that identified the hotspot area by extracting news sources and weather forecasting maps
  • Used various DATA cleaning, DATA manipulation, and DATA visualization of various structured and unstructured Dataset using packages
  • Developed and implemented ML/DL models using Logistic Regression, Nave Bayes, Random Forest, KNN, and Datasets to gain insights into customer behavior
  • Implemented end-to-end systems forDataAnalytics,DataAutomation and integrated with custom visualization tools using R, Mahout, Hadoop, and MongoDB.
  • Gathering all thedatathat is required from multipledatasources and creating datasets that will be used in the analysis.
  • Knowledge extraction from Notes using NLP (Python, NLTK, MLLIB, PySpark,)
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POCs using BigData.

Environment: Python 3.x / NumPy/ SciPy/ Scikit Learn/ Pandas/ Matplotlib/ Seaborn

Confidential, New York, NY

Data scientist

Responsibilities:

  • Actively develop predictive models and strategies for effective fraud detection for credit and customer banking activities using clustering K-means.
  • Utilized machine learning algorithms such as linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Used Python Matplotlib packages to visualize and graphically analyze the data.
  • Data pre-processing, splitting the identified data set into a set and a Test set.
  • Performed Data Wrangling to clean, transform, and reshape the data utilizing the panda’s library
  • Data cleaning, Data wrangling, manipulation, and visualization. Extract data from relational databases and perform complex data manipulations. Also conducted extensive data checks to ensure data quality, experienced in spatial data.
  • Used R, and Python programming languages to graphically analyze the data and perform data mining. Also Built and analyzed datasets using Python, MATLAB, and R.
  • Handled importing data from various data sources, and performed transformations. Understand transaction data and develop analytics insights using statistical modeling using Python.
  • Analyzed the performance of image segmentation and Analyzed the performance of recurrent Neural Networks for data over time.
  • Used Python NumPy, and Pandas packages to perform dataset manipulation.
  • Used Data Quality validation techniques to validate and identified various anomalies.
  • Applied Steam game descriptions and discovered latent features incorporated as item-side data
  • Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel, R, MATLAB, and Python.
  • Detected and classified required container images using Python.
  • Used Python packages to train machine-learning models.
  • Performed Pre-research on big data tools such as databases and assess their advantages and disadvantages of them.
  • Extensively used open-source tools - R Studio(R), Spyder (Python), and Jupyter Notebooks for statistical analysis and building machine learning models.

Environment: R, Python, Spyder, Jupyter notebook, R Studio, Tableau, Pandas, SciPy, AWS (EC2, RDS, S3), Matplotlib, Scikit Learn, Hadoop

Confidential, Monmouth Junction, NJ

Data Analyst

Responsibilities:

  • Enhanced Data collection procedures to include information that is relevant for building analytic systems and created value from data by performing advanced analytics and statistical techniques to determine to deepen insights, optimal solution architecture, efficiency, maintainability, and scalability which make predictions and generate recommendations.
  • Worked with the ETL team to document the transformation rules for data to be migrated from OLTP to the Warehouse environment and used for reporting purposes.
  • Worked closely with a data architect to review all the conceptual, logical, and physical database design models with respect to functions, definition, maintenance review and support data analysis, Data quality and ETL design that feeds the logical data models.
  • Maintained and developed complex SQL queries, stored procedures, views, functions, and reports that qualify customer requirements using Microsoft SQL Server.
  • Data mining using state-of-the-art methods.
  • Creating automated anomaly detection systems and constant tracking of their performance.
  • Support Sales and Engagement's management planning and decision-making on sales incentives.
  • Used statistical analysis, simulations, and predictive modelling to analyze information and develop practical solutions to business problems
  • Précised development of several types of sub-reports, drill-down reports, summary reports, and parameterized reports.
  • Developed the reports and visualizations based on the insights mainly using Tableau and dashboards for the company insight teams.

Environment: SQL Server, Tableau, R, Python, Spyder, Jupyter notebook,

We'd love your feedback!