We provide IT Staff Augmentation Services!

Data Scientist/ Data Engineer Resume

2.00/5 (Submit Your Rating)

New, JerseY

PROFESSIONAL SUMMARY:

  • Over 6 plus years of experience in all phases of diverse technology projects specializing in Data Science and Machine Learning.
  • Data scientist, Data Analyst, and Machine learning engineer experienced in manipulating and deriving insights from large sets of structured, semi - structured, and unstructured data.
  • Good understanding of delivery processes such as Agile.
  • Experience in data analysis, data migration, and data modeling activities.
  • Design effective statistical/predictive models for diverse datasets to predict risk factors.
  • High-level knowledge of statistical machine learning algorithms. Ability to develop solutions to complex data problems, utilize exploratory analysis to identify abnormalities in data, and implement the appropriate set of algorithms. (regression, decision tree, clustering)
  • Deep-sighted knowledge in Data Analysis, Data Validation, Data Cleaning, Data Verification, and identifying data mismatch. Acquired expertise in Relational Data modeling (3NF) and Dimensional data modeling.
  • Strong experience using MS Excel and MS Access to dump the data and analyze based on business needs.
  • Experience in Python Development and Scientific Programing, and using NumPy and Pandas in Python for Data Manipulation.
  • Experience in using Scikit-Learn and Stats models inPythonforMachine LearningandData Mining.
  • Snowflake Core and AWS Cloud Certified. Hands on experience and good understanding of Snowflake data platform. Experience with Snowflake Multi-Cluster Warehouses and building Snow pipe, loading data from the local system and AWS S3 Bucket, in-depth knowledge of Data Sharing in Snowflake, in-depth knowledge of Snowflake Database, Schema, and Table structures, and experience in using Snowflake Clone and Time Travel.
  • Hands on experience and in-depth understanding of Snowflake cloud technology. In-Depth understanding of SnowflakeMulti-cluster Size and Credit Usage

TECHNICAL SKILLS:

Expertise: Scikit-learn, NLTK, Spacy, NumPy, SciPy, OpenCV, Deep learning, NLP, Matplotlib, Microsoft Visual Studio, Microsoft Office.

Machine Learning Algorithms: Linear Regression, Logistic Regression, Decision Trees, Random Forest, K-Means Clustering, Support Vector Machines, Gradient Boost Machines & XG Boost, Neural Networks.

Supervised Models: Linear Models, Tree-based Models

Unsupervised Models: PCA, Clustering

Data Analysis Skills: Data Cleaning, Data Visualization, Feature Selection, Pandas.

Operating Systems: Windows, Mac and Linux, Unix.

Programming Languages: Python, HTML, CSS, SQL, R, MATLAB, Java, Apache Spark, Hadoop, Spark ML.

Other Programming Knowledge and Skills: Elasticsearch, Data Scraping, RESTful-API using Django Web Framework.

Tools: AWS, Tableau, Anaconda.

Hard Skills: R Studio Python Statistics Microsoft Excel Data Visualization (MatplotLib,Seaborn,Plotly,Pandas,Numpy) Base SAS

PROFESSIONAL EXPERIENCE

Confidential, New Jersey

Data Scientist/ Data Engineer

Responsibilities:

  • Working as a Data Scientist on projects which involve utilization of neural networking, machine learning and deep learning algorithm for the analysis of the customer data to improve company profit. Presently, developing a reverse image search engine of more than 200k images, applying NLP techniques for product recommendation.
  • Work involves the application of Tools: Python, TensorFlow, Kera’s, Tableau, R
  • Extraction of N-Grams (Bi-Grams and Tri-Grams) using Spacy and their corresponding sentiments.
  • Analyze product reviews at scale through an algorithm dat can identify key themes and enable buyers, suppliers, and quality inspectors to take actionable decisions around product quality risk mitigation and improve customer satisfaction
  • Done Kmart sentiment analysis with real data of K-Mart as a part of applied Research Experience using Machine Learning and NLP.
  • Done EDA, Machine Learning, and NLP Techniques. Able to successfully
  • associate positive, neutral, and negative sentiments for each product in
  • The machine learning system was able to learn from all the positive, neutral, and negative reviews, and fine-tune the algorithm to avoid bias sentiments.

Confidential MA

AWS Snowflake Data Engineer

Responsibilities:

  • Played a key role in MigratingTeradataobjects intothe Snowflakeenvironment.
  • Experience withSnowflake Multi-Cluster Warehouses. Experience withSnowflake Virtual Warehouses. Experience in buildingSnow pipe.
  • Involved in Migrating Objects from Teradata to Snowflake. Created Snow pipe for continuous data load.
  • Used Temporary and Transient tables on diff datasets. Worked with both Maximized and Auto-scale functionality.
  • Created Snow pipe for continuous data load.
  • Used COPY to bulk load the data. Involved in different data migration activities.
  • Created internal and external stage and transformed data during load.
  • Used FLATTEN table function to produce a lateral view of VARIANT, OBECT, and ARRAY column. Worked with both Maximized and Auto-scale functionality.
  • Good understanding and related experience with Hadoopstack-internals,Hive, Pig, and Map/Reduce.
  • Design and implement disaster recovery for the PostgreSQL Database.
  • Expertise in Analyzing data Quality checks using shell scripts. Upgrading, installing and configuring PostgreSQL servers.
  • PostgreSQL Installation, Configuration, Migration, Upgrades, and Patches. Server versions ranged from PostgreSQL 8.2 to 9.5.
  • Setup full CI/CD pipelines so dat each commit a developer makes will go through the standard process of the software lifecycle and gets tested well enough before it can make it to production.
  • Managed datasets usingPanda data framesandMYSQL. Queried the database queries usingPython-MySQLconnector and retrieved information usingMySQL DB.
  • Involved in developing alinear regressionmodel to predict a continuous measurement for improving the observation on wind turbine data developed using spark with Scala API.
  • Used Spark andSpark-SQLto read the parquet data and create the tables in hive using the Scala API
  • Generating various capacity planning reports (graphical) using Python packages likeNumPy, Matplotlib.
  • Built various graphs for business decision-making usingPython mat plot lib library.
  • Use Python Pandas for Data wrangling and enrichment
  • Data wrangling and cleansing using Python scripts and Pandas library.
  • Experienced in working with spark ecosystem using Spark SQL and Scala queries on different formats like Text file, CSV file.
  • Installed and configured HadoopMapReduce, HDFS developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
  • WroteAnt build scriptsbuild.xml to build the entire java web service project.
  • Used XML parsers to parse and fetch information fromXMLtemplates.
  • Exported the result set fromHIVEto MySQL using Shell scripts.
  • Used Git for version control. Develop financial reports and dashboards for forecasting, trending, and results in analysis
  • Develop MapReduce/Spark modules for machine learning & predictive analytics in Hadoop on AWS.
  • Experience with SQL programming skills and Developed Stored Procedures, Triggers, Functions, Packages.
  • Exploratory data analysis, handling missing data, data wrangling, feature scaling, outlier analysis, and development of algorithms in R.
  • Utilize Excel for data pre-processing (Pivot Tables, VLOOKUP) and created ANOVA sheets, regressions and perform hypothesis testing using the data analysis add-on in Excel.
  • Adept at using SAS Enterprise suite, Python, and Big Data related technologies including knowledge in Hadoop, Hive, Map-Reduce.
  • Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification, and Testing as per cycle in both Waterfall and Agile methodologies.
  • Strong SQL Server and Python programming skills with experience in working with functions.
  • Efficient in developing Logical and Physical Data model and organizing data as per the business requirements using Sybase Power Designer, ER Studio in both Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) applications.
  • Experience in designing star schema, Snowflake schema for Data Warehouse, Operational Data
  • Experience in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using Informatica
  • Work with and extract data from various database sources like Oracle, SQL Server, and DB2.
  • Involved in designing and deploying multi-tier applications using all the AWS services like (EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation
  • Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline dat can be written to Glue Catalog and can be queried from Atana.
  • Experience in the design and development of Tableau visualization solutions.
  • Developed story-telling dashboards in Tableau Desktop and published them on to Tableau Server, which allowed end-users to understand the data on the fly with the usage of quick filters for on-demand needed information.
  • Prepared proof of concept for best practices in Tableau deployment.
  • Extensively used ETL to transfer and extract data from source files (Flat files and DB2) and load the data into the target database.
  • Responsible for developing, support and maintenance for the ETL (Extract, Transform, and Load) processes usingInformatica PowerCenter
  • Stayed focused onInnovation projectsand developed scripts inpythonto connect and download files from team sites and uploading excel data to Teradata using python (PYODBC, Pandas & NumPy) and presented onGitHubefficiency (upload, download, move, rename, remove files, etc. And continuing learnings onAWSandMachine learningspecialization.

Environment: Windows Enterprise Server 2000, SSRS, SSIS, Crystal Reports, DTS, SQL Profiler, and Query Analyze, AWS cloud, XML, JSON, Scala, Gitlab, Gitversion control, JIRA, Python 3.8, SciPy, Pandas, AWS, R Studio, Linux, MySQL, Tableau, GitHub, Postgres SQL, Informatica PowerCenter 9.x/8.x

Confidential, ashington D.C

Data Engineer/Data Analyst/Data Scientist

Responsibilities:

  • Worked in areas ofData Sharingin Snowflake. In-depth knowledge of. SnowflakeDatabase, Schema, and Tablestructures.
  • Experience in using SnowflakeCloneandTime Travel. Used Temporary and Transient tables on diff datasets. Cloned Production data for code modifications and testing.
  • Time traveled to 56 days to recover missed data. Combine data from multiple data sets to provide a comprehensive picture and analysis of client usage and trends
  • Involved in loading data from the edge node to HDFS using shell scripting.
  • Responsible for all backup, recovery, and upgrading of all of the PostgreSQL databases. Monitoring databases to optimize database performance and diagnosing any issues.
  • Extensive experience with Warm Standby (PostgreSQL 8.x and earlier), and Hot Standby (PostgreSQL 9.x and greater).
  • Setup and maintenance of Postgres master-slave clusters utilizing streaming replication
  • Installing and Configuring PostgreSQL from source or packages on Linux machines. Experience designing database structures, indexes, views, partitioning.
  • Programmed a utility inPythondat used multiple packages(SciPy, NumPy, pandas)
  • Implemented Classification using supervisedalgorithmslikeLogistic Regression,Decision trees,KNN,Naive Bayes.
  • Responsible for performingMachine-learning techniques regression/classificationto predict the outcomes.
  • Participated in all phases ofdata mining, data cleaning, data collection, developing models, validation, visualization, andperformed Gap Analysis.
  • Compiled data from various sources to perform complex analysis for actionable results.
  • Utilized machine learning algorithms such aslinear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNNfor data analysis.
  • Data transformationfrom various resources,data organization, features extraction from raw and stored.
  • Built models usingStatistical techniqueslike BayesianHMMandMachine Learningclassification models likeXG Boost,SVM, andRandom Forest.
  • Usedpandas,NumPy,seaborn,Scipy,Matplotlib,sci-kit-learn,NLTKinPythonfor developing variousmachine learning algorithms.
  • Performed ExploratoryData AnalysisandData VisualizationsusingR, andTableau, and python
  • Created Tableau dashboards/reports for data visualization, Reporting, and Analysis and presented them to businesses.
  • Extracted data utilizing SQL queries and tan data analysis utilizing R. Performed exploratory data analysis, handled missing data, data wrangled, feature scaled.
  • Assisted the analytics project lifecycle - data extraction, design, and implementation of scalable machine learning algorithms and documentation of the results.
  • Created rates utilizing statistical analysis which helped determine peak and off-peak periods of sales. Designed and developed NoSQL solutions for all users
  • Managed and administered all NoSQL database systems. Suggested the latest upgrades and technologies for NoSQL databases.
  • Implemented various statistical modeling algorithms like decision trees, linear and logistic regression models, clustering (K means). Worked on various data formats.
  • Experimented with other algorithms like Random Forests and Principal Component Analysis.
  • Helped integrate the effort of both technical and non-technical resources across the business.
  • Worked in Data warehousing methodologies/Dimensional Data modeling techniques such as Star/Snowflake schema using ERWIN9.1.
  • Performed Exploratory Data analysis (EDA) to find and understand interactions between different fields in the dataset, for dimensionality reduction, to detect outliers, summarize main characteristics and extract important variables graphically.
  • Responsible for DataCleaning, feature scaling, and feature engineering using NumPy and Pandas in Python
  • Worked on writing and as well as read data from CSV and Excel file formats. Proficient in object-oriented programming, design patterns, algorithms, and data structures
  • ImplementedPythonscripts to update content in databases and manipulate files.
  • Experience usingPythonlibraries for machine learning like pandas, NumPy, Matplotlib, Sklearn, SciPy to Load the dataset, summarizing the dataset, visualizing the dataset, evaluating some algorithms, and making some predictions.
  • Worked withPythonmodules like Urllib, Urllib2, requests for web crawling. Experience using ML techniques: clustering, regression, classification, graphical models.
  • Carried out Regression, K-means clustering, and Decision Trees along with DataVisualization reports for the management using R.
  • Implemented classification algorithms such as Logistic Regression, KNN, and Random Forests to predict the Customer churn and Customer interface.
  • Performeddata visualization and designed dashboards using Tableau, generated reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders.
  • Involved in debugging and troubleshooting issues and fixed many bugs in two of the main applications.
  • Designed data warehouses on platforms such as AWS Redshift, Azure SQL Data Warehouse, and other high-performance platforms.
  • Knowledge in extracting and synthesizing of data from Azure: data lake storage (ADLs), blob storage, SQL DW, SQL Server; and legacy systems: Oracle and its companion data lake storage.
  • Implementing the preprocessing procedures along with deployment using the AWS services and creating a virtual machine using EC2.
  • Developed various clustering algorithms for market segmentation to analyze customer behavior patterns.
  • Good knowledge in Exploratory data analysis and performed data wrangling and data visualization.
  • Validating the datato check for the proper conversion and identifying and cleaning unwanteddata,dataprofiling for accuracy, completeness, consistency.
  • Preparing standard reports, charts, graphs, and tables from a structureddata source by querying data repositories using Python and SQL.
  • Developed and produced a dashboard, key performance indicators, and monitor organization performance.
  • Working on the improvement of existing machine learning algorithms to extract data points accurately.Completed customer segmentation of small and large businesses.

Environment: R Programming, Python, Jupiter, SQL Server 2014, SQL Server Management Studio, Webservices, Tableau 8.4, Shell Scripting, Postgres SQL

We'd love your feedback!