Data Scientist/ Data Engineer Resume
SUMMARY
- Over 6 plus years of experience in all phases of diverse technology projects specializing in Data Science and Machine Learning.
- Data scientist, Data Analyst, and Machine learning engineer experienced in manipulating and deriving insights from large sets of structured, semi - structured, and unstructured data.
- Good understanding of delivery processes such as Agile.
- Experience in data analysis, data migration, and data modeling activities.
- Design effective statistical/predictive models for diverse datasets to predict risk factors.
- High-level knowledge of statistical machine learning algorithms. Ability to develop solutions to complex data problems, utilize exploratory analysis to identify abnormalities in data, and implement the appropriate set of algorithms. (regression, decision tree, clustering)
- Deep-sighted knowledge in Data Analysis, Data Validation, Data Cleaning, Data Verification, and identifying data mismatch. Acquired expertise in Relational Data modeling (3NF) and Dimensional data modeling.
- Strong experience using MS Excel and MS Access to dump the data and analyze based on business needs.
- Experience in Python Development and Scientific Programing, and using Numpy and Pandas in Python for Data Manipulation.
- Experience in using Scikit-Learn and Statsmodels in Python for Machine Learning and Data Mining.
- Snow Core Certified. Good understanding of Snowflake data platform. Experience with Snowflake Multi-Cluster Warehouses and building Snow pipe, loading data from the local system and AWS S3 Bucket, in-depth knowledge of Data Sharing in Snowflake, in-depth knowledge of Snowflake Database, Schema, and Table structures, and experience in using Snowflake Clone and Time Travel.
- In-depth understanding of Snowflake cloud technology.
- In-Depth understanding of Snowflake Multi-cluster Size and Credit Usage
TECHNICAL SKILLS
Expertise: Scikit-learn, NLTK, Spacy, NumPy, SciPy, OpenCV, Deep learning, NLP, Matplotlib, Microsoft Visual Studio, Microsoft Office.
Machine Learning Algorithms: Linear Regression, Logistic Regression, Decision Trees, Random Forest, K-Means Clustering, Support Vector Machines, Gradient Boost Machines & XGBoost, Neural Networks.
Supervised Models: Linear Models, Tree-based Models
Unsupervised Models: PCA, Clustering
Data Analysis Skills: Data Cleaning, Data Visualization, Feature Selection, Pandas.
Operating Systems: Windows, Mac and Linux, Unix.
Programming Languages: Python, HTML, CSS, SQL, R, MATLAB, Java, Apache Spark, Hadoop, Spark ML.
Other Programming Knowledge and Skills: Elasticsearch, Data Scraping, RESTful-API using Django Web Framework.
Tools: AWS, Tableau, Anaconda.
PROFESSIONAL EXPERIENCE
Confidential
Data Scientist/ Data Engineer
Responsibilities:
- Working as a Data Scientist on projects which involve utilization of neural networking, machine learning and deep learning algorithm for the analysis of the customer data to improve company profit. Presently, developing a reverse image search engine of more than 200k images, applying NLP techniques for product recommendation.
- Work involves the application of Tools : Python, TensorFlow, Kera’s, Tableau, R
- Platform: Google Cloud, Anaconda, AWS Sagemaker
- From the given Yelp reviews of TS Restaurants, located in Hawaii, the objective was to extract the aspects and opinion indicators using Text blob, NLTK, and Spacy libraries.
- Post data preprocessing, the polarity and subjectivity of the sentences were determined using a text blob.
- Extraction of N-Grams (Bi-Grams and Tri-Grams) using Spacy and their corresponding sentiments.
- Analyze product reviews at scale through an algorithm that can identify key themes and enable buyers, suppliers, and quality inspectors to take actionable decisions around product quality risk mitigation and improve customer satisfaction
- Done Kmart sentiment analysis with real data of K-Mart as a part of
- Applied Research Experience using Machine Learning and NLP. Done EDA, Machine Learning, and NLP Techniques. Able to successfully associate positive, neutral, and negative sentiments for each product in Kmart's Catalog.
- The machine learning system was able to learn from all the positive, neutral, and negative reviews, and fine-tune the algorithm to avoid bias sentiments.
- Applied time series, neural networking, and machine learning tools such as LSTM, cluster analysis, and SARIMA, SARIMAX models to compare TS restaurant chain with their competitors concerning trend, seasonality, and forecast to improve the business and provide insights.
- The data obtained was from social media websites. The sentiment analysis of the reviews was also performed. Work is submitted for publication.
Tools: Python: TensorFlow, Kera’s, Tableau.
Confidential, Burlington MA
Snowflake Certified Data Analyst/Data Engineer/Data Scientist
Responsibilities:
- Played a key role in Migrating Teradata objects into the Snowflake environment.
- Experience with Snowflake Multi-Cluster Warehouses.
- Experience with Snowflake Virtual Warehouses.
- Experience in building Snow pipe.
- Involved in Migrating Objects from Teradata to Snowflake.
- Created Snow pipe for continuous data load.
- Used COPY to bulk load the data.
- Created internal and external stage and transformed data during load.
- Used FLATTEN table function to produce a lateral view of VARIANT, OBECT, and ARRAY column.
- Worked with both Maximized and Auto-scale functionality.
- Good understanding and related experience with Hadoop stack-internals, Hive, Pig, and Map/Reduce.
- Design and implement disaster recovery for the PostgreSQL Database.
- Expertise in Analyzing data Quality checks using shell scripts. Upgrading, installing, and configuring PostgreSQL servers.
- PostgreSQL Installation, Configuration, Migration, Upgrades, and Patches. Server versions ranged from PostgreSQL 8.2 to 9.5.
- Setup full CI/CD pipelines so that each commit a developer makes will go through the standard process of the software lifecycle and gets tested well enough before it can make it to production.
- Managed datasets using Panda data frames and MYSQL. Queried the database queries using Python-MySQL connector and retrieved information using MySQLdb.
- Involved in developing a linear regression model to predict a continuous measurement for improving the observation on wind turbine data developed using spark with Scala API.
- Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API
- Generating various capacity planning reports (graphical) using Python packages like Numpy, Matplotlib.
- Built various graphs for business decision-making using Python mat plot lib library.
- Use Python Pandas for Data wrangling and enrichment
- Data wrangling and cleansing using Python scripts and Pandas library.
- Experienced in working with spark ecosystem using Spark SQL and Scala queries on different formats like Text file, CSV file.
- Installed and configured Hadoop MapReduce, HDFS developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
- Wrote Ant build scripts build.xml to build the entire java web service project.
- Used XML parsers to parse and fetch information from XML templates.
- Exported the result set from HIVE to MySQL using Shell scripts.
- Used Git for version control. Develop financial reports and dashboards for forecasting, trending, and results in analysis
- Develop MapReduce/Spark modules for machine learning & predictive analytics in Hadoop on AWS.
- Experience with SQL programming skills and Developed Stored Procedures, Triggers, Functions, Packages.
- Exploratory data analysis, handling missing data, data wrangling, feature scaling, outlier analysis, and development of algorithms in R.
- Utilize Excel for data pre-processing (Pivot Tables, VLOOKUP) and created ANOVA sheets, regressions and perform hypothesis testing using the data analysis add-on in Excel.
- Adept at using SAS Enterprise suite, Python, and Big Data related technologies including knowledge in Hadoop, Hive, Map-Reduce.
- Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification, and Testing as per cycle in both Waterfall and Agile methodologies.
- Strong SQL Server and Python programming skills with experience in working with functions.
- Efficient in developing Logical and Physical Data model and organizing data as per the business requirements using Sybase Power Designer, ER Studio in both Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) applications.
- Experience in designing star schema, Snowflake schema for Data Warehouse, Operational Data
- Experience in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio, SSIS, SSAS, SSRS.
- Work with and extract data from various database sources like Oracle, SQL Server, and DB2.
- Involved in designing and deploying multi-tier applications using all the AWS services like (EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation
- Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline that can be written to Glue Catalog and can be queried from Athena.
- Experience in the design and development of Tableau visualization solutions.
- Developed story-telling dashboards in Tableau Desktop and published them on to Tableau Server, which allowed end-users to understand the data on the fly with the usage of quick filters for on-demand needed information.
- Prepared proof of concept for best practices in Tableau deployment.
- Worked on requirements Analysis, Physical and Logical design development using Erwin, normalization, and SQL Server Enterprise manager
- Migrated DTS packages and converted them to SSIS packages in SQL Server 2008.
- Extensively used ETL to transfer and extract data from source files (Flat files and DB2) and load the data into the target database.
- Responsible for developing, support and maintenance for the ETL (Extract, Transform, and Load) processes using Informatica PowerCenter
- Stayed focused on Innovation projects and developed scripts in python to connect and download files from team sites and uploading excel data to Teradata using python (Pyodbc, Pandas & NumPy) and presented on GitHub efficiency (upload, download, move, rename, remove files, etc. And continuing learnings on AWS and Machine learning specialization.
Environment: Windows Enterprise Server 2000, SSRS, SSIS, Crystal Reports, DTS, SQL Profiler, and Query Analyze, AWS cloud, XML, JSON, Scala, Gitlab, Gitversion control, JIRA, Python 3.8, SciPy, Pandas, AWS, R Studio, Linux, MySQL, Tableau, GitHub, Postgres SQL, Informatica PowerCenter 9.x/8.x
Confidential
Data Engineer/Data Analyst/Data Scientist
Responsibilities:
- In-depth knowledge of Data Sharing in Snowflake.
- In-depth knowledge of. Snowflake Database, Schema, and Table structures.
- Experience in using Snowflake Clone and Time Travel.
- Used Temporary and Transient tables on diff datasets.
- Cloned Production data for code modifications and testing.
- Time traveled to 56 days to recover missed data.
- Combine data from multiple data sets to provide a comprehensive picture and analysis of client usage and trends
- Involved in loading data from the edge node to HDFS using shell scripting.
- Responsible for all backup, recovery, and upgrading of all of the PostgreSQL databases. Monitoring databases to optimize database performance and diagnosing any issues.
- Extensive experience with Warm Standby (PostgreSQL 8.x and earlier), and Hot Standby (PostgreSQL 9.x and greater).
- Setup and maintenance of Postgres master-slave clusters utilizing streaming replication
- Installing and Configuring PostgreSQL from source or packages on Linux machines. Experience designing database structures, indexes, views, partitioning.
- Programmed a utility in Python that used multiple packages (SciPy, NumPy, pandas)
- Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
- Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
- Participated in all phases of data mining, data cleaning, data collection, developing models, validation, visualization, and performed Gap Analysis.
- Compiled data from various sources to perform complex analysis for actionable results.
- Utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
- Data transformation from various resources, data organization, features extraction from raw and stored.
- Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest.
- Used pandas, NumPy, seaborn, S cipy, M atplotlib, sci-kit-learn, NLTK in Python for developing various machine learning algorithms.
- Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau, and python
- Published the dashboard reports to Tableau Server for navigating the developed dashboards in web
- Created Tableau dashboards/reports for data visualization, Reporting, and Analysis and presented them to businesses.
- The building, publishing customized interactive reports and dashboards, report scheduling using Tableau server.
- Extracted data utilizing SQL queries and then data analysis utilizing R.
Environment: R Programming, Python, Jupiter, SQL Server 2014, SQL Server Management Studio, Webservices, Tableau 8.4, Shell Scripting, Postgres SQL
Confidential
Data Analyst/Data Scientist
Responsibilities:
- Assisted the analytics project lifecycle - data extraction, design, and implementation of scalable machine learning algorithms and documentation of the results.
- Created rates utilizing statistical analysis which helped determine peak and off-peak periods of sales
- Designed and developed NoSQL solutions for all users
- Managed and administered all NoSQL database systems
- Suggested the latest upgrades and technologies for NoSQL databases.
- Evaluated system performance and validated NoSQL solutions.
- Managed and maintained Oracle and NoSQL databases in the production domain
- Performed customer data analysis for further modifying and designing the rates.
- Identified root causes of problems and facilitated the implementation of cost-effective solutions with all levels of management.
- Implemented various statistical modeling algorithms like decision trees, linear and logistic regression models, clustering (K means). Worked on various data formats.
- Experimented with other algorithms like Random Forests and Principle Component Analysis.
- Helped integrate the effort of both technical and non-technical resources across the business.
- Worked in Data warehousing methodologies/Dimensional Data modeling techniques such as Star/Snowflake schema using ERWIN9.1.