- 6+ years of Experience in Data Mining, Machine Learning and Spark Development with big datasets of Structured and Unstructured Data.
- Delivered scalable ML/AI solutions related to predictive analytics for large, complex datasets on cloud platforms such as Azure, GCP, and AWS.
- Developed business profitable PoCs that required train, test and deploy best analytical solutions hosted in hybrid and scalable cloud environments.
- Contributed to automate Feature Engineering and Machine Learning components of auto - ML product.
- Experienced in Software development and testing using Python and Java.
- Data Acquisition, Data Validation, Predictive demonstrating, Data Visualization. Capable in measurable programming languages like R and Python.
- Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data mining solutions to various business problems and generating data visualizations using R, Python and Tableau.
- Adept and deep understanding of Statistical modeling, Multivariate Analysis, model testing, problem analysis, model comparison and validation.
- Skilled in performing data parsing, data manipulation and data preparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, Remap, merge, subset, reindex, melt and reshape.
- Experience in using various packages in R and libraries in Python.
- Working knowledge in Hadoop, Hive and NOSQL databases like Cassandra and HBase.
- Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
- Good industry knowledge, analytical and problem - solving skills and ability to work well within a team as well as an individual.
- Highly creative, innovative, committed, intellectually curious, business savvy with effective communication and interpersonal skills.
- I can be able to quickly adapt the new work pace and learning
- Expertise: Scikit - learn, NLTK, spaCy, NumPy, SciPy, OpenCv, Deep learning, NLP, RNN, CNN, Tensor flow, Keras, matplotlib, Microsoft Visual Studio, Microsoft Office.
- Machine Learning Algorithms: Linear Regression, Logistic Regression, Decision Trees, Random Forest, K-Means Clustering, Support Vector Machines, Gradient Boost Machines &XGBoost, Neural Networks.
- Data Analysis Skills: Data Cleaning, Data Visualization, Feature Selection, Pandas.
- Operating Systems: Windows, Mac and Linux, Unix.
- Programming Languages: Python, SQL, R, Matlab, Torch, C, C++, Java, Octave, Apache Spark, Hadoop, Spark ML.
- Other Programming Knowledge and Skills: ElasticSearch, Data Scraping, RESTful-Api using Django Web Frame work.
- Tools: Toad, Erwin, AWS, Azure,D3, Mule Soft, Alteryx, Tableau, Shiny, Adobe Analytics, Anaconda
Confidential, Irving, TX
Machine Learning Engineer
- Implemented K-means clustering, KNN and Random Forest to analyze the historical dataset for displaying patterns of attacks and incorporating preemptive actions in model with 97% accuracy
- Trained machine learning and deep learning models to perform detection and segmentation with an accuracy of 60%
- Built company’s data pipelines and developed machine learning applications according to requirements.
- Developed a simplified Machine Learning platform for developers with limited ML experience to train high quality models by developing auto-ML product.
- Reduced 60-70% of time spent on data pre-processing by developing automatic Feature Engineering product that performs automatic data pre-processing and cleansing.
- Reduced ML complexity for developers through DMP (Digital Model Predictor) product that can automatically run ML to give top 5 best fit models for given data.
- Performed Data Ingestion to extract data from multiple sources using Kafka.
Environment: Machine learning, AWS, Spark, HDFS, Hive, Pig, Linux, MySQL, Eclipse, PL/SQL, SQL connector.
Confidential, New York City, NY
Machine Learning Engineer
- Analyzed Trading mechanism for real-time transactions and build collateral management tools.
- Compiled data from various sources to perform complex analysis for actionable results.
- Utilized machine learning algorithms such as linear regression, multivariate regression, naive bayes, Random Forests, K-means, & KNN for data analysis.
- Measured Efficiency of Hadoop/Hive environment ensuring SLA is met.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Prepared Spark build from the source code and ran the PIG Scripts using Spark rather using MR jobs for better performance.
- Analyzing the system for new enhancements/functionalities and perform Impact analysis of the application for implementing ETL changes.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developed Scripts and Batch Job to schedule various Hadoop Program. Used TensorFlow to train the model from insightful data and look at thousands of examples.
- Designing, developing and optimizing SQL code (DDL / DML).
- Building performant, scalable ETL processes to load, cleanse and validate data.
- Expertise in Data archival and Data migration, ad-hoc reporting and code utilizing SAS on UNIX and Windows Environments.
- Tested and debugged SAS programs against the test data.
- Processed the data in SAS for the given requirement using SAS programming concepts.
- Imported and Exported data files to and from SAS using Proc Import and Proc Export from Excel and various delimited text-based data files such as .TXT (tab delimited) and .CSV (comma delimited) files into SAS datasets for analysis.
- Expertise in producing RTF, PDF, HTML files using SAS ODS facility.
- Providing support for data processes. This will involve monitoring data, profiling database usage, trouble shooting, tuning and ensuring data integrity.
- Participating in the full software development lifecycle with requirements, solution design, development, QA implementation, and product support using Scrum and other Agile methodologies.
- Collaborate with team members and stakeholders in design and development of data environment.
- Learning new tools and skillsets as needs arise.
- Preparing associated documentation for specifications, requirements and testing.
- Optimizing the Tensorflow Model for an efficiency.
- Used Tensorflow for text summarization.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Wrote Hive queries for data analysis to meet the business requirements.
- Developed Kafka producer and consumers for message handling.
- Responsible for analyzing multi-platform applications using python.
- Used storm for an automatic mechanism to analyze large amounts of non-unique data points with low latency and high throughput.
- Developed MapReduce jobs in Python for data cleaning and data processing.
Environment: Machine learning, AWS, MS Azure, Cassandra, SAS, Spark, HDFS, Hive, Pig, Linux, Anaconda Python, MySQL, Eclipse, PL/SQL, SQL connector, SparkML.
- Involved in all the phases of data science project life cycle including data extraction, data cleaning, transforming and visualization.
- Responsible for data identification, collection, exploration, cleaning for modeling.
- Performed Data Cleaning, features scaling, featurization, features engineering.
- Queried and aggregated data from SQL server, oracle 10g, MySQL databases to get sample datasets.
- Performed Exploratory Analysis to understand the insights of the data and spot anomalies using Pandas, Matplotlib.
- Ensured data accuracy, and treated missing values using NumPy and Pandas.
- Used Principal Component Analysis to analyze high dimensional data in feature engineering and also eliminated unrelated features.
- Utilized avariety of machine learning methods including Classifications, Regressions, Dimensionally Reduction, Clustering techniques.
- Customer segmentation is achieved by using clustering algorithms to group customers into various segments based on their behavioral and geographical data. This helps in improving target marketing.
- Observed groups of customers being neglected by the pricing algorithm, used hierarchical clustering to improve customer segmentation
- Designed and developed Recommendation models to recommend products to customers using Content based and Collaborative filtering.
- Developed NLP models for Sentiment Analysis for customer reviews.
- Text analytics on review data using Natural Language Processing Tool Kit (NLTK).
- Writing software to clean and investigate large, messy data sets of numerical and textual characters
- Addressed overfitting and underfitting by tuning the hyper parameter of the algorithm and by using L1 and L2 Regularization.
- Created various types of data visualizations using Matplotlib, Tableau to convey the results to other data and marketing teams.
- Used Predictive Analytics to analyze the shopping behavior of the customers.
- Responsible for establishing a detailed program specification through interaction with clients.
Environment: NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Spark ML lib, Tableau, SQL, Linux, Git, Microsoft Excel, PySpark, Spark SQL, Logistic Regression, Random Forests, Decision Trees, t-SNE, PCA, Tensor Flow, K-Means, Natural Language Tool Kit.