Data Scientist Resume
Newark, NJ
SUMMARY
- 7+ years of professional experience in Statistical modeling, Machine Learning, Conversational AI, Data Visualization
- Expertise in transforming business resources and requirements into manageable data formats and analytical models, Designing algorithms, building models, developing data mining and reporting solutions and scale.
- Proficient in managing entire data science project life cycle including Data Acquisition,Data
- Preparation,Data Manipulation, Feature Engineering, Statistical Modeling, Testing and Validation, Visualization and Reporting.
- Proficient in Machine learning algorithm like Linear Regression, Ridge, Lasso, Elastic Net Regression, Decision Tree,
- Random Forests and more advanced algorithms like ANN, CNN, RNN, Ensemble methods like Bagging, Boosting, Stacking.
- Build chatbots using RASA for more than 4 years, has worked with chatbot in several projects, have a good experience in feature extraction analysis methods.
- Excellent performance in Model Validation and Model Tuning with Model selection, K - ford cross-validation, Hold-Out
- Scheme and Hyperparameter tuning by Grid search and HyperOpt.
- Advanced experience with Python (2.x, 3.x) and its libraries such as NumPy, Pandas, Scikit-learn, XGBoost, LightGBM,
- Keras, Matplotlib, Seaborn.
- Strong Knowledge in Statistical methodologies such as Hypothesis Testing, ANOVA, Principal Component Analysis
- (PCA), Monte Carlo Sampling and Time Series Analysis.
- Strong experience with R to develop Machine learning models and Hypothesis testing.
- Basic knowledge with Hadoop, Spark and experience with BigDatatools such as PySpark, Pig and Hive.
- Experience in building machine learning solutions using PySpark for large sets ofdataon Hadoop System.
- Experience in using cloud services Amazon Web Services (AWS) including EC2, S3, AWS Lambda, and EMR.
- Proficient at building and publishing interactive reports and dashboard with design customizations based on the Stakeholders' needs in Tableau.
- Experienced in RDMS such as SQL Server 2012, Oracle 9i/10g and NoSQL database like MongoDB, DynamoDB.
- Expert in SQL with writing Queries, Temp tables, CTE, Stored Procedures, User-Defined Functions, Views, and Indexes.
- Responsible for creating ETL packages, migratingdatafrom, Flat File and MS Excel, cleaningdataand backing updatafiles, and synchronizing daily transactions by using SSIS.
- Quick learner in any new business industries or software environment to deliver the best solutions adapted to new requirements and challenges.
- Knowledge and experience in GitHub/Git version control tools.
TECHNICAL SKILLS
Deep learning Frameworks and libraries: Tensor flow, Keras, TF Learn, NumPy, Conversational AI matplotlib, Seaborn, Pandas, Scikit learn, NLTK, Kubeflow, Vertex AI, tensor flow.
ML and DL algorithms: Linear, Logistic regression, Support Vector Machines, K-Means, Decision Trees, Random forests, Single and Multi-layer perceptron’s, CNN, chatbot, ensemble models, YOLO, object detection, feature extraction, SSD, RASA etc.:
Programming Languages and tools: Python, C++, MATLAB, MySQL, Visual basic, XCode, Jupyter, Spyder, OpenCV, Tableau, Google cloud platform. .
Operating Systems: Microsoft Windows, Mac, Linux.
PROFESSIONAL EXPERIENCE
Confidential - Newark, NJ
Data Scientist
Responsibilities:
- Worked asDataScientistand developed and deployed predictive models for analyzing customer churn and retention
- PerformedDataExtraction,DataManipulation andDataAnalysis on TBs of structured and unstructureddata
- Developed machine learning models using Logistic Regression, Naïve Bayes, Random Forest and KNN.
- PerformedDataImputation using Scikit-learn package of Python
- Created interactive analytic dashboards using Tableau
- Conducted analysis on assessing customer consuming behaviors and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering
- Collaborated withdataengineers and operation team to implement ETL process, wrote and optimized SQL queries to performdataextraction to fit the analytical requirements
- Performed sentiment analysis and captured customer sentiments and categorized positive, negative, angry and happy customers from feedback forms
- Ensured that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structureddata
- Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
- Integrated SAS datasets into Excel using DynamicDataExchange, using SAS to analyzedata, statistical tables, listings and graphs for reports.
- Use Principal Component Analysis in feature engineering to analyse high dimensionaldata
- Use MLlib, Spark's Machine learning library to build and evaluate different models
- PerformedDatamanagement like Merging, concatenating, interleaving of SAS datasets using MERGE, UNION and SET statements inDATAstep and PROC SQL.
- Experience in using SAS to read, write, import and export to anotherdatafile formats including delimited files, spreadsheet, Microsoft excel and access tables.
- Communicate with team members, leadership, and stakeholders on findings to ensure models are well understood and incorporated into business processes
Environment: Python, R, MS Excel, Perl, MS SQL Server, HIPAA, EDI, Power BI, Tableau, T-SQL, ETL, MS Access, XML, JSON, MS office 2010, Outlook.
Confidential - Austin, TX
Data Analyst
Responsibilities:
- Statistical Modelling with ML to bring Insights inDataunder guidance of PrincipalData Scientist
- Ingestion with Sqoop, Flume.
- Used SVN to commit the Changes into the main EMM application trunk.
- Understanding and implementation of text mining concepts, graph processing and semi structured and unstructureddataprocessing.
- Worked with Ajax API calls to communicate with Hadoop through Impala Connection and SQL to render the requireddatathrough it. These API calls are like Microsoft Cognitive API calls.
- Performed Map Reduce Programs on nodes running on the cluster.
- Developed multiple MapReduce jobs in Scala fordatacleaning and pre-processing.
- Analysed the partitioned and bucketeddataand compute various metrics for reporting.
- Involved in loadingdatafrom RDBMS and web logs into HDFS using Sqoop and Flume.
- Worked on loading thedatafrom MySQL to HBase where necessary using Sqoop.
- Developed Hive queries for Analysis across different banners.
- Extracteddatafrom Twitter using Java and Twitter API. Parsed JSON formatted twitterdata and uploaded to database.
- Exported the result set from Hive to MySQL using Sqoop after processing thedata.
- Analysed thedataby performing Hive queries and running Pig scripts to study customer behaviour.
- Have hands on experience working on Sequence files, AVRO, HAR file formats and compression.
- Used Hive to partition and bucketdata.
- Experience in writing MapReduce programs with Java API to cleanse Structured and unstructureddata.
- Wrote Pig Scripts to perform ETL procedures on thedatain HDFS.
- Created HBase tables to store variousdataformats ofdatacoming from different portfolios
- Worked on improving performance of existing Pig and Hive Queries.
Confidential
SQL Developer
Responsibilities:
- Created and modifiedSQLStored Procedures, Triggers, User Defined Functions, Views, and Cursors.
- Performance tuning of stored procedures utilizing Database tuning advisor andSQLProfiler.
- Adhoc reporting per client's requests.
- Build and modify SSIS packages utilizing BIDS 2012.
- Built complex queries utilizing recursive CTE's.
- Documents both business and application processes utilizing Microsoft Visio.
- Build SSIS configuration files to adhere to deployment processes.
- Utilize SSIS script task extensively to load data from excel files to database tables.
- Utilize SVN as the main source control repository.
- Modified Web interface to adhere to business logic pertaining to the leasing of company assets.