We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00/5 (Submit Your Rating)

Newark, NJ

SUMMARY

  • 7+ years of professional experience in Statistical modeling, Machine Learning, Conversational AI, Data Visualization
  • Expertise in transforming business resources and requirements into manageable data formats and analytical models, Designing algorithms, building models, developing data mining and reporting solutions and scale.
  • Proficient in managing entire data science project life cycle including Data Acquisition,Data
  • Preparation,Data Manipulation, Feature Engineering, Statistical Modeling, Testing and Validation, Visualization and Reporting.
  • Proficient in Machine learning algorithm like Linear Regression, Ridge, Lasso, Elastic Net Regression, Decision Tree,
  • Random Forests and more advanced algorithms like ANN, CNN, RNN, Ensemble methods like Bagging, Boosting, Stacking.
  • Build chatbots using RASA for more than 4 years, has worked with chatbot in several projects, have a good experience in feature extraction analysis methods.
  • Excellent performance in Model Validation and Model Tuning with Model selection, K - ford cross-validation, Hold-Out
  • Scheme and Hyperparameter tuning by Grid search and HyperOpt.
  • Advanced experience with Python (2.x, 3.x) and its libraries such as NumPy, Pandas, Scikit-learn, XGBoost, LightGBM,
  • Keras, Matplotlib, Seaborn.
  • Strong Knowledge in Statistical methodologies such as Hypothesis Testing, ANOVA, Principal Component Analysis
  • (PCA), Monte Carlo Sampling and Time Series Analysis.
  • Strong experience with R to develop Machine learning models and Hypothesis testing.
  • Basic knowledge with Hadoop, Spark and experience with BigDatatools such as PySpark, Pig and Hive.
  • Experience in building machine learning solutions using PySpark for large sets ofdataon Hadoop System.
  • Experience in using cloud services Amazon Web Services (AWS) including EC2, S3, AWS Lambda, and EMR.
  • Proficient at building and publishing interactive reports and dashboard with design customizations based on the Stakeholders' needs in Tableau.
  • Experienced in RDMS such as SQL Server 2012, Oracle 9i/10g and NoSQL database like MongoDB, DynamoDB.
  • Expert in SQL with writing Queries, Temp tables, CTE, Stored Procedures, User-Defined Functions, Views, and Indexes.
  • Responsible for creating ETL packages, migratingdatafrom, Flat File and MS Excel, cleaningdataand backing updatafiles, and synchronizing daily transactions by using SSIS.
  • Quick learner in any new business industries or software environment to deliver the best solutions adapted to new requirements and challenges.
  • Knowledge and experience in GitHub/Git version control tools.

TECHNICAL SKILLS

Deep learning Frameworks and libraries: Tensor flow, Keras, TF Learn, NumPy, Conversational AI matplotlib, Seaborn, Pandas, Scikit learn, NLTK, Kubeflow, Vertex AI, tensor flow.

ML and DL algorithms: Linear, Logistic regression, Support Vector Machines, K-Means, Decision Trees, Random forests, Single and Multi-layer perceptron’s, CNN, chatbot, ensemble models, YOLO, object detection, feature extraction, SSD, RASA etc.:

Programming Languages and tools: Python, C++, MATLAB, MySQL, Visual basic, XCode, Jupyter, Spyder, OpenCV, Tableau, Google cloud platform. .

Operating Systems: Microsoft Windows, Mac, Linux.

PROFESSIONAL EXPERIENCE

Confidential - Newark, NJ

Data Scientist

Responsibilities:

  • Worked asDataScientistand developed and deployed predictive models for analyzing customer churn and retention
  • PerformedDataExtraction,DataManipulation andDataAnalysis on TBs of structured and unstructureddata
  • Developed machine learning models using Logistic Regression, Naïve Bayes, Random Forest and KNN.
  • PerformedDataImputation using Scikit-learn package of Python
  • Created interactive analytic dashboards using Tableau
  • Conducted analysis on assessing customer consuming behaviors and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering
  • Collaborated withdataengineers and operation team to implement ETL process, wrote and optimized SQL queries to performdataextraction to fit the analytical requirements
  • Performed sentiment analysis and captured customer sentiments and categorized positive, negative, angry and happy customers from feedback forms
  • Ensured that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structureddata
  • Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
  • Integrated SAS datasets into Excel using DynamicDataExchange, using SAS to analyzedata, statistical tables, listings and graphs for reports.
  • Use Principal Component Analysis in feature engineering to analyse high dimensionaldata
  • Use MLlib, Spark's Machine learning library to build and evaluate different models
  • PerformedDatamanagement like Merging, concatenating, interleaving of SAS datasets using MERGE, UNION and SET statements inDATAstep and PROC SQL.
  • Experience in using SAS to read, write, import and export to anotherdatafile formats including delimited files, spreadsheet, Microsoft excel and access tables.
  • Communicate with team members, leadership, and stakeholders on findings to ensure models are well understood and incorporated into business processes

Environment: Python, R, MS Excel, Perl, MS SQL Server, HIPAA, EDI, Power BI, Tableau, T-SQL, ETL, MS Access, XML, JSON, MS office 2010, Outlook.

Confidential - Austin, TX

Data Analyst

Responsibilities:

  • Statistical Modelling with ML to bring Insights inDataunder guidance of PrincipalData Scientist
  • Ingestion with Sqoop, Flume.
  • Used SVN to commit the Changes into the main EMM application trunk.
  • Understanding and implementation of text mining concepts, graph processing and semi structured and unstructureddataprocessing.
  • Worked with Ajax API calls to communicate with Hadoop through Impala Connection and SQL to render the requireddatathrough it. These API calls are like Microsoft Cognitive API calls.
  • Performed Map Reduce Programs on nodes running on the cluster.
  • Developed multiple MapReduce jobs in Scala fordatacleaning and pre-processing.
  • Analysed the partitioned and bucketeddataand compute various metrics for reporting.
  • Involved in loadingdatafrom RDBMS and web logs into HDFS using Sqoop and Flume.
  • Worked on loading thedatafrom MySQL to HBase where necessary using Sqoop.
  • Developed Hive queries for Analysis across different banners.
  • Extracteddatafrom Twitter using Java and Twitter API. Parsed JSON formatted twitterdata and uploaded to database.
  • Exported the result set from Hive to MySQL using Sqoop after processing thedata.
  • Analysed thedataby performing Hive queries and running Pig scripts to study customer behaviour.
  • Have hands on experience working on Sequence files, AVRO, HAR file formats and compression.
  • Used Hive to partition and bucketdata.
  • Experience in writing MapReduce programs with Java API to cleanse Structured and unstructureddata.
  • Wrote Pig Scripts to perform ETL procedures on thedatain HDFS.
  • Created HBase tables to store variousdataformats ofdatacoming from different portfolios
  • Worked on improving performance of existing Pig and Hive Queries.

Confidential

SQL Developer

Responsibilities:

  • Created and modifiedSQLStored Procedures, Triggers, User Defined Functions, Views, and Cursors.
  • Performance tuning of stored procedures utilizing Database tuning advisor andSQLProfiler.
  • Adhoc reporting per client's requests.
  • Build and modify SSIS packages utilizing BIDS 2012.
  • Built complex queries utilizing recursive CTE's.
  • Documents both business and application processes utilizing Microsoft Visio.
  • Build SSIS configuration files to adhere to deployment processes.
  • Utilize SSIS script task extensively to load data from excel files to database tables.
  • Utilize SVN as the main source control repository.
  • Modified Web interface to adhere to business logic pertaining to the leasing of company assets.

We'd love your feedback!