We provide IT Staff Augmentation Services!

Data Science Engineer Resume

3.00/5 (Submit Your Rating)

Princeton, NJ

SUMMARY:

  • Over 7+ years of experience building scalable SaaS products and solving practical business problems for startups to fortune 500 clients.
  • Experience in creating Data Pipelines, performing Web Data Mining, Data Extraction, Data Transformation, Data Cleaning, Data Modelling, Data Wrangling, Statistical Modeling, Machine Learning, Data Visualization and Analytics.
  • Well versed with latest technological demands, adaptability needs and data architecture.
  • Good understanding of Data Models - various corporate multilevel schema design and implementation from scratch for data products and self contained implementations- experience in moving messy data structure (in TBs) to structured ones.
  • Well experienced in Web Mining / Data ETL (over half million websites), extracting raw content in distributed fashion, pipelining to data lakes, applying Machine Learning on Scale and helping other data scientists/ ML engineers get relevant data for ML modeling.
  • Experience in building synchronous/ asynchronous and distributed architecture on cloud infrastructure from scratch to reduce cost and time.
  • Experience in Google APIs - geocode, translate etc. and well versed with designing best algorithms to use paid API’s efficiently and profitably.
  • Experience in most of cloud platform - AWS, GCP and Azure, also experience in exploiting private on-premise cloud services.
  • Experience writing production level data pipelining and modeling code with unit testing and checks for fault tolerant, secure and scalable systems.
  • Experience modelling, writing scalable ML algorithm implementations, Machine Learning models and recommendation systems.
  • Experience in reproducing research and bringing latest technological enhancements to practice. Active participant in research projects and publications.
  • Extensive experience in relational as well as non relational databases - NoSQL, MySQL 5x-8x, MongoDB, Cassandra, PostgreSQL etc.
  • Experience maintaining servers, keeping track of logs, errors and faults, security and improvements and helping SDE’s in ELK data flow.
  • Experience in AWS with provisioning and maintaining AWS resources such as EC2, EMR, S3, RDS etc.
  • Good Knowledge of Data Warehouse Architecture and various schemas like Star Schema, Snowflake Schema.
  • Experienced in Data Analysis - business presentation ready reports creation, proficient in gathering business requirements and handling requirements management.
  • Experience in Big Data Technologies - Hadoop, HDFS, Hive, MapReduce, PySpark etc.
  • Experience in BI/ visualization tools like Tableau, Plotly etc.
  • Experience in version control - Github.
  • Have good communication skills and believe in collaborative work,
  • Experienced in working independently as well as in team.
  • Experience communicator with clients on data productization requirements.

TECHNICAL SKILLS:

Programming Languages: Python, SQL, Java, Hive, R, Py: Spark, C, C++

Internet Technologies: JavaScript, Chart.js, D3.js, HTML5, CSS3, PHP, Bootstrap, Angular, Rest API s

Databases: MySQL, MongoDB, Cassandra, PostgreSQL

IDEs/ Development tools: Jupyter Notebook, Springboot, Tableau, POSTMAN, IntelliJ, Eclipse- Java EE, GitHub, MongoDB Compass

Platform: OSX, Linux, Ubuntu, Windows

PROFESSIONAL EXPERIENCE:

Confidential, Princeton, NJ

Data Science Engineer

Technologies used: Python, Beautiful Soup, REST Web Services, S3, Big Data technologies, EC2, EMR AWS, MS Azure, Flask, MySQL, Chart.js, D3.js, Java, Spring JDBC, Boto-S3, Java, HTML, CSS, AngularJS, Salesforce APIs, Plotly, Tableau, Jupyter, Postman, IntelliJ etc.

Responsibilities:

  • Asynchronous text scraping thousands of websites
  • Implemented parallelized data processing operations using Dask framework to clean and filter text data
  • Implemented ML algorithms to extract accurate needed informations on scale.
  • Designing and developed optimal API call algorithms on Geocoding and translation Google API services to produce readable english results with minimal cost and time.
  • Performed contact sourcing ML based optimizers to retrieve client focused required results and tagging searches.

Technologies Used: Python, Async.io, Dask, BeautifulSoup, Requests, Json, selenium, scrapy, matplotlib, pandas, AWS, MongoDB, XGBoost, NLP, NER, Py-spark

Confidential

Software Engineer

Responsibilities:

  • Creating structured data pipeline with 40+ integrations of various data sources to filter, transform and validate the inflow of raw data.
  • Performed Data Cleaning and Preprocessing, transformations and performing predictive modelling.
  • Targeted analysis of sales and customer acquisitions.
  • Target was to find key insights and opportunities designated to leverage the data intelligently, thus improving customer targeting and over data value to increase sales.
  • Performed RFM analysis, customer-churn predictions, recommendation system, association rule mining, data enrichment and quality improvement.

Technologies used: Python, GraphLab, numpy, pandas, scikit-learn, tensorflow, keras, Tableau, Chart.js, D3.js

Confidential

Software Engineer

Responsibilities:

  • Developed robust machine learning models for cryptocurrency direction movement.
  • Instrumental in creating infrastructure for complete pipeline for the project.
  • Provided framework for identifying key features for stacked models.
  • Identified key features for direction movement useful for day traders.

Technologies used: Generative & Discriminative Models, Python, MongoDB, Neural Network, Bitcoin, Quandl

Confidential, Bridgewater, NJ

Software Engineer - Data Analytics

Responsibilities:

  • Design and Implement scaled productization algorithms for 4G wireless systems using advanced Self organizing network and machine learning techniques with Python, Matlab
  • Improved the accuracy and computation efficacy of network
  • Feature extraction, selection, analysis and optimization of algorithms using Python, MATLAB
  • Apply machine learning/ reinforcement learning algorithms to large dataset, utilizing GPUs to accelerate training Processing
  • Worked on creating Data Pipelines, strategizing and implementing Micro-Service based data infrastructure.
  • Managed Cloud architecture, ensuring efficient data management and data governance.
  • Wrote robust Machine Learning models to learn and solve practical business problems.
  • Proactively improving and maintaining data quality and identifying data issues.
  • Worked on REST API’s, Scraping and Crawling large web data, building Scalable SaaS products.
  • Researched on recommendation engines to optimize the quality of algorithms used.
  • Optimized ETL process for query efficiency and quality.
  • Performed Data Cleaning, data pre-processing, visualizations and implementing on data pipeline.
  • Worked on Big Data technologies - Hadoop ecosystem to ensure high performance on larger datasets.
  • Worked on data maintenance in a logical, consistent, accurate and sustainable form.
  • Worked on creation and analysis of data trend reports.
  • Scraped and crawled web data from multiple sources and API’s to store in cloud data warehouse.
  • Scaled and created non-relational NoSQL databases for efficient data ingestion on MongoDB servers.
  • Optimized query efficiency on MySQL server and Hive on Hadoop ecosystem.
  • Identified business logic required to clean, normalize and model incongruent source data.
  • Performed descriptive and inferential statistical analysis of business data to find outliers and trends.
  • Created dashboards on Tableau, D3.js, Chart.js and Plotly for user friendly interactive visualizations.

Confidential

Software Engineer

Responsibilities:

  • Exploiting TB’s of textual data from various magazines and news articles.
  • Strategizing tagging and implementing solutions for model implementation on scale.
  • Implemented Multi Class- classification algorithms of documents (in millions) into categories and reinspecting manual labelled documents to reduce human resource costs.
  • Scaled implementable units of algorithms for tagging, increased precision by 12% and recall by 15% through pipelined structure of algorithms.
  • Developed Flask Rest services to put tagged data into MongoDB and access sharepoint UI.
  • Developed data access points, helped create SDE’s ELK dashboard.

Technologies used: Hadoop, EMR, Spark, Multithreading, Multiprocessing, Asynchronous programming, Natural Language Programming, Elastic Search, NER, Dask, Python, Flask, MongoDB, AWS S3, Redis, RabbitMQ

We'd love your feedback!