Data Science Engineer Resume Princeton, NJ - Hire IT People

SUMMARY:

Over 7+ years of experience building scalable SaaS products and solving practical business problems for startups to fortune 500 clients.
Experience in creating Data Pipelines, performing Web Data Mining, Data Extraction, Data Transformation, Data Cleaning, Data Modelling, Data Wrangling, Statistical Modeling, Machine Learning, Data Visualization and Analytics.
Well versed with latest technological demands, adaptability needs and data architecture.
Good understanding of Data Models - various corporate multilevel schema design and implementation from scratch for data products and self contained implementations- experience in moving messy data structure (in TBs) to structured ones.
Well experienced in Web Mining / Data ETL (over half million websites), extracting raw content in distributed fashion, pipelining to data lakes, applying Machine Learning on Scale and helping other data scientists/ ML engineers get relevant data for ML modeling.
Experience in building synchronous/ asynchronous and distributed architecture on cloud infrastructure from scratch to reduce cost and time.
Experience in Google APIs - geocode, translate etc. and well versed with designing best algorithms to use paid API’s efficiently and profitably.
Experience in most of cloud platform - AWS, GCP and Azure, also experience in exploiting private on-premise cloud services.
Experience writing production level data pipelining and modeling code with unit testing and checks for fault tolerant, secure and scalable systems.
Experience modelling, writing scalable ML algorithm implementations, Machine Learning models and recommendation systems.
Experience in reproducing research and bringing latest technological enhancements to practice. Active participant in research projects and publications.
Extensive experience in relational as well as non relational databases - NoSQL, MySQL 5x-8x, MongoDB, Cassandra, PostgreSQL etc.
Experience maintaining servers, keeping track of logs, errors and faults, security and improvements and helping SDE’s in ELK data flow.
Experience in AWS with provisioning and maintaining AWS resources such as EC2, EMR, S3, RDS etc.
Good Knowledge of Data Warehouse Architecture and various schemas like Star Schema, Snowflake Schema.
Experienced in Data Analysis - business presentation ready reports creation, proficient in gathering business requirements and handling requirements management.
Experience in Big Data Technologies - Hadoop, HDFS, Hive, MapReduce, PySpark etc.
Experience in BI/ visualization tools like Tableau, Plotly etc.
Experience in version control - Github.
Have good communication skills and believe in collaborative work,
Experienced in working independently as well as in team.
Experience communicator with clients on data productization requirements.

TECHNICAL SKILLS:

Programming Languages: Python, SQL, Java, Hive, R, Py: Spark, C, C++

Internet Technologies: JavaScript, Chart.js, D3.js, HTML5, CSS3, PHP, Bootstrap, Angular, Rest API s

Databases: MySQL, MongoDB, Cassandra, PostgreSQL

IDEs/ Development tools: Jupyter Notebook, Springboot, Tableau, POSTMAN, IntelliJ, Eclipse- Java EE, GitHub, MongoDB Compass

Platform: OSX, Linux, Ubuntu, Windows

PROFESSIONAL EXPERIENCE:

Confidential, Princeton, NJ

Data Science Engineer

Technologies used: Python, Beautiful Soup, REST Web Services, S3, Big Data technologies, EC2, EMR AWS, MS Azure, Flask, MySQL, Chart.js, D3.js, Java, Spring JDBC, Boto-S3, Java, HTML, CSS, AngularJS, Salesforce APIs, Plotly, Tableau, Jupyter, Postman, IntelliJ etc.

Responsibilities:

Asynchronous text scraping thousands of websites
Implemented parallelized data processing operations using Dask framework to clean and filter text data
Implemented ML algorithms to extract accurate needed informations on scale.
Designing and developed optimal API call algorithms on Geocoding and translation Google API services to produce readable english results with minimal cost and time.
Performed contact sourcing ML based optimizers to retrieve client focused required results and tagging searches.

Technologies Used: Python, Async.io, Dask, BeautifulSoup, Requests, Json, selenium, scrapy, matplotlib, pandas, AWS, MongoDB, XGBoost, NLP, NER, Py-spark

Confidential

Software Engineer

Responsibilities:

Creating structured data pipeline with 40+ integrations of various data sources to filter, transform and validate the inflow of raw data.
Performed Data Cleaning and Preprocessing, transformations and performing predictive modelling.
Targeted analysis of sales and customer acquisitions.
Target was to find key insights and opportunities designated to leverage the data intelligently, thus improving customer targeting and over data value to increase sales.
Performed RFM analysis, customer-churn predictions, recommendation system, association rule mining, data enrichment and quality improvement.

Technologies used: Python, GraphLab, numpy, pandas, scikit-learn, tensorflow, keras, Tableau, Chart.js, D3.js

Confidential

Software Engineer

Responsibilities:

Developed robust machine learning models for cryptocurrency direction movement.
Instrumental in creating infrastructure for complete pipeline for the project.
Provided framework for identifying key features for stacked models.
Identified key features for direction movement useful for day traders.

Technologies used: Generative & Discriminative Models, Python, MongoDB, Neural Network, Bitcoin, Quandl

Confidential, Bridgewater, NJ

Software Engineer - Data Analytics

Responsibilities:

Design and Implement scaled productization algorithms for 4G wireless systems using advanced Self organizing network and machine learning techniques with Python, Matlab
Improved the accuracy and computation efficacy of network
Feature extraction, selection, analysis and optimization of algorithms using Python, MATLAB
Apply machine learning/ reinforcement learning algorithms to large dataset, utilizing GPUs to accelerate training Processing
Worked on creating Data Pipelines, strategizing and implementing Micro-Service based data infrastructure.
Managed Cloud architecture, ensuring efficient data management and data governance.
Wrote robust Machine Learning models to learn and solve practical business problems.
Proactively improving and maintaining data quality and identifying data issues.
Worked on REST API’s, Scraping and Crawling large web data, building Scalable SaaS products.
Researched on recommendation engines to optimize the quality of algorithms used.
Optimized ETL process for query efficiency and quality.
Performed Data Cleaning, data pre-processing, visualizations and implementing on data pipeline.
Worked on Big Data technologies - Hadoop ecosystem to ensure high performance on larger datasets.
Worked on data maintenance in a logical, consistent, accurate and sustainable form.
Worked on creation and analysis of data trend reports.
Scraped and crawled web data from multiple sources and API’s to store in cloud data warehouse.
Scaled and created non-relational NoSQL databases for efficient data ingestion on MongoDB servers.
Optimized query efficiency on MySQL server and Hive on Hadoop ecosystem.
Identified business logic required to clean, normalize and model incongruent source data.
Performed descriptive and inferential statistical analysis of business data to find outliers and trends.
Created dashboards on Tableau, D3.js, Chart.js and Plotly for user friendly interactive visualizations.

Confidential

Software Engineer

Responsibilities:

Exploiting TB’s of textual data from various magazines and news articles.
Strategizing tagging and implementing solutions for model implementation on scale.
Implemented Multi Class- classification algorithms of documents (in millions) into categories and reinspecting manual labelled documents to reduce human resource costs.
Scaled implementable units of algorithms for tagging, increased precision by 12% and recall by 15% through pipelined structure of algorithms.
Developed Flask Rest services to put tagged data into MongoDB and access sharepoint UI.
Developed data access points, helped create SDE’s ELK dashboard.

Technologies used: Hadoop, EMR, Spark, Multithreading, Multiprocessing, Asynchronous programming, Natural Language Programming, Elastic Search, NER, Dask, Python, Flask, MongoDB, AWS S3, Redis, RabbitMQ

We provide IT Staff Augmentation Services!

Data Science Engineer Resume

Princeton, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship