Data Science Engineer Resume
Princeton, NJ
SUMMARY:
- Over 7+ years of experience building scalable SaaS products and solving practical business problems for startups to fortune 500 clients.
- Experience in creating Data Pipelines, performing Web Data Mining, Data Extraction, Data Transformation, Data Cleaning, Data Modelling, Data Wrangling, Statistical Modeling, Machine Learning, Data Visualization and Analytics.
- Well versed with latest technological demands, adaptability needs and data architecture.
- Good understanding of Data Models - various corporate multilevel schema design and implementation from scratch for data products and self contained implementations- experience in moving messy data structure (in TBs) to structured ones.
- Well experienced in Web Mining / Data ETL (over half million websites), extracting raw content in distributed fashion, pipelining to data lakes, applying Machine Learning on Scale and helping other data scientists/ ML engineers get relevant data for ML modeling.
- Experience in building synchronous/ asynchronous and distributed architecture on cloud infrastructure from scratch to reduce cost and time.
- Experience in Google APIs - geocode, translate etc. and well versed with designing best algorithms to use paid API’s efficiently and profitably.
- Experience in most of cloud platform - AWS, GCP and Azure, also experience in exploiting private on-premise cloud services.
- Experience writing production level data pipelining and modeling code with unit testing and checks for fault tolerant, secure and scalable systems.
- Experience modelling, writing scalable ML algorithm implementations, Machine Learning models and recommendation systems.
- Experience in reproducing research and bringing latest technological enhancements to practice. Active participant in research projects and publications.
- Extensive experience in relational as well as non relational databases - NoSQL, MySQL 5x-8x, MongoDB, Cassandra, PostgreSQL etc.
- Experience maintaining servers, keeping track of logs, errors and faults, security and improvements and helping SDE’s in ELK data flow.
- Experience in AWS with provisioning and maintaining AWS resources such as EC2, EMR, S3, RDS etc.
- Good Knowledge of Data Warehouse Architecture and various schemas like Star Schema, Snowflake Schema.
- Experienced in Data Analysis - business presentation ready reports creation, proficient in gathering business requirements and handling requirements management.
- Experience in Big Data Technologies - Hadoop, HDFS, Hive, MapReduce, PySpark etc.
- Experience in BI/ visualization tools like Tableau, Plotly etc.
- Experience in version control - Github.
- Have good communication skills and believe in collaborative work,
- Experienced in working independently as well as in team.
- Experience communicator with clients on data productization requirements.
TECHNICAL SKILLS:
Programming Languages: Python, SQL, Java, Hive, R, Py: Spark, C, C++
Internet Technologies: JavaScript, Chart.js, D3.js, HTML5, CSS3, PHP, Bootstrap, Angular, Rest API s
Databases: MySQL, MongoDB, Cassandra, PostgreSQL
IDEs/ Development tools: Jupyter Notebook, Springboot, Tableau, POSTMAN, IntelliJ, Eclipse- Java EE, GitHub, MongoDB Compass
Platform: OSX, Linux, Ubuntu, Windows
PROFESSIONAL EXPERIENCE:
Confidential, Princeton, NJ
Data Science Engineer
Technologies used: Python, Beautiful Soup, REST Web Services, S3, Big Data technologies, EC2, EMR AWS, MS Azure, Flask, MySQL, Chart.js, D3.js, Java, Spring JDBC, Boto-S3, Java, HTML, CSS, AngularJS, Salesforce APIs, Plotly, Tableau, Jupyter, Postman, IntelliJ etc.
Responsibilities:
- Asynchronous text scraping thousands of websites
- Implemented parallelized data processing operations using Dask framework to clean and filter text data
- Implemented ML algorithms to extract accurate needed informations on scale.
- Designing and developed optimal API call algorithms on Geocoding and translation Google API services to produce readable english results with minimal cost and time.
- Performed contact sourcing ML based optimizers to retrieve client focused required results and tagging searches.
Technologies Used: Python, Async.io, Dask, BeautifulSoup, Requests, Json, selenium, scrapy, matplotlib, pandas, AWS, MongoDB, XGBoost, NLP, NER, Py-spark
Confidential
Software EngineerResponsibilities:
- Creating structured data pipeline with 40+ integrations of various data sources to filter, transform and validate the inflow of raw data.
- Performed Data Cleaning and Preprocessing, transformations and performing predictive modelling.
- Targeted analysis of sales and customer acquisitions.
- Target was to find key insights and opportunities designated to leverage the data intelligently, thus improving customer targeting and over data value to increase sales.
- Performed RFM analysis, customer-churn predictions, recommendation system, association rule mining, data enrichment and quality improvement.
Technologies used: Python, GraphLab, numpy, pandas, scikit-learn, tensorflow, keras, Tableau, Chart.js, D3.js
Confidential
Software EngineerResponsibilities:
- Developed robust machine learning models for cryptocurrency direction movement.
- Instrumental in creating infrastructure for complete pipeline for the project.
- Provided framework for identifying key features for stacked models.
- Identified key features for direction movement useful for day traders.
Technologies used: Generative & Discriminative Models, Python, MongoDB, Neural Network, Bitcoin, Quandl
Confidential, Bridgewater, NJ
Software Engineer - Data Analytics
Responsibilities:
- Design and Implement scaled productization algorithms for 4G wireless systems using advanced Self organizing network and machine learning techniques with Python, Matlab
- Improved the accuracy and computation efficacy of network
- Feature extraction, selection, analysis and optimization of algorithms using Python, MATLAB
- Apply machine learning/ reinforcement learning algorithms to large dataset, utilizing GPUs to accelerate training Processing
- Worked on creating Data Pipelines, strategizing and implementing Micro-Service based data infrastructure.
- Managed Cloud architecture, ensuring efficient data management and data governance.
- Wrote robust Machine Learning models to learn and solve practical business problems.
- Proactively improving and maintaining data quality and identifying data issues.
- Worked on REST API’s, Scraping and Crawling large web data, building Scalable SaaS products.
- Researched on recommendation engines to optimize the quality of algorithms used.
- Optimized ETL process for query efficiency and quality.
- Performed Data Cleaning, data pre-processing, visualizations and implementing on data pipeline.
- Worked on Big Data technologies - Hadoop ecosystem to ensure high performance on larger datasets.
- Worked on data maintenance in a logical, consistent, accurate and sustainable form.
- Worked on creation and analysis of data trend reports.
- Scraped and crawled web data from multiple sources and API’s to store in cloud data warehouse.
- Scaled and created non-relational NoSQL databases for efficient data ingestion on MongoDB servers.
- Optimized query efficiency on MySQL server and Hive on Hadoop ecosystem.
- Identified business logic required to clean, normalize and model incongruent source data.
- Performed descriptive and inferential statistical analysis of business data to find outliers and trends.
- Created dashboards on Tableau, D3.js, Chart.js and Plotly for user friendly interactive visualizations.
Confidential
Software EngineerResponsibilities:
- Exploiting TB’s of textual data from various magazines and news articles.
- Strategizing tagging and implementing solutions for model implementation on scale.
- Implemented Multi Class- classification algorithms of documents (in millions) into categories and reinspecting manual labelled documents to reduce human resource costs.
- Scaled implementable units of algorithms for tagging, increased precision by 12% and recall by 15% through pipelined structure of algorithms.
- Developed Flask Rest services to put tagged data into MongoDB and access sharepoint UI.
- Developed data access points, helped create SDE’s ELK dashboard.
Technologies used: Hadoop, EMR, Spark, Multithreading, Multiprocessing, Asynchronous programming, Natural Language Programming, Elastic Search, NER, Dask, Python, Flask, MongoDB, AWS S3, Redis, RabbitMQ