We provide IT Staff Augmentation Services!

Sr. Data/ml Engineer Resume

3.00/5 (Submit Your Rating)

San, FranciscO

SUMMARY:

  • Lead data engineer with over 8 years of experience in scalable and high performance Big Data, Machine Learning, OLTP and OLAP environment
  • Experience in Hadoop ecosystem and implemented end to end solutions on all the major Hadoop distributions like Hortonworks, MAPR and Cloudera
  • Experience in designing the pipeline to bring External data for example, Confidential atmospheric chemistry data and weather forecast data in Hadoop. Strong experience with both customer facing projects and R&D projects
  • Deep, intuitive understanding of core statistical concepts, such as probability, randomness, correlation and sampling distributions
  • Proven experience in service development (REST, API, Micro services)
  • Deep understanding of modern machine learning methods for regression and classification
  • Expertise in architecting the data pipelines to load weblogs, clickstream, impression data in Hadoop
  • Experience in building the recommender system using the Prediction IO and Collaborative filtering
  • Expertise in installing and creating ETL pipeline using Confidential data integration tool
  • Experience in working with the near real time data and process it via using Flume and Kafka
  • Experience in data migration, cleaning, transformation and loading from legacy sources to DWH
  • Experience in implementing Confidential Big Data Integration solution. Implemented solution for Visa, AMEX, Aldo, Stanford Research, Match.com, Truecar, Home Depot and TE connectivity
  • Experience working with reporting tools SAP business objects, SSRS and Tableau
  • Experience in designing and implementing OLTP database (RDBMS Modeling), ETL and Reporting for big clients like US Mint, Forest Labs, Johnson & Johnson, Activision and Samsung
  • Helped Confidential in growing the business from 3 million to 10 million across various clients
  • Expertise in Project Management i.e. Project Scoping, Planning, Estimating, Scheduling, Organizing and Budgeting
  • Expertise in defining roadmap(MRD/PRD) based on product differentiation by target segments & customer requirements.
  • Managed cross - functional teams and multi-disciplinary projects across different geographical locations.
  • Expertise in negotiating deals of high complexity with creative solution of win-win for both parties.

TECHNICAL SKILLS:

Big Data: Hadoop Cloudera, HortonWorks, Pivotal, MapR distribution, Spark, Spark Streaming

Machine Learning: Apache Prediction IO, Convoluted Neural Network, NLP, Google Cloud ML, IBM Watson, Tensorflow, Keras, SparkML

Database/MPP: Greenplum, Vertica, Hawq, Hive, SQL Server, MySQL, Oracle, Spark SQL, MongoDB

Cloud: Google Cloud Platform, AWS

Languages: SQL, Shell scripting, Pig, Python, Scala

ETL Tools: Sqoop, SSIS, DataStage

Reporting Tools: SAP Business Objects, SSRS

SDLC Methodologies: AGILE - SCRUM, Waterfall

Project Management: MS Project, MS Office, Trello board, JIRA

PROFESSIONAL EXPERIENCE:

Confidential, San Francisco

Sr. Data/ML Engineer

Responsibilities:

  • Subject Matter Expert in Big Data, Machine Learning and Natural Language Processing
  • Developed ETL pipeline and algorithms of product recommendation for Macys.com. Recommendations system daily deals with the 90 GB of search, view and user purchase data. Product Recommendation system is projected to bring the revenue of $70 million for 2017
  • Implemented the uses cases like boosted product search, price boosting, prop card model, personalized deals and new arrivals on PredictionIO.
  • Set up the PredictionIO in the production environment which includes the technology stack of HBase, Spark and Scala based on the Cross-Occurrence algorithm
  • Working on tuning the existing product recommendation models for better performance.
  • Used universal recommender template of PIO to implement personalized recommendations, “Viewed this bought that”, item-based-cross-action, complementary purchases based on the product category hypothesis
  • Implemented flume to capture the events data in the Hadoop.
  • Working on supporting various data from different sources like Coremetrics data, Product catalog data, Store transaction data, User data, Pinterest data, Liked data, Events data for reporting, Marketing and Email Assembly data. Source system are Hadoop, DB2 and Oracle

Confidential, San Francisco

Big Data Engineer(Consultant)

Responsibilities:

  • Architected and lead technical solution for QA of Dredge data platform involving technologies - PIG and Spark
  • Gathered detailed business and technical requirements and participated in the definitions of business rules and data standards.
  • Worked with product stakeholders to create product roadmap. Coordinated with multiple teams to document use cases - click stream and impression data
  • Sentiment analysis and topics/content classification/categorization with deep learning
  • Processed clickstream data in Hadoop and moved the aggregated data to Vertica
  • Processed the impressions data in Vertica.
  • Designed the experimentation ETL’s using spark
  • Benchmarked the Hadoop cluster for the better performance

Confidential, San Ramon, CA

Big Data Evangelist

Responsibilities:

  • Involved in planning, estimation, scheduling and budgeting of external data project for data lake platform
  • Led the team of ETL engineers and architected the data pipelines using Confidential
  • Designed the pipeline to bring the external data from the NWS source using the Confidential Rest client in Hawq database and stored the data in Hawq parquet compression for the windmills.
  • Provided training for Confidential to 30 developers

Confidential, Redwood City, CA

Data Engineering Manager

Responsibilities:

  • Worked closely with presales team to deliver demos and POC. Delivered training and implemented ETL solutions
  • Actively worked with R&D team to create components in core Confidential product.

We'd love your feedback!