We provide IT Staff Augmentation Services!

Data Science Engineer Resume

3.00/5 (Submit Your Rating)

Plano, TX

SUMMARY:

  • Having 5+ years of working experience on various IT Systems & application using open source technologies involving Analysis, Design, Coding, Testing, Implementation and Training, Excellent skills in state - of-the-art technology of client server computing with good understanding of Big Data Technologies, Machine Learning.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Experience in Elastic Search Engine Lucene/Index based search, ELK log analytics tool, Elastic Search, Logstash, Kibana.
  • Design NoSQL database schema to help migrating legacy application's datastore to Elastic Search.
  • Designing Elasticsearch, Kibana and Logstash based logs & metrics pipeline and performing KPI based cloud monitoring.
  • Experienced in performing in memory data processing for batch, real time, and advanced analytics using Apache Spark (Spark SQL &Spark-Shell) .
  • Good knowledge of integrating Spark Streaming with Kafka for real time processing of streaming data
  • Aggregated Data through Kafka, HDFS, Hive, Scala and Spark Streams in Amazon AWS .
  • Worked on Big Data Analytics, Hadoop ecosystems (Hadoop, Hive) and Spark, integration with R.
  • Extensive Knowledge in implementation of machine learning programs in Python and R
  • Experience using and developing solutions utilizing the Hadoop ecosystem such as Hadoop, Spark, Hive, Sqoop, Zookeeper, Kafka, NoSQL databases like Hbase.
  • Experience with working on cloud infrastructure like Amazon Web Services(AWS)

TECHNICAL SKILLS:

Roles: Data Science Engineer, Big Data Engineer, Spark Developer, Data Analyst, Project Engineer

Programming: Python, R, C, SQL (Familiarity Scala, SAS)

Tools: Spyder, IPython Notebook/Jupyter, Spark Notebook, Zeppelin notebook (Familiarity Git, Docker)

Cloud: AWS/EMR/EC2/S3 (also direct-Hadoop-EC2)

Big Data: ELK Stack, Spark, Hadoop, Hive, Pig, Sqoop, (Familiarity Cloudera Search)

DB Languages: SQL, PL/SQL, Oracle, Hive, Spark SQL, Memsql

Domain: Big Data, Data Mining, Data Analytics, Machine Learning, Natural Language Processing

PROFESSIONAL EXPERIENCE:

Data Science Engineer

Confidential, Plano, TX

Responsibilities:

  • Responsible for being the technical point of contact to upper management, business analysts, project management, and miscellaneous other groups for the proactive monitoring project.
  • Analyzed and solved anomaly detection problem for root-cause events for STB failures. Chose and prototyped ELK stack.
  • Core developer of Elasticsearch along with X-Pack for machine-learning for proactive monitoring, anomaly detection, and alert generation.
  • Created real-time dashboard to report KPI / Performance Monitor / Geo-based error display / Historical search used by IHD / Command Center reps to support customer’s STB troubleshooting / diagnostic
  • Created Machine Learning Jobs from Kibana X-Pack ML component for anomalies and watcher for ML jobs for root-cause analyses to predict KPI Metrics and errors. created anomalies alerting so can be alerted if such an occurrence happens.
  • Generate scheduled reports for Kibana Dashboards and Visualizations.

Technologies used: Elasticsearch, Logstash, Kibana, Kafka, Machine Learning, Python

Big Data Engineer

Confidential, Warren, NJ

Responsibilities:

  • Designing the architecture and re-writing the DMAT Application from scratch using ELK Stack and Integration with other applications
  • Strategy to improve Business KPIs: Analyzed existing products and KPIs and recommended short term and long-term ideas to improve various KPIs via DMAT.
  • ETL process for continuously bulk importing dmat data from sql server into Elasticsearch.
  • Design/Implement large scale pub-sub message queues using Apache Kafka
  • Worked on Configuring Zookeeper, Kafka and logstash cluster for data ingestion and Elasticsearch performance and optimization and Worked on Kafka for live streaming of data.
  • Indexing and search/query substantial number of documents (~400 million) inside Elasticsearch and created a Kibana dashboard for sanity-checking the data and Working with the Kibana dashboard for the overall build status with drill down features
  • Setup/Optimise ELK {Elasticsearch, Logstash, Kibana} Stack and Integrated Apache Kafka for data ingestion
  • Creating geo-mapping visualizations using Kibana to show data points on US based map and Utilize reporting via Kibana.
  • Developed a Spark job which loads enormous data from HDFS and imply some transformations along with pre-processing on the fly and load the data into Elasticsearch.
  • Data migrating into Elasticsearch through ES-Spark integration and Created mapping are indexing in Elasticsearch for quick retrieval
  • Data Discovery, visualizations and dashboards are created in KIBANA for quick analysis on data
  • POC work on replacing Sql server-backed for data retrieving data points with Elasticsearch, resulting in a thousand-fold speedup.
  • Designed and developed data import, aggregation and advanced analytics on top of Memsql a very quick poc's in the initial stages of the product

Environment: Python, Spark, Elasticsearch, Hive, HDFS, Kafka, Logstash, Kibana, Jupyter, IntelliJ, MemSQL

Data Analytics

Confidential

Responsibilities:

  • Conducting an exploratory analysis on data and to study different imputation methods that could be used for the data.
  • Perform ad-hoc data visualizations using ggplot2 in R for evaluating existing models
  • Spark implementation for using python, spark SQL to access hive tables into spark for faster processing of data.
  • Program that I used Python (Pandas, NumPy, scikitlearn, matplotlib) and R ggplot2
  • Involved in converting Hive/Sql queries into Spark transformations using Spark RDD's.
  • Used various spark transformations like map, reducebykey, filter to clean the input data.

Environment: Python, R, Machine learning, AWS, Apache Hadoop, HDFS, Hive, Pig, Apache Spark, Spark Streaming, Spark SQL, HBase, Kafka, Sqoop, Git.

Confidential

Project Engineer

Responsibilities:

  • Configuring essential parameters before deploying Elasticsearch cluster to production.
  • ETL process for continuously bulk importing TeamCenter data from Sql server into Elasticsearch
  • Setting up Logstash for centralizing and analyzing TeamCenter data management and exchange operations.
  • Using Kibana interface to filter and visualize log messages gathered by an Elasticsearch ELK stack.
  • Generating a histogram or even a date histogram (a histogram over time) using Elasticsearch giving it an interval to bucket the data into two weeks.
  • Learned to index and search/query millions of documents inside Elasticsearch.
  • Analyzing Electrolux sales data, writing the results to hdfs in Avro as well as Elasticsearch and created a Kibana dashboard for sanity-checking the data.
  • Involved in data migration from one cluster to another cluster
  • Handled importing of data from various data sources, performed transformations using Hive. (External tables, partitioning).
  • Configuration, research and develop various 'use cases', aid in the use and operation of BI Tools, predictive data modelling and data analytics and or integration platform software such as TalenD.
  • Worked on Teamcenter implementation and Data Migration projects, Teamcenter Engineering, and Teamcenter Manufacturing to suit as per the requirements of AB Electrolux

Technologies used: Hadoop, map-reduce, Hbase, Sqoop, Elasticsearch, Talend, TeamCenter

We'd love your feedback!