Data Science Engineer Resume Plano, TX - Hire IT People

SUMMARY:

Having 5+ years of working experience on various IT Systems & application using open source technologies involving Analysis, Design, Coding, Testing, Implementation and Training, Excellent skills in state - of-the-art technology of client server computing with good understanding of Big Data Technologies, Machine Learning.
Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, & KNN for data analysis.
Experience in Elastic Search Engine Lucene/Index based search, ELK log analytics tool, Elastic Search, Logstash, Kibana.
Design NoSQL database schema to help migrating legacy application's datastore to Elastic Search.
Designing Elasticsearch, Kibana and Logstash based logs & metrics pipeline and performing KPI based cloud monitoring.
Experienced in performing in memory data processing for batch, real time, and advanced analytics using Apache Spark (Spark SQL &Spark-Shell) .
Good knowledge of integrating Spark Streaming with Kafka for real time processing of streaming data
Aggregated Data through Kafka, HDFS, Hive, Scala and Spark Streams in Amazon AWS .
Worked on Big Data Analytics, Hadoop ecosystems (Hadoop, Hive) and Spark, integration with R.
Extensive Knowledge in implementation of machine learning programs in Python and R
Experience using and developing solutions utilizing the Hadoop ecosystem such as Hadoop, Spark, Hive, Sqoop, Zookeeper, Kafka, NoSQL databases like Hbase.
Experience with working on cloud infrastructure like Amazon Web Services(AWS)

TECHNICAL SKILLS:

Roles: Data Science Engineer, Big Data Engineer, Spark Developer, Data Analyst, Project Engineer

Programming: Python, R, C, SQL (Familiarity Scala, SAS)

Tools: Spyder, IPython Notebook/Jupyter, Spark Notebook, Zeppelin notebook (Familiarity Git, Docker)

Cloud: AWS/EMR/EC2/S3 (also direct-Hadoop-EC2)

Big Data: ELK Stack, Spark, Hadoop, Hive, Pig, Sqoop, (Familiarity Cloudera Search)

DB Languages: SQL, PL/SQL, Oracle, Hive, Spark SQL, Memsql

Domain: Big Data, Data Mining, Data Analytics, Machine Learning, Natural Language Processing

PROFESSIONAL EXPERIENCE:

Data Science Engineer

Confidential, Plano, TX

Responsibilities:

Responsible for being the technical point of contact to upper management, business analysts, project management, and miscellaneous other groups for the proactive monitoring project.
Analyzed and solved anomaly detection problem for root-cause events for STB failures. Chose and prototyped ELK stack.
Core developer of Elasticsearch along with X-Pack for machine-learning for proactive monitoring, anomaly detection, and alert generation.
Created real-time dashboard to report KPI / Performance Monitor / Geo-based error display / Historical search used by IHD / Command Center reps to support customer’s STB troubleshooting / diagnostic
Created Machine Learning Jobs from Kibana X-Pack ML component for anomalies and watcher for ML jobs for root-cause analyses to predict KPI Metrics and errors. created anomalies alerting so can be alerted if such an occurrence happens.
Generate scheduled reports for Kibana Dashboards and Visualizations.

Technologies used: Elasticsearch, Logstash, Kibana, Kafka, Machine Learning, Python

Big Data Engineer

Confidential, Warren, NJ

Responsibilities:

Designing the architecture and re-writing the DMAT Application from scratch using ELK Stack and Integration with other applications
Strategy to improve Business KPIs: Analyzed existing products and KPIs and recommended short term and long-term ideas to improve various KPIs via DMAT.
ETL process for continuously bulk importing dmat data from sql server into Elasticsearch.
Design/Implement large scale pub-sub message queues using Apache Kafka
Worked on Configuring Zookeeper, Kafka and logstash cluster for data ingestion and Elasticsearch performance and optimization and Worked on Kafka for live streaming of data.
Indexing and search/query substantial number of documents (~400 million) inside Elasticsearch and created a Kibana dashboard for sanity-checking the data and Working with the Kibana dashboard for the overall build status with drill down features
Setup/Optimise ELK {Elasticsearch, Logstash, Kibana} Stack and Integrated Apache Kafka for data ingestion
Creating geo-mapping visualizations using Kibana to show data points on US based map and Utilize reporting via Kibana.
Developed a Spark job which loads enormous data from HDFS and imply some transformations along with pre-processing on the fly and load the data into Elasticsearch.
Data migrating into Elasticsearch through ES-Spark integration and Created mapping are indexing in Elasticsearch for quick retrieval
Data Discovery, visualizations and dashboards are created in KIBANA for quick analysis on data
POC work on replacing Sql server-backed for data retrieving data points with Elasticsearch, resulting in a thousand-fold speedup.
Designed and developed data import, aggregation and advanced analytics on top of Memsql a very quick poc's in the initial stages of the product

Environment: Python, Spark, Elasticsearch, Hive, HDFS, Kafka, Logstash, Kibana, Jupyter, IntelliJ, MemSQL

Data Analytics

Confidential

Responsibilities:

Conducting an exploratory analysis on data and to study different imputation methods that could be used for the data.
Perform ad-hoc data visualizations using ggplot2 in R for evaluating existing models
Spark implementation for using python, spark SQL to access hive tables into spark for faster processing of data.
Program that I used Python (Pandas, NumPy, scikitlearn, matplotlib) and R ggplot2
Involved in converting Hive/Sql queries into Spark transformations using Spark RDD's.
Used various spark transformations like map, reducebykey, filter to clean the input data.

Environment: Python, R, Machine learning, AWS, Apache Hadoop, HDFS, Hive, Pig, Apache Spark, Spark Streaming, Spark SQL, HBase, Kafka, Sqoop, Git.

Confidential

Project Engineer

Responsibilities:

Configuring essential parameters before deploying Elasticsearch cluster to production.
ETL process for continuously bulk importing TeamCenter data from Sql server into Elasticsearch
Setting up Logstash for centralizing and analyzing TeamCenter data management and exchange operations.
Using Kibana interface to filter and visualize log messages gathered by an Elasticsearch ELK stack.
Generating a histogram or even a date histogram (a histogram over time) using Elasticsearch giving it an interval to bucket the data into two weeks.
Learned to index and search/query millions of documents inside Elasticsearch.
Analyzing Electrolux sales data, writing the results to hdfs in Avro as well as Elasticsearch and created a Kibana dashboard for sanity-checking the data.
Involved in data migration from one cluster to another cluster
Handled importing of data from various data sources, performed transformations using Hive. (External tables, partitioning).
Configuration, research and develop various 'use cases', aid in the use and operation of BI Tools, predictive data modelling and data analytics and or integration platform software such as TalenD.
Worked on Teamcenter implementation and Data Migration projects, Teamcenter Engineering, and Teamcenter Manufacturing to suit as per the requirements of AB Electrolux

Technologies used: Hadoop, map-reduce, Hbase, Sqoop, Elasticsearch, Talend, TeamCenter

We provide IT Staff Augmentation Services!

Data Science Engineer Resume

Plano, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship