Application Engineer Resume

SUMMARY

Qualified, proficient and thorough Data Engineer 6+ years of experience with total 8+ years of experience in software development applied to cloud computing, data mining, data storage and predictive modeling with various forms of data.
Worked closely with departmental managers to produce precise reports that significantly contributes to business growth. Experienced with report preparation, project management, resource management and effective delivery of results.
Experience in machine learning techniques, predictive modeling, data visualization, statistical analysis of large scale datasets, hypothesis testing and anomaly detection.
Experience within cloud architecture, design and implementation of high available/scalable solutions. Worked on data security issues while creating aws cloud services.
Good understanding and long user experience of various paradigms for High throughput, High performance computing & version control with CVS/SVN/GitHub.
Parallel - processing of data on Hadoop framework, MapReduce, Hortonworks HDP. Worked on several hadoop eco-system products like PIG, HIVE, Scala, Scoop & Spark.
Used several develop engine and editors, Anaconda, Jypyter, Spyder, Rstudio etc are among few.
Natural orientation towards taking up fresh challenges and dealing with complex problems.

TECHNICAL SKILLS

Python, Conda, Pyspak, databricks, notebooks, AWS S3, AWS lambda, AWS Cloud watch
AWS, EMR, S3, CICD build pipeline, Red-Shift, Elastic Search, VPN, VPC, IPSec
R, JAVA, J2EE, WEB Services, Hadoop, Hadoop eco-system, Hortonworks HDP HDF
Map-reduce, Spark, Kafka, Scala, PIG, Scoop/Casandra/Hive
OracleDB, noSQL, MongoDB, MySQL, QlikView
Cloud services, MS Azure, ISSD, Blob, VM, Container, AWS, EMR, S3, Pipeline, Red-Shift, Elastic Search
Linear/Logistic Regression, Naïve Bayes, Decision Tree, Apriori, Nearest neighbhood, Fusion, ARCH, AR(I)MA time series
Microserices, Open source, Big Data Lake, Agile background

PROFESSIONAL EXPERIENCE

Confidential

Application Engineer

Responsibilities:

Worked on providing data for models by running multiple data pipelines which extract data from multiple data sources and export the required data and combine them.
There were multiple legacy databricks notebooks which were refactored and optimized as the part of this project. I refactored all these notebooks by creating a pyspark based project template.
Code reusability was ensured by creating utils module which was ready to consume library.
Validation framework was created for input and output data validation.
Validation alerts were configured for data sanity failure.
Summary email with overall trend on key parameters was designed.
Databricks job configuraton
Refactoring of ETL databricks notebooks
Databricks dbutils usages and mounting to S3
Standalone spark submit job deployment
JVM memory tuning for spark jobs
Reusable python library for job validations and alerts
Data pipleline configuration using databricks cluster
Fully sanitized and automated data pipeline which was providing data for different Type Ahead models.
Worked on providing data for models by running multiple data pipelines which extract data from multiple data sources and export the required data and combine them.
Developed various needs for automation, application and monitoring tools

Technologies - Python, Conda, Pyspak, databricks, notebooks, AWS S3, AWS lambda, AWS Cloud watch

Confidential

Solution Architect /Data Engineer

Responsibilities:

Built a connected platform that integrated all of the customers data, retail data, client data and other eco-system data which can be used through any app development
Client requirement gathering
POC for new requirements
Coordination with offshore team
Distributed architecture setup and deployment for Jupiter Diagnostic System
Configuration of new devices and interfaces
Code customization requirements and testing
Optimization of existing middle layer components
Design and architecture of real time data processing monitoring system via spark/kafka streaming
Design and implementation for real time data processing system using Cassandra, Kafka, Spark and Hadoop
Data Ingestion of oracle database tables into Hadoop datalake with Spark and Kafka via Scala/Python programming.
Worked on data security by creating EC2 instances running in a AWS Virtual Private Cloud (VPC) followed by connecting to a local network via Internet Protocol Security (IPsec) VPN tunnel between the two networks with appropriate firewall.
ETL of health care insurance data via devloping ingetion model using pyspark
Wrote several scripts in PIG, HIVE for data mining and used Sqoop to transfer data from (and to) Hadoop data-lake created on AWS redshift.
Developed code in Spark framework using Scala and py-spark for UI and dashboard display analysis of LTE OSS data. Other than spark core API’s used spark streaming, spark SQL, spark Mlib and spark Catalog
Developed code in python using core api’s, pandas, matplotlib, SciKit-Learn and Keras. Developed code in python using core api’s, pandas, matplotlib, SciKit-Learn and Keras.
Studied data collected on handsets for various events and develop a model to classify the data points as indoor and outdoor
Tested and developed several machine learning predictive models to sort out long standing issue of indoor vs outdoor data classification. Used many machine learning techniques like decision tree model, artificial neural network model, naïve Bayes probability model, regression models for classification and prediction

Confidential

P.hd. Research Scientist

Responsibilities:

Wrote binary classification tools in core C++/Java/Python/R for data mining and analytics of health care data.
Extensively used python for development of core SAM engine, used requests, SQLAlchemy, Numpy, SciPy, pyGtk, pywin, nltk, nose and many others python API’s
Developed ARCH models, AR(I)MA, TIME SERIES models using statistical methods and machine learning techniques.
Batch job submission of data processing using Hadoop framework having many hundreds of servers located across dozens of computing clusters in US and EU.
Data analysis jobs were submitted to clusters of CPUS’s using MapReduce & Spark framework.
Written PIG/HIVE scprits to interact with SAM(database) to filter/map the data and query.
Developed code in Scala/py-spark for data analysis & KPI calculations.
Analysis of big “scientific-data” (~20 PB) records, which resulted in several high quality publications in reputed journals.
Extensive use of generalized linear regression models for SMT & LM radiation damage predictions studies. The prediction results were successfully used in up gradation plans.
Use of SDLC best practices, and agile development for code optimization for object storage

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship