Application Engineer Resume
4.00/5 (Submit Your Rating)
SUMMARY
- Qualified, proficient and thorough Data Engineer 6+ years of experience with total 8+ years of experience in software development applied to cloud computing, data mining, data storage and predictive modeling with various forms of data.
- Worked closely with departmental managers to produce precise reports that significantly contributes to business growth. Experienced with report preparation, project management, resource management and effective delivery of results.
- Experience in machine learning techniques, predictive modeling, data visualization, statistical analysis of large scale datasets, hypothesis testing and anomaly detection.
- Experience within cloud architecture, design and implementation of high available/scalable solutions. Worked on data security issues while creating aws cloud services.
- Good understanding and long user experience of various paradigms for High throughput, High performance computing & version control with CVS/SVN/GitHub.
- Parallel - processing of data on Hadoop framework, MapReduce, Hortonworks HDP. Worked on several hadoop eco-system products like PIG, HIVE, Scala, Scoop & Spark.
- Used several develop engine and editors, Anaconda, Jypyter, Spyder, Rstudio etc are among few.
- Natural orientation towards taking up fresh challenges and dealing with complex problems.
TECHNICAL SKILLS
- Python, Conda, Pyspak, databricks, notebooks, AWS S3, AWS lambda, AWS Cloud watch
- AWS, EMR, S3, CICD build pipeline, Red-Shift, Elastic Search, VPN, VPC, IPSec
- R, JAVA, J2EE, WEB Services, Hadoop, Hadoop eco-system, Hortonworks HDP HDF
- Map-reduce, Spark, Kafka, Scala, PIG, Scoop/Casandra/Hive
- OracleDB, noSQL, MongoDB, MySQL, QlikView
- Cloud services, MS Azure, ISSD, Blob, VM, Container, AWS, EMR, S3, Pipeline, Red-Shift, Elastic Search
- Linear/Logistic Regression, Naïve Bayes, Decision Tree, Apriori, Nearest neighbhood, Fusion, ARCH, AR(I)MA time series
- Microserices, Open source, Big Data Lake, Agile background
PROFESSIONAL EXPERIENCE
Confidential
Application Engineer
Responsibilities:
- Worked on providing data for models by running multiple data pipelines which extract data from multiple data sources and export the required data and combine them.
- There were multiple legacy databricks notebooks which were refactored and optimized as the part of this project. I refactored all these notebooks by creating a pyspark based project template.
- Code reusability was ensured by creating utils module which was ready to consume library.
- Validation framework was created for input and output data validation.
- Validation alerts were configured for data sanity failure.
- Summary email with overall trend on key parameters was designed.
- Databricks job configuraton
- Refactoring of ETL databricks notebooks
- Databricks dbutils usages and mounting to S3
- Standalone spark submit job deployment
- JVM memory tuning for spark jobs
- Reusable python library for job validations and alerts
- Data pipleline configuration using databricks cluster
- Fully sanitized and automated data pipeline which was providing data for different Type Ahead models.
- Worked on providing data for models by running multiple data pipelines which extract data from multiple data sources and export the required data and combine them.
- Developed various needs for automation, application and monitoring tools
Technologies - Python, Conda, Pyspak, databricks, notebooks, AWS S3, AWS lambda, AWS Cloud watch
Confidential
Solution Architect /Data Engineer
Responsibilities:
- Built a connected platform that integrated all of the customers data, retail data, client data and other eco-system data which can be used through any app development
- Client requirement gathering
- POC for new requirements
- Coordination with offshore team
- Distributed architecture setup and deployment for Jupiter Diagnostic System
- Configuration of new devices and interfaces
- Code customization requirements and testing
- Optimization of existing middle layer components
- Design and architecture of real time data processing monitoring system via spark/kafka streaming
- Design and implementation for real time data processing system using Cassandra, Kafka, Spark and Hadoop
- Data Ingestion of oracle database tables into Hadoop datalake with Spark and Kafka via Scala/Python programming.
- Worked on data security by creating EC2 instances running in a AWS Virtual Private Cloud (VPC) followed by connecting to a local network via Internet Protocol Security (IPsec) VPN tunnel between the two networks with appropriate firewall.
- ETL of health care insurance data via devloping ingetion model using pyspark
- Wrote several scripts in PIG, HIVE for data mining and used Sqoop to transfer data from (and to) Hadoop data-lake created on AWS redshift.
- Developed code in Spark framework using Scala and py-spark for UI and dashboard display analysis of LTE OSS data. Other than spark core API’s used spark streaming, spark SQL, spark Mlib and spark Catalog
- Developed code in python using core api’s, pandas, matplotlib, SciKit-Learn and Keras. Developed code in python using core api’s, pandas, matplotlib, SciKit-Learn and Keras.
- Studied data collected on handsets for various events and develop a model to classify the data points as indoor and outdoor
- Tested and developed several machine learning predictive models to sort out long standing issue of indoor vs outdoor data classification. Used many machine learning techniques like decision tree model, artificial neural network model, naïve Bayes probability model, regression models for classification and prediction
Confidential
P.hd. Research Scientist
Responsibilities:
- Wrote binary classification tools in core C++/Java/Python/R for data mining and analytics of health care data.
- Extensively used python for development of core SAM engine, used requests, SQLAlchemy, Numpy, SciPy, pyGtk, pywin, nltk, nose and many others python API’s
- Developed ARCH models, AR(I)MA, TIME SERIES models using statistical methods and machine learning techniques.
- Batch job submission of data processing using Hadoop framework having many hundreds of servers located across dozens of computing clusters in US and EU.
- Data analysis jobs were submitted to clusters of CPUS’s using MapReduce & Spark framework.
- Written PIG/HIVE scprits to interact with SAM(database) to filter/map the data and query.
- Developed code in Scala/py-spark for data analysis & KPI calculations.
- Analysis of big “scientific-data” (~20 PB) records, which resulted in several high quality publications in reputed journals.
- Extensive use of generalized linear regression models for SMT & LM radiation damage predictions studies. The prediction results were successfully used in up gradation plans.
- Use of SDLC best practices, and agile development for code optimization for object storage