Big Data / AWS Solution Architect / Lead Data Engineer Resume Hartford, CT - Hire IT People

SUMMARY

14 years of professional experience in Design, Development of Applications with 6+ years of experience in Big Data technologies with very proficient knowledge on Spark Framework and Specialty in Performance Tuning.
Lead team of geographically diverse data engineers to deliver big data and cloud data product services across lines of business and implement standards in data engineering Center of Practice
Reengineered the existing heavy ETL spark Jobs and achieved close to 5X performance improvement with the help of dynamic partition keys creation on datasets by leveraging Spark ML Statistical Libraries
Created very efficient custom frameworks to easily convert the On - Prem spark jobs to adopt a cyclic micro batch execution at partitions level in parallel to solve the shuffle spills problem and leverage an optimal cluster capacity without impacting other applications that share the cluster
Extensively worked on Designing Resilient, Cost Optimized and Operationally Excellent solutions in AWS Stack to migrate On-Prem Spark Applications
Recently Developed and Implemented a dynamic data-ingestion framework in Spark / Shell Script to ingest the on-prem Hadoop / Hive datasets into AWS S3
Recently Developed and Implemented Dynamic Framework for Data Quality Detection in Spark which provided capabilities of a regression testing framework on the strategic data products to detect data quality issues at attributes level in the entire workflow and developed multiple accelerators using combination of statistical techniques and data engineering for various automation tasks.
Collaborated with Data Science community to deliver new underwriting model using cloud resources.
Developed an unsupervised anomaly detection model with LSTM Deep Neural Networks for earlier detection of data quality issues on the strategic data products ETL workflow
Implemented Real Time Anomaly detection on streaming data using Sagemaker along with Data Science Community for Innovation Jam Initiatives at the enterprise level
Collaborated with Data Science Community to perform feature engineering for Image Detection and Image Classification Modelling and Inference

TECHNICAL SKILLS

EMR, EC2, Glue ETL, Lambda, Redshift, Athena, S3, RDS, Glue Metadata Services, Step Functions, SNS, SQS, IAM, Secrets Manager, KMS, Sagemaker, Textract, Comprehend and CloudWatch Logs / Events
Databricks
Spark (Python and Scala), Hive/HiveQL, SQL, Sqoop, Flume, Shell Scripting
Python - Core Python, Pandas, Numpy, Sklearn, Flask, Matplotlib, Seaborn, NLTK and various statistical and machine learning packages
Teradata, Hive, Oracle, SqlServer, RDS, Redshift, Snowflake

PROFESSIONAL EXPERIENCE

Confidential, Hartford, CT

Big Data / AWS Solution Architect / Lead Data Engineer

Responsibilities:

Key contributor to the development and execution of BI&A and Data Engineering strategy in AWS Stack
Responsible for architecting and strategizing to migrate on prem spark applications into AWS and Databricks platforms
Responsible for the design, development and delivery of core analytic data products to support the BI R&D function, Actuarial, Product Management and business analytic consumers
Responsible for enabling batch and real time data ingestion patterns of customer data.
Responsible for design, development and implementation of resilient, cost effective, highly available robust applications in AWS Stack
Led the solution design, and overall ETL design for processing RDF messaging data using Spark
Built reusable transformation rules and repeatable data conversion models for similar solutions enabling saving of development effort by 20%.
Developed python framework for parsing complicated XML and Json messages ingestion into data lake
Implemented data lake Pipeline which involves orchestrating almost 20 data sources, applying ETL, and created a single Hive table with close to 1200 attributes with more than a billion rows in less than 2 hours.
Establish Best Practices (standards, principles, guidelines, framework, and knowledge management in the big data space
Worked on Deep Neural Networks like ANN along with Natural language processing algorithms for Text mining processes
Implemented Random Cut Forest, Isolation Forest and Deep Auto Encoder Models for Anomaly detection in both batch and streaming data capture outliers and improve overall quality of strategic data products
Developed spark programs for data ingestion/transformation from DB2, Teradata and Json files.
Extensively worked on converting SAS to Spark/Hive Modules to create a single entity for data scientist’s exploration
Extensively worked on performance tuning of Spark/Hive components
Extensively worked on Kafka - Spark Streaming for Real time data pipelines
Worked on AWS tools like EMR / LAMBDA for specific requirements where the source data was placed in S3 by the vendors
Use Kanban, Git, and GitHub for project management and version control as project lead on Workers Compensation lines’ data products.
Provide data models and POCs to ingest data from various sources and in various formats.
Write bash scripts for multithreading and automation.
Deploy and manage cloud infrastructure using Jenkins and Terraform.
Designed and Deployed Dynamic framework to connect, read and parse MongoDB data and stored in hive
Playing technical leadership/mentoring role for a bunch of Onshore and Onshore Team

Confidential

Lead Data Engineer / Data Analyst

Responsibilities:

Extensively worked on solution architecture and design for Data lake analytics implementations using Spark and Hive
Designing and deployment of data lake with different Big Data ecosystem tools including Spark, Kafka, Python, NIFI, Hive, Oozie, Sqoop with Hortonworks distribution.
Developed Spark code using Scala and Spark-SQL for large data sets for both Streaming batch processing.
Expertise in using various Spark connectors to load and process data between Cassandra, Elastic Search, Kafka
Extensively worked on loading data into HIVE tables using Spark.
Extensively worked in Kafka and Spark Streaming for unbounded API data, to perform various transformations, Joins and load into Elastic Search for low latency reporting and analytics.
Leveraged to NIFI to configure for Streaming and Batch Sources for Pipelining into Kafka / HDFS Sinks
Involved in converting Hive/HQL queries into Spark transformations using Spark RDD, Scala and Python.
Developed SQOOP import utility to load data from various RDBMS sources for history loads
Developed data pipeline using Flume and Spark to store data into HDFS.
Good Knowledge on Cloudera/Hortonworks distributions and in Amazon simple storage service (Amazon S3), Amazon EC2, Amazon EMR and have very good understanding of Microsoft Azure and Google Cloud Dataflow Big data and machine learning tools.
Extensively worked on loading data into Hive Tables, Raw HDFS Storage, Cassandra and Elastic Search from Using Spark Jobs
Implemented web log analytics using SPLUNK, ElasticSearch / Kibana and Grafana.
Good experience in performing data analytics using SPARK with both Scala and Python API’s.

We provide IT Staff Augmentation Services!

Big Data / Aws Solution Architect / Lead Data Engineer Resume

Hartford, CT

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship