We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

2.00/5 (Submit Your Rating)

Houston, TX

SUMMARY

  • 12+ Years Development experience working with JAVA.
  • 9+ Years of experience with Big data and cloud native technologies.
  • Experience in designing and implementation of Realtime data pipelines processing more than 1.5 billion messages every day using Apache Kafka.
  • Extensively worked with Kafka Producer/Consumer API, Streaming API and Connect API.
  • Experience in designing and implementation of efficient batch processing pipelines.
  • Used Apache Spark extensively for processing historical data with 200+ billion records and Terabytes in size.
  • Extensively used Spark streaming for persistence of data from Kafka to Blob stores.
  • Experienced with latest features of Databricks Spark platform including optimal Delta Table persistence, SQL extensions, Jobs/Tasks and Databricks CLI etc.
  • Used ADLS Gen2 and Blob stores to persist Databricks Delta tables of very large data sets and running Spark SQL queries for aggregation.
  • Experience designing efficient and cost effective Object data stores in Azure Blob Stores and Azure Data Lake gen 2.
  • Extensively used AVRO and Parquet data formats for real time data and data archive respectively.
  • Hands On experience with ELT workflows in Snowflake on Azure.
  • Good understanding of time - travel, caching and data sharing concepts in Snowflake.
  • Extensively worked with Geospatial datasets and Time Series data.
  • Experience with optimal Spark job configurations.
  • Developed DAGs in Airflow to schedule and orchestrate the multiple Spark jobs.
  • Configured failure alerts and status alerts for long running jobs on Airflow.
  • Developed ETL pipelines using AWS Glue to process when new data is available in S3 buckets.
  • Configured triggers in AWS Glue to initiate ETL jobs and persistence to Object stores and RDBMS.
  • Extensively performed analytics on large geospatial & imeseries datasets with Spark using open source geospatial libraries like Uber-H3 and Geomesa analytics.
  • Experienced in creating complex SSIS packages using proper control and data flow elements with error handling.
  • Experience recreating SSIS ETL workflows in Azure Data Factory.
  • Expert level knowledge of Azure Data Factory.
  • Developed data processing workflows using Alteryx, Talend and Azure Data Factory.
  • Experience with publishing data layers to Geoserver and rendering on the layers on a Map.
  • Expertise in RDBMS solutions like Postgres/PostGIS, SQlServer, TimescaleDB.
  • Excellent SQL knowledge and expertise in writing Triggers, Stored Procedures, Views, Materialized views, and tuning query performance with Indexes and query plan analysis.
  • Experience with NoSql solutions like Hbase, Cassandra and MongoDB.
  • Experience with Kubernetes ecosystem and containerization.
  • Experienced with implementation of Event Driven Architecture.
  • Experience leading a team responsible for design & development of realtime & batch processing data pipelines.
  • Developed microservices using Quarkus (Kubernetes Native Java stack).
  • Extensively used Kubernetes Deployments, Stateful sets, Cron Jobs, Config Maps, Secret management, Persistence Volumes for Stateful and Stateless microservices.
  • Implemented advanced Kubernetes concepts like Horizontal Pod autoscaling, health/readiness checks, taints & tolerations, cluster autoscaling etc.
  • Experienced leveraging Docker for container-native microservices.
  • Experienced with managing the lifecycle of Docker images using Container Registries.
  • Experience with Kubernetes ecosystem tools like Helm, Istio, Rancher, Lens, Kubectl.
  • Experience with metrics monitoring and alerting with Prometheus and Grafana.

TECHNICAL SKILLS

Cloud Native technologies: Kubernetes, Docker, Helm, OpenFaaS, Istio, Prometheus, GrafanaRancher

Stream Processing Technologies: Kafka, KSQL, Spark Streaming

Cloud technologies: Azure Data Factory, Azure Databricks ServiceAzure Kubernetes Service, Azure Blob StoreAzure Data Lake Storage Gen 2, Azure HDInsight for HbaseAzure Data Lake Analytics/U-SQL, Azure SQL DatawarehouseAzure container Registry, Azure Synapse, Snowflake, Azure Key VaultAWS S3, AWS EMR for Hbase, AWS EC2, Cloudera, Hortonworks.

ETL technologies: Azure Data Factory, Alteryx, Talend, Pentaho Kettle, SSIS

Big Data Technologies: Databricks Spark, Hbase, Kafka, Cassandra, Impala, Hadoop, HiveSqoop, Airflow, MongoDB, PySpark, Airflow, AWS Glue.

Programming Languages: JAVA, Scala, Python

DBMS Packages: Postgres, PostGIS, SQL Server, MySQL, Netezza, TimescalDB

GIS technologies: Geoserver, QGIS, PostGIS, SQL Server Spatial, ARCGIS

Technologies/Tools: Jupyter, Hibernate, RabbitMQ, JTS, Websockets, JIRA, PowerBI JUnit, Maven, SourceTree, Bitbucket, GitHub, IntelliJ, VScode, Eclipse

Devops: Rancher pipelines, Codefresh pipelines, Azure Devops

PROFESSIONAL EXPERIENCE

Sr. Data Engineer

Confidential, Houston, TX

Responsibilities:

  • Developed decoders, REST services, Streaming services, Fleet Management services, Feed monitoring services, filters, Persistence services, and GeoJson WebSocket feed distribution services using Java, Hibernate, Maven, JTS, Restlets, Postgres, PostGIS, and Cassandra.
  • Leading the team to research and development of NoSql GIS solutions with efficient Spatio-Temporal indexes, GeoMesa, and integrating with Spark.
  • Leading a team of 5 engineers to develop realtime & batch processing ETL pipelines
  • Developed multistage real-time data processing to ingest data from external sources, process, filter and enrich using Streaming APIs of Apache Kafka.
  • Implemented Kafka connectors to Postgres and MongoDB.
  • Responsible for design and implementation of Monolith to Microservice based Event Driven Architecture.
  • Developed multiple streaming services using the Quarkus framework.
  • Worked on building Docker images for applications and base images
  • Worked on persistence and tagging of Docker images to Container Registry in Azure.
  • Worked on Kubernetes Deployments, Stateful Sets YAMLs for Stateful and Stateless services.
  • Implemented Config Maps, Secrets, HPA, Cluster autoscaler, Pod resource configs, Side cars in the Kubernetes cluster.
  • Configured Persistence Volumes and Volume Claims for Stateful Kubernetes deployments.
  • Responsible for modelling the Master database in RDBMS.
  • Converted existing SSIS ETL workflows to Azure Data Factory pipelines.
  • Designed, developed and maintained ETL pipelines using SSIS.
  • Troubleshoot previous SSIS packages to solve issues.
  • Performance tuning existing scripts and processes.
  • Involved in creating SSIS jobs to automate the reports generation, cube refresh packages.
  • Developed ETL workflows to import data from legacy systems using Azure Data Factory and SSIS.
  • Worked on design and development of complex pipelines in Azure Data Factory.
  • Created Datasets, Linked services, Triggers in Azure Data Factory pipelines.
  • Ingested large historical datasets into Snowflake for analytics.
  • Used Snowflake’s secure data sharing to share tables and views with other teams.
  • Managed virtual warehouses and engaged different caching options using Snowflake.
  • Applied Control flows to the ADF pipelines and setup monitoring with email notifications.
  • Developed Alteryx workflows using In/Out, Preparation, Spatial, Join tools for analyzing the data.
  • Developed Talend workflows to automate the process of loading external datasets into master DB.
  • Designed and implemented several Spark Jobs for data preprocessing, enrichment, filtering and running complex GIS analysis on the historical data.
  • Responsible for design and implementation of temporal and spatial partitioning of data archive in Object datastores.
  • Responsible for standing up a Databricks Spark environment in Azure and setting up the Notebooks, Clusters, Scheduled Jobs/Tasks, SQl extensions.
  • Implemented Spark Streaming application to append AVRO messages from Kafka topic to Delta Table in Azure Data Lake Storage Gen 2.
  • Implemented batch processing data pipelines in AWS using Pyspark, Airflow, Glue and S3.
  • Implemented scheduled jobs to run periodically to append the data from kafka to the archive and run optimizations talks to merge small files to larger files on Object store.
  • Developed custom AVRO ser-des.
  • Published GIS data from Postgres, HBase and Kafka through Geoserver.
  • Responsible for creation of Rancher CI/CD pipelines and private Image repository.
  • Extensively used Kubernetes tools like Kubectl, kubens, kubectx, Lens for deployments and troubleshooting.
  • Developed Grafana dashboards for monitoring and alerting on key metrics.
  • Design/development/implementation of procedures, triggers and functions in Postgres/TimescaleDB.
  • Created Hypertables in TimescaleDB and configured optimal chunk sizes and vacuum policies.
  • Collaborated with diverse programming teams to prototype and solve complex business problems.
  • Wrote build scripts using Maven software project management tool.
  • Assisted teammates in setting up Hibernate framework, configured mapping.xml files, wrote POJO classes and PL/SQL stored procedures.
  • Used JIRA to support Agile/ Scrum development methodology.
  • Involved in code reviews and ensured code quality across the project.
  • Suggested architectural improvements to break down Monolithic application to microservices.

Confidential

Responsibilities:

  • Developed multi-threaded application to perform decoding, enrichment and filtering operations on data collected from multiple data sources.
  • Developed services to use RabbitMQ for messaging services.
  • Responsible for design and maintenance of storage on Cassandra for historical data.
  • Responsible for data modeling on Postgres RDBMS.
  • Responsible for creating views, triggers and stored procedures on Postgres/PostGIS.
  • Responsible for creating Hibernate models and mapping files in the Java application.
  • Developed modules to process live feeds from TCP sockets.
  • Developed feed monitoring services to alert when there is an outage.
  • Responsible for maintenance of partitioned tables on SQLServer data warehouse solution.
  • Worked on complex stored procedures to migrate and transform the historical data from legacy systems to a new schema.
  • Design and development of an efficient NoSQL solution on Cassandra for time-series data.
  • Extensively used Java Topology Suite (JTS) for application development.
  • Used spatial reduction algorithms to provide reduced navigation history.
  • Implemented Java Websockets server to relay real-time feeds to multiple clients.
  • Design/development/implementation of procedures, triggers and functions in SQL Server and Postgres.
  • Responsible for Table partitioning and automated the file/filegroup creation on SQL server.
  • Wrote build scripts using Maven software project management tool.
  • Used JIRA to support Agile/ Scrum development methodology.
  • Involved in code reviews and ensured code quality across the project.
  • Translate business requirements into system design

Sr. Software Developer

Confidential, Houston, Tx

Responsibilities:

  • Developed a Java based Hbase data visualization tool.
  • Design, development and maintenance of REST web services.
  • Developed unit test using tools like Mockito to check correctness of functionality
  • Used HBase shell to create/delete/scan tables and use filtering techniques supported by Hbase shell.
  • Involved in installation and configuration of the Hadoop cluster.
  • Used Apache Sqoop to import data from a traditional data store called INSITE.
  • Involved in adaptation & redesigning of traditional RDBMS schema to HBase schema.
  • Designed Hbase tables with appropriate Row Keys to take advantage of the partial key scanning ability provided by Hbase architecture and designed ‘Salting’ strategies to avoid Hot Spotting.
  • Evaluated different Row Key encoding schemes to reduce duplication as well as improve performance.
  • Used Snappy and Gzip compression techniques to optimize disk space.
  • Used Pentaho Kettle to import data into HDFS.
  • Impala query profiling for query performance & resource utilization.
  • Carried out benchmarking experiments to choose between de-normalized & star schemas.
  • Compared the performance & disc space used for Parquet, RCFile, Sequential & Text based file formats.
  • Experienced in Partitioned tables based on the timestamp & pump-id.
  • Tuned Impala for performance by experimentally evaluating performance of various schemas, file formats and partitioning techniques.
  • Developed a Highcharts plugin for Pentaho User console.
  • Worked closely with Pentaho development team to customize Downsampling algorithms for Confidential ’s time-series data, which required extreme downsampling.

Confidential

Responsibilities:

  • Used Java technologies to build web applications for client-server environments.
  • Design, development and maintenance of REST web services.
  • Extensively used Builder, Adaptor, Factory, Singleton and Facade design patterns.
  • Developed advanced wellbore search functionality.
  • Responsible for Wellbore drilling data modeling.
  • Backend support for Teradata, Greenplum and Netezza.
  • Developed custom DDL, DML and Dialect service to generate SQL for Teradata, Greenplum or Netezza.
  • Developed an Aggregation service for quick access to aggregate information related to wellbore logs.
  • Managed and analyzed log files to find information and facilitate problem resolution
  • Contributed to design planning meetings.
  • Production support including analyzing and fixing defects
  • Standardized project and department by integrating with Maven, SVN, Jira, and Confluence.
  • Involved in training and clustering modules, results of which are reported graphically using High charts.
  • Used Kettle, Pentaho Data Integration Community Edition, to integrate and transform data from various data sources into OLAP cubes.
  • Used junit4, Mockito and Derby for unit testing.

Software Developer

Confidential

Responsibilities:

  • Built web services using J2EE that were used by customer facing Application development using Java, Restlets, Hibernate, Maven, Sql Server.
  • Responsible for schema design and data modeling.
  • Responsible for design, development & maintenance of REST services.
  • Wrote interfaces and test clients in order to facilitate testing scheduled jobs
  • Worked on setting up Hibernate framework, configured mapping.xml files,
  • Design, development, implementation and tuning of Stored Procedures as per the requirements in Sql Server.
  • Contributed to design planning meetings.
  • Responsible for JVM tuning.
  • Built extensive test coverage for all new services.
  • Worked within an agile team.
  • Responsible for implementation of Ant scripts.
  • Contributed to design planning meetings.
  • Managed and analyzed log files to find information and facilitate problem resolution
  • Developed unit test using tools like Mockito to check correctness of functionality
  • Source code management using Tortoise SVN

We'd love your feedback!