We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

2.00/5 (Submit Your Rating)

Fargo, ND

SUMMARY

  • Over all 7 years of working experience as a Sr.Data engineer, providing support in teh areas of project coordination, data analysis and metrics reporting.
  • Experience working wif varied forms ofdatainfrastructure inclusive of relational databases such as SQL, Hadoop, Spark, and column - oriented databases such as MySQL.
  • Good experience of Data Warehousing methodologies and concepts, including star schemas, snowflakes, ETL processes, dimensional modelling and reporting tools.
  • Experience in designing interactive dashboards, reports, performing ad-hoc analysis and visualizations using Tableau, Power BI, Arcadia & Matplotlib.
  • Excellent experience of Big Data and Hadoop ecosystems inDataScience, BI, DW projects
  • Experience in implementing Lakehouse solutions on Azure Cloud using AzureDataLake and Databricks DeltaLake.
  • Experience in developing Spark jobs,Hivejobs to summarize and transformdata.
  • Experienced in ETL processes,Datawarehousingmethodologies and concepts including star schemas, snowflake schemas, dimensional modelling and reporting tools, OperationsDataStore concepts,DataMart and OLAP technologies.
  • Experienced in harvesting metadata from Snowflake, AWS S3, and MySQL using TalendDataCatalog, Talend ETL, andCollibra.
  • Experience in building and architecting multipleDatapipelines, end to end ETL and ELT process forDataingestion and transformation inGCPand coordinate task among teh team.
  • Expertise in developing streaming applications in Scala usingKafkaand SparkStructured Streaming.
  • Hands of experience inGCPcomponents like Big Query, GCS bucket, G-Cloud functions, Cloud Dataflow, Pub/Sub cloud shell, GSUTIL, BQ command line utilities,DataFusion, Composer, Cloud Storage, Cloud SQL, Cloud SDK and G-Cloud Commands.
  • Expert in writing T-SQL, working onSSIS, SSRS, SSAS, andDataMigration projects.
  • Expertise inDataExtraction, Transforming and Loading (ETL) between different Systems using SQL tools (SSIS, DTS, Bulk Insert, and BCP).
  • Good experience in developing desktop and web applications using Java, Spring, JDBC, Eclipse,React.
  • Experience in working wif Data Bricks, Azure ML service, Azure Synapse Analytics, Azure Analytical services, and various Machine learning frameworks like PyTorch and TensorFlow
  • Experienced in Designed and developed Datamodels for Database (OLTP), theOperationalDataStore (ODS),Datawarehouse (OLAP), and federated databases to support client enterprise Information Management Strategy.
  • Experience in developing under scrum methodology and in aCI/CDenvironment using Jenkin.
  • Hands on experience on developing Blockchain based onRustand Substrate.
  • Experience in working wif Azure cloud platform (HDInsight,Databricks,DataLake, Blob,DataFactory, Synapse, SQL DB, SQL DWH).
  • Experience in Developing Spark applications using Spark - SQL inDatabricksfordataextraction, transformation, and aggregation from multiple file formats for analyzing and transforming thedatato uncover insights into teh customer usage patterns.
  • Experience in using AWSGluecrawlers to create tables on rawdatain AWS S3.
  • Expert level experience in designing, building and managing applications to process large amounts ofdatain a Hadoop/DevOps(GCP) ecosystem.
  • Good experience ofDataModeling techniques, Normalization andDataWarehouse concepts using Star schema andSnowflakeschema modeling.
  • Experience in working on implementing CRUD operations using NoSQL Rest APIs.
  • Strong communication and analytical skills, including conceptual, requirements interpretation, solution creation and problem-solving abilities.

PROFESSIONAL EXPERIENCE

Senior Data Engineer

Confidential - Fargo, ND

RESPONSIBILITIES:

  • Involved in various sectors of business, wif In-depth noledge of SDLC (System Development Life Cycle) wif all phases ofAgile- Scrum, & Waterfall.
  • Used version control tools likeGITHUBto share teh code snippet among teh team members.
  • Worked wif complex SQL,NoSQL, Stored Procedures, Triggers, and packages in large databases from various servers.
  • Proven track record in teh areas ofDatascience,Datamining, advanced analytics, Consulting, ETL,BI and
  • Worked on different Java technologies like Hibernate,spring, JSP, Servlets and developed code for both server side and client side for our web application.
  • Developed ETL jobs using Spark -Scala to migratedatafrom Oracle to newCassandratables.
  • Worked on implementing CRUD operations using NoSQLRestAPIs.
  • Proficiency indatawarehousinginclusive of dimensional modeling concepts and in scripting languages like Python, Scala, and JavaScript.
  • Created interactivedatacharts on web application using High charts JavaScript library wifdatacoming from ApacheCassandraalong wif NumPy, Pandas library.
  • Used DGraph,GraphQLto arrange massive multilingual dictionaries in teh Graph database to predict teh semantic domain of input text, increasing teh efficiency of machine translation models
  • Reduced access time by refactoringdatamodels, query optimization and implementedRediscache to support Snowflake.
  • Used theGCPenvironment to perform teh following: Cloud Function’s for event-based triggering, Cloud Monitoring and Alerting, Pub/Sub for real-time messaging and CloudDataCatalog build-out.
  • Spark Developer in Bigdataapplication development through frameworks Hadoop, Spark,Hive, Sqoop, Flume, Airflow.
  • Used cloud shell SDK inGCPto configure teh servicesDataProc, Storage, BigQuery
  • Created windows services dat retrieve call record details and text messagedata, tan sent thedatatoRabbitMQfor consumption by Quantitative Strategy.
  • UtilizedHivepartitioning, Bucketing and performed various kinds of joins onHivetables.
  • Worked wif DevOps team to setup serverless based pipelines forLamdbacode deployment.
  • Developed an IAM application using Spring, JavaEE, Oracle, Okta,RedisandPostman wif microservices architecture and REST services.
  • Implemented variousdataflows from various sources, transformations and to targets including enterprisedataandIOTdatausing Java, Spring, Spring Cloud Stream inRabbitMQand Kafka messaging
  • Upgraded and convertedGraphQLimplementation to Graphene to provide session autantication and user permission
  • Used Spark Streaming APIs to perform transformations and actions on teh fly for building common learnerdatamodel which gets thedatafromKafkain near real time and persist it to Cassandra.
  • Analyzed and planneddatamigration from NoSQL databases including MongoDB,Cassandraand Neo4J fordatamigration purposes.
  • Implementation ofRabbitMQand Kafka systems for high throughput and scalabledataflows
  • Built aReactNative application to be used by delivery drivers on teh road.
  • Wrote all functions and stored procedures powering Everett’s custom RESTful partner API.
  • Developed Lakehouse architecture wif teh halp of DatabricksDelta(DeltaLake).
  • Improved a Python-based tool dat queried a Vertica cluster to generate millions of hashes dat were pipelined via twemproxy into Redis.
  • Skilled in ingestingdatainto HDFS from various Relational databases likeMYSQL, Oracle, DB2, Teradata, Postgres using sqoop.
  • Extract Real time feed usingKafkaand Spark Streaming and convert it to RDD and processdatain teh form ofDataFrame and save thedataas Parquet format inHDFS.
  • Created Spark Streaming jobs using Python to read messages fromKafka& downloadJSON files from AWS S3 buckets
  • Design ETL process inDatabricksto handledatavolume and complexity including incremental loading, anddataaggregation/filtering for reduction.
  • Used Spark Streaming to receive real timedatafrom theKafkaand store teh streamdatato HDFS using Python and NoSQL databases such as HBase and Cassandra
  • Movingdatain and out toHadoopFile System Using Talend BigDataComponents. Involved in creating Hive Internal.
  • Migratedataextract and transform jobs from Matillion to Airflow, in Python, to design and load intoGCPBigQuery targets.
  • Created aReactclient web-app backed by serverless AWS Lambda functions to LINKS Interact wif an AWS Sagemaker Endpoint.
  • Integrated wif Restful APIs to create Service now Incidents when there is a process failure wifin teh batch job.
  • Involved in Software development,Datawarehousingand Analytics andDataengineering projects using Hadoop, MapReduce, Pig, Hive and other open-source tools/technologies.
  • Build a program wif Python andapachebeamand execute it in cloud Dataflow to runDatavalidation between raw source file and big query tables.
  • Used AWS lambda to run code virtually. queries from Python using Python -MySQLconnector and MySQL database package.
  • Monitoringbigquery, Dataproc and cloudDataflow jobs via Stackdriver for all teh environment.
  • Worked on BigDataIntegration & Analytics based on Hadoop,SOLR, Spark, Kafka,Storm and web Methods.
  • Used Dataprep fordataanalysis of on-premdatain cloud storage and ingested into big query for analysis bydatascienceteam
  • Strong understanding of teh principals ofDatawarehousing, Fact Tables, Dimension Tables, star and snowflake schema modeling.
  • Created severalDatabricksSpark jobs clusters wif PySpark to perform several tables to table operations.
  • Implementeddataengineering and ETL solutions leveraging CI/CD software including Pentaho Kettle, S3, EC2, Jenkins, maven,GitHub, artifactory etc.
  • Ingesteddatain one or more Azure services (AzureDataLake, Azure Storage,Azure SQL, Azure DW) and processing thedatain AzureDatabricks.
  • Involved in all phases ofdatawarehouse project life cycle. Designed and developed ETL Architecture to loaddatafrom various sources like DB2 UDB, Oracle, Flat files, XML files,SFDC.
  • Develops new components and updates to teh frontend application usingReact
  • Prepared queries for Databricks using Python, Spark, SQL andNoSQL.
  • Developing and configuring Build and Release (CI/CD) processes using AzureDevOps, along wif managing application code using Azure GIT wif required security standards for .Net and java applications.
  • Designing and buildingDevOpsto provide CICD using Team city, GIT, Maven to reduce build & deployment times, increase automation and efficiently analyze code& security vulnerabilities prior to production
  • Used MySQL as backend database and MySQL dB ofpythonas database connector to interact wif MS SQL server.

Environment: Hadoop,Alteryx, Hive, HDFS, Pyspark, Oozie, HBase,Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting, Cloudera.

Data Engineer

Confidential - Chicago, IL

RESPONSIBILITIES:

  • Involved in teh design, build and management of large-scaledatastructures and pipelines and efficient Extract/Load/Transform(ETL) workflows/feeds.
  • Implemented teh BigDatasolution usingHadoop, Hive, and Informatica to pull/load thedatainto teh HDFS system.
  • Worked wif cloud PUB/SUB to replicatedatareal-time from source system toGCPBig Query.
  • Developing teh automated Devops CI/CD solution usingGitHub, Jenkins, python, Informatica PowerCenter
  • Skilled inDataWarehousing/ETL programming and fulfilment ofdatawarehouse project tasks such asdataextraction, cleansing, aggregating, validations, transforming and loading.
  • Has noledge in building and architecting multipleDatapipelines, end to end ETL and ELT process forDataingestion and transformation inGCP.
  • Worked onmicroservicesa fully automated continuous integration system using Git, Jenkins,MySQL and custom tools developed in Python and Bash.
  • Containerized spring bootdataflow microservices to be deployed on Kubernetes connecting to Kafka/RabbitMQ.
  • Developed full text search platform using NoSQL and Logstash Elastic Search engine, allowing for much faster, more scalable and more intuitive user searches
  • Developed Satoridataprocessing consuming from Kafkadatabus and publishing to elastic search/Cassandra
  • Designed Pipelines wif Apache Beam, KubeFlow, Dataflow and orchestrated jobs intoGCP.
  • Architected and implemented ETL/ELT fordatamovement using AzureDataFactory (ADF) andSSIS.
  • Mentor other members of thedataengineering team on how to most efficiently utilize available services inGCP.
  • Used Google Cloud Platform Services (GCP) to process and managedatafrom streaming sources.
  • UsedRestAPI wif Python to ingestDatafrom and some other site to BIG QUERY.
  • Wrote spark sql job injavaspring boot using dataframe api to convert frontend sql typed in by user into react app into a backend spark/sql job.
  • Developed API query elastic search andCassandradatabases for time seriesdata, aggregation, time series database.
  • Created logging for ETL load at package level using event handlers and created a process to log number of records processed by each package usingSSIS.
  • Worked wifNoSQLdatabases like Cassandra and HBase and developed real-time read/write access to very large datasets via HBase.
  • CreateDataQuality pipelines for movingdatalake andDatawarehouse from Hadoop/Teradata environment to Google CloudGCP/ BigQuery
  • Worked onapachesparkwif Python to increase teh performance of MySQL queries.
  • Created Python / SQL scripts, to transformDatabricksnotebooks from Redshift table into Snowflake S3 buckets.
  • Created severalDatabricksSpark jobs wif Pyspark to perform several tables to table operations.
  • Creating Spark notebooks, Application Insights for accessing logs and real-time analytics).

Environment: Python, GCP, Airflow, SAP ECC, SparkML, SQL, agile.

Software Developer

Confidential - Dallas, TX

RESPONSIBILITIES:

  • Involved using ETL toolInformaticato populate teh database,datatransformation from teh old database to teh new database using Oracle and SQL.
  • Created ETL applications using Python to load data from source to destination Cloud Data Base Storage).
  • Used Docker for building and testing teh containers locally and deploy toAWSECS.
  • Worked wif streaming of data through a web hook (using Cloud Functions) for various data sources.
  • UsedSSISto create ETL packages to Validate, Extract, Transform and LoaddataintoDataWarehouse andDataMarts.
  • Configured Spark streaming to receive real timedatafrom theKafkaand store teh streamdatato HDFS using Scala.
  • Used GIT as version control and Bit Bucket as repository code management.
  • Created automatic Jenkins builds when pushing code on GIT to a newly created branch.
  • Extensively Created documented and maintained logical & physical database models.
  • Delivered data solutions in report/presentation format according to customer specifications and timelines.
  • Wrote complex T-SQL and PL/SQL queries, stored procedures, views, triggers, functions, table variables, and cursors as per business requirement.

Environment: Agile, Redshift, Hive, Hbase, MySQL, Sqoop, Excel, Oozie, SSRS, ETL, Business Objects.

We'd love your feedback!