Senior Data Engineer Resume Fargo, ND - Hire IT People

SUMMARY

Over all 7 years of working experience as a Sr.Data engineer, providing support in teh areas of project coordination, data analysis and metrics reporting.
Experience working wif varied forms ofdatainfrastructure inclusive of relational databases such as SQL, Hadoop, Spark, and column - oriented databases such as MySQL.
Good experience of Data Warehousing methodologies and concepts, including star schemas, snowflakes, ETL processes, dimensional modelling and reporting tools.
Experience in designing interactive dashboards, reports, performing ad-hoc analysis and visualizations using Tableau, Power BI, Arcadia & Matplotlib.
Excellent experience of Big Data and Hadoop ecosystems inDataScience, BI, DW projects
Experience in implementing Lakehouse solutions on Azure Cloud using AzureDataLake and Databricks DeltaLake.
Experience in developing Spark jobs,Hivejobs to summarize and transformdata.
Experienced in ETL processes,Datawarehousingmethodologies and concepts including star schemas, snowflake schemas, dimensional modelling and reporting tools, OperationsDataStore concepts,DataMart and OLAP technologies.
Experienced in harvesting metadata from Snowflake, AWS S3, and MySQL using TalendDataCatalog, Talend ETL, andCollibra.
Experience in building and architecting multipleDatapipelines, end to end ETL and ELT process forDataingestion and transformation inGCPand coordinate task among teh team.
Expertise in developing streaming applications in Scala usingKafkaand SparkStructured Streaming.
Hands of experience inGCPcomponents like Big Query, GCS bucket, G-Cloud functions, Cloud Dataflow, Pub/Sub cloud shell, GSUTIL, BQ command line utilities,DataFusion, Composer, Cloud Storage, Cloud SQL, Cloud SDK and G-Cloud Commands.
Expert in writing T-SQL, working onSSIS, SSRS, SSAS, andDataMigration projects.
Expertise inDataExtraction, Transforming and Loading (ETL) between different Systems using SQL tools (SSIS, DTS, Bulk Insert, and BCP).
Good experience in developing desktop and web applications using Java, Spring, JDBC, Eclipse,React.
Experience in working wif Data Bricks, Azure ML service, Azure Synapse Analytics, Azure Analytical services, and various Machine learning frameworks like PyTorch and TensorFlow
Experienced in Designed and developed Datamodels for Database (OLTP), theOperationalDataStore (ODS),Datawarehouse (OLAP), and federated databases to support client enterprise Information Management Strategy.
Experience in developing under scrum methodology and in aCI/CDenvironment using Jenkin.
Hands on experience on developing Blockchain based onRustand Substrate.
Experience in working wif Azure cloud platform (HDInsight,Databricks,DataLake, Blob,DataFactory, Synapse, SQL DB, SQL DWH).
Experience in Developing Spark applications using Spark - SQL inDatabricksfordataextraction, transformation, and aggregation from multiple file formats for analyzing and transforming thedatato uncover insights into teh customer usage patterns.
Experience in using AWSGluecrawlers to create tables on rawdatain AWS S3.
Expert level experience in designing, building and managing applications to process large amounts ofdatain a Hadoop/DevOps(GCP) ecosystem.
Good experience ofDataModeling techniques, Normalization andDataWarehouse concepts using Star schema andSnowflakeschema modeling.
Experience in working on implementing CRUD operations using NoSQL Rest APIs.
Strong communication and analytical skills, including conceptual, requirements interpretation, solution creation and problem-solving abilities.

PROFESSIONAL EXPERIENCE

Senior Data Engineer

Confidential - Fargo, ND

RESPONSIBILITIES:

Involved in various sectors of business, wif In-depth noledge of SDLC (System Development Life Cycle) wif all phases ofAgile- Scrum, & Waterfall.
Used version control tools likeGITHUBto share teh code snippet among teh team members.
Worked wif complex SQL,NoSQL, Stored Procedures, Triggers, and packages in large databases from various servers.
Proven track record in teh areas ofDatascience,Datamining, advanced analytics, Consulting, ETL,BI and
Worked on different Java technologies like Hibernate,spring, JSP, Servlets and developed code for both server side and client side for our web application.
Developed ETL jobs using Spark -Scala to migratedatafrom Oracle to newCassandratables.
Worked on implementing CRUD operations using NoSQLRestAPIs.
Proficiency indatawarehousinginclusive of dimensional modeling concepts and in scripting languages like Python, Scala, and JavaScript.
Created interactivedatacharts on web application using High charts JavaScript library wifdatacoming from ApacheCassandraalong wif NumPy, Pandas library.
Used DGraph,GraphQLto arrange massive multilingual dictionaries in teh Graph database to predict teh semantic domain of input text, increasing teh efficiency of machine translation models
Reduced access time by refactoringdatamodels, query optimization and implementedRediscache to support Snowflake.
Used theGCPenvironment to perform teh following: Cloud Function’s for event-based triggering, Cloud Monitoring and Alerting, Pub/Sub for real-time messaging and CloudDataCatalog build-out.
Spark Developer in Bigdataapplication development through frameworks Hadoop, Spark,Hive, Sqoop, Flume, Airflow.
Used cloud shell SDK inGCPto configure teh servicesDataProc, Storage, BigQuery
Created windows services dat retrieve call record details and text messagedata, tan sent thedatatoRabbitMQfor consumption by Quantitative Strategy.
UtilizedHivepartitioning, Bucketing and performed various kinds of joins onHivetables.
Worked wif DevOps team to setup serverless based pipelines forLamdbacode deployment.
Developed an IAM application using Spring, JavaEE, Oracle, Okta,RedisandPostman wif microservices architecture and REST services.
Implemented variousdataflows from various sources, transformations and to targets including enterprisedataandIOTdatausing Java, Spring, Spring Cloud Stream inRabbitMQand Kafka messaging
Upgraded and convertedGraphQLimplementation to Graphene to provide session autantication and user permission
Used Spark Streaming APIs to perform transformations and actions on teh fly for building common learnerdatamodel which gets thedatafromKafkain near real time and persist it to Cassandra.
Analyzed and planneddatamigration from NoSQL databases including MongoDB,Cassandraand Neo4J fordatamigration purposes.
Implementation ofRabbitMQand Kafka systems for high throughput and scalabledataflows
Built aReactNative application to be used by delivery drivers on teh road.
Wrote all functions and stored procedures powering Everett’s custom RESTful partner API.
Developed Lakehouse architecture wif teh halp of DatabricksDelta(DeltaLake).
Improved a Python-based tool dat queried a Vertica cluster to generate millions of hashes dat were pipelined via twemproxy into Redis.
Skilled in ingestingdatainto HDFS from various Relational databases likeMYSQL, Oracle, DB2, Teradata, Postgres using sqoop.
Extract Real time feed usingKafkaand Spark Streaming and convert it to RDD and processdatain teh form ofDataFrame and save thedataas Parquet format inHDFS.
Created Spark Streaming jobs using Python to read messages fromKafka& downloadJSON files from AWS S3 buckets
Design ETL process inDatabricksto handledatavolume and complexity including incremental loading, anddataaggregation/filtering for reduction.
Used Spark Streaming to receive real timedatafrom theKafkaand store teh streamdatato HDFS using Python and NoSQL databases such as HBase and Cassandra
Movingdatain and out toHadoopFile System Using Talend BigDataComponents. Involved in creating Hive Internal.
Migratedataextract and transform jobs from Matillion to Airflow, in Python, to design and load intoGCPBigQuery targets.
Created aReactclient web-app backed by serverless AWS Lambda functions to LINKS Interact wif an AWS Sagemaker Endpoint.
Integrated wif Restful APIs to create Service now Incidents when there is a process failure wifin teh batch job.
Involved in Software development,Datawarehousingand Analytics andDataengineering projects using Hadoop, MapReduce, Pig, Hive and other open-source tools/technologies.
Build a program wif Python andapachebeamand execute it in cloud Dataflow to runDatavalidation between raw source file and big query tables.
Used AWS lambda to run code virtually. queries from Python using Python -MySQLconnector and MySQL database package.
Monitoringbigquery, Dataproc and cloudDataflow jobs via Stackdriver for all teh environment.
Worked on BigDataIntegration & Analytics based on Hadoop,SOLR, Spark, Kafka,Storm and web Methods.
Used Dataprep fordataanalysis of on-premdatain cloud storage and ingested into big query for analysis bydatascienceteam
Strong understanding of teh principals ofDatawarehousing, Fact Tables, Dimension Tables, star and snowflake schema modeling.
Created severalDatabricksSpark jobs clusters wif PySpark to perform several tables to table operations.
Implementeddataengineering and ETL solutions leveraging CI/CD software including Pentaho Kettle, S3, EC2, Jenkins, maven,GitHub, artifactory etc.
Ingesteddatain one or more Azure services (AzureDataLake, Azure Storage,Azure SQL, Azure DW) and processing thedatain AzureDatabricks.
Involved in all phases ofdatawarehouse project life cycle. Designed and developed ETL Architecture to loaddatafrom various sources like DB2 UDB, Oracle, Flat files, XML files,SFDC.
Develops new components and updates to teh frontend application usingReact
Prepared queries for Databricks using Python, Spark, SQL andNoSQL.
Developing and configuring Build and Release (CI/CD) processes using AzureDevOps, along wif managing application code using Azure GIT wif required security standards for .Net and java applications.
Designing and buildingDevOpsto provide CICD using Team city, GIT, Maven to reduce build & deployment times, increase automation and efficiently analyze code& security vulnerabilities prior to production
Used MySQL as backend database and MySQL dB ofpythonas database connector to interact wif MS SQL server.

Environment: Hadoop,Alteryx, Hive, HDFS, Pyspark, Oozie, HBase,Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting, Cloudera.

Data Engineer

Confidential - Chicago, IL

RESPONSIBILITIES:

Involved in teh design, build and management of large-scaledatastructures and pipelines and efficient Extract/Load/Transform(ETL) workflows/feeds.
Implemented teh BigDatasolution usingHadoop, Hive, and Informatica to pull/load thedatainto teh HDFS system.
Worked wif cloud PUB/SUB to replicatedatareal-time from source system toGCPBig Query.
Developing teh automated Devops CI/CD solution usingGitHub, Jenkins, python, Informatica PowerCenter
Skilled inDataWarehousing/ETL programming and fulfilment ofdatawarehouse project tasks such asdataextraction, cleansing, aggregating, validations, transforming and loading.
Has noledge in building and architecting multipleDatapipelines, end to end ETL and ELT process forDataingestion and transformation inGCP.
Worked onmicroservicesa fully automated continuous integration system using Git, Jenkins,MySQL and custom tools developed in Python and Bash.
Containerized spring bootdataflow microservices to be deployed on Kubernetes connecting to Kafka/RabbitMQ.
Developed full text search platform using NoSQL and Logstash Elastic Search engine, allowing for much faster, more scalable and more intuitive user searches
Developed Satoridataprocessing consuming from Kafkadatabus and publishing to elastic search/Cassandra
Designed Pipelines wif Apache Beam, KubeFlow, Dataflow and orchestrated jobs intoGCP.
Architected and implemented ETL/ELT fordatamovement using AzureDataFactory (ADF) andSSIS.
Mentor other members of thedataengineering team on how to most efficiently utilize available services inGCP.
Used Google Cloud Platform Services (GCP) to process and managedatafrom streaming sources.
UsedRestAPI wif Python to ingestDatafrom and some other site to BIG QUERY.
Wrote spark sql job injavaspring boot using dataframe api to convert frontend sql typed in by user into react app into a backend spark/sql job.
Developed API query elastic search andCassandradatabases for time seriesdata, aggregation, time series database.
Created logging for ETL load at package level using event handlers and created a process to log number of records processed by each package usingSSIS.
Worked wifNoSQLdatabases like Cassandra and HBase and developed real-time read/write access to very large datasets via HBase.
CreateDataQuality pipelines for movingdatalake andDatawarehouse from Hadoop/Teradata environment to Google CloudGCP/ BigQuery
Worked onapachesparkwif Python to increase teh performance of MySQL queries.
Created Python / SQL scripts, to transformDatabricksnotebooks from Redshift table into Snowflake S3 buckets.
Created severalDatabricksSpark jobs wif Pyspark to perform several tables to table operations.
Creating Spark notebooks, Application Insights for accessing logs and real-time analytics).

Environment: Python, GCP, Airflow, SAP ECC, SparkML, SQL, agile.

Software Developer

Confidential - Dallas, TX

RESPONSIBILITIES:

Involved using ETL toolInformaticato populate teh database,datatransformation from teh old database to teh new database using Oracle and SQL.
Created ETL applications using Python to load data from source to destination Cloud Data Base Storage).
Used Docker for building and testing teh containers locally and deploy toAWSECS.
Worked wif streaming of data through a web hook (using Cloud Functions) for various data sources.
UsedSSISto create ETL packages to Validate, Extract, Transform and LoaddataintoDataWarehouse andDataMarts.
Configured Spark streaming to receive real timedatafrom theKafkaand store teh streamdatato HDFS using Scala.
Used GIT as version control and Bit Bucket as repository code management.
Created automatic Jenkins builds when pushing code on GIT to a newly created branch.
Extensively Created documented and maintained logical & physical database models.
Delivered data solutions in report/presentation format according to customer specifications and timelines.
Wrote complex T-SQL and PL/SQL queries, stored procedures, views, triggers, functions, table variables, and cursors as per business requirement.

Environment: Agile, Redshift, Hive, Hbase, MySQL, Sqoop, Excel, Oozie, SSRS, ETL, Business Objects.

We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

Fargo, ND

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship