Senior Data Engineer Resume
2.00/5 (Submit Your Rating)
Fargo, ND
SUMMARY
- Over all 7 years of working experience as a Sr.Data engineer, providing support in teh areas of project coordination, data analysis and metrics reporting.
- Experience working wif varied forms ofdatainfrastructure inclusive of relational databases such as SQL, Hadoop, Spark, and column - oriented databases such as MySQL.
- Good experience of Data Warehousing methodologies and concepts, including star schemas, snowflakes, ETL processes, dimensional modelling and reporting tools.
- Experience in designing interactive dashboards, reports, performing ad-hoc analysis and visualizations using Tableau, Power BI, Arcadia & Matplotlib.
- Excellent experience of Big Data and Hadoop ecosystems inDataScience, BI, DW projects
- Experience in implementing Lakehouse solutions on Azure Cloud using AzureDataLake and Databricks DeltaLake.
- Experience in developing Spark jobs,Hivejobs to summarize and transformdata.
- Experienced in ETL processes,Datawarehousingmethodologies and concepts including star schemas, snowflake schemas, dimensional modelling and reporting tools, OperationsDataStore concepts,DataMart and OLAP technologies.
- Experienced in harvesting metadata from Snowflake, AWS S3, and MySQL using TalendDataCatalog, Talend ETL, andCollibra.
- Experience in building and architecting multipleDatapipelines, end to end ETL and ELT process forDataingestion and transformation inGCPand coordinate task among teh team.
- Expertise in developing streaming applications in Scala usingKafkaand SparkStructured Streaming.
- Hands of experience inGCPcomponents like Big Query, GCS bucket, G-Cloud functions, Cloud Dataflow, Pub/Sub cloud shell, GSUTIL, BQ command line utilities,DataFusion, Composer, Cloud Storage, Cloud SQL, Cloud SDK and G-Cloud Commands.
- Expert in writing T-SQL, working onSSIS, SSRS, SSAS, andDataMigration projects.
- Expertise inDataExtraction, Transforming and Loading (ETL) between different Systems using SQL tools (SSIS, DTS, Bulk Insert, and BCP).
- Good experience in developing desktop and web applications using Java, Spring, JDBC, Eclipse,React.
- Experience in working wif Data Bricks, Azure ML service, Azure Synapse Analytics, Azure Analytical services, and various Machine learning frameworks like PyTorch and TensorFlow
- Experienced in Designed and developed Datamodels for Database (OLTP), theOperationalDataStore (ODS),Datawarehouse (OLAP), and federated databases to support client enterprise Information Management Strategy.
- Experience in developing under scrum methodology and in aCI/CDenvironment using Jenkin.
- Hands on experience on developing Blockchain based onRustand Substrate.
- Experience in working wif Azure cloud platform (HDInsight,Databricks,DataLake, Blob,DataFactory, Synapse, SQL DB, SQL DWH).
- Experience in Developing Spark applications using Spark - SQL inDatabricksfordataextraction, transformation, and aggregation from multiple file formats for analyzing and transforming thedatato uncover insights into teh customer usage patterns.
- Experience in using AWSGluecrawlers to create tables on rawdatain AWS S3.
- Expert level experience in designing, building and managing applications to process large amounts ofdatain a Hadoop/DevOps(GCP) ecosystem.
- Good experience ofDataModeling techniques, Normalization andDataWarehouse concepts using Star schema andSnowflakeschema modeling.
- Experience in working on implementing CRUD operations using NoSQL Rest APIs.
- Strong communication and analytical skills, including conceptual, requirements interpretation, solution creation and problem-solving abilities.
PROFESSIONAL EXPERIENCE
Senior Data Engineer
Confidential - Fargo, ND
RESPONSIBILITIES:
- Involved in various sectors of business, wif In-depth noledge of SDLC (System Development Life Cycle) wif all phases ofAgile- Scrum, & Waterfall.
- Used version control tools likeGITHUBto share teh code snippet among teh team members.
- Worked wif complex SQL,NoSQL, Stored Procedures, Triggers, and packages in large databases from various servers.
- Proven track record in teh areas ofDatascience,Datamining, advanced analytics, Consulting, ETL,BI and
- Worked on different Java technologies like Hibernate,spring, JSP, Servlets and developed code for both server side and client side for our web application.
- Developed ETL jobs using Spark -Scala to migratedatafrom Oracle to newCassandratables.
- Worked on implementing CRUD operations using NoSQLRestAPIs.
- Proficiency indatawarehousinginclusive of dimensional modeling concepts and in scripting languages like Python, Scala, and JavaScript.
- Created interactivedatacharts on web application using High charts JavaScript library wifdatacoming from ApacheCassandraalong wif NumPy, Pandas library.
- Used DGraph,GraphQLto arrange massive multilingual dictionaries in teh Graph database to predict teh semantic domain of input text, increasing teh efficiency of machine translation models
- Reduced access time by refactoringdatamodels, query optimization and implementedRediscache to support Snowflake.
- Used theGCPenvironment to perform teh following: Cloud Function’s for event-based triggering, Cloud Monitoring and Alerting, Pub/Sub for real-time messaging and CloudDataCatalog build-out.
- Spark Developer in Bigdataapplication development through frameworks Hadoop, Spark,Hive, Sqoop, Flume, Airflow.
- Used cloud shell SDK inGCPto configure teh servicesDataProc, Storage, BigQuery
- Created windows services dat retrieve call record details and text messagedata, tan sent thedatatoRabbitMQfor consumption by Quantitative Strategy.
- UtilizedHivepartitioning, Bucketing and performed various kinds of joins onHivetables.
- Worked wif DevOps team to setup serverless based pipelines forLamdbacode deployment.
- Developed an IAM application using Spring, JavaEE, Oracle, Okta,RedisandPostman wif microservices architecture and REST services.
- Implemented variousdataflows from various sources, transformations and to targets including enterprisedataandIOTdatausing Java, Spring, Spring Cloud Stream inRabbitMQand Kafka messaging
- Upgraded and convertedGraphQLimplementation to Graphene to provide session autantication and user permission
- Used Spark Streaming APIs to perform transformations and actions on teh fly for building common learnerdatamodel which gets thedatafromKafkain near real time and persist it to Cassandra.
- Analyzed and planneddatamigration from NoSQL databases including MongoDB,Cassandraand Neo4J fordatamigration purposes.
- Implementation ofRabbitMQand Kafka systems for high throughput and scalabledataflows
- Built aReactNative application to be used by delivery drivers on teh road.
- Wrote all functions and stored procedures powering Everett’s custom RESTful partner API.
- Developed Lakehouse architecture wif teh halp of DatabricksDelta(DeltaLake).
- Improved a Python-based tool dat queried a Vertica cluster to generate millions of hashes dat were pipelined via twemproxy into Redis.
- Skilled in ingestingdatainto HDFS from various Relational databases likeMYSQL, Oracle, DB2, Teradata, Postgres using sqoop.
- Extract Real time feed usingKafkaand Spark Streaming and convert it to RDD and processdatain teh form ofDataFrame and save thedataas Parquet format inHDFS.
- Created Spark Streaming jobs using Python to read messages fromKafka& downloadJSON files from AWS S3 buckets
- Design ETL process inDatabricksto handledatavolume and complexity including incremental loading, anddataaggregation/filtering for reduction.
- Used Spark Streaming to receive real timedatafrom theKafkaand store teh streamdatato HDFS using Python and NoSQL databases such as HBase and Cassandra
- Movingdatain and out toHadoopFile System Using Talend BigDataComponents. Involved in creating Hive Internal.
- Migratedataextract and transform jobs from Matillion to Airflow, in Python, to design and load intoGCPBigQuery targets.
- Created aReactclient web-app backed by serverless AWS Lambda functions to LINKS Interact wif an AWS Sagemaker Endpoint.
- Integrated wif Restful APIs to create Service now Incidents when there is a process failure wifin teh batch job.
- Involved in Software development,Datawarehousingand Analytics andDataengineering projects using Hadoop, MapReduce, Pig, Hive and other open-source tools/technologies.
- Build a program wif Python andapachebeamand execute it in cloud Dataflow to runDatavalidation between raw source file and big query tables.
- Used AWS lambda to run code virtually. queries from Python using Python -MySQLconnector and MySQL database package.
- Monitoringbigquery, Dataproc and cloudDataflow jobs via Stackdriver for all teh environment.
- Worked on BigDataIntegration & Analytics based on Hadoop,SOLR, Spark, Kafka,Storm and web Methods.
- Used Dataprep fordataanalysis of on-premdatain cloud storage and ingested into big query for analysis bydatascienceteam
- Strong understanding of teh principals ofDatawarehousing, Fact Tables, Dimension Tables, star and snowflake schema modeling.
- Created severalDatabricksSpark jobs clusters wif PySpark to perform several tables to table operations.
- Implementeddataengineering and ETL solutions leveraging CI/CD software including Pentaho Kettle, S3, EC2, Jenkins, maven,GitHub, artifactory etc.
- Ingesteddatain one or more Azure services (AzureDataLake, Azure Storage,Azure SQL, Azure DW) and processing thedatain AzureDatabricks.
- Involved in all phases ofdatawarehouse project life cycle. Designed and developed ETL Architecture to loaddatafrom various sources like DB2 UDB, Oracle, Flat files, XML files,SFDC.
- Develops new components and updates to teh frontend application usingReact
- Prepared queries for Databricks using Python, Spark, SQL andNoSQL.
- Developing and configuring Build and Release (CI/CD) processes using AzureDevOps, along wif managing application code using Azure GIT wif required security standards for .Net and java applications.
- Designing and buildingDevOpsto provide CICD using Team city, GIT, Maven to reduce build & deployment times, increase automation and efficiently analyze code& security vulnerabilities prior to production
- Used MySQL as backend database and MySQL dB ofpythonas database connector to interact wif MS SQL server.
Environment: Hadoop,Alteryx, Hive, HDFS, Pyspark, Oozie, HBase,Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting, Cloudera.
Data Engineer
Confidential - Chicago, IL
RESPONSIBILITIES:
- Involved in teh design, build and management of large-scaledatastructures and pipelines and efficient Extract/Load/Transform(ETL) workflows/feeds.
- Implemented teh BigDatasolution usingHadoop, Hive, and Informatica to pull/load thedatainto teh HDFS system.
- Worked wif cloud PUB/SUB to replicatedatareal-time from source system toGCPBig Query.
- Developing teh automated Devops CI/CD solution usingGitHub, Jenkins, python, Informatica PowerCenter
- Skilled inDataWarehousing/ETL programming and fulfilment ofdatawarehouse project tasks such asdataextraction, cleansing, aggregating, validations, transforming and loading.
- Has noledge in building and architecting multipleDatapipelines, end to end ETL and ELT process forDataingestion and transformation inGCP.
- Worked onmicroservicesa fully automated continuous integration system using Git, Jenkins,MySQL and custom tools developed in Python and Bash.
- Containerized spring bootdataflow microservices to be deployed on Kubernetes connecting to Kafka/RabbitMQ.
- Developed full text search platform using NoSQL and Logstash Elastic Search engine, allowing for much faster, more scalable and more intuitive user searches
- Developed Satoridataprocessing consuming from Kafkadatabus and publishing to elastic search/Cassandra
- Designed Pipelines wif Apache Beam, KubeFlow, Dataflow and orchestrated jobs intoGCP.
- Architected and implemented ETL/ELT fordatamovement using AzureDataFactory (ADF) andSSIS.
- Mentor other members of thedataengineering team on how to most efficiently utilize available services inGCP.
- Used Google Cloud Platform Services (GCP) to process and managedatafrom streaming sources.
- UsedRestAPI wif Python to ingestDatafrom and some other site to BIG QUERY.
- Wrote spark sql job injavaspring boot using dataframe api to convert frontend sql typed in by user into react app into a backend spark/sql job.
- Developed API query elastic search andCassandradatabases for time seriesdata, aggregation, time series database.
- Created logging for ETL load at package level using event handlers and created a process to log number of records processed by each package usingSSIS.
- Worked wifNoSQLdatabases like Cassandra and HBase and developed real-time read/write access to very large datasets via HBase.
- CreateDataQuality pipelines for movingdatalake andDatawarehouse from Hadoop/Teradata environment to Google CloudGCP/ BigQuery
- Worked onapachesparkwif Python to increase teh performance of MySQL queries.
- Created Python / SQL scripts, to transformDatabricksnotebooks from Redshift table into Snowflake S3 buckets.
- Created severalDatabricksSpark jobs wif Pyspark to perform several tables to table operations.
- Creating Spark notebooks, Application Insights for accessing logs and real-time analytics).
Environment: Python, GCP, Airflow, SAP ECC, SparkML, SQL, agile.
Software Developer
Confidential - Dallas, TX
RESPONSIBILITIES:
- Involved using ETL toolInformaticato populate teh database,datatransformation from teh old database to teh new database using Oracle and SQL.
- Created ETL applications using Python to load data from source to destination Cloud Data Base Storage).
- Used Docker for building and testing teh containers locally and deploy toAWSECS.
- Worked wif streaming of data through a web hook (using Cloud Functions) for various data sources.
- UsedSSISto create ETL packages to Validate, Extract, Transform and LoaddataintoDataWarehouse andDataMarts.
- Configured Spark streaming to receive real timedatafrom theKafkaand store teh streamdatato HDFS using Scala.
- Used GIT as version control and Bit Bucket as repository code management.
- Created automatic Jenkins builds when pushing code on GIT to a newly created branch.
- Extensively Created documented and maintained logical & physical database models.
- Delivered data solutions in report/presentation format according to customer specifications and timelines.
- Wrote complex T-SQL and PL/SQL queries, stored procedures, views, triggers, functions, table variables, and cursors as per business requirement.
Environment: Agile, Redshift, Hive, Hbase, MySQL, Sqoop, Excel, Oozie, SSRS, ETL, Business Objects.