We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

0/5 (Submit Your Rating)

Bentonville, AR

SUMMARY:

  • Senior Data Engineer with 7+ years of experience in Implementing highly scalable, massively parallel processing data pipelines,dataLakes and Analytics platforms using AWS, Presto, Apache Hudi, Apache Spark and other Hadoop opensource frameworks.
  • Experience with Informatica ETL (Informatica PowerCenter/Developer) andDatawarehousing.Expertise inDatawarehousing and Business Intelligence technologies along with 5 years ofdatavisualization using Tableau.
  • Experience in deploying and managing teh multimode Hadoop cluster with dierent Hadoop distributions likeCloudera Manager, HDP and AWS.
  • Experience in design and development of Web forms using Spring MVC,JavaScript,JSON and JQplotter.
  • Experience in analyzingdatausing Python, SQL, PySpark, Spark SQL forDataMining,DataCleansing and Machine Learning.
  • Hands on experience in migrating on premise ETLs to Google Cloud Platform (GCP) using cloud native tools such as BIG query, Cloud Data Proc, Google Cloud Storage, Composer
  • Experience in Project / Product development team in Agile - Scrum / Kanban Framework as Agile PR actioner/ Scrum Master in Large Scaled Scrum (Less) execution through Scrum ceremonies.
  • Hands on experience inGCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, Pub/Sub cloud shell, GSUTIL, BQ command line utilities,DataProc, Stack driver.
  • Experience in creating configuration files for SSIS Project deployments, jobs to automate theETL Load and performance tuning SSIS and T-SQL code.
  • Experienced in creating dashboards anddatasets in Power BI and Grafana and generating scheduled reports usingSSRS.
  • Expert in Amazon EMR, Spark, Kinesis, S3, Boto3, Beanstalk, ECS, CloudWatch, Lambda, ELB, VPC, Elastic Cache, DynamoDB, Redshift, RDS, Aetna python, Zeppelin & Airflow.
  • Hands on experience designing and buildingdatamodels anddatapipelines onDataWarehouse focus andDataLakes.
  • Experienced with full software development life cycle (SDLC), architecting scalable Platforms, Object-oriented programming (OOP), database design andagilemethodologies.
  • Experience using ETL anddatavisualizationtools. Testing software compatibility. Writing and processing SQL scripts
  • Expertise in designing and deployment of Hadoop cluster and different BigDataanalytic tools including Pig, Hive,ApacheSpark, with Cloudera Distribution.
  • Excellent knowledge inAgileMethodologies, Scrum stories and sprints experience in a Python based environment, along withdataanalytics,datawrangling.
  • Extensive Experience knowledge in developing applications in a single page (SPAs) using various JavaScript frameworks like SAAS, Angular Js, Backbone.js, Node.js, Vue’s andExpress.js.
  • Experience in Azure with Databricks, Databricks Workspace for Business Analytics, ManageClusters in Databricks, Managing theMachine Learning Lifecycle.
  • Good Hands-on Experience on T-SQL databases like MongoDB, Cassandra, and HBase.
  • Experience working with Snowflake for runningdatapipelines that TEMPhas huge volumes.
  • Working experience with version control tools like SVN, Git, GitHub and Bitbucket.

TECHNICAL SKILLS:

Hadoop/Big Data Technologies: Hadoop 3.0, HDFS, MapReduce, HBase 1.4, Apache Pig 0.17, Hive 2.3, Sqoop 1.4, Apache Nifi Apache Impala 3.0, Oozie 4.3, Yarn, Apache Flume 1.8, Angular2.0, Kafka 1.1, Zookeeper 3.4

Hadoop Distributions: Cloudera, Hortonworks, MapR

Cloud: AWS, GCP, Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake and Data Factory.

Programming Language: Scala 2.12, Python 3.6, SQL, PL/SQL, Shell Scripting, Storm 1.0

Web/Application Server: Apache Tomcat 9.0.7, JBoss, Web Logic, Web Sphere

SDLC Methodologies: Agile, Waterfall

Databases: Oracle 12c/11g, SQL Database

Tools: TOAD, SQL PLUS, SQL

Operating Systems: Linux, Unix, Windows 10/8/7

NoSQL Databases: HBase 1.4, Cassandra 3.11, MongoDB

Version Control: GIT, SVN, CVS

PROFESSIONAL EXPERIENCE

Sr. Data Engineer

Confidential, Bentonville, AR

Responsibilities:

  • As aDataEngineerinDataPlatform team, I am responsible for designing, architecting and implementing secure batch and real-time applications using Snowflake, Python, Airflow, DBT, Kubernetes.
  • Involved in migrating theETLapplication from development environment to testing environment.
  • Involved in migration of datasets and ETL workloads with Python from On-prem toAWS Cloud services.
  • Developed Simple to complex Map/reduce Jobs usingJavaprogramming language that is implemented using Hive and Pig.
  • Design and Developed teh Key word Driven Automation Framework withSeleniumWebDriver and Java.
  • Written multiple MapReduce programs inJavafordataextraction, transformation, and aggregation from multiple file-formats including XML, JSON, CSV, and other compressed file formats.
  • Developed Spark/Scala, and Python for regular expression projects in theHadoop/Hive environment.
  • Created React-D3DataVisualizationDashboard Apps for A.I. Monitoring andManagement application.
  • Designed SSIS packages, performed multidimensional analysis by creating cubes in SSAS and reported them inSSRSas a part of a project.
  • Design and Developed teh Page Object Modelling Automation Framework usingSelenium, C# and.Net Framework
  • Designed and developed REST based Microservices using thespringbootfor teh application to seamlessly integrate with supporting sub systems.
  • Developed PySpark and SparkSQL code to process thedatain Apache Spark on Amazon EMR to perform teh necessary transformations based on teh STMs.
  • Developed REST Microservices which are like API’s used for Home Automation. They also keep teh data in synchronization between two database services.
  • Developeddatapreprocessing pipelines using Python, R, Linux scripts on on-premise High-performance cluster and GCPcloud VMs
  • Worked as a Big Data Engineer to Import and export data from different databases
  • Developed and deployed teh outcome using spark and Scala code in Hadoop cluster running on GCP.
  • Implemented anETLpipeline with Python, Docker-compose, Google Cloud function, Cloud Storage for CSV and Google Big query data warehouse management.
  • Used teh Data Stage Director and its run-time engine to run teh solution, testing and debugging its components, and monitoring teh resulting executable versions
  • Programmer analysts with expertise in Tableau Servers in ETL, Teradata, and other EDWdataintegrations and developments.
  • Setting up databases inGcpusing RDs storage using S3 bucket and configuring instance backups to s3 bucket and docker runtime environment.
  • Implemented timestamp-based CDC and SCD type 2 and type 3 to capture delta usingMatillion
  • Monitoring Big query, Dataproc and cloud Data flow jobs via Stackdriver for all teh environment.
  • Implemented Spark Scripts usingScala, Spark SQL to access hive tables intoSpark for faster processing ofdata.
  • Migrated from db2, oracle, T-SQL database’s tables data to big query using teh GCP Data fusion pipelines and scheduled in Cloud Composer.
  • Created and deployed Several Reports using salesforce.com platform. Involved in system analysis, design, development and implementation of web based and client/server application using HTML, CSS, JavaScript,Angular.js,Pythonand Django
  • Leading teh testing efforts in support of projects/programs across a large landscape of technologies (Unix, Angular JS, AWS, Sause LABS,CucumberJVM, MongoDB.
  • Create develop and test environments of different applications by provisioningKubernetes clusters on AWS using Docker, Ansible, andTerraform.
  • Developed Restful Microservices using Flask and Django and deployed on AWS servers using EBS and EC2.
  • Worked on creating Kafka producers using KafkaJavaProducer Api for connecting to external Rest live stream application and producing messages to Kafka topic. created datasets and reports using Power BI andSSRSto have detailed overview of teh equipment usage metrics and creating alerts.
  • Wrote Spark applications usingScalato interact with teh T-SQL database usingSpark SQL Context and accessed Hive tables using Hive Context.
  • Performeddatacomparison between SDP (StreamingDataPlatform) real-timedatawithAWS S3dataand Snowflakedatausing Databricks, Spark SQL, and Python.

Environment: Teradata, Tableau, NoSQL, GCP, Airflow, Kubernetes.Spring MongoDB, Python, Databricks, Snowflake, Redshift, Splunk, DB2, Matillion, Hadoop, Couch space, Microservices, Ansible, Machine Learning, DataLakes, Linux, JSON, ETL, JavaScript, AWS, Spark, Quantum, HTML, CI/CD.

Data Engineer

Confidential, Whippany NJ

Responsibilities:

  • Involved in gathering requirements for teh reports, dashboards, and OLAP cube, which have to be built on DWH usingSSRS/SSAS.
  • Involved in optimization at all levels either in SSIS packages, SQL queries,SSRSreport/dashboards, and SSAS cubes.
  • Involved in moving teh SSIS ETL packages from teh development server to QA and tan to production smoothly.
  • Involved in data migration, data profiling and data integration withSnowflake, Matillion, SQL Server Integration Services (SSIS/SSDT), Informatica etc.
  • Developed Batch and streaming pipelines usingGCPServices, Docker, Kubernetes, and Cloud composer (Airflow).
  • Develop features on an enterprise React/Java (SpringBoot) web application
  • Involved in creating and Deployment of REST API and Micro Services in Java J2EEE usingSpringBoot.
  • Design, develop, implement and execute marketing campaigns for US card customers using Unica Affinium campaign, Snowflake, AWS S3, Spark, and Databricks.
  • Developed data orchestration and data transformation jobs, performed data quality checks usingMatillion.
  • Strong understanding ofHadoopecosystem such as HDFS, MapReduce, HBase, Zookeeper, Pig,Hadoopstreaming, Sqoop, Oozie and Hive
  • Consumed REST based Microservices with Rest template based on RESTful APIs and designed, developed and tested HTML, CSS, jQuery and React.js that meets accessibility and web browser
  • Designed and ImplementedApacheStreaming Apps to process Userdatafrom Upstream as per teh Design.
  • Worked onGCPservices such as compute engine, cloud load balancing, cloud storage, cloud SQL, stack driver monitoring, and cloud deployment manager.
  • Designed and implementeddatapipelines that handle alot ofdatastreaming.
  • IntegratedSpringDAO fordataaccess using Hibernate, used HQL and SQL for querying databases.
  • Used teh Data stage Designer to develop processes for extracting, cleansing, transforming, integrating, and loading data into data mart database.
  • Used Bash andPythonincluded Boto3 to supplement automation provided by Ansible andTerraformfor tasks such as encrypting EBS volumes, backing AMIs and scheduling Lambda functions for routine AWS tasks.
  • Designed and Developed SSIS code for improvingdataquality and metadata management
  • Maintaining applications and services written in Spring Boot, Spring Framework,Java, Angular, RxJS, JavaScript deployed onAWScloud.
  • JavaScript and AngularJS in conjunction with HTML5, CSS3 standards, with front-end UI team. Used JSTL, Custom Tags, HTML/DHTML in JSP’s.UsedJenkinsfor CI and deployment.
  • Analyzed events fromKafkain real-time using Spark to create streamingdatapipelines on Amazon EMR.
  • Automated repetitive tasks and Cron jobs usingApacheAirflow scheduler and monitor workflows.
  • Develop bulk-load scripts using HQL/Scala/Python that extract-transform-load huge volume and varietydatato and from disparate persistent layers.
  • Deployed SSIS code with parameters to Catalogs for business to execute packages for adhoc analysis.
  • Used cloud shell SDK in GCP to configure teh services Data Proc, Storage, Big Query.
  • Used teh GCP components like Big Query, GCS bucket, G-Cloud functions, Cloud Dataflow, Pub/Sub cloud shell, GSUTIL, BQ command line utilities, Data Proc, Composer, Cloud Storage, Cloud SQL, Cloud SDK and G-Cloud Commands.
  • Proficiency in writing complex T-SQL queries, and PL/SQL to write StoredProcedures, Functions, and Triggers
  • Used and configured multiple AWS services like RedShift, EMR, EC2, and S3 to maintain compliance with organization standards.

Environment: Python, PySpark, ETL, Glue, lambda, NoSQL, CI/CD, Docker, Kubernetes, MongoDB, Glue, AWS, Linux, Ansible, Machine Learning, Couch space, Microservices, Hadoop, DataLakes, JSON, Data Proc, Kafka, JavaScript, DB2, Teradata, Snowflake, Agile, Tableau, Airflow.

Data Engineer

Confidential, Mason, OH

Responsibilities:

  • Involved in building and deploying cloud-baseddatapipelines and BI applications using AWS and GCP services.
  • Implementeddataingestion strategies and scalable pipelines,datawarehouse, anddatamart structures in teh Snowflakedataplatform.
  • Developed T-SQL scripts and shell scripts to movedatafrom source systems to staging and from staging toDatawarehouse in batch processing mode and understanding of Google cloud platforms by AWS to retrieve thedata.
  • Developed Batch and streaming pipelines usingGCPServices, Docker, Kubernetes, and Cloud composer (Airflow).
  • Develop features on an enterprise React/Java (SpringBoot) web application
  • Createddatapipelines inGCPusing different airflow operators for ETL-related jobs.
  • Worked on ingestingdatafrom SQL-server to S3 using Sqoop with in AWS EMR.
  • Used AWS glue catalog with crawler to get thedatafrom S3 and perform SQL query operations using AWS Atana.
  • Written PySpark job in AWS Glue to mergedatafrom multiple tables and in Utilizing Crawler to populate AWS GluedataCatalog with metadata table definitions.
  • Wrote T-SQL statements for retrieval ofdataand involved in performance tuning ofT-SQLqueries.
  • Managed model deployment by building a Flask app and storing it in aDockercontainer.
  • Used AWS EMR to transform and move large amounts ofdatainto and out of AWS S3.
  • Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs using CloudWatch.
  • Exploring and performing POC on Google Cloud Platform (including Cloud MachineLearning, CloudDataStore, Bigtable, Big Query, Data Lab, and Data Studio).
  • Automated Unix shell scripts to verify teh count of records added every day due to incrementaldataload for a few of teh base tables to check for teh consistency.

Environment: NoSQL, Hadoop, Python, Mainframes, Oracle, GCP, Tableau, SQL, AWS, Linux, ShellScripting, CI/CD, GCP, Airflow.

Software Developer

Confidential, Burlington, MA

Responsibilities:

  • Importeddatausing Sqoop to loaddatafrom MySQL and Oracle to HDFS.
  • Involved in developing Class diagrams, Sequence Diagrams using UML.
  • Designed various interactive front-end web pages using HTML, CSS, jQuery &Bootstrap.
  • Have Implemented Spring Batch module for achieving batch Transactions.
  • UsedspringMVC and JSF MVC for implementing teh web layer and web application respectively.
  • Performed unit/module testing of software to find errors to make sure programs meet specifications.
  • Documented teh work we did and teh solutions we built.
  • Involved in Unit Testing, User Acceptance Testing and Bug Fixing.
  • Worked in small scrum teams in an agile development environment.
  • Used Jira as issue tracking system Jira setup.

Environment: MYSQL, Python, Mainframes, Oracle, Tableau, SQL, AWS, Linux, ShellScripting, CI/CD, Airflow.

We'd love your feedback!