Sr. Data Engineer Resume

SUMMAR

Over 8+ years of experience as a Data Engineer, Data Analyst, Data Integrating, Big Data, Data Modelling - Logical and Physical, and Implementation of Business Applications using the Oracle Relational Database Management System RDBMS.
Strong experience working with Oracle 12c/11g/10g/9i/8i, SQL, SQL Loader and open Interface to analyse, design, develop, test, implement database applications using Client/ Server application.
Knowledge in database conversion from Oracle and SQL Server to PostgreSQL and MySQL.
Worked on projects that involved Client/Server Technology, customer implementation involving GUI Design, Relational Database Management Systems RDBMS, and Rapid Application Development Methodology.
Practical knowledge in PL/SQL for creating stored procedures, clusters, packages, database triggers, exception handlers, cursors, cursor variables.
Understanding and analysis of Monitoring/Auditing tools to gain in dept knowledge in AWS such as CloudWatch and Cloud Trail.
In-depth familiarity of AWS DNS Services via Route53. Understanding the several sorts of routes: simple, weighted, latency, failover, and geolocational.
Effective Algorithm expertise with Hadoop ecosystem components such as Hadoop Map-Reduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Hortonworks, and Flume, as well as installing, configuring, monitoring, and using them.
Amazon EMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloudwatch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshift, RDS, Aethna, Zeppelin, and Airflow professionals.
Handling, organizing, and operating databases such as MySQL and NoSQL databases such as MongoDB and Cassandra.
Sound knowledge of AWS cloud building templates and how to transmit information using the SQS service via java API.
AWS Snowflake experience generating separate virtual data warehouses with differently sized classes
Worked on Teiid and Spark Data Virtualization, RDF graph Data, Solr Search, and Fuzzy Algorithm.
Thoro

PROFESSIONAL EXPERIENCE:

Confidential

Sr. Data Engineer

Responsibilities:

Involved in Identifying requirements and developing models according to the customer's specifications and drafting detailed documentation. Implementing new drill to detail dimensions into the data pipeline upon on the business requirements. Worked on ETL pipeline to source these tables and to deliver this calculated ratio data from AWS to Datamart (SQL Server) & Credit Edge server Experience in managing large - scale, geographically-distributed database systems, including relational (Oracle, SQL server) and NoSQL (MongoDB, Cassandra) systems. Designed and implemented by configuring Topics in new
Kafka cluster in all environment. Creation of best practices and standards for data pipelining and integration with Snowflake data warehouses. Worked on both External and Managed HIVE tables for optimized performance. Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka. Developed and designed system to collect data from multiple portal using Kafka and then process it using spark. Loaded data using ETL tools like SQL loader and external tables to load data from data warehouse and various other database like SQL Server and DB2. Implemented Composite server for the data virtualization needs and created multiples views for restricted data access using a REST API. Configured Visual Studio to work with AWS enabling a suitable environment for writing code. Performed Data Cleaning, features scaling, features engineering
Using pandas and numpy packages in python and build models using Predictive Analytics. Developed Python scripts to automate the ETL process using Apache Airflow and CRON scripts in the UNIX operating system as well. Created an automated event driven notification service utilizing SNS, SQS, Lambda, and Cloud Watch. Managed Amazon Web Services like EC2, S3 bucket, ELB, Auto-Scaling, Dynamo DB, Elastic search. Implemented Data Lake to consolidate data from multiple source databases such as Exadata, Teradata using Hadoop stack technologies SQOOP, and HIVE/HQL. Used AWS SQS to send the processed data further to the next working teams for further processing. Hands on with Redshift Database (ETL data pipe lines from AWS Aurora - MySQL Engine to Redshift) Design and implement secure data pipelines into a Snowflake data warehouse from on-premise and cloud data sources Migrated
SQl server database into MySQL using Data Transformation Services. Worked with various complex queries with joins, sub-queries, and nested queries in SQL queries. Worked with SQL loader and control files to load the data in different schema tables. Developed PySpark and SparkSQL code to process the data in Apache Spark on Amazon EMR to perform the necessary transformations based on the STMs developed Used S3 Bucket to store the jar's, input datasets and used Dynamo DB to store the processed output from the input da

Confidential

Data Engineer

Responsibilities:

Committed to identifying requirements, developing models based on client specifications, and drafting full evidence. Relying on the business requirements, adding new drill - down dimensions to the data flow. Developed an ETL pipeline to source these datasets and transmit calculated ratio data from AWS to
Datamart (SQL Server) and Credit Edge. Team leader with large-scale, widely distributed database systems, including relational (Oracle, SQL server) and NoSQL (MongoDB, Cassandra) databases. Designed and implemented in all scenarios through configuring Topics in a new Kafka cluster. Developing And maintaining best practises and standards for data pipelining and Snowflake data warehouse integration. Streamlined the speed of both External and Managed HIVE tables. Worked mostly on requirements and technical phase of the Streaming Lambda Architecture, that uses Spark and Kafka to provide real-time streaming. Created and developed a system that uses Kafka to collect data from multiple portals and then processes it using Spark. Loaded information from the data warehouse and other systems such as SQL Server and DB2 using ETL tools such as SQL loader and external tables. Using a REST API, implemented Composite server for data isolation and generated multiple views for restricted data access. Arranged Visual Studio to integrate with AWS, making it easier to write code in a pleasant setting. Employed Python's pandas and numpy libraries to clean data, scale features, and engineer features, as well as Predictive Analytics to create models. Applied Apache Airflow and CRON scripts in the UNIX operating system to develop Python scripts to automate the ETL process. Managed Amazon Web Services like EC2, S3, ELB, Auto-Scaling, Dynamo DB, and Elastic Search Using Hadoop stack technologies

SQOOP and HIVE/HQL, implemented Data Lake to consolidate data from multiple source databases such as Exadata and Teradata. AWS SQS was used to transmit the processed data to the next working teams for processing. Design and instal secure data pipelines into a Snowflake data warehouse from on-premise and cloud data sources using the Redshift Database (ETL data pipe lines from AWS Aurora - MySQL Engine to Redshift). Used Data Transformation Services to convert a SQL server database to MySQL. Developed complicated SQL queries that included joins, sub-inquiries, and nested queries.

Loaded data into different schema tables using SQL loader and control files. Created PySpark and SparkSQL code to process data in Apache Spark on Amazon EMR and conduct the required transformations depending on the STMs developed. The jars and input datasets were stored in S3 Bucket, and the processed output from the input data set was stored in Dynamo DB. Used the AWS data pipeline for data extraction, transformation, and loading from homogeneous and heterogeneous data sources, as well as the Python matplot toolkit to try unique graphs for cor

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship