Sr. Data Engineer Resume

SUMMARY:

Over 8+ years of experience as a Data Engineer, Data Analyst, Data Integrating, Big Data, Data Modelling - Logical and Physical, and Implementation of Business Applications using the Oracle Relational Database Management System RDBMS.
Strong experience working with Oracle 12c/11g/10g/9i/8i, SQL, SQL Loader and open Interface to analyse, design, develop, test, and implement database applications using Client / Server application.
Knowledge in database conversion from Oracle and SQL Server to PostgreSQL and MySQL.
Worked on projects that involved Client/Server Technology, customer implementation involving GUI Design, Relational Database Management Systems RDBMS, and Rapid Application Development Methodology.
Practical knowledge in PL/SQL for creating stored procedures, clusters, packages, database triggers, exception handlers, cursors, cursor variables.
Understanding and analysis of Monitoring/Auditing tools to gain in dept knowledge in AWS such as CloudWatch and Cloud Trail.
In-depth familiarity of AWS DNS Services via Route53. Understanding the several sorts of routes: simple, weighted, latency, failover, and geolocational.
Effective Algorithm expertise with Hadoop ecosystem components such as Hadoop Map-Reduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Hortonworks, and Flume, as well as installing, configuring, monitoring, and using them.
Amazon EMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloud watch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshift, RDS, Aetna, Zeppelin, and Airflow professionals.
Handling, organizing, and operating databases such as MySQL and NoSQL databases such as MongoDB and Cassandra.
Sound knowledge of AWS and Azure cloud building templates and how to transmit information using the SQS service via java API.
AWS Snowflake experience generating separate virtual data warehouses with differently sized classes
Worked on Teiid and Spark Data Virtualization, RDF graph Data, Solr Search, and Fuzzy Algor

PROFESSIONAL EXPERIENCE:

Confidential

Sr. Data Engineer

Responsibilities:

Involved in Identifying requirements and developing models according to the customer's specifications and drafting detailed documentation.
Implementing new drill to detail dimensions into the data pipeline upon on the business requirements. Worked on ETL pipeline to source these tables and to deliver this calculated ratio data from AWS to Datamart (SQL Server) & Credit Edge server Experience in managing large - scale, geographically-distributed database systems, including relational (Oracle, SQL server) and NoSQL (MongoDB, Cassandra) systems. Designed and implemented by configuring Topics in new Kafka cluster in all environment.
Creation of best practices and standards for data pipelining and integration with Snowflake data warehouses. Worked on both External and
Managed HIVE tables for optimized performance. Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka.
Developed and designed system to collect data from multiple portals using Kafka and then process it using spark. Loaded data using ETL tools like SQL loader and external tables to load data from data warehouse and various other databases like SQL Server and DB2. Implemented
Composite server for the data virtualization needs and created multiples views for restricted data access using a REST API. Configured Visual
Studio to work with AWS enabling a suitable environment for writing code. Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python and build models using Predictive Analytics.
Developed Python scripts to automate the ETL process using Apache Airflow and CRON scripts in the UNIX operating system as well. Created an automated event driven notification service utilizing SNS, SQS, Lambda, and Cloud Watch. Migrated SQL server database into MySQL using Data
Transformation Services. Worked with various complex queries with joins, sub-queries, and nested queries in SQL queries. Worked with SQL loader and control files to load the data in different schema tables. Developed PySpark and SparkSQL code to process the data in Apache
Spark on Amazon EMR to perform the necessary transformations based on the STMs developed Used S3 Bucket to store the jar's, input datasets and used Dynamo DB to store the processed output from the input data set. Used AWS data pipeline for Data Extraction,
Transformation and Loading from homogeneous or heterogeneous data sources and built various graphs for business decision-making using
Python matplot library. Responsible for Master Data Management (MDM) and Data Lake design and architecture. Data Lake is built using Cloudera Hadoop. Integrated Apache Kafka for data ingestion Hive tables were created on HDFS to store the data processed by Apache Spark on the Cloudera Hadoop Cluster in Parquet format. Used AWS services like EC2 and S3 for small data

Confidential

Data Engineer

Responsibilities:

Committed to identifying requirements, developing models based on client specifications, and drafting full evidence. Relying on the business requirements, adding new drill - down dimensions to the data flow. Developed an ETL pipeline to source these datasets and transmit calculated ratio data from Azure to Datamart (SQL Server) and Credit Edge.
Team leader with large-scale, widely distributed database systems, including relational (Oracle, SQL server) and NoSQL (MongoDB, Cassandra) databases. Designed and implemented in all scenarios through configuring Topics in a new Kafka cluster. Developing and maintaining best practices and standards for data pipelining and Snowflake data warehouse integration. Streamlined the speed of both External and Managed HIVE tables. Worked mostly on requirements and technical phase of the Streaming Lambda Architecture, that uses Spark and Kafka to provide real-time streaming.
Created and developed a system that uses Kafka to collect data from multiple portals and then processes it using Spark. Loaded information from the data warehouse and other systems such as SQL Server and DB2 using ETL tools such as SQL loader and external tables. Using a
REST API, implemented Composite server for data isolation and generated multiple views for restricted data access. Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, Validation and verified its performance over MR jobs Employed Python's pandas and numpy libraries to clean data, scale features, and engineer features, as well as Predictive Analytics to create models. Applied Apache Airflow and CRON scripts in the UNIX operating system to develop Python scripts to automate the ETL process. Worked in Azure environment for development and deployment of Custom Hadoop Applications Using Hadoop stack technologies SQOOP and HIVE/HQL, implemented Data
Lake to consolidate data from multiple source databases such as Exadata and Teradata. Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users. Used Data Transformation Services to convert a SQL server database to MySQL. Developed complicated SQL queries that included joins, sub-inquiries, and nested queries. Used windows Azure SQL reporting services to create reports with tables, charts and maps. Created PySpark and SparkSQL code to process data in Apache Spark on
Amazon EMR and conduct the required transformations depending on the STMs developed. The jars and input datasets were stored in S3 Bucket, and the processed output from the input data set was stored in Dynamo DB. Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in

Confidential

Data Engineer

Responsibilities:

Define the relevant tables from the Data mart and establish the universe links to create new universes in Business Objects according the user needs. Convert existing
ETL pipelines to Hadoop - based systems by designing, developing, and describing the new architecture and development process. Using geographical MongoDB replica sets across several data centers to provide high availability. Developed monitoring and tuning scripts for
PostgreSQL and EDB Postgres Advanced Server databases. Used Git, GitHub, Amazon EC2, and Heroku for deployment, and used extracted data for analysis and carried out various mathematical operations for calculation purposes using the NumPy and SciPy Python libraries.
Created multiple proof-of-concepts using PySpark and deployed them on the Yarn cluster, comparing Spark's performance to that of Hive and
SQL/Teradata. Created and deployed Lambda functions in AWS using pre-built AWS Lambda Libraries, as well as Lambda functions in Scala using custom libraries. Develop an effective pipeline that can gather data from both streaming web data and RDBMS sources. Worked on
Creating Data quality customized plans for data harmonization, cleansing, profiling using Analyst tool etc. Used Data Analysis Expression tool to develop complex calculated measures (DAX). Used Spark Streaming APIs to generate common transformations and actions on the fly. A learner data model that pulls data from Kafka in near-real time and persists it in Cassandra. Experience with MongoDB database fundamentals like locking, transactions, indexes, sharding, replication, and schema design. Trained and guided development colleagues on how to translate and write NOSQL queries in comparison to legacy RDBMS queries. Implemented Data Lake idea by working with various data streams such as JSON, CSV, XML, and DAT. DataStage was used as an ETL tool to extract data from several sources and load it into an ORACLE database. Used Hive queries and Pig scripts to analyse data in order to better understand client behavior. Applied Hive, Map Reduce, and
HDFS to perform transformations, cleaning, and filtering on imported data, then loaded the finalized data into HDFS. Assessed the SQL scripts and devised a PySpark-based solution. Collaborate with the Data Governance team to implement the rules and create a physical data model in the data lake's hive. Solid knowledge of performance tuning techniques for NoSQL, Kafka, Storm, and SQL technologies. Using S3, RDS,
Dynamo DB, Redshift, and Python, implemented the AWS cloud computing platform. Created tables using NoSQL databases like HBase to load vast quantities of semi-structured data from source systems. Experience with CI/CD tools such as Repos, Code Deploy, Code Pipeline, and GitHub to build DevOps culture. Used Erwin extensively for data modelling and ERWIN Dimensional Data Modelling. To fine-tune SQL queries, I used EXPLAIN PLAN and T

Confidential

Data Engineer

Responsibilities:

Identify the appropriate tables from the Data mart and define the universe links to create new universes in Business Objects based on user needs. Reports based on SQL queries were created using Business Objects. Executive dashboard reports provide the most recent financial data from the company, broken down by business unit and product.
Conducted data analysis and mapping, as well as database normalization, performance tuning, query optimization, data extraction, transfer, and loading ETL, and clean up. Developed reports, interactive drill charts, balanced scorecards, and dynamic Dashboards using Teradata
RDBMS analysis with Business Objects. Gathering requirements, status reporting, developing various KPIs and project deliverables are all responsibilities. In charge of maintaining a high - availability, high-performance, and scalability MongoDB environment. Created a NoSQL database in MongoDB using CRUD, Indexing, Replication, and Sharing. Assisting with the migration of the warehouse database from Oracle 9i to
Oracle 10g. Worked on assessing and implementing new Oracle 10g features in existing Oracle 9i applications, such as DBMS SHEDULER create directory, data pump, and CONNECT BY ROOT. Improved report performance by rewriting SQL statements and utilizing Oracle's new built-in functions. Used Erwin extensively for data modelling and ERWIN's Dimensional Data Modeling. Tuning SQL queries with EXPLAIN
PLAN and TKPROF. Created BO full client reports, Web intelligence reports in 6.5 and XI R2, and universes with context and loops in 6.5 and XI R2. Worked on Informatica, Oracle Database, PL/SQL, Python, and Shell Scripts as an ETL tool. Built HBase tables to load enormous amounts of structured, semi-structured, and unstructured data from UNIX, NoSQL, and several portfolios.

Environment: Quality center, Quick Test Professional 8.2, SQL Server, J2EE, UNIX, .Net, Python, NoSQL, MS Project, Oracle, Web Logic, Shell script, JavaScript, HTML, Microsoft Office Suite 2010, Excel

Confidential

ETL Developer / Data Modeler

Responsibilities:

Attended a requirement gathering session with business users and sponsors in order to better understand and document the business requirements.
Worked with large corporations to assess the financial impact of health - care initiatives. Based on requirements, created a logical and physical model in ERwin.
Created physical model and database objects in collaboration with DBA. Discovered the Primary Key and Foreign Key linkages between entities and topic areas. In importing and mapping the data, I worked closely with the ETL team. Meta data on record count, file size, and execution time were included in SSIS packages developed to automate ETL processes. To extract data from Oracle Database, I created an
ETL process using Pentaho PDI. Using Microsoft Access and Oracle SQL calculate and analyse claims data for provider incentive and supplemental benefit analyses. As part of Data Analysis, I created a source-to-target (S2T) mapping document.

Environment: CA Erwin, Oracle10g, MS Excel, SQL server, SSIS, Oracle.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship