We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

5.00/5 (Submit Your Rating)

New York, NY


  • Over 8+ years of experience as a Data Engineer, Data Analyst, Data Integrating, Big Data, Data Modelling - Logical and Physical, and Implementation of Business Applications using the Oracle Relational Database Management System RDBMS.
  • Strong experience working with Oracle 12c/11g/10g/9i/8i, SQL, SQL Loader and open Interface to analyse, design, develop, test, implement database applications using Client/ Server application.
  • Knowledge in database conversion from Oracle and SQL Server to PostgreSQL and MySQL.
  • Worked on projects dat involved Client/Server Technology, customer implementation involving GUI Design, Relational Database Management Systems RDBMS, and Rapid Application Development Methodology.
  • Practical noledge in PL/SQL for creating stored procedures, clusters, packages, database triggers, exception handlers, cursors, cursor variables.
  • Understanding and analysis of Monitoring/Auditing tools to gain in dept noledge in AWS such as CloudWatch and Cloud Trail.
  • Extensive experience in using Talend features such as context variables, triggers, connectors for Database and flat files.
  • In-depth familiarity of AWS DNS Services via Route53. Understanding the several sorts of routes: simple, weighted, latency, failover, and geolocational.
  • Effective Algorithm expertise with Hadoop ecosystem components such as Hadoop Map-Reduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Hortonworks, and Flume, as well as installing, configuring, monitoring, and using them.
  • Amazon EMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloudwatch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshift, RDS, Aethna, Zeppelin, and Airflow professionals.
  • Handling, organizing, and operating databases such as MySQL and NoSQL databases such as MongoDB and Cassandra.
  • Sound noledge of AWS cloud building templates and how to transmit information using the SQS service via java API.
  • AWS Snowflake experience generating separate virtual data warehouses with differently sized classes
  • Worked on Teiid and Spark Data Virtualization, RDF graph Data, Solr Search, and Fuzzy Algorithm.
  • Thorough understanding of MPP databases, wherein data is partitioned across multiple servers or nodes, with each server/node having memory and processors to interpret data locally.
  • Data simulation, database development, and OLTP (Star Schema, Snowflake Schema, Data Warehouse, Data Marts, Multi-Dimensional Modelling, and Cube design) for OLTP, OLAP (Star Schema, Snowflake Schema, Data Warehouse, Data Marts, Multi-Dimensional Modelling, and Cube design), Business Intelligence, and data mining.
  • For data analysis and pattern classification, I used SQL, Numpy, Pandas, Scikit-learn, Spark, and Hive extensively.
  • Established and sustained a number of Existing BI dashboards, reports, and content packs.
  • Customized POWER BI Visualizations and Dashboards in line with the client's needs
  • Working expertise of Amazon Web Services databases such as RDS (Aurora), Redshift, DynamoDB, and Elastic Cache (Memcached & Redis)
  • Productizing and constructing a DataLake employing Hadoop and its ecosystem components.
  • Long Working hours on real-time data streaming solutions using Apache Spark/Spark Streaming and Kafka, as well as developing Spark Data Frames in Python.
  • Shifting Cultivation an API to manage servers and run code in AWS using Amazon Lambda.
  • Have rich experience programming python scripts to implement the workflow and have experience with ETL workflow management technologies such as Apache Airflow.
  • Adequate noledge of databases such as MongoDB, MySQL, and Cassandra.
  • For performance tuning and database optimisation, a working grasp of SQL Trace, TK-Prof, Explain Plan, and SQL Loader is needed.
  • Provide asynchronous replication, including Amazon EC2 and RDS, for regional MySQL database deployments and fault tolerant servers (with solutions tailored for managing RDS).
  • Extensive noledge of Dynamic SQL, Records, Arrays, and Exception Handling, as well as data sharing, data caching, and data pipelines. Nested Arrays and Collections are being used to do complex processing.
  • Hands on experience of integrating databases such as MongoDB and MySQL with webpages such as HTML, PHP, and CSS to update, insert, delete, and retrieve data using simple ad-hoc queries.
  • For huge parallelism, developed heavy workload Spark Batch processing on top of Hadoop.
  • Expertise in Extraction, Transformation, and Loading (ETL) processes, including UNIX shell scripting, SQL, PL/SQL, and SQL Loader
  • For Distributed Processing, I created both Spark RDD and Spark Data Frame APIs.


Hadoop Technologies: HDFS, MapReduce, YARN, Hive, Pig, HBASE, Impala, Zookeeper, Sqoop, OOZIE, Apache Cassandra, Flume, Spark, AWS, EC2


Web Technologies: HTML, CSS, JavaScript, AJAX, Servlets, JSP, DOM, XML, XSLT.

Languages: C, Java, SQL, PL/SQL, Scala, Shell Scripts

Operating Systems: Linux, UNIX, Windows

Databases: NoSQL, Oracle, DB2, MySQL, SQL Server, MS Access, HBase

Application Servers: WebLogic, WebSphere, Apache Tomcat, JBOSS

IDE s: Eclipse, NetBeans JDeveloper, IntelliJ IDEA.

Version Control: CVS, SVN, Git

Reporting Tools: Jaspersoft, Qlik Sense, Tableau, JUnit


Sr. Data Engineer

Confidential, New York, NY


  • Involved in Identifying requirements and developing models according to the customer's specifications and drafting detailed documentation.
  • Implementing new drill to detail dimensions into the data pipeline upon on the business requirements.
  • Worked on ETL pipeline to source these tables and to deliver dis calculated ratio data from AWS to Datamart (SQL Server) & Credit Edge server
  • Experience in managing large-scale, geographically-distributed database systems, including relational (Oracle, SQL server) and NoSQL (MongoDB, Cassandra) systems.
  • Designed and implemented by configuring Topics in new Kafka cluster in all environment.
  • Creation of best practices and standards for data pipelining and integration with Snowflake data warehouses.
  • Worked on both External and Managed HIVE tables for optimized performance.
  • Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka.
  • Developed and designed system to collect data from multiple portal using Kafka and tan process it using spark.
  • Loaded data using ETL tools like SQL loader and external tables to load data from data warehouse and various other database like SQL Server and DB2.
  • Part of data loading into data warehouse using big data Hadoop Talend ETL components, AWS S3 Buckets and AWS Services for redshift database
  • Implemented Composite server for the data virtualization needs and created multiples views for restricted data access using a REST API.
  • Configured Visual Studio to work with AWS enabling a suitable environment for writing code.
  • Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python and build models using Predictive Analytics.
  • Developed Python scripts to automate the ETL process using Apache Airflow and CRON scripts in the UNIX operating system as well.
  • Created an automated event driven notification service utilizing SNS, SQS, Lambda, and Cloud Watch.
  • Managed Amazon Web Services like EC2, S3 bucket, ELB, Auto-Scaling, Dynamo DB, Elastic search.
  • Implemented Data Lake to consolidate data from multiple source databases such as Exadata, Teradata using Hadoop stack technologies SQOOP, and HIVE/HQL.
  • Used AWS SQS to send the processed data further to the next working teams for further processing.
  • Hands on with Redshift Database (ETL data pipe lines from AWS Aurora - MySQL Engine to Redshift)
  • Design and implement secure data pipelines into a Snowflake data warehouse from on-premise and cloud data sources
  • Migrated SQl server database into MySQL using Data Transformation Services.
  • Worked with various complex queries with joins, sub-queries, and nested queries in SQL queries.
  • Worked with SQL loader and control files to load the data in different schema tables.
  • Developed PySpark and SparkSQL code to process the data in Apache Spark on Amazon EMR to perform the necessary transformations based on the STMs developed
  • Used S3 Bucket to store the jar's, input datasets and used Dynamo DB to store the processed output from the input data set.
  • Used AWS data pipeline for Data Extraction, Transformation and Loading from homogeneous or heterogeneous data sources and built various graphs for business decision-making using Python matplot library.
  • Responsible for Master Data Management (MDM) and Data Lake design and architecture. Data Lake is built using Cloudera Hadoop.
  • Integrated Apache Kafka for data ingestion
  • Hive tables were created on HDFS to store the data processed by Apache Spark on the Cloudera Hadoop Cluster in Parquet format.
  • Used AWS services like EC2 and S3 for small data sets.
  • Involved in Data Integration by identifying the information needs within and across functional areas of an enterprise database upgrade and scripting/data Migration with SQL server Export Utility
  • Involved in various NOSQL databases like Hbase, Cassandra in implementing and integration.
  • Good experience in writing Spark applications using Python and Scala.
  • Experience in cloud versioning technologies like Github.
  • Designed and developed automation test scripts using Python.

Environment: Big Data Tools, Hadoop, Hive, HBase, Spark, Oozie, Kafka, My SQL, Jenkins, API, Snowflake, Powershell, Git hub, AWS, Oracle Database 12c/11g, Datastage, SQL Server 2017/2016/ 2012/ 2008 , RDBMS, PostgreSQL, PowerBI, MongoDB, ETL, Data Pipelining, NoSQL, SDLC, CI/CD, SQS, Python, Waterfall, Agile methodologies.

Data Engineer

Confidential, Santa Clara, CA


  • Committed to identifying requirements, developing models based on client specifications, and drafting full evidence.
  • Relying on the business requirements, adding new drill-down dimensions to the data flow.
  • Developed an ETL pipeline to source these datasets and transmit calculated ratio data from AWS to Datamart (SQL Server) and Credit Edge.
  • Team leader with large-scale, widely distributed database systems, including relational (Oracle, SQL server) and NoSQL (MongoDB, Cassandra) databases.
  • Designed and implemented in all scenarios through configuring Topics in a new Kafka cluster.
  • Developing And maintaining best practises and standards for data pipelining and Snowflake data warehouse integration.
  • Streamlined the speed of both External and Managed HIVE tables.
  • Worked mostly on requirements and technical phase of the Streaming Lambda Architecture, dat uses Spark and Kafka to provide real-time streaming.
  • Created and developed a system dat uses Kafka to collect data from multiple portals and tan processes it using Spark.
  • We must design jobs using Bigdata Talend and Pick files from AWS S3 Buckets and Load into AWS Redshift database.
  • Loaded information from the data warehouse and other systems such as SQL Server and DB2 using ETL tools such as SQL loader and external tables.
  • Using a REST API, implemented Composite server for data isolation and generated multiple views for restricted data access.
  • Arranged Visual Studio to integrate with AWS, making it easier to write code in a pleasant setting.
  • Employed Python's pandas and numpy libraries to clean data, scale features, and engineer features, as well as Predictive Analytics to create models.
  • Applied Apache Airflow and CRON scripts in the UNIX operating system to develop Python scripts to automate the ETL process.
  • Managed Amazon Web Services like EC2, S3, ELB, Auto-Scaling, Dynamo DB, and Elastic Search
  • Using Hadoop stack technologies SQOOP and HIVE/HQL, implemented Data Lake to consolidate data from multiple source databases such as Exadata and Teradata.
  • AWS SQS was used to transmit the processed data to the next working teams for processing.
  • •Design and instal secure data pipelines into a Snowflake data warehouse from on-premise and cloud data sources using the Redshift Database (ETL data pipe lines from AWS Aurora - MySQL Engine to Redshift).
  • Used Data Transformation Services to convert a SQL server database to MySQL.
  • Developed complicated SQL queries dat included joins, sub-inquiries, and nested queries.
  • Loaded data into different schema tables using SQL loader and control files.
  • Created PySpark and SparkSQL code to process data in Apache Spark on Amazon EMR and conduct the required transformations depending on the STMs developed.
  • The jars and input datasets were stored in S3 Bucket, and the processed output from the input data set was stored in Dynamo DB.
  • Used the AWS data pipeline for data extraction, transformation, and loading from homogeneous and heterogeneous data sources, as well as the Python matplot toolkit to try unique graphs for corporate decision-making.
  • Participate in the design and architecture of Master Data Management (MDM) and Data Lakes. Cloudera Hadoop is used to create Data Lake.
  • Data intake is handled via Apache Kafka.
  • On HDFS, Hive tables were built to store the Parquet-formatted data processed by Apache Spark on the Cloudera Hadoop Cluster.
  • For modest data sets, AWS services like EC2 and S3 were used.
  • Participated in Data Integration by defining information needs within and across functional domains of an enterprise database update and scripting/data migration using SQL Server Export Utility.
  • Implemented and integrated several NOSQL databases such as Hbase and Cassandra.
  • Clear noledge of Python and Scala for creating Spark applications.
  • Knowledge of cloud versioning systems such as Github.
  • Using Python, formulated and constructed automation test scripts.

Environment: Kafka, Spark, Hive, Scala, Hbase, Snowflake, Pig, AWS, CI/CD, API, Datastage, SQS, Git, Oracle Database 11g, PowerBI, Oracle Http Server 11g, Postgresql, Windows 2007 Enterprise, RDBMS, Data Pipelining, NoSQL, MongoDB, Dynamo DB, Python, ETL, SDLC, Waterfall, Agile methodologies, SOX Compliance.

Data Engineer

Confidential, Texas


  • Define the relevant tables from the Data mart and establish the universe links to create new universes in Business Objects according the user needs.
  • Convert existing ETL pipelines to Hadoop-based systems by designing, developing, and describing the new architecture and development process.
  • Using geographical MongoDB replica sets across several data centres to provide high availability.
  • Developed monitoring and tuning scripts for PostgreSQL and EDB Postgres Advanced Server databases.
  • Used Git, GitHub, Amazon EC2, and Heroku for deployment, and used extracted data for analysis and carried out various mathematical operations for calculation purposes using the NumPy and SciPy Python libraries.
  • Created multiple proof-of-concepts using PySpark and deployed them on the Yarn cluster, comparing Spark's performance to dat of Hive and SQL/Teradata.
  • Created and deployed Lambda functions in AWS using pre-built AWS Lambda Libraries, as well as Lambda functions in Scala using custom libraries.
  • Develop an effective pipeline dat can gather data from both streaming web data and RDBMS sources.
  • Creating and running Map-Reduce tasks on YARN and Hadoop clusters to provide daily and monthly reports in accordance with customer requirements.
  • Conducted data analysis and mapping, as well as database normalisation, performance tuning, query optimization, data extraction, transfer, and loading ETL, and clean up.
  • Developed PL/SQL Procedures, Functions, and Packages, and loaded data into the database using SQL Loader.
  • Used Data Analysis Expression tool to develop complex calculated measures (DAX).
  • Used Spark Streaming APIs to generate common transformations and actions on the fly.
  • A learner data model dat pulls data from Kafka in near-real time and persists it in Cassandra.
  • Experience with MongoDB database fundamentals like locking, transactions, indexes, sharding, replication, and schema design.
  • Trained and guided development colleagues on how to translate and write NOSQL queries in comparison to legacy RDBMS queries.
  • Implemented Data Lake idea by working with various data streams such as JSON, CSV, XML, and DAT.
  • DataStage was used as an ETL tool to extract data from several sources and load it into an ORACLE database.
  • Used Hive queries and Pig scripts to analyse data in order to better understand client behaviour.
  • Applied Hive, Map Reduce, and HDFS to perform transformations, cleaning, and filtering on imported data, tan loaded the finalized data into HDFS.
  • Assessed the SQL scripts and devised a PySpark-based solution.
  • Collaborate with the Data Governance team to implement the rules and create a physical data model in the data lake's hive.
  • Solid noledge of performance tuning techniques for NoSQL, Kafka, Storm, and SQL technologies.
  • Using S3, RDS, Dynamo DB, Redshift, and Python, implemented the AWS cloud computing platform.
  • Created tables using NoSQL databases like HBase to load vast quantities of semi-structured data from source systems.
  • Experience with CI/CD tools such as Repos, Code Deploy, Code Pipeline, and GitHub to build DevOps culture.
  • Used Erwin extensively for data modelling and ERWIN Dimensional Data Modelling.
  • To fine-tune SQL queries, I used EXPLAIN PLAN and TKPROF.
  • Forming Shell and Python scripts to automate Pig programmes and give control flow. HDFS data was imported from the Linux file system.
  • Experience with writing Python programmes dat communicate with middleware/back end services.
  • Assisting with the migration from PostgreSQL to MySQL.

Environment: SPARK, Hive, Pig, Oozie, Flume, Kafka, HBase, AWS, SQL Server, PostgreSQL, J2EE, UNIX, MS Project, Oracle, Web Logic, JavaScript, RDBMS, Git, HTML, NoSQL, Microsoft Office Suite 2010, Excel, Oracle Database 11g, Python, Windows 2007 Enterprise, TOAD, ETL, SDLC, Waterfall, Agile methodologies.

Data Engineer



  • Identify the appropriate tables from the Data mart and define the universe links to create new universes in Business Objects based on user needs.
  • Reports based on SQL queries were created using Business Objects. Executive dashboard reports provide the most recent financial data from the company, broken down by business unit and product.
  • Conducted data analysis and mapping, as well as database normalisation, performance tuning, query optimization, data extraction, transfer, and loading ETL, and clean up.
  • Developed reports, interactive drill charts, balanced scorecards, and dynamic Dashboards using Teradata RDBMS analysis with Business Objects.
  • Gathering requirements, status reporting, developing various KPIs, and project deliverables are all responsibilities.
  • In charge of maintaining a high-availability, high-performance, and scalability MongoDB environment.
  • Created a NoSQL database in MongoDB using CRUD, Indexing, Replication, and Sharing.
  • Assisting with the migration of the warehouse database from Oracle 9i to Oracle 10g.
  • Worked on assessing and implementing new Oracle 10g features in existing Oracle 9i applications, such as DBMS SHEDULER create directory, data pump, and CONNECT BY ROOT.
  • Improved report performance by rewriting SQL statements and utilising Oracle's new built-in functions.
  • Used Erwin extensively for data modelling and ERWIN's Dimensional Data Modeling.
  • Tuning SQL queries with EXPLAIN PLAN and TKPROF.
  • Created BO full client reports, Web intelligence reports in 6.5 and XI R2, and universes with context and loops in 6.5 and XI R2.
  • Worked on Informatica, Oracle Database, PL/SQL, Python, and Shell Scripts as an ETL tool.
  • Built HBase tables to load enormous amounts of structured, semi-structured, and unstructured data from UNIX, NoSQL, and several portfolios.

Environment: Quality center, Quick Test Professional 8.2, SQL Server, J2EE, UNIX, .Net, Python, NoSQL, MS Project, Oracle, Web Logic, Shell script, JavaScript, HTML, Microsoft Office Suite 2010, Excel

ETL Developer / Data Modeler



  • Attended a requirement gathering session with business users and sponsors in order to better understand and document the business requirements.
  • Worked with large corporations to assess the financial impact of health-care initiatives. Based on requirements, created a logical and physical model in ERwin.
  • Created physical model and database objects in collaboration with DBA.
  • Discovered the Primary Key and Foreign Key linkages between entities and topic areas. In importing and mapping the data, I worked closely with the ETL team.
  • Meta data on record count, file size, and execution time were included in SSIS packages developed to automate ETL processes. To extract data from Oracle Database, I created an ETL process using Pentaho PDI.
  • Using Microsoft Access and Oracle SQL, calculate and analyse claims data for provider incentive and supplemental benefit analyses. As part of Data Analysis, I created a source-to-target (S2T) mapping document.

Environment: CA Erwin, Oracle10g, MS Excel, SQL server, SSIS, Oracle.

We'd love your feedback!