We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

5.00/5 (Submit Your Rating)

Charlotte, NC

SUMMARY

  • Over 8+ years of experience as a Data Engineer, Data Analyst, Data Integrating, Big Data, Data Modeling Logical and Physical, and Implementation of Business Applications using the Oracle Relational Database Management System RDBMS.
  • Strong experience in analysis, design, development, testing, implementation of database application in Client/ Server application using Oracle 12c/11g/10g/9i/8i, SQL, SQL Loader and open Interface.
  • Experienced in database conversion from Oracle and SQL Server to PostgreSQL and MySQL.
  • Extensive knowledge in Client/Server Technology, GUI Design, Relational Database Management Systems RDBMS, and Rapid Application Development Methodology.
  • Extensively worked in PL/SQL for creating stored procedures, clusters, packages, database triggers, exception handlers, cursors, cursor variables.
  • In dept understanding of Monitoring/Auditing tools in AWS such as CloudWatch and Cloud Trail.
  • Expertise understanding of AWS DNS Services through Route53. Understanding of Simple, Weighted, Latency, Failover & Geolocational Route types.
  • Hands on experience in installing, configuring, monitoring, and using Hadoop ecosystem components like Hadoop Map - Reduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Horton works, Flume
  • Expert in Amazon EMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloudwatch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshift, RDS, Aethna, Zeppelin & Airflow.
  • Experience in handling, configuration and administration of databases like MySQL and NoSQL databases like MongoDB and Cassandra.
  • Good knowledge on AWS cloud formation templates and configured SQS service through java API to send and receive the information.
  • Experience in creating separate virtual data warehouses with difference size classes in AWS Snowflake
  • Worked on Data Virtualization using Teiid and Spark, RDF graph Data, Solr Search and Fuzzy Algorithm.
  • Strong knowledge of Massively Parallel Processing (MPP) databases data is partitioned across multiple servers or nodes with each server/node having memory/processors to process data locally.
  • Data modeling and database and development for OLTP, OLAP (Star Schema, Snowflake Schema, Data Warehouse, Data Marts, Multi-Dimensional Modeling and Cube design), Business Intelligence and data mining.
  • Extensively used SQL, Numpy, Pandas, Scikit-learn, Spark, Hive for Data Analysis and Model building.
  • Developed and maintained multiple Power BI dashboards/reports and content packs
  • Created POWER BI Visualizations and Dashboards as per the requirements
  • Hands on expertise with AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis)
  • Responsible for designing and building a DataLake using Hadoop and its ecosystem components.
  • Working experience in creating real time data streaming solutions using Apache Spark/Spark Streaming & Kafka and built Spark Data Frames using Python.
  • Used Amazon Lambda for developing API to manage servers and run the code in AWS.
  • Experience with ETL workflow Management tools like Apache Airflow and have significant experience in writing the python scripts to implement the workflow.
  • Experience in working with databases like MongoDB, MySQL and Cassandra.
  • Working knowledge of SQL Trace, TK-Prof, Explain Plan, and SQL Loader for performance tuning and database optimization.
  • Provide regional MySQL database migrations and hot standby servers via asynchronous replication including Amazon EC2 and RDS (with solutions tailored for managing RDS).
  • Extensive experience in Dynamic SQL, Records, Arrays and Exception handling, data sharing, Data Caching, Data Pipelining. Complex processing using nested Arrays and Collections.
  • Experience in integrating databases like MongoDB, MySQL with webpages like HTML, PHP and CSS to update, insert, delete and retrieve data with simple ad-hoc queries.
  • Developed heavy load Spark Batch processing on top of Hadoop for massive parallel computing.
  • Strong knowledge of Extraction Transformation and Loading ETL processes using UNIX shell scripting, SQL, PL/SQL and SQL Loader.
  • Developed Spark RDD and Spark DataFrame API for Distributed Data Processing.

TECHNICAL SKILLS

Big Data Ecosystem: MapReduce, HDFS, HBase, Spark, Kafka, Scala, Zookeeper, Hive, Pig, Sqoop Cassandra, Oozie, MongoDB, Flume.

Cloud Ecosystem: Amazon Web services (EC2, EMR and S3)

ETL Tools: SQL Server Integration Services, AWS Data pipeline, Informatica Power center 9.x/10.x, Talend

Languages: SQL, PL/SQL, TSQL, C, C .NET, VB, ASP.NET, HTML, XML, XSLT

Database: Oracle 12c, 11g, 10g, 9i, 8i, 7, SQL Server 2017/2016/2012/2008

Special Tools: SQL Plus, SQL Loader, Toad, SQL Developer, Enterprise manager, FTP, Winscp, Rational Clearcase and Clearquest, SQL Server Management Studio, MS Visual Studio 2008/2005, Team Foundation Server TFS

Data Migration Tools: SQL Loader, Export/Import, SSIS

Methodologies: Agile, RUP, Waterfall Model

Packages MS Office: Word, Excel, Project, Access, Visio, PowerPoint

Operating Systems: Linux, UNIX, AIX, Windows NT/2000/2003/XP/7

PROFESSIONAL EXPERIENCE

Sr. Data Engineer

Confidential, Charlotte, NC

Responsibilities:

  • Worked closely with Business Analysts to gather requirements and design a reliable and scalable data pipelines.
  • Worked with various complex queries, sub queries and joins to check the validity of loaded and imported data.
  • Worked with PowerShell and UNIX scripts for file transfer, emailing and other file related tasks.
  • Designed and implemented ETL pipelines between from various Relational data Bases to the Data Warehouse using Apache Airflow.
  • Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Worked on data transformation and retrieval from mainframes to oracle, using SQL loader and control files.
  • Created Tableau Visualizations by connecting to AWS Hadoop Elastic MapReduce.
  • Developed Custom ETL Solution, Batch processing and Real-Time data ingestion pipeline to move data in and out of Hadoop using Python and shell Script.
  • Build Integration between applications primarily Salesforce.
  • Developed PySpark and SparkSQL code to process the data in Apache Spark on Amazon EMR to perform the necessary transformations based on the STMs developed.
  • Developed data integration strategies for data flow between disparate source systems and Big Data enabled Enterprise Data Lake.
  • Built a serverless ETL in AWS lambda to process the files that are new in the S3 bucket to be cataloged immediately.
  • Worked on AWS SQS to consume the data from S3 buckets.
  • Use Spark (Python) and Teiid for Data Virtualization i.e. extract data from different Relational, No-SQL databases, Hive and file system to bring into one logical layer.
  • Heavily involved in testing Snowflake to understand best possible way to use the cloud resources.
  • Work with relational SQL and NoSQL databases, including Postgresql and Hadoop.
  • Overcame challenges like data migration from MySQL to MongoDB.
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
  • Worked and learned a great deal from AmazonWebServices (AWS) Cloud services like EC2, S3, and EMR.
  • Worked on data cleaning and reshaping, generated segmented subsets using Numpy and Pandas in Python
  • Configured Jenkins with GIT and for appropriate release builds and scheduled automated builds.
  • Developed and deployed to production multiple projects in the CI/CD pipeline for real-time data distribution, storage and analytics. Persistence to S3, HDFS, Postgres,
  • Realtime data from the source were ingested as file streams to SPARK streaming platform and data was saved in HDFS and HIVE.
  • Used RESTful web services for salesforce integration and to retrieve contacts from Oracle database.
  • Configured Cloud Watch, Lambda, SQS, and SNS to send alert notifications.
  • Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS.
  • Designed data flow to pull the data using Rest API from a third-party Vendor using OAUTH authentication.
  • Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Designing and implementing a fully operational production grade large scale data solution on Snowflake Data Warehouse
  • Worked on Amazon EMR processes data across a Hadoop Cluster of viral servers on Amazon Elastic Computing Cloud (EC2).
  • Used salesforce Excel connector with MS Excel for data loading and Migration.
  • Designed and implemented Map Reduce for distributed and parallel programming.
  • Experience in Server infrastructure development on Gateway, ELB, Auto Scaling, Dynamo DB, Elastic search, Virtual Private Cloud (VPC).
  • Created architecture stack blueprint for data access with NoSQL Database Cassandra.
  • Deployed the Big Data Hadoop application using Talendon cloud AWS (Amazon Web Sevices)
  • Worked on Snowflake environment to remove redundancy and load real time data from various data sources into HDFS using Kafka
  • Oracle/Aurora Query Optimization.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in Redshift.

Environment: Big Data Tools, Hadoop, Hive, HBase, Spark, Oozie, Kafka, My SQL, Jenkins, API, Snowflake, Powershell, Git hub, AWS, Oracle Database 12c/11g, Datastage, SQL Server 2017/2016/ 2012/ 2008 , RDBMS, PostgreSQL, PowerBI, MongoDB, ETL, Data Pipelining, NoSQL, SDLC, CI/CD, SQS, Python, Waterfall, Agile methodologies.

Data Engineer

Confidential, Indianapolis, IN

Responsibilities:

  • Involved in Identifying requirements and developing models according to the customer's specifications and drafting detailed documentation.
  • Implementing new drill to detail dimensions into the data pipeline upon on the business requirements.
  • Worked on ETL pipeline to source these tables and to deliver this calculated ratio data from AWS to Datamart (SQL Server) & Credit Edge server
  • Experience in managing large-scale, geographically-distributed database systems, including relational (Oracle, SQL server) and NoSQL (MongoDB, Cassandra) systems.
  • Designed and implemented by configuring Topics in new Kafka cluster in all environment.
  • Creation of best practices and standards for data pipelining and integration with Snowflake data warehouses.
  • Worked on both External and Managed HIVE tables for optimized performance.
  • Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka.
  • Developed and designed system to collect data from multiple portal using Kafka and then process it using spark.
  • Loaded data using ETL tools like SQL loader and external tables to load data from data warehouse and various other database like SQL Server and DB2.
  • Implemented Composite server for the data virtualization needs and created multiples views for restricted data access using a REST API.
  • Configured Visual Studio to work with AWS enabling a suitable environment for writing code.
  • Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python and build models using Predictive Analytics.
  • Developed Python scripts to automate the ETL process using Apache Airflow and CRON scripts in the UNIX operating system as well.
  • Created an automated event driven notification service utilizing SNS, SQS, Lambda, and Cloud Watch.
  • Managed Amazon Web Services like EC2, S3 bucket, ELB, Auto-Scaling, Dynamo DB, Elastic search.
  • Implemented Data Lake to consolidate data from multiple source databases such as Exadata, Teradata using Hadoop stack technologies SQOOP, and HIVE/HQL.
  • Used AWS SQS to send the processed data further to the next working teams for further processing.
  • Hands on with Redshift Database (ETL data pipe lines from AWS Aurora - MySQL Engine to Redshift)
  • Design and implement secure data pipelines into a Snowflake data warehouse from on-premise and cloud data sources
  • Migrated SQl server database into MySQL using Data Transformation Services.
  • Worked with various complex queries with joins, sub-queries, and nested queries in SQL queries.
  • Worked with SQL loader and control files to load the data in different schema tables.
  • Developed PySpark and SparkSQL code to process the data in Apache Spark on Amazon EMR to perform the necessary transformations based on the STMs developed
  • Used S3 Bucket to store the jar's, input datasets and used Dynamo DB to store the processed output from the input data set.
  • Used AWS data pipeline for Data Extraction, Transformation and Loading from homogeneous or heterogeneous data sources and built various graphs for business decision-making using Python matplot library.
  • Responsible for Master Data Management (MDM) and Data Lake design and architecture. Data Lake is built using Cloudera Hadoop.
  • Integrated Apache Kafka for data ingestion
  • Hive tables were created on HDFS to store the data processed by Apache Spark on the Cloudera Hadoop Cluster in Parquet format.
  • Used AWS services like EC2 and S3 for small data sets.
  • Involved in Data Integration by identifying the information needs within and across functional areas of an enterprise database upgrade and scripting/data Migration with SQL server Export Utility
  • Involved in various NOSQL databases like Hbase, Cassandra in implementing and integration.
  • Good experience in writing Spark applications using Python and Scala.
  • Experience in cloud versioning technologies like Github.
  • Designed and developed automation test scripts using Python.

Environment: Kafka, Spark, Hive, Scala, HBase, Snowflake, Pig, AWS, CI/CD, API, Datastage, SQS, Git, Oracle Database 11g, PowerBI, Oracle Http Server 11g, PostgreSQL, Windows 2007 Enterprise, RDBMS, Data Pipelining, NoSQL, MongoDB, Dynamo DB, Python, ETL, SDLC, Waterfall, Agile methodologies, SOX Compliance.

Data Engineer

Confidential, Fort Worth, TX

Responsibilities:

  • Build the new universes in Business Objects as per the user requirements by identifying the required tables from Data mart and by defining the universe connections.
  • Design, Develop and Document the new architecture and development process to convert existing ETL pipeline in to Hadoop based systems.
  • Configuring high availability using geographical MongoDB replica sets across multiple data centers.
  • Developed scripts for PostgreSQL, EDB Postgres Advanced Server databases for monitoring and tuning procedures.
  • Used Git, GitHub, and Amazon EC2 and deployment using Heroku and Used extracted data for analysis and carried out various mathematical operations for calculation purpose using python library - NumPy, SciPy.
  • Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Developed and deployed various Lambda functions in AWS with in-built AWS Lambda Libraries and also deployed Lambda Functions in Scala with custom Libraries.
  • Design the Data Pipeline which can capture data from streaming web data as well as RDBMS source data.
  • Developing and running Map-Reduce jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need.
  • Performed the data analysis and mapping database normalization, performance tuning, query optimization data extraction, transfer, and loading ETL and clean up.
  • Developed PL/SQL Procedures, Functions and Packages and used SQL loader to load data into the database.
  • Developed complex calculated measures using Data Analysis Expression language (DAX).
  • Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra.
  • Worked on MongoDB database concepts such as locking, transactions, indexes, sharding, replication and schema design.
  • Trained and mentored development colleagues in translating and writing NOSQL queries vs legacy RDBMS.
  • Worked with different feeds data like JSON, CSV, XML, DAT and implemented Data Lake concept.
  • Used DataStage as an ETL tool to extract data from sources systems, loaded the data into the ORACLE database.
  • Analyzed the data by performing Hive queries and running Pig scripts to know customer behaviour.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Analyzed the SQL scripts and designed the solution to implement using PySpark
  • Work with Data Governance team and implement the rules and build physical data model on hive in the data lake.
  • Good understanding of performance tuning with NoSQL, Kafka, Storm and SQL Technologies.
  • Implemented AWS cloud computing platform using S3, RDS, Dynamo DB, Redshift, and Python.
  • Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
  • Expertise in implementing DevOps culture through CI/CD tools like Repos, Code Deploy, Code Pipeline, GitHub.
  • Extensively used Erwin for data modeling and Dimensional Data Modeling by ERWIN.
  • Used EXPLAIN PLAN, TKPROF to tune SQL queries.
  • Developed Shell and Python scripts to automate and provide Control flow to Pig scripts. Imported data from Linux file system to HDFS
  • Expertise in designing Python scripts to interact with middleware/back end services.
  • Provided guidance on migrating from PostgreSQL to MySQL.

Environment: SPARK, Hive, Pig, Oozie, Flume, Kafka, HBase, AWS, SQL Server, PostgreSQL, J2EE, UNIX, MS Project, Oracle, Web Logic, JavaScript, RDBMS, Git, HTML, NoSQL, Microsoft Office Suite 2010, Excel, Oracle Database 11g, Python, Windows 2007 Enterprise, TOAD, ETL, SDLC, Waterfall, Agile methodologies.

Data Engineer

Confidential

Responsibilities:

  • Build the new universes in Business Objects as per the user requirements by identifying the required tables from Data mart and by defining the universe connections.
  • Used Business Objects to create reports based on SQL-queries. Generated executive dashboard reports with latest company financial data by business unit and by product.
  • Performed the data analysis and mapping database normalization, performance tuning, query optimization data extraction, transfer, and loading ETL and clean up.
  • Implemented Teradata RDBMS analysis with Business Objects to develop reports, interactive drill charts, balanced scorecards and dynamic Dashboards.
  • Responsible for requirements gathering, status reporting, creating various metrics, projects deliverables.
  • Responsible for managing MongoDB environment with high availability, performance and scalability perspectives.
  • Developed NoSQL database by using CRUD, Indexing, Replication and Sharing in MongoDB.
  • Involved in migrating warehouse database from Oracle 9i to 10g database.
  • Involved in analyzing and adding new features of Oracle 10g like DBMS SHEDULER create directory, data pump, CONNECT BY ROOT in existing Oracle 9i application.
  • Tuned Report performance by exploiting the Oracle's new built-in functions and rewriting SQL statements.
  • Extensively used Erwin for data modeling and Dimensional Data Modeling by ERWIN.
  • Used EXPLAIN PLAN, TKPROF to tune SQL queries.
  • Developed BO full client Reports, Web intelligence report in 6.5 and XI R2 and universes with context and loops.
  • Worked on ETL tool Informatica, Oracle Database and PL/SQL, Python and Shell Scripts.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.

Environment: Quality center, Quick Test Professional 8.2, SQL Server, J2EE, UNIX, .Net, Python, NoSQL, MS Project, Oracle, Web Logic, Shell script, JavaScript, HTML, Microsoft Office Suite 2010, Excel

Data Analyst

Confidential

Responsibilities:

  • Worked with Demand Gen and interactive marketing teams to build descriptive and actionable insights
  • Experience in creating various views in Tableau (Tree maps, Heat Maps, Scatter plot).
  • Create action filters, parameters, calculated fields, set ad table calculations for preparing dashboards and worksheets in Tableau.
  • Worked on multiple tableau visualization charts like Area Chart, Line Chart, Heat and Tree maps, Bar Chart, Stacked Bar Charts, water fall Chart and many more custom charts.
  • Designed data-driven B2B demand gen solutions to improve ROI on media spend
  • Hands on experience to extract, manipulate and built complex formulas in Tableau for various business calculations.
  • Gathered business, system, and functional requirements by conducting detailed interviews with business users, stakeholders, and Subject Matter Experts (SME's). Defined the scope of the project, financial projections and Cost/benefit analysis.
  • Responsible for revelation, engagement and churn analysis
  • Implemented engagement score driven content personalization & optimization
  • Generated complete digital insights reporting using online and offline data sources
  • Responsible to increase user engagement, conversion and enable data driven product development

We'd love your feedback!