Senior Data Engineer Resume
Evansville, IN
SUMMARY
- Over 8+ years of experience as a Data Engineer, Data Analyst, Data Integrating, Big Data, Data Modelling - Logical and Physical, and Implementation of Business Applications using the Oracle Relational Database Management System RDBMS.
- Strong experience working with Oracle 12c/11g/10g/9i/8i, SQL, SQL Loader and open Interface to analyse, design, develop, test, and implement database applications using Client / Server application.
- Knowledge in database conversion from Oracle and SQL Server to PostgreSQL and MySQL.
- Worked on projects that involved Client/Server Technology, customer implementation involving GUI Design, Relational Database Management Systems RDBMS, and Rapid Application Development Methodology.
- Practical knowledge in PL/SQL for creating stored procedures, clusters, packages, database triggers, exception handlers, cursors, cursor variables.
- Understanding and analysis of Monitoring/Auditing tools to gain in dept knowledge in AWS such as CloudWatch and Cloud Trail.
- In-depth familiarity of AWS DNS Services via Route53. Understanding the several sorts of routes: simple, weighted, latency, failover, and geolocational.
- Effective Algorithm expertise with Hadoop ecosystem components such as Hadoop Map-Reduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Hortonworks, and Flume, as well as installing, configuring, monitoring, and using them.
- Amazon EMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloud watch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshift, RDS, Aetna, Zeppelin, and Airflow professionals.
- Handling, organizing, and operating databases such as MySQL and NoSQL databases such as MongoDB and Cassandra.
- Sound knowledge of AWS and Azure cloud building templates and how to transmit information using the SQS service via java API.
- AWS Snowflake experience generating separate virtual data warehouses with differently sized classes
- Worked on Teiid and Spark Data Virtualization, RDF graph Data, Solr Search, and Fuzzy Algorithm.
- Thorough understanding of MPP databases, wherein data is partitioned across multiple servers or nodes, with each server/node having memory and processors to interpret data locally.
- Data simulation, database development, and OLTP (Star Schema, Snowflake Schema, Data Warehouse, Data Marts, Multi-Dimensional Modelling, and Cube design) for OLTP, OLAP (Star Schema, Snowflake Schema, Data Warehouse, Data Marts, Multi-Dimensional Modelling, and Cube design), Business Intelligence, and data mining.
- For data analysis and pattern classification, I used SQL, Numpy, Pandas, Scikit-learn, Spark, and Hive extensively.
- Established and sustained a number of Existing BI dashboards, reports, and content packs.
- Customized POWER BI Visualizations and Dashboards in line with the client's needs
- Working expertise of Amazon Web Services databases such as RDS (Aurora), Redshift, DynamoDB, and Elastic Cache (Memcached & Redis)
- Productizing and constructing a Data Lake employing Hadoop and its ecosystem components.
- Long Working hours on real-time data streaming solutions using Apache Spark/Spark Streaming and Kafka, as well as developing Spark Data Frames in Python.
- Shifting Cultivation an API to manage servers and run code in AWS using Amazon Lambda.
- Have rich experience programming python scripts to implement the workflow and have experience with ETL workflow management technologies such as Apache Airflow.
- Adequate knowledge of databases such as MongoDB, MySQL, and Cassandra.
- For performance tuning and database optimization, a working grasp of SQL Trace, TK-Prof, Explain Plan, and SQL Loader is needed.
- Experience in working with data warehouses and data marts using Informatica Power center (Designer, Repository Manager, Workflow Manager, and Workflow Monitor).
- Understanding & Working knowledge of Informatica CDC (Change Data Capture).
- Provide asynchronous replication, including Amazon EC2 and RDS, for regional MySQL database deployments and fault tolerant servers (with solutions tailored for managing RDS).
- Extensive knowledge of Dynamic SQL, Records, Arrays, and Exception Handling, as well as data sharing, data caching, and data pipelines. Nested Arrays and Collections are being used to do complex processing.
- Hands on experience of integrating databases such as MongoDB and MySQL with webpages such as HTML, PHP, and CSS to update, insert, delete, and retrieve data using simple ad-hoc queries.
- For huge parallelism, developed heavy workload Spark Batch processing on top of Hadoop.
- Expertise in Extraction, Transformation, and Loading (ETL) processes, including UNIX shell scripting, SQL, PL/SQL, and SQL Loader
- For Distributed Processing, I created both Spark RDD and Spark Data Frame APIs.
TECHNICAL SKILLS
Big Data: Cloudera Distribution, HDFS, Yarn, Data Node, Name Node, Resource Manager, Node Manager, Mapreduce, PIG, SQOOP, Hbase, Hive, Flume, Cassandra, Spark, Storm, Scala, Impala
Operating System: UNIX AIX 5.3, OS/390 z/OS 1.6, Windows 95/98/NT/ME/00/XP, UNIX, MS-DOS, Sun Solaris 5.8, Linux 8x
Languages: Visual Basic 6.0/5.0, SQL, PL/SQL, and Transact-SQL, python
Databases: Snowflake(cloud), Teradata, IBM DB2, Oracle, SQL Server, MySQL, NoSQL
Web Technologies: HTML, XML
Version Tools: GIT, CVS
Packages: SQL* PLUS, Toad 7.x, SQL Loader, Erwin 7.0
Tools: TOAD, SQL Developer, ANT, Log4J
Web Services: WSDL, SOAP.
ETL/Reporting: Ab Initio GDE 3.0, CO>OP 2.15,3.0.3, Infromatica, Tableau
Web/App Server: UNIX server, Apache Tomcat
PROFESSIONAL EXPERIENCE
Senior Data Engineer
Confidential, Evansville, IN
Responsibilities:
- Worked on AWS Data pipeline to configure data loads from S3 to Redshift.
- Using AWS Redshift, I Extracted, transformed, and loaded data from various heterogeneous data sources and destinations
- Created Tables, Stored Procedures, and extracted data using T-SQL for business users whenever required.
- Performs data analysis and design, and creates and maintains large, complex logical and physical data models, and metadata repositories using ERWIN and MB MDR
- I have written a shell script to trigger data Stage jobs.
- Assist service developers in finding relevant content in the existing reference models.
- Like Access, Excel, CSV, Oracle, flat files using connectors, tasks, and transformations provided by AWS Data Pipeline.
- Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.
- Worked on developing Pyspark script to encrypting the raw data by using Hashing algorithms concepts on client-specified columns.
- Responsible for Design, Development, and testing of the database and Developed Stored Procedures, Views, and Triggers
- Developed Python-based API (RESTful Web Service) to track revenue and perform revenue analysis.
- Compiling and validating data from all departments and Presenting it to the Director of Operation.
- KPI calculator Sheet and maintain that sheet within SharePoint.
- Created Tableau reports with complex calculations and worked on Ad-hoc reporting using PowerBI.
- Creating a data model that correlates all the metrics and gives a valuable output.
- Worked on the tuning of SQL Queries to bring down run time by working on Indexes and Execution Plan.
- Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, PostgreSQL, Data Frame, OpenShift, Talend, pair RDD's
- Involved in integration of Hadoop cluster with spark engine to perform BATCH and GRAPHX operations.
- Performed data preprocessing and feature engineering for further predictive analytics using Python Pandas.
- Developed and validated machine learning models including Ridge and Lasso regression for predicting the total amount of trade.
- Boosted the performance of regression models by applying polynomial transformation and feature selection and used those methods to select stocks.
- Generated report on predictive analytics using Python and Tableau including visualizing model performance and prediction results.
- Implemented Copy activity, Custom Azure Data Factory Pipeline Activities
- Primarily involved in Data Migration using SQL, SQL Azure, Azure Storage, and Azure Data Factory, SSIS, PowerShell.
- Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).
- Migration of on-premise data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake and Stored (ADLS) using Azure Data Factory (ADF V1/V2).
- Developed a detailed project plan and helped manage the data conversion migration from the legacy system to the target snowflake database.
- Design, develop, and test dimensional data models using Star and Snowflake schema methodologies under the Kimball method.
- Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight
- Developed data pipeline using Spark, Hive, Pig, Python, Impala, and HBase to ingest customer
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python, and Scala.
- Ensure deliverables (Daily, Weekly &Monthly MIS Reports) are prepared to satisfy the project requirements cost and schedule
- Worked on a direct query using PowerBI to compare legacy data with the current data and generated reports and stored and dashboards.
- Designed SSIS Packages to extract, transfer, load (ETL) existing data into SQL Server from different environments for the SSAS cubes (OLAP)
- SQL Server reporting services (SSRS). Created & formatted Cross-Tab, Conditional, Drill-down, Top N, Summary, Form, OLAP, Subreports, ad-hoc reports, parameterized reports, interactive reports & custom reports
- Created action filters, parameters, and calculated sets for preparing dashboards and worksheets using PowerBI
- Developed visualizations and dashboards using PowerBI
- Sticking to ANSI SQL language specification wherever possible, and providing context about similar functionality in other industry-standard engines (e.g. referencing PostgreSQL function documentation)
- Used ETL to implement the Slowly Changing Transformation, to maintain Historically Data in Data warehouse.
- Performing ETL testing activities like running the Jobs, Extracting the data using necessary queries from database transform, and upload into the Data warehouse servers.
- Created dashboards for analyzing POS data using Power BI
Environment: MS SQL Server 2016, T-SQL, Oracle, Hive, Advance Excel (creating formulas, pivot tables, Hlookup, Vlookup, Macros), Spark, MongoDB, SSAS, SSRS, OLAP, Python, ETL, Power BI, Tableau, Hive/Hadoop, Snowflakes, AWS Data Pipeline, Cognos Report Studio 10.1
Data Engineer
Confidential Columbus, Indiana
Responsibilities:
- Committed to identifying requirements, developing models based on client specifications, and drafting full evidence.
- Relying on the business requirements, adding new drill-down dimensions to the data flow.
- Developed an ETL pipeline to source these datasets and transmit calculated ratio data from Azure to Datamart (SQL Server) and Credit Edge.
- Team leader with large-scale, widely distributed database systems, including relational (Oracle, SQL server) and NoSQL (MongoDB, Cassandra) databases.
- Designed and implemented in all scenarios through configuring Topics in a new Kafka cluster.
- Developing and maintaining best practices and standards for data pipelining and Snowflake data warehouse integration.
- Streamlined the speed of both External and Managed HIVE tables.
- Worked mostly on requirements and technical phase of the Streaming Lambda Architecture, that uses Spark and Kafka to provide real-time streaming.
- Created and developed a system that uses Kafka to collect data from multiple portals and then processes it using Spark.
- Loaded information from the data warehouse and other systems such as SQL Server and DB2 using ETL tools such as SQL loader and external tables.
- Using a REST API, implemented Composite server for data isolation and generated multiple views for restricted data access.
- Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, Validation and verified its performance over MR jobs Employed Python's pandas and numpy libraries to clean data, scale features, and engineer features, as well as Predictive Analytics to create models.
- Applied Apache Airflow and CRON scripts in the UNIX operating system to develop Python scripts to automate the ETL process.
- Worked in Azure environment for development and deployment of Custom Hadoop Applications
- Using Hadoop stack technologies SQOOP and HIVE/HQL, implemented Data Lake to consolidate data from multiple source databases such as Exadata and Teradata.
- Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
- Used Data Transformation Services to convert a SQL server database to MySQL.
- Developed complicated SQL queries that included joins, sub-inquiries, and nested queries.
- Used windows Azure SQL reporting services to create reports with tables, charts and maps.
- Created PySpark and SparkSQL code to process data in Apache Spark on Amazon EMR and conduct the required transformations depending on the STMs developed.
- The jars and input datasets were stored in S3 Bucket, and the processed output from the input data set was stored in Dynamo DB.
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
- Participate in the design and architecture of Master Data Management (MDM) and Data Lakes. Cloudera Hadoop is used to create Data Lake.
- Data intake is handled via Apache Kafka.
- On HDFS, Hive tables were built to store the Parquet-formatted data processed by Apache Spark on the Cloudera Hadoop Cluster.
- Architect & implement medium to large scale BI solutions on Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Data bricks, NoSQL DB)
- Using Python, formulated and constructed automation test scripts.
Environment: Kafka, Spark, Hive, Scala, Hbase, Snowflake, Pig, Azure, CI/CD, API, Data stage, SQS, Git, Oracle Database 11g, PowerBI, Oracle Http Server 11g, PostgreSQL, Windows 2007 Enterprise, RDBMS, Data Pipelining, NoSQL, MongoDB, Dynamo DB, Python, ETL, SDLC, Waterfall, Agile methodologies, SOX Compliance.
Big Data Engineer
Confidential, Branchburg, NJ
Responsibilities:
- Migrating data from FS to Snowflake within the organization
- Imported Legacy data from SQL Server and Teradata into Amazon S3.
- Created consumption views on top of metrics to reduce the running time for complex queries.
- Exported Data into Snowflake by creating Staging Tables to load Data of different files from Amazon S3.
- Compare the data in a leaf level process from various databases when data transformation or data loading takes place. I need to analyze and look into the data quality when these types of loads are done (To look for any data loss, data corruption).
- As a part of Data Migration, wrote many SQL Scripts for Mismatch of data and worked on loading the history data from Teradata SQL to snowflake.
- Created Metric tables, End-user views in Snowflake to feed data for Tableau refresh.
- Generated Custom SQL to verify the dependency for the daily, Weekly, Monthly jobs.
- Using Nebula Metadata, registered Business and Technical Datasets for corresponding SQL scripts
- Experienced in working with spark ecosystem using Spark SQL and Scala queries on different formats like text file, CSV file.
- Developed spark code and spark-SQL/streaming for faster testing and processing of data.
- Closely involved in scheduling Daily, Monthly jobs with Precondition/Postcondition based on the requirement.
- Monitor the Daily, Weekly, Monthly jobs and provide support in case of failures/issues.
- Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
- Worked on analyzing Hadoop clusters and different big data analytic tools including Pig, Hive.
- Working experience with data streaming process with Kafka, Apache Spark, Hive.
- Worked with various HDFS file formats like Avro, Sequence File, Nifi, Json, and various compression formats like Snappy, bzip2.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
Environment: Snowflake, AWS S3, GitHub, Service Now, HP Service Manager, EMR, Nebula, Kafka, Jira, Confluence, Shell/Perl Scripting, Python, AVRO, Zookeeper Teradata, SQL Server, Apache Spark, Sqoop.
Data Engineer
Confidential
Responsibilities:
- Identify the appropriate tables from the Data mart and define the universe links to create new universes in Business Objects based on user needs.
- Reports based on SQL queries were created using Business Objects. Executive dashboard reports provide the most recent financial data from the company, broken down by business unit and product.
- Conducted data analysis and mapping, as well as database normalization, performance tuning, query optimization, data extraction, transfer, and loading ETL, and clean up.
- Developed reports, interactive drill charts, balanced scorecards, and dynamic Dashboards using Teradata RDBMS analysis with Business Objects.
- Gathering requirements, status reporting, developing various KPIs, and project deliverables are all responsibilities.
- In charge of maintaining a high-availability, high-performance, and scalability MongoDB environment.
- Created a NoSQL database in MongoDB using CRUD, Indexing, Replication, and Sharing.
- Assisting with the migration of the warehouse database from Oracle 9i to Oracle 10g.
- Worked on assessing and implementing new Oracle 10g features in existing Oracle 9i applications, such as DBMS SHEDULER create directory, data pump, and CONNECT BY ROOT.
- Improved report performance by rewriting SQL statements and utilizing Oracle's new built-in functions.
- Used Erwin extensively for data modelling and ERWIN's Dimensional Data Modeling.
- Tuning SQL queries with EXPLAIN PLAN and TKPROF.
- Created BO full client reports, Web intelligence reports in 6.5 and XI R2, and universes with context and loops in 6.5 and XI R2.
- Worked on Informatica, Oracle Database, PL/SQL, Python, and Shell Scripts as an ETL tool.
- Built HBase tables to load enormous amounts of structured, semi-structured, and unstructured data from UNIX, NoSQL, and several portfolios.
Environment: Quality center, Quick Test Professional 8.2, SQL Server, J2EE, UNIX, .Net, Python, NoSQL, MS Project, Oracle, Web Logic, Shell script, JavaScript, HTML, Microsoft Office Suite 2010, Excel
Environment: CA Erwin, Oracle10g, MS Excel, SQL server, SSIS, Oracle.
Data Analyst
Confidential
Responsibilities:
- Collaborated with Business Analysts, SMEs across departments to gather business requirements, and identify workable items for further development.
- Partnered with ETL developers to ensure that data is well cleaned and the data warehouse is up-to-date for reporting purposes by Pig.
- Selected and generated data into CSV files and stored them into AWS S3 by using AWS EC2 and then structured and stored in AWS Redshift.
- Processed some simple statistical analysis of data profiling like cancel rate, var, skew, Kurt of trades, and runs of each stock everyday group by 1 min, 5 min, and 15 min.
- Used PySpark and Pandas to calculate the moving average and RSI score of the stocks and generated them into the data warehouse.
- Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, PostgreSQL, Data Frame, OpenShift, Talend, pair RDD's
- Involved in integration of Hadoop cluster with spark engine to perform BATCH and GRAPHX operations.
- Developed complex SQL statements to extract the Data and packaging/encrypting Data for delivery to customers.
- Provided business intelligence analysis to decision-makers using an interactive OLAP tool
- Created T/SQL statements (select, insert, update, delete) and stored procedures.
- Defined Data requirements and elements used in XML transactions.
- Created Informatica mappings using various Transformations like Joiner, Aggregate, Expression, Filter, and Update Strategy.
- Performed Tableau administering by using tableau admin commands.
- Involved in defining the source to target Data mappings, business rules, and data definitions.
- Ensured the compliance of the extracts to the Data Quality Center initiatives
- Metrics reporting, Data mining, and trends in helpdesk environment using Access
- Worked on SQL Server Integration Services (SSIS) to integrate and analyze data from multiple heterogeneous information sources
- Performed data preprocessing and feature engineering for further predictive analytics using Python Pandas.
- Developed and validated machine learning models including Ridge and Lasso regression for predicting the total amount of trade.
- Boosted the performance of regression models by applying polynomial transformation and feature selection and used those methods to select stocks.
- Generated report on predictive analytics using Python and Tableau including visualizing model performance and prediction results.
- Utilized Agile and Scrum methodology for team and project management.
- Used Git for version control with colleagues.
Environment: Spark, AWS Redshift, Python, Tableau, Informatica, Pandas, Pig, Pyspark, SQL Server, T-SQL, XML, Git.