Enterprise Data Engineer Resume
Middleton, WI
SUMMARY
- Around 8 years of IT experience as Big Data - Redshift, Hadoop & Spark developer, and Python Developer.
- Hands on experience in Hadoop ecosystem tools including HDFS, MapReduce, Hive, Sqoop, Oozie, Spark, Flume, Kafka.
- Experienced in using Apache Airflow data pipelines using python to extract and process data using DAG’s while working on API’s.
- Excellent knowledge on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and MapReduce programming paradigm.
- Configured various performance metrics using AWS Cloud watch & Cloud Trial
- Worked on configuring Cross-Account deployments using AWS Code Pipeline, Code Build and Code Deploy by creating Cross-Account Policies & Roles on IAM.
- Experience of Analyzing, Designing, Development, and Implementation of operational database systems (OLTP) and data warehouse systems (OLAP).
- Expertise in full life cycle of ETL (Extraction, Transformation and Loading) using Informatica power center and Apache Airflow
- Proficiency with AWS developer tools like AWS CLI, CloudFormation templates and workflows.
- Expertise in utilizing AWS services such as EC2, RDS, S3, EFS, Glacier, Storage Gateway, DynamoDB, ElastiCache, Redshift, VPC, CloudFront, Route53, Direct Connect, API Gateway, EBS, AMI, SNS, CloudWatch, ELB, Auto Scaling, IAM.
- Experience in automation of code deployment, support, and administrative tasks across multiple cloud providers such as Amazon Web Services, Microsoft Azure, Google Cloud
- Good experience in SQL/, PL/SQL. Developed simple & complex stored procedures in Confidential Netezza. Expertise in developing UNIX shell scripts.
- Good understanding of relational database management systems like Confidential, Confidential Netezza, DB2, SQL Server and worked on Data Integration using Informatica for the Extraction transformation and loading of data from various database source systems.
- Hands-on experience on RDD architecture, implementing Spark operations on RDD.
- Having knowledge on Spark Streaming to ingest data from multiple data sources into HDFS.
- Skillful Hands-on Experience on Stream Processing including Storm and Spark streaming.
- Experience in importing and exporting the data using SQOOP from HDFS to Relational Database systems and vice-versa.
- Understanding the data architecture and data integration layers while implementing data tools to support analytics and data systems
- Project Planning, Development, Risk and Dependencies Management, with Communication Management (at all levels of the organization), Requirements/Scope Development and Management and Project Integration Management & Implementation/Deployment.
TECHNICAL SKILLS
Databases: Confidential, MS SQL Server, Amazon RDS, Confidential Netezza, HBase, MongoDB, Cassandra
Datawarehouse: AWS Redshift, Cloudera, Spark.
ETL Tools: Informatica power center, SSIS, Snaplogic
Query Tools: Aginity, SQL Developer, SQL Navigator, SQL* Plus, PL/SQL Developer, Data modelling
Cloud Ecosystem: Amazon Web Services (S3, EC2, RDS)
Hadoop Distributions: Cloudera, MapReduce, Hortonworks
Operating Systems: UNIX, Windows95/98/NT/2000/XP/Windows10, MS-DOS
Programming/ Scripting Languages: UNIX, SQL, PL/SQL, Python, C, C++, Java, Scala
PROFESSIONAL EXPERIENCE
Enterprise Data Engineer
Confidential, Middleton WI
Responsibilities:
- Performing Batch, Real time processing of Data using Hadoop Components like Hive, Spark
- Used Spark streaming to Process the streaming data and to analyze the continuous datasets using PySpark
- Resolving complex issues reported in azure Databricks and HDInsight which were reported by Azure end customers
- Experience in Apache Airflow to author workflows as directed acyclic graphs (DAGs), to visualize batch and handle API’s real - time data pipelines running in production using Python.
- Integrate new big data management technologies with Spark Sala, Hadoop, and software engineering tools in designing the data architecture and data integration layers to ensure that data center networks
- Experienced in handling Big Data with Hadoop, Apache Spark through libraries NumPy & pandas while using PYSPARK in Data bricks & Jupiter Notebook
- Worked with varieties of Relational Databases (RDBMS) like PostgreSQL, MSQL, MySQL, Netezza, and NoSQL Athena.
- Experience in ETL process consisting of data sourcing, data transformation, mapping, conversion and loading. Had developed complex mappings.
- Automatic deployment tool like Docker and Kubernetes for containerization by combining them with the workflows to make them lightweight.
- Experience in creating and maintaining reporting infrastructure to facilitate visual representation of manufacturing data by executing Lambda in S3, Redshift, RDS, MongoDB/DynamoDB ecosystems.
- Implement data extraction tools with integration of a variety of data sources and data formats Solve big data problems with smart algorithmic solutions like using PIG, Sqoop and Informatica Power Center
- Constructed product-usage SDK data and Siebel data aggregations by using Python 3, AWS EMR, PYSPARK, Scala, Spark SQL and Hive context in partitioned Hive external tables maintained in AWS S3 location for reporting, data science dash boarding and ad-hoc analyses.
- Experience in installing software using pip command for python libraries and extensive usage of the PEP8 coding convention
- Reduced server crashing when there is heavy traffic to the website by spinning up the EC2 instances integrated with ELB and auto-scaling
- Discover data across many different systems, data sources, and data types like CLARITY, UNITY and EPIC. Collecting them storing them as a big data
- Good Knowledge on Linux and Shell Scripting, SQL Server, UNIX, and knowledge on version control software GitHub
- In developing scheduling jobs, I have worked on API triggers, Python Language, R language, SQL, PROC SQL, SAS macros, Informatica XML and so on. I have an application with more than 1100 jobs which is developed and supported by me.
- Performed in agile methodology, interacted directly with entire team provided/took feedback on design, Suggested/implemented optimal solutions, and tailored application to meet business requirement and followed Standards.
Environment: Python, Apache Airflow, PySpark, Hadoop, MapReduce, HDFS, Sqoop, Oozie, WinSCP, UNIX Shell Scripting, HIVE, Impala, Cloudera (Hadoop distribution), AWS, Docker, JIRA etc.
Data Warehouse Developer
Confidential, Madison, WI
Responsibilities:
- Experience in developing Informatica power center jobs and worked as a point of contact to address any failures or issues that caused during the extraction load. There are nearly 800 jobs with brings the data from Confidential, Sql Server, XML, CSV and Netezza to stage nearly 3000 objects across all three environments.
- Involved in Creation of tables, partitioning tables, Join conditions, correlated sub queries, nested queries, views, sequences, synonyms for the business application development.
- Extensively involved in writing SQL queries (Sub queries and Join conditions), PL/SQL programming.
- Involved in designing, implementing, and developing scheduling jobs using CA workload automation which will be associated with Informatica power center, SAS Enterprise Guide, Netezza Confidential, Anaconda for Python 3 and A2B data.
- Experience in developing machine learning code using spark MLLIB Used Spark SQL for data pre-processing, cleaning and joining very large data sets.
- Design and Develop of Logical and physical Data Model of Schema Wrote PL/SQL code for data Conversion in their Clearance Strategy Project.
- Created Indexes for faster retrieval of the customer information and enhance the database performance.
- Extensively used the advanced features of PL/SQL like collections, nested table, arrays, ref cursors, materialized views and dynamic SQL
- CA Workload Automation Modernized data analytics environment by using cloud-based Hadoop platform SPLUNK.
- Involved in the data analysis for source and target systems and good understanding of Data Warehousing concepts, staging tables, Dimensions, Facts and Star Schema, Snowflake Schema.
- Involved in Data Extraction from Confidential and Flat Files using SQL Loader Designed and developed mappings using Informatica Power Center.
- Extensively worked on Data Cleansing, Data Analysis and Monitoring Using Data Quality.
- Involved in fixing of invalid Mappings, Performance tuning, testing of Stored Procedures and Functions, Testing of Informatica Sessions, Batches and the Target Data.
- Troubleshoot data processing and regular data loads, Integrate new data technologies/tools across the enterprise
- Carry out Defect Analysis and fixing of bugs raised by the Users.
- Assist with data-related technical issues and expertise in doing Unit Testing, Integration Testing, System Testing and Data Validation for developed Informatica Mappings.
- Extensively used ETL to load data from Flat files which involved both fixed width as well as Delimited files and from the relational database, which was Confidential 11g.
Environment: python, Jupiter-book, Hadoop, MapReduce, HDFS, Sqoop, UNIX Shell Scripting, Informatica Power center, Confidential Netezza, CA Workload Automation, Redshift, AWS Data Pipeline, S3, SQL Server Integration Services, SQL Server 2014, AWS Data Migration Services, etc.
SQL Developer
Confidential
Responsibilities:
- Experience for working as PL/SQL for WC Confidential Project, Developing and Management Database for Collateral Management and Hedging
- Involved in requirements gathering, designing, developing, testing and deploying backend procedures, function & packages in PL/SQL particularly for maintaining payments.
- Worked on optimizing existing procedures and functions.
- Used Confidential materialized views / snapshots, clusters and partitions for faster query performance.
- Improved query performance through usage of bit map and function-based indexes, explain plan, analyze, hints and joins.
- Writing Triggers, Stored Procedures, Functions, define User Defined Data Types, SQL, and Create and maintain physical structures.
- Working with Dynamic Management Views, System Views for Indexing and other Performance Problems
- Create / Manage Table Indexes, Query Tuning, Performance Tuning, Deadlock Management
- Defect tracking and closing by using HP Quality Center
- Design and Deploy SSIS Packages, Creating SSIS Packages, Configure SSIS packages for different environments using DTS Config
- Produce CSV, utilize CSV files as part of ETL Process
- Event Handling, Logging, Batch Checks for data accuracy
- Code deployments / configurations/ config management in various environments
- Implement Security, Roles, Logins, and Permissions
- Extract-Transform-Load data into Confidential, SQL Server 2006 Normalized database from various sources
- Maintain Users / Roles / Permissions, Security
- Enhance existing applications to meet new business requirements
- Enhance existing stored procedures, views, functions, database objects to meet additional requirements of latest releases, capacity Planning DBs, Disk Requirements
Environment: Confidential 9i/10g, Windows XP, PL/SQL, SQL*PLUS, SQL Developer, SAP Crystal reports, data modelling, SSIS, SQL Server 2006