Sr. Data Engineer Resume
Chicago, IL
SUMMARY
- Around 9 years of professional IT experience in which 8+years of experience with emphasis on Big Data Technologies, Development and Design of Scala - based enterprise Application.
- 8+ years of IT experience with Data Engineer/Analyst and coding with analytical programming using SQL, Python, Snowflake and AWS.
- Experience in designing and building covering Data Ingestion, Data integration, Data consumption, Data delivery, and integration Reporting, Analytics, and System-System integration.
- Proficient in Big Data environment and Hands-on experience utilizing Hadoop environmental components for large-scale data processing including structured and semi-structured data.
- Strong experience with different phases including Requirement Analysis, Design, Coding, Testing and Support
- Extensive experience with Azure cloud components like Azure Data Lake Storage, Azure Data Factory, Azure SQL, Azure Synapse, Azure Analytical Services, and Azure Databricks.
- Solid Knowledge of AWS services like AWS EMR, Redshift, S3, EC2, and concepts, configuring the servers for auto-scaling and elastic load balancing.
- Involved in file movements betweenHDFSandAWS S3and extensively worked in different use cases of leveraging S3 bucket.
- Experience in designing dashboards, reports, performing ad-hoc analysis and visualizations using Tableau, Power BI.
- Experience in job workflow scheduling and monitoring tools like Oozie.
- Experience with application development on python, RDBMS, NoSQL and ETL solutions.
- Experience in the development of ETL processes using matillion for large-scale, complex datasets.
- Strong experience in Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export.
- Experience with monitoring the web services using Hadoop and Spark for controlling the applications and analyzing their operation and performance.
- Experienced in Python data manipulation for loading and extraction as well as with Python libraries such as Pandas for data engineering pipelines.
- Good knowledge and experience with NoSQL databases like HBase, Cassandra, and MongoDB and SQL databases like Teradata, Oracle, PostgreSQL, and SQL Server.
- Experience in the development and design of various scalable systems using Hadoop technologies in various environments and analyzing data using MapReduce, Hive, and Pig.
- Hands-on use of Spark and Scala to compare the performance of Spark with Hive and SQL and taking advantage of Spark SQL for different use cases.
- Strong knowledge in working with ETL methods for data extraction, transformation, and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.
- Hands-on experience in designing and implementing data engineering pipelines and analyzing data using Hadoop ecosystem tools like HDFS, Spark, Sqoop, Hive, Flume, Kafka, Impala, PySpark, Oozie, and HBase.
- Experience with different traditional ETL tools like SSIS, Informatica, and reporting tool environments like SQL Server Reporting Services, and Business Objects.
- Experience in deployment of applications and scripting using the Unix/Linux Shell/python/power shell scripting.
- Solid knowledge of Data Marts, Operational Data Store, OLAP, Dimensional Data Modeling with Star Schema Modeling, Snowflake Modeling for Dimensions Tables.
- Extensive experience with various databases like Teradata, MongoDB, Cassandra DB, MySQL, Oracle, and SQL Server.
- Developed data ingestion modules using AWS step functions, AWS Glue and python modules.
- Experience in Creating Teradata BTEQ scripts using different OLAP and aggregate functions.
- Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures, and functions.
- Worked on Continuous Integration and Continuous Deployment using containerization technologies like Docker and Jenkins.
- Excellent working experience in Agile/Scrum development and Waterfall project execution methodologies.
- Experience in using various version control systems like Git and SVN
- Strong analytical skills with the ability to collect, organize and analyze large amounts of information with attention to detail and accuracy.
- Possess good interpersonal, analytical presentation Skills, ability to work in Self-managed and Team environments.
TECHNICAL SKILLS
Big Data Technologies: Hadoop, MapReduce, Spark, HDFS, Sqoop, Oozie, Hive, Impala, Apache Flume, Cloudera, HBase, Pig
Programming Languages: Python, PL/SQL, SQL, Scala, PowerShell, C, C++, T-SQL
Cloud Services: Azure Data Lake Storage Gen 2, Azure Data Factory, Blob storage, Azure SQL DB, Databricks, Azure Event Hubs, AWS RDS, Amazon SQS, Amazon S3, AWS EMR, Lambda, AWS SNS
Databases: MySQL, SQL Server, Oracle, MS Access, Teradata, and Snowflake
NoSQL Data Bases: MongoDB, Cassandra DB, HBase
Monitoring tool: Apache Airflow
Visualization & ETL tools: Tableau, Informatica, Talend, SSIS
Version Control & Containerization tools: Jenkins, Git, and SVN
Operating Systems: Unix, Linux, Windows, Mac OS
PROFESSIONAL EXPERIENCE
Confidential - Chicago, IL
Sr. Data Engineer
Responsibilities:
- Developed data pipeline using Sqoop,Spark, MapReduce, and Hive to ingest, transform and analyze customer behavioral data.
- Implemented Spark using python and Spark SQL for faster processing of data and algorithms for real-time analysis in Spark.
- Experience in using Splunk and Apache Flume for collecting, aggregation, and moving large amounts of data from the application server.
- Used Spark for interactive queries, processing of streaming data, and integration with popular NoSQL database for huge volume of data.
- Experienced in implementing Real-Time streaming and analytics using various technologies i.e.,Spark Streaming and Kafka.
- Designed and developed batch and streaming data pipelines to load data into Data Lake and Data Warehouse.
- Using services like AWS S3, RedShift, Glue, EMR, Lambda, Step Function, IAM, QuickSight, RDS, CloudWatch events and Trails, etc.,
- Developed APIs to access enterprise metadata platform for registering datasets and migrated all the data pipeline applications from legacy to new platform.
- Developed and modified all our data applications using python.
- Extensively used PySpark, Python for building and modifying existing pipelines.
- Worked on creating panda’s data frames and spark data frames where needed in order optimize the performance.
- Created/modified data pipeline applications using spark, python, PySpark, AWS EMR, S3, Lambda, pandas, Spark SQL, Glue, Pestro and Snowflake.
- Extensively worked on EMR and Spark jobs and created IAM roles, S3 buckets policies based on the enterprise request.
- Created complex SQL queries using SparkSQL for all data transformation when loading data into Snowflake.
- Utilized enterprise scan tool to identify data that is sensitive in all S3 buckets that we own and encrypt the data before downstream can use.
- Deploy application to Docker Container using Jenkins and trigger jobs to load data from S3 to Snowflake tables or Snowflake to S3 based on the user requirements.
- Collaborated with Data Analysts to understand the data and their end requirements and transform the data using PySpark, SparkSQL and created data frames.
- Extensively worked on file formats like Parquet, AVRO, Confidential and JSON files
- Experience in using POSTMAN to test the end points for various applications and validate the schema, data types, formats of file etc.
- Monitor applications using PagerDuty, and slunk logs based the data loads daily, weekly, and monthly.
- Experience working on Docker Images, updating docker images as necessary.
- Worked on Databricks, created managed clusters, and performed data transformations.
Environment: Hadoop, MapReduce,Spark, Hive, Pig, Python, Docker, Databricks, AVRO, PySpark, SparkSQL, Snowflake, Kafka, Bash/Shell Scripting, Flume, EMR, S3, Glue, Pestro.
Confidential - Chicago, IL
Big Data Developer/Engineer
Responsibilities:
- Written T-SQL queries to develop/modify existing stored procedures, functions, triggers, views and other SQL objects.
- Expert in writing Complex T-SQL queries, dynamic-queries, sub-queries and complex joins.
- Worked on different types of Source systems like Oracle, flat files and Sql Server to host the data into ODS through SSIS.
- Used Oracle Data Integrator for performing Data loading and validation from source to staging and target environments, Data Cleansing efforts.
- Worked on creating different types of variables like global variable and local variables. Worked on using refresh variables and evaluate variables according to different requirements.
- Worked in SQL Tuning according to the business by creation of Indexes, rebuilding Indexes and timely materialize views refresh.
- Developed UNIX wrapper scripts to execute the latest package scenarios with parameters to execute jobs.
- Worked with ODI administrator in ODI Installation, Configuration, creating Master and Work Repositories, setup of Topology, Data Stores and Projects.
- Developed prototype reports for end users in Business objects to get sign off for actual designing and for presentation.
- Worked with containerization tools, can implement transition to Docker and develop distributed cloud system using Kubernetes and Helm package manager for Kubernetes.
- Deployed windows Kubernetes cluster with Azure Container Service (ACS) from Azure CLI and Utilized Kubernetes and Docker for runtime environment of the CI/CD system to build. test and deploy.
- Schedule the reports as requested by users on daily, Weekly, Monthly, Quarterly, Yearly bases.
- Converted ODI mappings from Oracle system to SQL Server using SSIS packages to make sure it has Apple to Apple match in the new system.
- Identified and Analysis of various facts from the source system and business requirements to be used for the data warehouse.
- Developed, deployed, and monitored SSIS Packages executions.
- Participated in requirements meetings and data mapping sessions to understand business needs.
- Coordinating with the offshore & training them on the process by balancing the workload.
- Experience in Creating Dashboards, Drill Down, Drill Through and Linked Reports using SSRS (SQL Server Reporting Services) 2008R 2/2012/2016.
- Worked on Agile model.
Environment: Python, SQL, Oracle, AWS RDS, Amazon SQS, Spark SQL, Amazon S3, AWS EMR, AWS Lambda, AWS SNS, MapReduce, Scala, ETL Data pipeline, SSIS, SQL Server, Cloudera, Hadoop, Apache Airflow, Jenkins, GIT, Hive, Linux, Spark, RDD, HDFS, AWS Kinesis, NoSQL, Tableau, Agile.
Confidential - St Louis, MO
Data Engineer
Responsibilities:
- Utilizing analytical, statistical and programming skills to collect, analyze and interpret large data sets to develop data-driven and technical solutions to difficult business problems using tools such as SQL, and Python.
- Worked on designing AWS EC2 instance architecture to meet high availability application architecture and security parameters.
- Created AWS S3 buckets and also managed policies for S3 buckets and Utilized S3 buckets and Glacier for storage and backup.
- Worked on Hadoop cluster and data querying tools to store and retrieve data from the stored databases.
- Worked with different file formats like Parquet files and also Impala using PySpark for accessing the data, and performed Spark Streaming with RDDs and Data Frames.
- Performed the aggregation of log data from different servers and used them in downstream systems for analytics using Apache Kafka.
- Worked on designing and developing the SSIS Packages to import and export data from MS Excel, SQL Server, and Flat files.
- Worked on Data Integration for extracting, transforming, and loading processes for the designed packages.
- Designed and deployed automated ETL workflows using AWS lambda, organized and cleansed the data in S3 buckets using AWS Glue, and processed the data using Amazon Redshift.
- Worked within the ETL architecture enhancements to increase the performance using query optimizer.
- Implemented the data that is extracted using Spark, Hive, and large data sets using HDFS.
- Worked on Streaming data transfer, data from different data sources into HDFS, No SQL databases.
- Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into the target database.
- Worked on scripting with Python in Spark for transforming the data from various files like Text files, Confidential and JSON.
- Worked on processing the data and testing using Spark SQL and on real-time processing by Spark Streaming and Kafka using Python.
- Used Kubernetes to manage containerized applications using its node, Config Maps, selector, services and deployed application container as Pods.
- Scripted using Python and PowerShell for setting up baselines, branching, merging, and automation processes across the process using GIT.
- Worked with the implementation of the ETL architecture for enhancing the data and optimized workflows by building DAGs in Apache Airflow to schedule the ETL jobs and additional components in Apache Airflow like Pool, Executors, and multi-node functionality.
- Used various Transformations in SSIS Dataflow, Control Flow using for loop Containers and Fuzzy.
- Worked on creating SSIS packages for Data Conversion using data conversion transformation and producing the advanced extensible reports using SQL Server Reporting Services.
Environment: Python, SQL, AWS EC2, AWS S3 buckets, Hadoop, PySpark, AWS lambda, AWS Glue, Amazon Redshift, Spark Streaming, Apache Kafka, SSIS, Informatica, ETL, Hive, HDFS, NoSQL, Talend, MySQL, Teradata, Sqoop, PowerShell, GIT, Apache Airflow.
Confidential
Big Data Developer/ETL Developer
Responsibilities:
- Participated in the complete life cycle in creating SSIS packages, building, deploying, and executing the packages in development and production environments.
- Designed and developed various SSIS packages for ETL operations to extract and transform data and scheduled the SSIS Packages.
- Worked on cleaning the ETL packages for determining the optimal data process and integrated the data for designing and developing the jobs.
- Created ETL metadata reports using SSRS, reports include execution times for the SSIS packages, Failure reports with error descriptions.
- Worked on developing SSIS packages to Extract, Transform and Load (ETL) data into the data warehouse database from heterogeneous databases and various other data sources.
- Worked on identifying the source tables using SSIS and created the processes, captured, stored, and processed the data from the transactions in real-time.
- Created OLAP applications with OLAP services in SQL Server and build cubes with many dimensions using both Star and Snowflake schemas.
- Extracted and transformed Data from OLTP databases to the specific database designed for OLAP services (was involved in the creation of all objects for that database) during off-peak hours
- Developed SQL queries/Scripts to validate the data such as checking duplicates, null values truncated values, and ensuring correct data aggregations.
- Developed Stored Procedures, Triggers, Functions, and T-SQL Queries to capture updated and deleted data.
- Effectively handled data errors during the modification of existing reports and the creation of new reports. manual using Tableau.
- Creating ETL jobs in AWS Glue using python programming. Also able to create the Query on dataframes using the PySpark syntaxes and also the functions.
- Performed ETL operations using Python, SQL on many data sets to obtain metrics.
- Prepared data according to analyst requirements on the extracted data using Pandas and NumPy modules in Python.
- Worked on exporting and importing data from Confidential files, Text files, and Excel Spreadsheets by creating SSIS Packages and participated in the development process following the Agile methodology.
Environment: ETL operations, SSIS, Metadata, SSRS, OLAP, SQL Server, Star, and Snowflake schemas, OLTP, SQL, T-SQL, Tableau, Confidential files, Text files, Excel
Confidential
SQL Developer / Data Analyst
Responsibilities:
- Integrated data from various Data sources like MS SQL Server, DB2, Oracle, Netezza and Teradata using Informatica to perform Extraction, Transformation, loading (ETL processes) Worked on ETL development and Data Migration using SSIS and (SQL Loader, PL/SQL).
- Created Entity/Relationship Diagrams, grouped and created the tables, validated thedata, identified PKs for lookup tables.
- Involved in Designed and Developed logical & physical data models and Meta Data to support the requirements using ERWIN.
- Involved using ETL tool Informatica to populate the database, data transformation from the old database to the new database using Oracle.
- Designed the database tables and reviewed new report standards to ensure optimized performance under the new reporting service SSRS.
- Planned, designed, and documented the optimum usage of space requirements and distribution of space for the Data Warehouse.
- Worked with MS SQL Server and managed the programs using MS SQL Server Setup.
- Designed and developed packages for data warehousing and Data Migration projects using Integration services and SSIS n MS SQL Server.
- Extracted the data from Oracle, Flat Files, transformed and implemented required Business Logic, and Loaded into the target Data warehouse using SSIS.
- Created OLAP cubes on top of the data warehouse basing various fact and dimension tables for analysis purposes using SQL Server Analysis Services.
- Worked with the setup and implementation of the reporting servers, written T-SQL queries, and Stored Procedures, and used them to build packages.
- Worked on modifying a variety of parameterized, drill down, click through, chart, matrix, sub reports using SSRS using data from a variety of sources.
- Worked on scheduling the reports on capacity at a level of different servers to run daily and sent the results to the business users in the required format using Tableau.
- Designed and implemented Stored Procedures and Triggers for automating tasks.
- Managed all indexing, debugging, optimization, and performance tuning using T-SQL.
- Worked on creating and modifying SQL Joins, sub-queries, and other T-SQL and PL/SQL code to implement business rules.
Environment: T-SQL, PL/SQL, Data Warehouse, MS SQL Server, Oracle, Flat Files, SSIS, OLAP, SQL Server Analysis Services, SSRS, Tableau, SAP.