Aws Data Engineer Resume
CA
SUMMARY
- About 7+ years of professional experience in IT which includes work experience in Data collection, Data Extraction, Data Cleaning, Data Aggregation, Data Mining, Data verification, Data analysis, Reporting, and data warehousing environments.
- Experienced in utilizing Google Cloud Platform (GCP) technologies including BigQuery, Cloud Storage, Cloud Dataflow, and Cloud Dataproc to improve data processing and storage.
- Proficient in creating efficient data pipelines and conducting data analysis using GCP Data Studio.
- Experienced in optimizing BigQuery for high performance and cost efficiency.
- Expert in implementing data security measures and performing regular performance monitoring.
- Proven track record of a successful migration of legacy systems to GCP and improved data - driven decision-making.
- Proficient in data visualization using Tableau and business intelligence using PowerBI.
- Strong experience in web development using Django, Flask, and Pyramid frameworks.
- Knowledge of database management and SQL, including ORM, DDL, DML, and stored procedures.
- Experienced in creating complex data reports using Proc REPORT, TABULATE, and T-SQL.
- Skilled in data manipulation, database optimization, and database performance tuning.
- Developed and maintained database triggers and stored procedures to ensure data accuracy and consistency.
- Collaborated with cross-functional teams and stakeholders to design, develop, and implement efficient and scalable database solutions.
- Proficient in utilizing AWS technologies, including CloudFormation, AWS CLI, and Amazon QuickSight for cloud infrastructure management and data analysis.
- Experienced in designing and implementing data pipelines using AWS services such as S3, Glue, and Redshift.
- Knowledgeable in utilizing AWS Step Functions and Lambda functions to automate data processing workflows.
- Skilled in ad-hoc data analysis using AWS Glue and Athena.
- Experience in Hadoop, Hive, Spark, Sqoop, Docker, Ansible, Kafka, Flume.
- Adept in utilizing AWS security features and healthcare regulations, such as HIPAA, to ensure data privacy and security.
- Proficient in designing and implementing complex data integration solutions using SSIS.
- Experienced in leveraging Azure Synapse Analytics to analyze and process large amounts of data.
- Knowledgeable in storing and managing data in Azure Data Lake, Blob storage, and Azure SQL.
- Skilled in version control and collaboration using Git and Bitbucket.
- Adept in automating data processing and management tasks using Azure Data Factory.
- Experienced in optimizing performance and cost for data storage and processing on Azure.
- Proficient in designing and implementing ETL workflows, including extracting, transforming, and loading data from various sources.
- Knowledgeable in utilizing Azure's security features to ensure data privacy and security.
- Proficient in Java and J2EE technologies, including Servlets, JSP, Hibernate, and Struts.
- Hands-on experience with Web Services, including SOAP and REST, and related technologies like HTML, XHTML, CSS, JSTL, and JavaScript.
- Experienced in developing and deploying Java applications using Eclipse, JMS, and MySQL.
- Knowledgeable in data exchange technologies such as XSD, XSLT, and LINUX.
- Adept at developing and deploying applications on WebLogic using ANT and version control using CVS.
- Expertise in testing and debugging using JUNIT and Log4j.
- Experienced in utilizing Jenkins for continuous integration and delivery in various software development projects.
- Skilled in automating infrastructure management using Terraform, including provisioning, updates, and scaling.
- Proficient in deploying and managing Docker containers in various environments, including on-premises and cloud.
- Well-versed in deploying and managing microservices using Kubernetes, including scaling, network, and storage management.
- Experienced in working with data warehousing technologies, including Snowflake and Databricks, for large-scale data processing and analysis.
TECHNICAL SKILLS
Technical Skills: Python, Spark, Scala, and Java
Database: SQL, MySQL, SQL-Server, and MongoDB
Data Analytics Tools: MS Excel, Databricks, Domino Lab, and Jupiter Notebook
Data Engineering: ETL Pipeline, Capture Data Change (CDC), and Data Migration
AWS Services: CloudFormation, AWS CLI, Amazon QuickSight, AWS Step Functions, Lambda functions, AWS Glue, Athena, ad-hoc data analysis, S3, Redshift, Amazon SageMaker
Hadoop/Big Data Technologies: Hadoop, MapReduce, HDFS, YARN, Oozie, Pig, Kafka, Hive, Sqoop, Spark, NiFi, Zookeeper and Cloudera Manager, Horton Works.
Azure Cloud Services: Azure Data Factory, Azure Synapse Analytics, Data Lake, Blob Storage, Azure Databricks, Azure Data Analytics, Azure functions.
GCP Cloud Services: BigQuery, Cloud Storage, Cloud Dataflow, Cloud Dataproc, GCP Data Studio
Java Technologies: J2EE, Servlet, JSP, Hibernate, Struts, HTML, XHTML, CSS, JavaScript, JMS, SOAP, XSD, XSLT, JUNIT, ANT, CVS
PROFESSIONAL EXPERIENCE
Confidential, CA
AWS Data Engineer
Responsibilities:
- Playing a lead role in gathering requirements, analysis of entire system and providing estimation on development, testing efforts.
- Involve in designing different components of system like Sqoop, Hadoop process involves map reduce & hive, Spark, FTP integration to down systems.
- Write hive and spark queries using optimize ways like using window functions, customizing Hadoop shuffle & sort parameter.
- Developing ETL’s using PySpark. Used both Dataframe API and Spark SQL API.
- Using Spark, perform various transformations and actions and the final result data is saved back to HDFS from there to target database Snowflake.
- Migrate an existing on-premises application to AWS . Use AWS services like EC2 and S3 for small data sets processing and storage, Experience in Maintaining the Hadoop cluster on AWS EMR.
- Develop a data pipeline using AWS services such as S3, Glue, and RedShift to collect, process, and store large amounts data.
- Manage AWS infrastructure using tools such as Cloud Formation and AWS CLI.
- Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
- Configure Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
- Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
- Use various spark Transformations and Actions for cleansing the input data.
- Use Jira for ticketing and tracking issues and Jenkins for continuous integration and continuous deployment.
- Enforce standards and best practices around data catalog, data governance efforts.
- Create Datastage jobs using different stages like Transformer, Aggregator, Sort, Join, Merge, Lookup, Data Set, Funnel, Remove Duplicates, Copy, Modify, Filter, Change Data Capture, Change Apply, Sample, Surrogate Key, Column Generator, Row Generator, Etc
- Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow for ETL batch processing to load into Snowflake for analytical processes.
- Work in building ETL pipeline for data ingestion, data transformation, data validation on cloud service AWS, working along with data steward under data compliance.
- Work on scheduling all jobs using Airflow scripts using python added different tasks to DAG, LAMBDA.
- Use Pyspark for extract, filtering and transforming the Data in data pipelines.
- Skilled in monitoring servers using Nagios, Cloud watch and using ELK Stack Elasticsearch Kibana.
- Use Data Build Tool for transformations in ETL process, AWS lambda, AWS SQS.
- Work on scheduling all jobs using Airflow scripts using python. Adding different tasks to DAG’s and dependencies between the tasks.
- Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
- Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark databricks cluster.
- Create Unix Shell scripts to automate the data load processes to the target Data Warehouse.
- Responsible for implementing monitoring solutions in Ansible, Terraform, Docker, and Jenkins.
Environment: Red Hat Enterprise Linux, HDP, Hadoop, Map Reduce, HDFS, Hive, Shell Script, SQOOP, Python, PostgreSQL, spark, airflow, snowflake, Redshift, Glue, S3.
Confidential
Azure Data Engineer
Responsibilities:
- Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool, and backward.
- Have extensive experience in creating pipeline jobs, scheduling triggers, Mapping data flows using Azure Data Factory (V2), and using Key Vaults to store credentials.
- Worked on migration of data from On-prem SQL server to Cloud databases (Azure synapse Analytics & Azure SQL DB).
- Have good experience working with Azure BLOB and Data Lake storage and loading data into Azure SQL synapse analytics (DW).
- Designed SSIS Packages to transfer data from flat files, Excel SQL Server using Business Intelligence Development Studio. Migrated on-premise databases to Snowflake databases via the shift and load method in ADF.
- Expertise in data preparation with experience in blending multiple data connections and creating multiple joins across the same and various data sources.
- Worked with Power BI for creating Power BI reports and dashboards and created custom visualizations and published them to Power BI service.
- Created databases and schema objects including tables, indexes, and applied constraints, connected various applications to the database and written functions, stored procedures, and triggers.
- Used of Joins, Procedures, cursors, triggers, and extracting data from different tables for Reporting. Worked on setting up folders and environments for Package creation and using Azure Databricks.
- Created the automated processes for the activities such as database backup and SSIS Packages to run sequentially using SQL Server Agent job.
- Worked on maintenance and development of bug fixes for existing components.
- Involved in Normalization and De-Normalization of existing tables for faster query retrieval.
- Expertise in data transformations such as adding calculated columns, managing relationships, creating different measures, merging queries, appending queries, replacing values, splitting columns, grouping by, Date & Time columns, etc.
- Created alerts on data integration events (success/failure) and proactively monitored them.
- Built a continuous integration (CI) and continuous deployment (CD) pipeline for accelerating application development and development lifecycle.
- Writing complex SQL Queries, Stored Procedures, Triggers, Views, Cursors, and User Defined Functions to implement the business logic.
Environment: SSIS, Azure Synapse Analytics, Azure Data Lake & BLOB, Azure SQL, Azure data factory, Python, Git, Bitbucket.
Confidential
GCP Data Engineer
Responsibilities:
- Led the migration of legacy systems to Google Cloud Platform (GCP) and BigQuery.
- Conducted data analysis and created a migration plan to ensure minimal disruption to business operations.
- Implemented data pipeline architecture to efficiently move data to BigQuery.
- Utilized GCP technologies such as Cloud Storage, Cloud Dataflow, and Cloud Dataproc to improve data processing.
- Developed data transformation scripts to convert legacy data into a modern data format.
- Conducted data quality checks to ensure data accuracy and consistency throughout the migration process.
- Worked with stakeholders to understand their data needs and developed a solution to meet those needs.
- Configured and optimized BigQuery for high performance and cost efficiency.
- Developed and executed data migration tests to validate the accuracy and completeness of the migrated data.
- Collaborated with internal IT teams to integrate new GCP-based systems with existing legacy systems.
- Provided training and support to end-users to ensure a seamless transition to the new platform.
- Improved data security by implementing GCP security features such as encryption and access control.
- Created dashboards and reports using GCP's Data Studio to provide business insights.
- Utilized GCP's machine learning capabilities to improve data-driven decision-making.
- Worked with DevOps teams to automate the data migration process for scalability and efficiency.
- Optimized data storage costs by utilizing GCP's cost-effective storage options.
- Utilized GCP's APIs and integrations to allow for easy integration with other data sources and systems.
- Improved data backup and disaster recovery by utilizing GCP's robust disaster recovery features.
- Collaborated with the data privacy and security team to ensure data privacy and security standards are met.
- Conducted regular performance monitoring and tuning to ensure the GCP platform is running at optimal performance.
Environment: s: Google Cloud Platform (GCP), BigQuery, Cloud Storage, Cloud Dataflow, Cloud Dataproc, GCP's Data Studio
Confidential
Data Analyst
Responsibilities:
- Capture functional requirements from business clients along with IT requirement analysts by posing suitable questions and analyzing the requirements by collaborating with team and system architects by following the standard templates.
- Extensive Tableau Experienced in Enterprise Environment and Tableau Administration.
- Successfully upgraded Tableau platforms in clustered environments and performed content upgrades.
- Experienced in popular Python frameworks (like Django, Flask, or Pyramid).
- Knowledge of object-relational mapping (ORM).
- Performed quality review and validation of SAS programs generated by other SAS programmers.
- Followed good programming practices and adequately documented programs.
- Produced quality customized reports by using PROC REPORT, TABULATE, Summary and descriptive statistics using PROC Means, Frequency, and Univariate.
- Provided production support to client benefit and product service users by Writing Custom SQL and SAS programs.
- Experienced with T-SQL, DDL, and DML Scripts and established Relationships between Tables using Primary Keys and Foreign Keys.
- Extensive Knowledge in Creating Joins and sub-queries for complex queries involving multiple tables.
- Hands-on experience in Using DDL and DML for writing Triggers, Stored Procedures, and Data manipulation.
- Worked with star schema, snowflake schema dimensions, and SSRS to support large Reporting needs.
- Experience in using Profiler and Windows Performance Monitor to resolve Dead Locks or long-running queries and slow the running server.
Environment: s: Tableau, Django, Flask, Pyramid, object-relational mapping, PROC REPORT, TABULATE, T-SQL, DDL, DML, Triggers, Stored Procedures, and Data manipulation.
Confidential
Java Developer
Responsibilities:
- Involved in the entire life cycle development of the application. Reviewing and analyzing data models for developing the Presentation layer and Value Objects.
- Designed table-less layouts, gradient effects, page layouts, navigation, and icons using CSS and appropriate HTML tags as per W3C standards.
- Involved in writing client-side scripts using JavaScript, and jQuery.
- Extensively used Hibernate in the data access layer to access and update information in the database for registration.
- Used JSPs and Servlets for implementing UI/MVC framework.
- Validated the XML documents with XSD validation and transformed them to XHTML using XSLT.
- Implemented the Struts framework based on the MVC design pattern.
- Involved in writing the struts-config files and implementing the Struts Tag library.
- Used Struts for web tier development and created Struts Action to handle the requests.
- Worked on JMS components for asynchronous messaging between clients and pharmacists.
- Used JReport for the generation of reports of the application.
- Developed framework using Java, MySQL and web server technologies.
- Services using SOAP, WSDL, and XML using CXF framework tool/Apache Commons.
- Wrote Stored Procedures and complicated queries for IBM DB2 and Implemented SOAP architecture with Web.
- Used WebLogic for application deployment and Log4J used for Logging/debugging.
- Used JUnit for unit testing and checking API performance.
- Used CVS version controlling tool and project build tool using ANT.
Environment: Java, JSP, Servlet, Hibernate, Struts, HTML, XHTML, CSS, JavaScript, JMS, MySQL, SOAP, XSD, XSLT, WebLogic, Log4J, JUnit, ANT, CVS