Aws Data Engineer Resume
NY
SUMMARY
- Over 7 years of working experience as Data Engineer with high proficient knowledge in Data Analysis and warehousing as well.
- Experienced using "Big data" work on Hadoop, Spark, PySpark, Hive, HDFS and other NoSQL platforms.
- Data Engineer with working experience in AWS, Spark, Hadoop Ecosystem, Python for Data science, data pipeline and Tableau.
- Strong experience in AWS S3, EMR, EC2, Glue, Lambda, IAM, Kinesis, RDS, Route 53, VPC, Code build, Code pipeline, Cloud watch and Cloud formation.
- Experienced in python algorithms and programming, also had hands on experience with Django framework in analytics.
- Experience in transferring the data using Informatica tool from AWS S3 to AWS Redshift.
- Hands on experience with Amazon Web Services along with provisioning and maintaining AWS resources such as EMR, S3 buckets, EC2instances, RDS and others.
- Hands on experience with Google cloud services like GCP, BigQuery, GCS Bucket and G - Cloud Function.
- Experienced in Informatica ILM and Informatica Lifecycle Management and its tools.
- Excellent knowledge on architecture of Distributed storage and parallel computing frameworks to deal with large datasets in data engineering space.
- Efficient in all phases of the development lifecycle, coherent with Data Cleansing, Data Conversion, Data Profiling, Data Mapping, Performance Tuning and System Testing.
- Good Knowledge on SQL queries and creating database objects like stored procedures, triggers, packages and functions using SQL and PL/SQL for implementing the business techniques.
- Supporting ad-hoc business requests and Developed Stored Procedures and Triggers and extensively used Quest tools like TOAD.
- Good understanding and exposure to Python programming.
- Excellent working experience in Scrum/Agile framework and Waterfall project execution methodologies.
- Extensive experience working with business users/SMEs as well as senior management.
- Experience in Big Data Hadoop Ecosystem in ingestion, storage, querying, processing and analysis of Big data.
- Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
- Experienced in writing complex MapReduce programs that work with different file formats like Text, JSON, Xml, parquet, and Avro.
- Experienced in Technical consulting and data modeling, data governance and design - development - implementation of solutions.
- Experience in installation, configuration, supporting and managing - Cloudera Hadoop platform along with CDH4&CDH5 clusters.
- Strong experience and knowledge of NoSQL databases such as MongoDB and Cassandra.
- Experience in Airflow and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Proficient in Normalization/De-normalization techniques in relational/dimensional database environments and have done normalizations up to 3NF.
- Good understanding of Ralph Kimball (Dimensional) & Bill Inman (Relational) model Methodologies.
- Strong experience in using MS Excel and MS Access to dump the data and analyze based on business needs.
- Good experienced in Data Analysis as a Proficient in gathering business requirements and handling requirements management.
- Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.
- Experienced with usage of spark for increasing performance and optimization of algorithms with Spark context, Spark -SQL, Spark streaming and others.
- Experienced in Data warehousing in creating SCD tables, Migrations services and others.
- Experienced in BI reporting tools like Tableau, Power BI and Cognos.
TECHNICAL SKILLS
Tools: InformaticaPower Center 10.4.1,InformaticaPower Exchange,IICS, InformaticaData Quality 10.2.2, Informatica BDM, Talend
Languages: HTML, C, UNIX, Shell Scripting, Python 3.7, Power shell, XML,.
Database: Oracle, SQL Server, Teradata, MYSQL, Postgres, Hadoop, ANSI SQL, PL/SQL, T-SQL.
Reporting Tools: Tableau, Power BI, SSRS
Big Data Tools: HDFS, Hive, Spark, Airflow, Oozie, Scoop, Kafka.
Cloud: AWS EMR, S3, Lambda, Sage Maker, Azure Data center, GCP, Big Query.
Other Tools: SQL Loader, SQL Plus, Query Analyzer, Putty, MS Office, MS Word.
PROFESSIONAL EXPERIENCE
AWS Data Engineer
Confidential, NY
Responsibilities:
- As Data Engineer in to drive projects using Spark, SQL and AWS cloud environment.
- Design, implement and maintain all AWS infrastructure and services within a managed service environment.
- Worked on data governance to provide operational structure to previously ungoverned data environments.
- Involved in ingestion, transformation, manipulation, and computation of data using kinesis, SQL, AWS glue and Spark
- Participated in the requirement gathering sessions to understand the expectations and worked with system analysts to understand the format and patterns of the upstream source data.
- Done data migration from an RDBMS to a NoSQL database, and gives the whole picture for data deployed in various data systems.
- Designed and implement end-to-end data solutions (storage, integration, processing, and visualization) in AWS.
- Hands on Experience in migrating datasets and ETL workloads with Python from On-prem to AWS Cloud services.
- Developed PySpark based pipelines using spark data frame operations to load data to EDL using EMR for jobs execution & AWS S3 as storage layer.
- Enterprise Data Lake was designed and set up to enable a variety of use cases, covering analytics, processing, storing, and reporting on large amounts ofdata.
- Collaboratedwith the clients and solution architectfor maintaining quality data points in source by carrying out theactivities such as cleansing, transformation, and maintaining Integrity in a relational environment.
- Create and setup self-hosted integration runtime on virtual machines to access private networks.
- Working on building visuals and dashboards using Power BI reporting tool.
- Built Apache Airflow with AWS to analyze multi-stage machine learning processes with Amazon SageMaker tasks.
- Developed streaming pipelines using Apache Spark with Python.
- Designed and developed Security Framework to provide fine grained access to objects in AWS S3 using AWS Lambda
- Used AWS EMR to move large data (Big Data) into other platforms such as AWS data stores, Amazon S3 and Amazon Dynamo DB.
- Developed AWS lambdas using Python & Step functions to orchestrate data pipelines.
- Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
- To meet specific business requirements wrote UDF’s in Scala and PySpark.
- Worked in Agile environment and used rally tool to maintain the user stories and tasks.
- Worked on Informatica tools such as power center, MDM, Repository manager and workflow monitor.
- Created Kibana Dashboardsand combined several source and target systems into Elastic search for real-timeanalysis of end-to-end transactions tracking
- Worked with Enterprise data support teams to install Hadoop updates, patches, version upgrades as required and fixed problems, which raised after the upgrades.
- Implemented test scripts to support test-driven development and continuous integration.
- Used Spark for Parallel data processing and better performances.
- Used Python to extract data for Web scraping.
- Conducted numerous training sessions, demonstration sessions on Big Data.
- Built campaigns in UNICA, to generate custom offers.
Environment: AWS EMR, S3, RDS, Redshift, Lambda, Boto3, DynamoDB, Amazon SageMaker, Apache Spark, HBase, Apache Kafka, HIVE, SQ00P, Map Reduce, Apache Pig, Python, Tableau, UNICA, Kibana, Informatica.
AWS Data Engineer
Confidential, NJ
Responsibilities:
- As a Data Engineer, assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
- Worked in AWS environment for development and deployment of Custom Hadoop Applications.
- Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift
- Lead architecture and design of data processing, ware housing and analytics initiatives.
- Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
- Responsible for data governance rules and standards to maintain the consistency of the business element names in the different data layers.
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
- Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
- Build a program with Python and apache beam and execute it in cloud Data flow to run Data validation between raw source file and Big query tables.
- Built the data pipelines that will enable faster, better, data-informed decision-making within the business.
- Used Rest API with Python to ingest Data from and some other site to Big Query.
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Performed Data transformations in Hive and used partitions, buckets for performance improvements.
- Optimized Hive queries to extract the customer information from HDFS.
- Strong experience in working with ELASTIC MAPREDUCE(EMR) and setting up environments on Amazon AWS EC2 instances
- Involved in scheduling Oozie workflow engine to run multiple Hive jobs.
- Continuously monitor and manage data pipeline (CI/CD) performance alongside applications from a single console with GCP.
- Developed Spark scripts by using python and bash Shell commands as per the requirement.
- Worked on POC to check various cloud offerings including Google Cloud Platform (GCP).
- Developed a POC for project migration from on prem Hadoop MapR system to GCP.
- Compared Self hosted Hadoop with respect to GCPs Data Proc, and explored Big Table (managed HBase) use cases, performance evolution.
- Write a Python program to maintain raw file archival in GCS bucket.
- Implemented business logic by writing UDFs and configuring CRON Jobs.
- Designed Google Compute Cloud Data Flow jobs that move data within a 200 PB data lake.
- Implemented scripts that load Google Big Query data and run queries to export data.
Environment: Hadoop 3.3, Spark 3.1, Python, GCP, Data Lake, GCS, HBase, Oozie, Hive, CI/CD, Big Query, Hive, Rest API, Agile Methodology, Code cloud, AWS.
Data Engineer
Confidential
Responsibilities:
- Worked as Data Engineer to collaborate with other Product Engineering team members to develop, test and support data-related initiatives.
- Developed understanding of key business, product and user questions.
- Followed Agile methodology for the entire project.
- Defined the business objectives comprehensively through discussions with business stakeholders, functional analysts and participating in requirement collection sessions.
- Provided a summary of the Project's goals, and the specific expectation of business users from BI and how it aligns with the project goals.
- Lead the estimation, review the estimates, identify the complexities and communicate to all the stakeholders.
- Responsible for data governance rules and standards to maintain the consistency of the business element names in the different data layers.
- Migrated on-primes environment on Cloud using MS Azure.
- Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
- Performed data flow transformation using the data flow activity.
- Performed ongoing monitoring, automation, and refinement of data engineering solutions.
- Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.
- Developed mapping document to map columns from source to target.
- Created azure data factory (ADF pipelines) using Azure PolyBase and Azure blob.
- Performed ETL using Azure Data Bricks.
- Wrote UNIX shell scripts to support and automate the ETL process.
- Worked on python scripting to automate generation of scripts. Data curation done using azure data bricks.
- Used stored procedure, lookup, executes pipeline, data flow, copy data, azure function features in ADF.
- Worked on Kafka to bring the data from data sources and keep it in HDFS systems for filtering.
- Created several Databricks Spark jobs with PySpark to perform several tables to table operations.
- Working on building visuals and dashboards using Power BI reporting tool.
- Providing 24/7 On-call Production Support for various applications.
Environment: Hadoop, Spark, Kafka, Azure Data Bricks, ADF, Python, PySpark, HDFS, ETL, Agile & Scrum meetings