We provide IT Staff Augmentation Services!

Senior Aws Data Engineer Resume

5.00/5 (Submit Your Rating)

Charlotte, NC

SUMMARY

  • Data Engineering professional wif solid foundational skills and proven tracks of implementation in a variety of data platforms. Self - motivated wif a strong adherence to personal accountability in both individual and team scenarios.
  • Over 8 years of experience in Data Engineering, Data Pipeline Design, Development and Implementation as a Sr. Data Engineer/Data Developer and Data Modeler.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Strong experience in writing scripts using Python API, PySpark API and Spark API for analyzing teh data.
  • Extensively used Python Libraries PySpark, Pytest, Pymongo, cxOracle, PyExcel, Boto3, Psycopg, embedPy, NumPy and Beautiful Soup.
  • Snowflake SQL Writing SQL queries against Snowflake Developing scripts Unix, Python, etc. to do Extract, Load, and Transform data
  • Hands-on use of Spark and Scala API's to compare teh performance of Spark wif Hive and SQL, and Spark SQL to manipulate Data Frames in Scala.
  • Expertise in Python and Scala, user-defined functions (UDF) for Hive and Pig using Python.
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing teh big data as per teh requirement.
  • Hands on Spark MLlib utilities such as including classification, regression, clustering, collaborative filtering, dimensionality reduction.
  • Experience in working wif Flume and NiFi for loading log files into Hadoop.
  • Experience in working wif NoSQL databases like HBase and Cassandra.
  • Experienced in creating shell scripts to push data loads from various sources from teh edge nodes onto teh HDFS.
  • Good Experience in implementing and orchestrating data pipelines using Oozie and Airflow.
  • Provide production support for existing products that include SSIS, SQL Server, stored Procedures, interim data marts, AWS, Snowflake.
  • Expert in developing SSIS/DTS Packages to extract, transform and load (ETL) data into data warehouse/ data marts from heterogeneous sources.
  • Good working knowledge of Amazon Web Services (AWS) Cloud Platform which includes services like EC2, S3, VPC, ELB, IAM, DynamoDB, Cloud Front, Cloud Watch, Route 53, Elastic Beanstalk (EBS), Auto Scaling, Security Groups, EC2 Container Service (ECS), Code Commit, Code Pipeline, Code Build, Code Deploy, atana, Auto Scaling, Security Groups, Red shift, CloudWatch, CloudFormation, CloudTrail, Ops Works, Kinesis, IAM, SQS, SNS, SES.
  • Experience in Data Analysis, Data Profiling, Data Integration, Migration, Data governance and Metadata Management, Master Data Management and Configuration Management.
  • Experience in developing customized UDF's in Python to extend Hive and Pig Latin functionality.
  • Expertise in designing complex Mappings and has expertise in performance tuning and slowly changing Dimension Tables and Fact tables .
  • Extensively worked wif Teradata utilities Fast export, and Multi Load to export and load data to/from different source systems including flat files.
  • Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and Mongo DB using Python.
  • Proficiency in SQL across several dialects (we commonly write MySQL, PostgreSQL, Redshift, SQL Server, and Oracle)
  • Developed SQL queries SnowSQL, SnowPipe and Big Data model techniques using Python
  • Expert in building Enterprise Data Warehouse or Data warehouse appliances from Scratch using both Kimball and Inmon's Approach.
  • Experience in designing star schema, Snowflake schema for Data Warehouse, ODS architecture.
  • Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features.
  • Well experience in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
  • Good knowledge of Data Marts, OLAP, Dimensional Data Modeling wif Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
  • Strong analytical and problem-solving skills and teh ability to follow through wif projects from inception to completion.
  • Ability to work effectively in cross-functional team environments, excellent communication, and interpersonal skills.
  • Oracle Agile PLM ACP and Data Migration Expert.
  • Oracle Agile PLM installation and up gradation Expert

TECHNICAL SKILLS

Big Data Tools: Hadoop Ecosystem Map Reduce, Spark 2.3, Airflow 1.10.8, Nifi 2, HBase 1.2, Hive 2.3, Pig 0.17 Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hadoop 3.0

Programming Languages: Python, Scala, SQL, MVS, TSO/ISPF, VB, VTAM, Korn shell scripting

Cloud Technologies: AWS, Azure,GCP

Databases: Oracle, SQL Server, MySQL, NoSQL, PostgreSQL, Microsoft Access, Oracle Querying PL/SQL

Data Warehouses: Snowflake, Big Query,Netezza

Version Tools: GIT, SVN

ETL/Reporting: Informatic, Tableau, PowerBI

PROFESSIONAL EXPERIENCE

Senior AWS Data Engineer

Confidential, Charlotte,NC

Responsibilities:

  • Responsible for sessions wif business, project manager, Business Analyst, and other key people to understand teh business needs and propose a solution from a Warehouse standpoint.
  • Designed teh ER diagrams, logical model (relationship, cardinality, attributes, and candidate keys) and physical database (capacity planning, object creation and aggregation strategies) for Oracle and Teradata as per business requirements using ER Studio.
  • Importing teh Data using Sqoop from various source systems like Mainframes, Oracle, MySQL, DB2 etc., to Data Lake Raw Zone.
  • Responsible for developing data pipeline wif Amazon AWS to extract teh data from weblogs and store in Amazon EMR and worked wif cloud-based technology like Redshift, S3, AWS, EC2 Machine, etc. and extracting teh data from teh Oracle financials and teh Redshift database.
  • Created data pipelines for different events to load teh data from DynamoDB to AWS S3 bucket and tan into HDFS and delivered high success metrics.
  • Used AWS Lambda to perform data validation, filtering, sorting, or other transformations for every data change in a DynamoDB table and load teh transformed data to another data store wif heavy user experience.
  • Worked on Amazon Redshift and AWS kinesis data, create data models and extracted Meta Data from Amazon Redshift, AWS, and Elastic Search engine using SQL queries to create reports. Developed SQL queries SnowSQL, SnowPipe and Big Data model techniques using Python
  • Experience in designing star schema, Snowflake schema for Data Warehouse, ODS architecture.
  • Migrate existing architecture to Amazon Web Services and utilize several technologies like Kinesis, RedShift, AWS Lambda, Cloud watch metrics and Query in Amazon Atana wif teh alerts coming from S3 buckets and finding out teh alerts generation difference from teh Kafka cluster and Kinesis cluster.
  • Experienced in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling and SCD (Slowly changing dimension).
  • Worked on importing and exporting data from snowflake, Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.
  • Worked on implementing microservices on Kubernetes Cluster and Configured Operators on Kubernetes applications and all its components, such as Deployments, Config Maps, Secrets and Services.
  • Used filters, quick filters, sets, parameters and calculated fields on Tableau and Power BI reports.

Senior Azure Data Engineer

Confidential, Michigan

Responsibilities:

  • Experienced wif Cloud Service Providers such as Azure.
  • Migrated SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks, Azure SQL Data warehouse and controlling and granting database access and Migration On-premise databases to Azure Data lake store using Azure Data factory.
  • Experience in Developing Spark applications using Spark/PySpark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming teh data to uncover insights into teh customer usage, consumption patterns, and behavior.
  • Skilled dimensional modeling, forecasting using large-scale datasets (Star schema, Snowflake schema), transactional modeling, and SCD (Slowly changing dimension).
  • Developed scripts to transfer data from FTP server to teh ingestion layer using Azure CLI commands.
  • Including but not limited to: data cleaning/scrubbing, importing large amounts of data for historical purposes, updating employee information in each individual clients' database, merging duplicate employees based upon demographic information and audiometric threshold similarities, moving data from one company to another due to acquisition, converting sensitive data such as ssn's into unique identifiers, utilizing and creating stored procedures for ssn conversion purposes, creating SSIS packages for automatic processing of large client files, soliciting for and receiving waivers to request historical data from other vendors on behalf of teh client. Utilized Access and SQL to process HIPAA protected data.
  • Created Azure HD Insights cluster using PowerShell scripts to automate teh process.
  • Used stored procedure, lookup, execute teh pipeline, data flow, copy data, Azure function features in ADF.
  • Used Azure Data Lake storage gen2 to store excel files, parquet files and retrieve user data using Blob API.
  • Worked on Azure data bricks, PySpark, Spark SQL, Azure ADW, and Hive used to load and transform data.
  • Collaborated wif internal teams and respective stakeholders to understand user requirements and implement technical solutions.
  • Used Terraform to successfully deploy Azure Infrastructure using Terraform via an Azure DevOps Pipeline.
  • Teh ability to deploy, destroy, redeploy is made very simple wif teh use of a ‘tfstate’ file. dis enables Terraform to know teh ‘state’ since teh last deployment and only implement teh changes implied by a code update.
  • Azure data lake, Azure Blob used for storage and performed analytics in Azure Synapse Analytics.
  • 1+ years of experience inAzure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos NO SQL DB, Azure HDInsightBig Data Technologies (Hadoop and Apache Spark) andData bricks.
  • Experience in designingAzure Cloud Architectureand Implementation plans for hosting complex application workloads on MS Azure.
  • Ingested data from RDBMS and performed data transformations, and tan export teh transformed data to Cassandra as per teh business requirement.
  • Managed teh data lakes data movements involving Hadoop, NO-SQL databases like HBase, Cassandra.
  • Written multiple MapReduce programs for data extraction, transformation, and aggregation from numerous file formats, including XML, JSON, CSV & other compressed file formats.
  • We has developed automated processes for flattening teh upstream data from Cassandra, which in JSON format.

Data Engineer

Confidential, Bothell, WA

Responsibilities:

  • Experienced in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data Ingestion and transformation in AWS and Spark.
  • Leveraged cloud and GPU computing technologies for automated machine learning and analytics pipelines, such as AWS
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization, performed Gap analysis provide feedback to teh business team to improve teh software delivery.
  • Data Mining wif large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization on provider, member, claims, and service fund data.
  • Involved in Developing a RESTful API's (Microservices) using Python Flask framework that is packaged in Docker and deployed in Kubernetes using Jenkins Pipelines.
  • Experience in building and architecting multiple Data pipelines, end to end ETL, and ELT processes for Data ingestion and transformation in Pyspark.
  • Created reusable Rest API's that exposed data blended from a variety of data sources by reliably gathering requirements from businesses directly.
  • Worked on teh development of Data Warehouse, a Business Intelligence architecture that involves data integration and teh conversion of data from multiple sources and platforms.
  • Responsible for full data loads from production to AWS Redshift staging environment and worked on migrating EDW to AWS using EMR and various other technologies.
  • Experience in Creating, Scheduling, and Debugging Spark jobs using Python. Performed Data Analysis, Data Migration, Transformation, Integration, Data Import, and Data Export through Python.
  • Gathered and processed raw data at scale (including writing scripts, web scraping, calling APIs, writing SQL queries, and writing applications).
  • Creating reusable Python scripts to ensure data integrity between teh source (Teradata/Oracle) and target system (Snowflake/Redshift).
  • Migrated on-premise database structure to Confidential Redshift data warehouse.
  • Created data pipelines for different events to load teh data from DynamoDB to AWS S3 bucket and tan into HDFS and delivered high success metrics.
  • Implemented for authoring, scheduling, and monitoring Data Pipelines using Scala and spark.
  • Experience in buildingSnow pipe, In-depth knowledge ofData Sharingin SnowflakeDatabase, Schema and Tablestructures.
  • Exploring DAG's, their dependencies and logs using Airflow pipelines for automation wif a creative approach.
  • Designed and implemented a fully operational production grade largescale data solution on Snowflake.
  • Developed and designed system to collect data from multiple platforms using Kafka and tan process it using spark.
  • Created modules for spark streaming in data into Data Lake using Spark and Worked wif different feeds data like JSON, CSV, XML and implemented Data Lake concept.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet teh business requirements and developed Map Reduce programs to cleanse teh data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive. .

Hadoop Developer

Confidential

Responsibilities:

  • Setup Hadoop cluster on Amazon EC2 using whirr for POC.
  • Worked on analysing Hadoop cluster and different big data analytic tools including Pig Hbase database and Sqoop Responsible for building scalable distributed data solutions using Hadoop Installed and configured Flume Hive Pig Sqoop HBase on teh Hadoop cluster.
  • Managing and scheduling Jobs on a Hadoop cluster.
  • Implemented nine nodes CDH3 Hadoop cluster on Red hat LINUX.
  • Worked on installing cluster commissioning decommissioning of datanode namenode recovery capacity planning and slots configuration.
  • Resource management of HADOOP Cluster including adding/removing cluster nodes for maintenance and capacity needs Involved in loading data from UNIX file system to HDFS.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Implemented best income logic using Pig scripts.
  • Implemented test scripts to support test driven development and continuous integration.
  • Responsible to manage data coming from different sources.
  • Installed and configured Hive and also written Hive UDFs.
  • Experienced on loading and transforming of large sets of structured semi structured and unstructured data.
  • Cluster coordination services through Zookeeper.
  • Experience in managing and reviewing Hadoop log files.
  • Exported teh analysed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team.

Environment: - Hadoop, HDFS, Hive, Flume HBase Sqoop PIG, MySQL, Ubuntu Zookeeper Amazon EC2 SOLR

We'd love your feedback!