Data Engineer- AWS Resume

SUMMARY

Over 6 plus years of diversified IT professional experience in Data Modelling and Data Analysis as a proficient in gathering business requirements and handling requirements management.
Worked for 4 years with AWS Big data - Hadoop ecosystems in the building of Datalake.
Hands-on experience with Hadoop framework and its ecosystems like HDFS, MapReduce, Pig, hive, Flume, Scoop and spark.
Excellent ability to understand business needs and data to generate end to end business solutions
Good in system analysis, ER dimensional Modelling, Database design and implementing RDBMS specific features
Experience in various RDBMS like Oracle 9i/10g/11g, SQL server 2005/2008, DB2 UDB, Teradata
Extensive experience in development inT-SQL,Oracle PL/SQL scripts,stored procedures and triggers for business logic implementation.
Proficient in Hive query language and experienced in Hive performance optimization using static partitioning, Dynamic partitioning, bucketing and parallel execution concepts
Hands-on experience with Amazon EC2,Amazon S3,Amazon RDS,VPC,IAM,Amazon Elastic Load Balancing,Auto Scaling,Cloudwatch,SNS,SES,SQS,Lambda,EMR and other services of the AWS family.
Wrote python scripts to manage AWS resources from API calls using BOTO SDK and also worked with AWS CLI.
Performed data analysis and data profiling using SQL on various systems including SQL server 2008.
Profound experience in creating real time data streaming solutions using Pyspark/spark streaming, Kafka.
Worked on NoSQL databases including HBase, Cassandra and MongoDB.
In depth knowledge of AWS cloud services like Compute,Network,Storage,Identity and Access management.
Hand on experience in configuration of Network architecture on AWS with VPC,Subnets,Internet gateway.
Worked with JIRA for creating projects, assigning permissions to users and groups for the projects and created mail handlers and notification schemes.
Used Jira for ticket tracking, change management and Agile/SCRUM tool.
Good work experience with UNIX/LINUX commands, Scripting and deploying the applications
Experienced in working as part of fast paced agile teams, exposure to testing in scrum teams, TDD.

TECHNICAL SKILLS

Big Data Technologies: AWS EMR, S3, EC2-Fleet, Spark-2.2,2.0 and 1.6,Hortonworks, HDP, Hadoop, MapReduce, Pig, Hive, Apache Spark, SparkSQL, Informatica Power center 9.6. .x, Kafka, NoSQL, Elastic MapReduce (EMR), Hue, YARN,Nifi, Impala, Sqoop, Solr, OOzie.

Databases: Cloudera Hadoop CDH 15.X, Hortonworks HDP, Oracle 10g/11g, Teradata, DB2, Microsoft SQL server, MySQL, NoSQL’s databases.

Platforms (O/S): Red-Hat LINUX, Ubuntu, Windows NT/2000/XP

Programming languages: Java, Scala, SQL, UNIX shell script, JDBC, Python.

Security Management: Hortonworks Ambari, Cloudera Manager, Apache Knox, XA Secure, Kerberos.

Web-technologies: DHTML, HTML,XHTML,XML, XSL (XSLT, XPATH), XSD, CSS, Javascript, SOAP, RESTful, Agile, Design Patterns.

Data warehousing: Informatica PowerCenter/ Power Mart/ Data Quality/ Big Data, Pentaho, ETL development, Amazon Redshift, IDQ.

Database tools: JDBC, HADOOP, Hive, NoSQL, SQL Navigator, SQL developer, TOAD, SQL plus, SAP Business Objects.

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer- AWS

Responsibilities:

Responsible for the execution of Big Data Analytics, predictive analysis and machine learning initiatives.
Implemented a proof of concept deploying this product in AWS S3 bucket and Snowflake.
Utilize AWS services with focus on Big Data architect/ analytics/ enterprise Data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision making
Developed Scala scripts, UDF’s using both data frames/SQL and RDD in spark for data aggregation, queries and writing back into S3 bucket.
Experience in Data cleansing and Data mining
Using Spark Streaming to divide streaming data into batches as an input to spark for batch processing
Wrote Spark applications for data validation, cleansing, transformation and custom aggregation and used Spark Engine,Spark SQL for data analysis and passing the data to Data scientists for further analysis.
Prepared scripts to automate the ingestion process using Python and Scala as needed through various sources such as API, AWS S3, Teradata and Snowflake
Working on Snowflake Schemas and Data Warehousing and processed batch and streaming data load pipeline using snow pipe and Matillion from data lake confidential AWS S3.
Profile structured,unstructured and semi structured data across various sources to identify patterns in data and Implement data quality metrics using necessary queries python scripts based on source.
Install and configure Apache Airflow for S3 bucket and Snowflake data warehouse and created dags.
Created DAG to use the Email operator,Bash Operator and spark Livy operator to execute and in EC2.

Environment: Agilescrum,Mapreduce,Snowflake,Pig,Spark,Scala,Hive,Kafka,Python,Airflow,JSON,Parquet,CSV,Codecloud,AWS.

Confidential

Data Modeler/ Data Analyst

Responsibilities:

Working as part of a data management/ digital execution team responsible for establishing an engineering data lake with both raw and modelled data.
Predominantly involved in understanding the requirements from various business partners and collaborating with data owners and subject matter experts.
Designing and developing code, scripts and data pipelines that leverage structured and unstructured data integrated from multiple sources.
Creating a data view as per the business requirements and creating the reports
Involved in moving the manufacturing data of the organisation from different sources into a Snowflake environment
Interface with business professionals, application developers and technical staff working in an agile process and environment
Develop technical standards, procedures and guidelines used to govern data models
Create and maintain technical documentation, architecture designs and data flow diagrams
Working in an agile environment to complete multiple project tasks during the same sprint
Converting the SAP data fields into Business understanding fields in the Snowflake Environment
Extensively used Dbeaver, Snowflake, Azure Devops on a daily basis.

Environment: Snowflake, Oracle 12/19c. SAP, Teradata, SQL, Azure devops, Debeaver, MSExcel, MS word, Power BI

Confidential

ETL Developer

Responsibilities:

Complete analysis, requirement gathering and function design document creation
Collaborated with the application developers in data modeling and E-R design of the systems
Creating and managing schema objects such as tables, views, Indexes and referential integrity based on the user requirements
Used DDL and DML commands for writing triggers, stored procedures and Data manipulation.
Designed and built up high performance ETL packages to migrate and manipulate data from MS Excel and SQL server by using SSIS
Involved in tuning system stored procedures for better performance
Developed impactful reports using SSRS and MS Excel to solve the business requirements
Create and manage reports subscriptions and schedules SSRS reports
Worked with the application developers and provide necessary SQL Scripts using SQL and PLSQL
Develop and deploy SSIS packages, configuration files, and schedules job to run the packages to generate data in CSV files
Generate periodic reports based on the statistical analysis of the data using SQL Server reporting services (SSRS)
Responsible for the management of the database performance, backup, replication, capacity and security
Experience in working creating Multidimensional cubes using SSAS.

Environment: MS SQL Server 2012/2014, T-SQL, SQL server Management studio (SSMS), SQL Profiler, Visual studio 2005, SSIS, SSRS, Windows XP, TFS, TWS

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship