We provide IT Staff Augmentation Services!

Data Engineer- Aws Resume

0/5 (Submit Your Rating)

SUMMARY

  • Over 6 plus years of diversified IT professional experience in Data Modelling and Data Analysis as a proficient in gathering business requirements and handling requirements management.
  • Worked for 4 years with AWS Big data - Hadoop ecosystems in the building of Datalake.
  • Hands-on experience with Hadoop framework and its ecosystems like HDFS, MapReduce, Pig, hive, Flume, Scoop and spark.
  • Excellent ability to understand business needs and data to generate end to end business solutions
  • Good in system analysis, ER dimensional Modelling, Database design and implementing RDBMS specific features
  • Experience in various RDBMS like Oracle 9i/10g/11g, SQL server 2005/2008, DB2 UDB, Teradata
  • Extensive experience in development inT-SQL,Oracle PL/SQL scripts,stored procedures and triggers for business logic implementation.
  • Proficient in Hive query language and experienced in Hive performance optimization using static partitioning, Dynamic partitioning, bucketing and parallel execution concepts
  • Hands-on experience with Amazon EC2,Amazon S3,Amazon RDS,VPC,IAM,Amazon Elastic Load Balancing,Auto Scaling,Cloudwatch,SNS,SES,SQS,Lambda,EMR and other services of the AWS family.
  • Wrote python scripts to manage AWS resources from API calls using BOTO SDK and also worked with AWS CLI.
  • Performed data analysis and data profiling using SQL on various systems including SQL server 2008.
  • Profound experience in creating real time data streaming solutions using Pyspark/spark streaming, Kafka.
  • Worked on NoSQL databases including HBase, Cassandra and MongoDB.
  • In depth knowledge of AWS cloud services like Compute,Network,Storage,Identity and Access management.
  • Hand on experience in configuration of Network architecture on AWS with VPC,Subnets,Internet gateway.
  • Worked with JIRA for creating projects, assigning permissions to users and groups for the projects and created mail handlers and notification schemes.
  • Used Jira for ticket tracking, change management and Agile/SCRUM tool.
  • Good work experience with UNIX/LINUX commands, Scripting and deploying the applications
  • Experienced in working as part of fast paced agile teams, exposure to testing in scrum teams, TDD.

TECHNICAL SKILLS

Big Data Technologies: AWS EMR, S3, EC2-Fleet, Spark-2.2,2.0 and 1.6,Hortonworks, HDP, Hadoop, MapReduce, Pig, Hive, Apache Spark, SparkSQL, Informatica Power center 9.6. .x, Kafka, NoSQL, Elastic MapReduce (EMR), Hue, YARN,Nifi, Impala, Sqoop, Solr, OOzie.

Databases: Cloudera Hadoop CDH 15.X, Hortonworks HDP, Oracle 10g/11g, Teradata, DB2, Microsoft SQL server, MySQL, NoSQL’s databases.

Platforms (O/S): Red-Hat LINUX, Ubuntu, Windows NT/2000/XP

Programming languages: Java, Scala, SQL, UNIX shell script, JDBC, Python.

Security Management: Hortonworks Ambari, Cloudera Manager, Apache Knox, XA Secure, Kerberos.

Web-technologies: DHTML, HTML,XHTML,XML, XSL (XSLT, XPATH), XSD, CSS, Javascript, SOAP, RESTful, Agile, Design Patterns.

Data warehousing: Informatica PowerCenter/ Power Mart/ Data Quality/ Big Data, Pentaho, ETL development, Amazon Redshift, IDQ.

Database tools: JDBC, HADOOP, Hive, NoSQL, SQL Navigator, SQL developer, TOAD, SQL plus, SAP Business Objects.

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer- AWS

Responsibilities:

  • Responsible for the execution of Big Data Analytics, predictive analysis and machine learning initiatives.
  • Implemented a proof of concept deploying this product in AWS S3 bucket and Snowflake.
  • Utilize AWS services with focus on Big Data architect/ analytics/ enterprise Data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision making
  • Developed Scala scripts, UDF’s using both data frames/SQL and RDD in spark for data aggregation, queries and writing back into S3 bucket.
  • Experience in Data cleansing and Data mining
  • Using Spark Streaming to divide streaming data into batches as an input to spark for batch processing
  • Wrote Spark applications for data validation, cleansing, transformation and custom aggregation and used Spark Engine,Spark SQL for data analysis and passing the data to Data scientists for further analysis.
  • Prepared scripts to automate the ingestion process using Python and Scala as needed through various sources such as API, AWS S3, Teradata and Snowflake
  • Working on Snowflake Schemas and Data Warehousing and processed batch and streaming data load pipeline using snow pipe and Matillion from data lake confidential AWS S3.
  • Profile structured,unstructured and semi structured data across various sources to identify patterns in data and Implement data quality metrics using necessary queries python scripts based on source.
  • Install and configure Apache Airflow for S3 bucket and Snowflake data warehouse and created dags.
  • Created DAG to use the Email operator,Bash Operator and spark Livy operator to execute and in EC2.

Environment: Agilescrum,Mapreduce,Snowflake,Pig,Spark,Scala,Hive,Kafka,Python,Airflow,JSON,Parquet,CSV,Codecloud,AWS.

Confidential

Data Modeler/ Data Analyst

Responsibilities:

  • Working as part of a data management/ digital execution team responsible for establishing an engineering data lake with both raw and modelled data.
  • Predominantly involved in understanding the requirements from various business partners and collaborating with data owners and subject matter experts.
  • Designing and developing code, scripts and data pipelines that leverage structured and unstructured data integrated from multiple sources.
  • Creating a data view as per the business requirements and creating the reports
  • Involved in moving the manufacturing data of the organisation from different sources into a Snowflake environment
  • Interface with business professionals, application developers and technical staff working in an agile process and environment
  • Develop technical standards, procedures and guidelines used to govern data models
  • Create and maintain technical documentation, architecture designs and data flow diagrams
  • Working in an agile environment to complete multiple project tasks during the same sprint
  • Converting the SAP data fields into Business understanding fields in the Snowflake Environment
  • Extensively used Dbeaver, Snowflake, Azure Devops on a daily basis.

Environment: Snowflake, Oracle 12/19c. SAP, Teradata, SQL, Azure devops, Debeaver, MSExcel, MS word, Power BI

Confidential

ETL Developer

Responsibilities:

  • Complete analysis, requirement gathering and function design document creation
  • Collaborated with the application developers in data modeling and E-R design of the systems
  • Creating and managing schema objects such as tables, views, Indexes and referential integrity based on the user requirements
  • Used DDL and DML commands for writing triggers, stored procedures and Data manipulation.
  • Designed and built up high performance ETL packages to migrate and manipulate data from MS Excel and SQL server by using SSIS
  • Involved in tuning system stored procedures for better performance
  • Developed impactful reports using SSRS and MS Excel to solve the business requirements
  • Create and manage reports subscriptions and schedules SSRS reports
  • Worked with the application developers and provide necessary SQL Scripts using SQL and PLSQL
  • Develop and deploy SSIS packages, configuration files, and schedules job to run the packages to generate data in CSV files
  • Generate periodic reports based on the statistical analysis of the data using SQL Server reporting services (SSRS)
  • Responsible for the management of the database performance, backup, replication, capacity and security
  • Experience in working creating Multidimensional cubes using SSAS.

Environment: MS SQL Server 2012/2014, T-SQL, SQL server Management studio (SSMS), SQL Profiler, Visual studio 2005, SSIS, SSRS, Windows XP, TFS, TWS

We'd love your feedback!