Data Engineer- Aws Resume
SUMMARY
- Over 6 plus years of diversified IT professional experience in Data Modelling and Data Analysis as a proficient in gathering business requirements and handling requirements management.
- Worked for 4 years with AWS Big data - Hadoop ecosystems in the building of Datalake.
- Hands-on experience with Hadoop framework and its ecosystems like HDFS, MapReduce, Pig, hive, Flume, Scoop and spark.
- Excellent ability to understand business needs and data to generate end to end business solutions
- Good in system analysis, ER dimensional Modelling, Database design and implementing RDBMS specific features
- Experience in various RDBMS like Oracle 9i/10g/11g, SQL server 2005/2008, DB2 UDB, Teradata
- Extensive experience in development inT-SQL,Oracle PL/SQL scripts,stored procedures and triggers for business logic implementation.
- Proficient in Hive query language and experienced in Hive performance optimization using static partitioning, Dynamic partitioning, bucketing and parallel execution concepts
- Hands-on experience with Amazon EC2,Amazon S3,Amazon RDS,VPC,IAM,Amazon Elastic Load Balancing,Auto Scaling,Cloudwatch,SNS,SES,SQS,Lambda,EMR and other services of the AWS family.
- Wrote python scripts to manage AWS resources from API calls using BOTO SDK and also worked with AWS CLI.
- Performed data analysis and data profiling using SQL on various systems including SQL server 2008.
- Profound experience in creating real time data streaming solutions using Pyspark/spark streaming, Kafka.
- Worked on NoSQL databases including HBase, Cassandra and MongoDB.
- In depth knowledge of AWS cloud services like Compute,Network,Storage,Identity and Access management.
- Hand on experience in configuration of Network architecture on AWS with VPC,Subnets,Internet gateway.
- Worked with JIRA for creating projects, assigning permissions to users and groups for the projects and created mail handlers and notification schemes.
- Used Jira for ticket tracking, change management and Agile/SCRUM tool.
- Good work experience with UNIX/LINUX commands, Scripting and deploying the applications
- Experienced in working as part of fast paced agile teams, exposure to testing in scrum teams, TDD.
TECHNICAL SKILLS
Big Data Technologies: AWS EMR, S3, EC2-Fleet, Spark-2.2,2.0 and 1.6,Hortonworks, HDP, Hadoop, MapReduce, Pig, Hive, Apache Spark, SparkSQL, Informatica Power center 9.6. .x, Kafka, NoSQL, Elastic MapReduce (EMR), Hue, YARN,Nifi, Impala, Sqoop, Solr, OOzie.
Databases: Cloudera Hadoop CDH 15.X, Hortonworks HDP, Oracle 10g/11g, Teradata, DB2, Microsoft SQL server, MySQL, NoSQL’s databases.
Platforms (O/S): Red-Hat LINUX, Ubuntu, Windows NT/2000/XP
Programming languages: Java, Scala, SQL, UNIX shell script, JDBC, Python.
Security Management: Hortonworks Ambari, Cloudera Manager, Apache Knox, XA Secure, Kerberos.
Web-technologies: DHTML, HTML,XHTML,XML, XSL (XSLT, XPATH), XSD, CSS, Javascript, SOAP, RESTful, Agile, Design Patterns.
Data warehousing: Informatica PowerCenter/ Power Mart/ Data Quality/ Big Data, Pentaho, ETL development, Amazon Redshift, IDQ.
Database tools: JDBC, HADOOP, Hive, NoSQL, SQL Navigator, SQL developer, TOAD, SQL plus, SAP Business Objects.
PROFESSIONAL EXPERIENCE
Confidential
Data Engineer- AWS
Responsibilities:
- Responsible for the execution of Big Data Analytics, predictive analysis and machine learning initiatives.
- Implemented a proof of concept deploying this product in AWS S3 bucket and Snowflake.
- Utilize AWS services with focus on Big Data architect/ analytics/ enterprise Data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision making
- Developed Scala scripts, UDF’s using both data frames/SQL and RDD in spark for data aggregation, queries and writing back into S3 bucket.
- Experience in Data cleansing and Data mining
- Using Spark Streaming to divide streaming data into batches as an input to spark for batch processing
- Wrote Spark applications for data validation, cleansing, transformation and custom aggregation and used Spark Engine,Spark SQL for data analysis and passing the data to Data scientists for further analysis.
- Prepared scripts to automate the ingestion process using Python and Scala as needed through various sources such as API, AWS S3, Teradata and Snowflake
- Working on Snowflake Schemas and Data Warehousing and processed batch and streaming data load pipeline using snow pipe and Matillion from data lake confidential AWS S3.
- Profile structured,unstructured and semi structured data across various sources to identify patterns in data and Implement data quality metrics using necessary queries python scripts based on source.
- Install and configure Apache Airflow for S3 bucket and Snowflake data warehouse and created dags.
- Created DAG to use the Email operator,Bash Operator and spark Livy operator to execute and in EC2.
Environment: Agilescrum,Mapreduce,Snowflake,Pig,Spark,Scala,Hive,Kafka,Python,Airflow,JSON,Parquet,CSV,Codecloud,AWS.
Confidential
Data Modeler/ Data Analyst
Responsibilities:
- Working as part of a data management/ digital execution team responsible for establishing an engineering data lake with both raw and modelled data.
- Predominantly involved in understanding the requirements from various business partners and collaborating with data owners and subject matter experts.
- Designing and developing code, scripts and data pipelines that leverage structured and unstructured data integrated from multiple sources.
- Creating a data view as per the business requirements and creating the reports
- Involved in moving the manufacturing data of the organisation from different sources into a Snowflake environment
- Interface with business professionals, application developers and technical staff working in an agile process and environment
- Develop technical standards, procedures and guidelines used to govern data models
- Create and maintain technical documentation, architecture designs and data flow diagrams
- Working in an agile environment to complete multiple project tasks during the same sprint
- Converting the SAP data fields into Business understanding fields in the Snowflake Environment
- Extensively used Dbeaver, Snowflake, Azure Devops on a daily basis.
Environment: Snowflake, Oracle 12/19c. SAP, Teradata, SQL, Azure devops, Debeaver, MSExcel, MS word, Power BI
Confidential
ETL Developer
Responsibilities:
- Complete analysis, requirement gathering and function design document creation
- Collaborated with the application developers in data modeling and E-R design of the systems
- Creating and managing schema objects such as tables, views, Indexes and referential integrity based on the user requirements
- Used DDL and DML commands for writing triggers, stored procedures and Data manipulation.
- Designed and built up high performance ETL packages to migrate and manipulate data from MS Excel and SQL server by using SSIS
- Involved in tuning system stored procedures for better performance
- Developed impactful reports using SSRS and MS Excel to solve the business requirements
- Create and manage reports subscriptions and schedules SSRS reports
- Worked with the application developers and provide necessary SQL Scripts using SQL and PLSQL
- Develop and deploy SSIS packages, configuration files, and schedules job to run the packages to generate data in CSV files
- Generate periodic reports based on the statistical analysis of the data using SQL Server reporting services (SSRS)
- Responsible for the management of the database performance, backup, replication, capacity and security
- Experience in working creating Multidimensional cubes using SSAS.
Environment: MS SQL Server 2012/2014, T-SQL, SQL server Management studio (SSMS), SQL Profiler, Visual studio 2005, SSIS, SSRS, Windows XP, TFS, TWS