We provide IT Staff Augmentation Services!

Data Engineer Resume

0/5 (Submit Your Rating)

Plano, TX

SUMMARY

  • Over 8+ years of experience as a Data Engineer/SQL Developer in Design, Development, Implementation and analysis.
  • Experience in Agile Software development process, Test Driven Development and Scrum.
  • Strong working experience on Amazon Web Services including S3, EC2 and EMR.
  • Experience in front - end UI development skills using HTML5, CSS3, JavaScript.
  • Hands on experience in configuring and deploying Applications using different web/application servers such as Terraform and CircleCI
  • Worked with source code version control systems like GIT for providing common platform for all the developers.
  • Experience in Data Mapping to meet the client requirement.
  • Designed, developed, and implemented customized temporary tables, queries, and reports utilizing SQL.
  • Extensive experience in Hadoop components like Pyspark, Airflow and Hive
  • Experience in Cloud data migration using AWS and Snowflake.
  • Strong Linux/Unix based experience for process management.
  • Solid experience in writing SQL queries and procedures to extract data from various source tables.
  • Knowledge on developing complex Tableau reports and dashboards.
  • Technical knowledge on Tableau Desktop
  • Experience in Data Analysis using Snow SQL on Snowflake.
  • Experience in process automation through Airflow.
  • Experience in master core functionalities such as DAGs, Operators, Tasks, Workflows etc. in Airflow.
  • Strong experience in analyzing large amounts of data sets writing Pyspark scripts and Hive queries.
  • Knowledge on AWS Glue to prepare the Data for Analysis through Automated Extract, Transform and Load (ETL) processes.
  • Knowledge on AWS Redshift Datawarehouse.
  • Extensive experience in UNIX performance monitoring and Load balancing to ensure stable performance.
  • Attended Performance Optimization training by Snowflake Team from Snowflake.
  • Working experience on creating DNS Server names through Terraform.
  • Implementation of AWS Lambda functions and integration with SNS for email notification to get the data from Sprinkle Api

TECHNICAL SKILLS

Languages: Python, SQL, PL/SQL, Snow-SQL, Shell, Bash, Terraform

Databases: SQL Server, MySQL, Snowflake, MongoDB, PostgreSQL.

Web Technologies: XML, HTML, CSS, XSLT, JSF, JavaScript, jQuery

Web Servers: Tomcat, Web Logic

Database Tools: MySQL Work bench, Snowflake

Operating Systems: Windows, Unix and Linux, Teradata, Oracle.

Development Tools: Eclipse Neon2, Intellij IDEA, STS, Visual Studio

Version Controls: GIT & CVS.

SDLC Methodologies: Agile and Waterfall

Cloud Technologies: AWS, Microsoft Azure, GCP

Visualization Tools: Power BI, Grafana and Tableau

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential, Plano, TX

Responsibilities:

  • Experience with full development cycle of a Data Warehouse, including requirements gathering, design, implementation, and maintenance.
  • Worked with management, developers, quality engineers, and product managers to gather requirements and define workflow for a new project, then implement in JIRA.
  • Improvement performance of existing ETL processes and SQL queries for weekly CRM summary data.
  • Utilized Jitterbit tool to get the data from Salesforce objects toS3.
  • Worked on Consumer data and processed using Pyspark data frames and loaded the data toS3.
  • Used Enterprise Snowflake Datawarehouse to populate the data for reporting team.
  • Responsible for Creating roles and assigning to individuals for snowflake access.
  • Used Apache Airflow extensively for orchestration purpose.
  • Created Hive tables and Okera views and used data masking of sensitive attributes in okeraviews.
  • Created and Maintained Airflow dags to automate the process of loading data to Snowflake and Hive.
  • Extensive experience on different filetypes which include Json, Csv, Text, Parquet.
  • Creation of filetypes and external stages in Snowflake
  • Used Cerberus as a Safe deposit boxes for Key management tool
  • Used CloudFormation to provision all the s3 bucket policies.
  • Minimal Experience in Salesforce to see the objects and fields in lightening console.
  • Used Bitbucket as a version control.
  • Configured Jenkins to bitbucket for the deployment purposes.

Environment: JIRA, ETL, Apache Airflow, Snowflakes, Json, Hive, Okera, Csv, Parquet HBASE, PIG, Cerberus, Bitbucket, Jitterbit, SQL, CRM.

AWS Data Engineer

Confidential, Charlotte, NC

Responsibilities:

  • Design and Develop ETL Processes in AWS Glue to migrate data from oracle data warehouse to AWSRedshift.
  • Working Experience on creating external schemas and tables in AWS Redshift on top of S3 files i.e., in Parquet
  • Data Extraction, aggregations and consolidation of data within AWS Glue using PySpark.
  • Responsible for the creation and maintenance of Role based authentication for different users with respective accesses in Redshift.
  • Creation of stored procedure in Aws Redshift which will have many steps, eventually to load the data into the target table in Curated layer from Raw layer.
  • Analyze, design and build modern data solutions using Azure PaaS service to support visualization of data.
  • Assigning and Co-ordinating Offshore team daily.
  • Used Perforce as a code repository and version management tool.
  • CreatedSourcetotargetmappingandanalyzedthekeysneedtobeusedtogetthedatafrom several tables and documented.
  • Responsible for analyzing and resolving QAdefects/bugs.
  • Designed a Data Integrated strategy mapping document which will tell all the connections and Mapping between source and target.
  • Created Job Chain definitions and established the dependencies between the tasks.
  • Used Jenkins for a build and deploy.
  • ImplementedthedataextractfromSalesforcedatabasetoAWSS3forProofofConceptand populate the data in Athena using AWS Glue crawlers.

Environment: AWS Glue, AWS Redshift, AWS Airflow,, Azure storage, AWS SQL, AWS DW, Jenkins.

Data Engineer

Confidential, Phoenix, AZ

Responsibilities:

  • Followed Agile Scrum Methodology to analyze, design and implement applications to support functional and business requirements.
  • Worked in a cross functional environment and played a significant role in ingesting and collecting data from multiple sources.
  • Managed AWS EC2 instances utilizing Fleet Management, Elastic Load Balancing for our Prod and QA.
  • Assembling large, complex data sets that meet functional business requirements.
  • Implementing and maintaining optimal data pipeline architecture
  • Responsible for loading data into S3 Buckets from Internal Server and Snowflake Data warehouse.
  • Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources,
  • Creating and writing the aggregation logic on tables in Snowflake Datawarehouse.
  • Used Circle CIAs continuous Integration tool for Integration, Build and Test purpose and for cloud-based deployments.
  • Involved in data migration to snowflake using AWS S3buckets.
  • Recreated existing Access Database objects in Snowflake and maintaining it.
  • Converted Existing SQL Mapping logic to Snow SQL.
  • Created Snowflake external stages to load and unload the data from/to AWS S3Bucket.
  • Provided specific access to the respective roles for the Tables and Stages.
  • Wrote complex SNOW SQL queries for data validation and analytical purposes.
  • Performed Data mapping for extraction and loading. Involved in Disaster Recovery, backup restore and database optimization.
  • Creation of File formats in Snowflake.
  • Perform tests and validate data flows and prepare ETL processes according to business requirements
  • Created and integrated the API with a DynamoDB backend for posting user details dynamically.
  • Consuming high volumes of data from multiple sources (Such as Hive, S3 files, Snowflake tables, xls) and performing transformations usingPyspark.
  • Created PySparkscripts to perform data analysis, Aggregation and load in Data frames and eventually to S3 in the migration process.
  • Apache Spark Data frames were used to apply business transformations and utilized Hive Context objects to perform read/write operations.
  • Used Jira and VersionOne as a ticket tracking tool
  • Created DAGS to automate the process by scheduling the jobs in Airflow using python.
  • Used GitHub and Bitbucket as a source control tool.

Environment: AWS Glue, Bitbucket, DAGS, Jira, PySpark, Snowflake, ETL, DynamoDB, API, Snow SQL, AWS S3, AWS EC2, Jenkins.

Database Developer

Confidential, Cincinnati, OHIO

Responsibilities:

  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in InAzure Databricks.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Implemented batch processing of jobs using Spark Scala API.
  • Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity.
  • Worked with a team to improve the performance and optimization of the existing algorithms in Hadoop using Spark, Spark -SQL, Data Frame.
  • Addressing the issues occurring due to the huge volume of data and transitions.
  • Designed, documented operational problems by following standards and procedures using JIRA.

Environment: Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake, Spark, Azure SQL, JSON, Hadoop, Data Frame, JIRA.

Data Analyst

Confidential

Responsibilities:

  • Wrote and analyzed SQL queries using T-SQL to obtain critical data.
  • Created and built business reports with SSRS, Microsoft Access and Excel.
  • Generated data from primary or secondary data sources and maintaining databases and data systems.
  • Scheduled jobs and alerts using SQL Server Agent and configure database mail for job failure
  • Developed Reports for business end users using Report Builder with updating Statistics.
  • Managed backup & Recovery procedure on SQL server Databases for Confidential Inc.
  • Created ETL packages with different data sources (SQL Server, Oracle, Flat files, Excel, DB2, and Teradata) and loaded the data into target tables by performing different kinds of transformations usings IS.
  • Maintained the data integrity during extraction, manipulation, processing, analysis and storage.
  • Wrote complex queries to retrieve data and ad-hoc reports from multiple tables within SQL server database.

Environment: ETL, SQL server, Oracle, Flat, DB2, Teradata, SSRS, T-SQL.

We'd love your feedback!