We provide IT Staff Augmentation Services!

Data Engineer Resume

5.00/5 (Submit Your Rating)

OH

SUMMARY

  • Around 6 years of experience working with big data, primarily using the Hadoop framework and PySpark for data analysis, transformations, deployment, and ingestion. Knowledgeable about AWS Data Pipelines, data structures, and processing systems. PySpark, Python, SQL, and Hive are used for data mining, cleaning, and munging.
  • Extensive experience optimizing Spark performance with concepts such as persist, cache, broadcast, and efficient joins.
  • Experience in improving the performance and optimizing existing algorithms in Hadoop by working with Spark - SQL, Spark Context, Pair RDDs, Data Frames, YARN and in-memory processing frameworks such as Spark Transformations, SparkQL, Spark Streaming.
  • Hands on experience in writing PySpark Scripts to process streaming data from data lakes using Spark Streaming, and PySpark Pipelines were built to process big data.
  • Experience on handling different file formats like CSV, xml, log ORC, AVRO, PARQUET, Sequential files, MAP Files and RC.
  • Experience in working on Spark using python on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle/Snowflake.
  • Experience with the Hive data warehouse tool, including the creation of tables, partitioning and bucketing of data, and the development and optimization of HiveQL queries.
  • Hands on experience in usingAWS Kinesis, Lambda and Dynamo DB to implementreal time data streaming pipeline and deployedAWS Lambda codefrom AWS S3 buckets.
  • Extensive experience with data wrangling and numerical computation tools such as Pandas and Numpy.
  • Experience in creating tables, dropping and altered at run time without blocking updates and queries using Spark and Hive.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs .
  • Experienced with cloud platforms like AWS(Amazon Web Services), Azure, Databricks (both on Azure as well as AWS integration of Databricks)
  • Data modeling experience with the Star schema, Snowflake schema, and transactional modeling.
  • Experience on Migrating SQL database toAzure data Lake, Azure data lake Analytics,Azure SQL Database, Data BricksandAzure SQL Data warehouseand Controlling and granting database accessandMigrating On premise databases toAzure Data lake storeusing Azure Data factory.
  • Orchestration experience using Azure Data Factory, Airflow on multiple cloud platforms and able to understand the process of leveraging the Airflow Operators.
  • Scheduled Airflow DAGs to run multiple Hive jobs, which independently run with time and data availability.
  • Hands on experience on Data Analytics Services such as Athena, Glue Data Catalog & Quick Sight.
  • Addressing complex POCs according to business requirements from the technical end.
  • Active Agile team player in Production support, Hotfix deployment, Code Reviews, System Design & Review, Test cases, Sprint planning and Demos.
  • Effectively communicate with business units and stake holders and provide strategic solutions according to the client’s requirements.
  • Well versed with Agile with SCRUM, Waterfall Model and Test-driven Development (TDD) methodologies.

TECHNICAL SKILLS

Programming Languages: Python, SQL, PLSQL

Bigdata tools: Spark, Hive, Sqoop, Kafka, YARN, HBase

Technical Databases: Oracle, Teradata, SQL Datawarehouse, Azure, Databricks

BI Visualization Tools: Tableau, Power BI, Birst BI

ETL Tools: ADF, Informatica, SSIS

Cloud platforms: Azure, AWS

Scheduling Tools: Airflow, Oozie

IDE s: PyCharm, Jupiter, IntelliJ, Visual Studio

PROFESSIONAL EXPERIENCE

Confidential, OH

Data Engineer

Responsibilities:

  • Created AWS S3 buckets and managed policies for S3 buckets and Glacier for storage and backup on AWS.
  • Collecting data from edge device databases, exporting it in CSV format, and storing it in AWS S3 buckets.
  • Using PySpark, created data processing tasks like reading data from external sources, merging the obtained data, performing data enrichment, and loading into data warehouses.
  • Using PySpark, performed transformations and actions on imported data from AWS S3.
  • Created Lambda functions to create ad-hoc tables in S3 to add schema and structure to data, and performed data validation, filtering, sorting, and transformations for every data change in a Dynamo DB table, before loading the transformed data into PostgreSQL.
  • Worked on Python APIs calls and landed data to S3 from external sources.
  • Scheduled Airflow DAGs to export data to AWS S3 buckets by triggering an AWS lambda function.
  • Developed robust and scalable data integration pipelines to transfer data from an S3 bucket to a Redshift database using Python and AWS Glue.
  • Developed Spark-based real-time data ingestion and real-time analytics, as well as AWS Lambda functions to power the system's real-time monitoring dashboards.
  • Worked with Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON formatted data into snowflake table.
  • Implemented Data warehouse solution consisting of ETLs, On - premise to Cloud Migration and good expertise building and deploying batch and streaming data pipelines on cloud environment.
  • Used Tableau for Visualization charts etc. and regularly communicating finding with Product Owners.
  • Worked on Tableau Visualization Charts and daily status Dashboards.
  • Demonstrated good communication skills and story narratives while Sprint Demos to leadership and Stake holders.

Environment: PySpark, Hive, Python, AWS, S3, Airflow, SQL, Excel, Python 3, Spark SQL, Redshift, ETL/ELT, AWS Glue, Tableau

Confidential

Data Engineer

Responsibilities:

  • Worked with extensive data sets in Big Data to uncover pattern, problem & unleash value for the Enterprise.
  • Worked with internal and external data sources on improving data accuracy / coverage and generate recommendation on the process flow to accomplish the goal.
  • Developed PySpark Scripts to process streaming data from data lakes using Spark Streaming.
  • Using PySpark, created data processing pipelines that read data from external sources, merge the obtained data, perform data enrichment, and load into data warehouses. User Defined Functions in PySpark expand the capabilities of Data Frames.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, Spark SQL . Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in InAzure Databricks.
  • Primarily involved in Data Migration usingSQL, SQL Azure, Azure Storage,andAzure Data Factory, SSIS, PowerShell.
  • Consume real-time events from Kafka streams and persist them to HDFS with Parquet.
  • Using the Spark Data Frame API, performed transformations, cleaning, and filtering on the imported data before loading it into Hive.
  • Implemented and created Hive scripts for transformations such as aggregation, evaluation, and filtering.
  • Developed sophisticated Hive queries to extract key performance indicators by joining various tables, and performed data analysis on Hive tables using HiveQL.
  • Worked extensively on hive to analyze the data and create reports for data quality.
  • Written Hive queries for data analysis to meet the business requirements and Designed and developed User Defined Function (UDF).
  • Involved in creating Hive tables (Managed tables and External tables), loading and analyzing data using hive queries.
  • Worked on improving the Spark jobs performance by using the broadcast joins, reducing number of shuffling
  • Handle the production Incidents assigned to our workgroup promptly and fix the bugs or route it to the respective teams and optimized the SLA’s.

Environment: SparkSQL, PySpark, Python, SQL, Kafka, Hive, Hadoop, HDFS, Tableau, MapReduce, Sqoop, Azure

Confidential

MS SQL/MSBI Developer

Responsibilities:

  • Analyzed business requirements, facilitating planning and implementation phases of the OLAP model in Team meetings
  • Participated in Team meetings to ensure a mutual understanding with business, development and test teams.
  • Encapsulated frequently executed SQL statements into stored procedures to reduce the query execution times.
  • DesignedSSISPackages to extract, transfer, load (ETL) existing data intoSQLServer from different environments for theSSAScubes (OLAP)
  • CreatedSSIS packagesto implement error/failure handling with event handlers, row redirects, and loggings.
  • SQLServer reporting services (SSRS). Created & formatted Cross-Tab, Conditional, Drill-down, Top N, Summary, Form,OLAP, Subreports, ad-hoc reports, parameterized reports, interactive reports & custom reports.
  • Managed packages the inSSISDBcatalog with environments; automated deployment and execution with SQL agent jobs.
  • Involved in the design of Data-warehouse usingStar-Schemamethodology and converted data from various sources to SQL tables.
  • Created action filters, parameters and calculated sets for preparing dashboards and worksheets usingPower BI
  • Designed complex data intensive reports inPower BIutilizing various graph features such as gauge, funnel.

Environment: Power BI, Birst BI, ETL, SQL, SSIS.

Confidential

Software Engineer

Responsibilities:

  • Review the system requirements and attending requirements meetings with analysts and users.
  • Involved in the life cycle of the project from documentation to unit testing making development as priority.
  • Actively involved for testing after creating reports.
  • Get the reporting issues resolved by identifying whether it is report related issue or source related issue.
  • Apply the hot fixes in production environments.
  • Interacted with the clients to make enhancements in the reports.
  • Published dashboards on Power Bi Services.
  • Worked on extracts and scheduling them on Power BI service.
  • Managed access of reports, dashboards and data for individual users using roles.
  • Working on the issue upon the given SLA time and make sure it doesn’t get breached.
  • Provided development support for System Testing, Product Testing, User Acceptance Testing, Data Conversion Testing, Load Testing, and Production.

Environment: Birst BI Tool, Microsoft Power BI, Amazon Redshift Database, agile methodologies.

We'd love your feedback!