We provide IT Staff Augmentation Services!

Data Engineer Resume

5.00/5 (Submit Your Rating)

Beaverton, OR

SUMMARY

  • Around 7 years of experience in the IT industry with over 4 years of experience with Big data ecosystem.
  • Worked with RDBMS like MySQL, Oracle and cloud data warehouse like Snowflake.
  • Worked extensively with SQL queries and optimizations in databases like MySQL and Oracle.
  • Expertise on coding in different technologies i.e., Python, Java, C++, Unix shell scripting.
  • Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
  • Hands-on experience with Hadoop, MapReduce, Hive, Sqoop, Spark, NiFi, AWS EMR clusters.
  • Expert level knowledge in HDFS and MapReduce architectural components like Name Node, Data Node, Job Tracker, Task Tracker, Mappers and Reducers.
  • Managed and External tables in Hive to optimize performance.
  • Experience in process improvement, Normalization/de-Normalization, data extraction, data cleansing, data manipulation on HIVE.
  • Experience in loading data files from HDFS to Hive for reporting.
  • Hands-on experience on RDD architecture, implementing Spark operations on RDD and optimizing transformations and actions using PySpark.
  • Handled large volume of data efficiently using partitions, in-memory capabilities, broadcasts, joins and transformations in Spark.
  • Developed Spark workflows and programs using Spark Data Frames and SparkSQL.
  • Experience in using different file formats like flat files, Sequence files, Avro, ORC and Parquet formats.
  • Hands-on experience with ETL (Extract, Transform and Load) pipelines development in Hive.
  • Good Knowledge with cloud technologies AWS (EMR, S3, EC2, DynamoDB), Azure (HD Insights, Blob Storage, Document DB) and GCP Data Procs.
  • Experience of using GitHub and GitBash version control system.
  • Worked in Agile methodology and SDLC development models.
  • Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive
  • Good Knowledge of Data Science libraries like NumPy, Scikit, PyMySQL, Quandl, SQLAlchemy used in statistical modelling and machine learning.
  • Closely working with Business team for gathering the requirements and fully understand the business requirements
  • Strong background in mathematics and have very good analytical and problem-solving skills

TECHNICAL SKILLS

Hadoop/Big Data: HDFS, MapReduce, Hive, Sqoop, Spark

Programming languages & Scripting: Python, Java, C++, Linux shell scripts

RDBMS: MySQL, Oracle, TSQL, Snowflake

NoSQL Databases: Cassandra, HBase

Cloud Platforms: AWS EC2, AWS S3, ASW EMR, GCP

Web Technologies: HTML, CSS, PHP, XML, JavaScript, AJAX

Data Visualization: Power BI, Tableau

Data Science libraries: NumPy, Scikit, Pandas, Matplotlib, Seaborn, PyMySQL, SQLAlchemy, Quandl, Pandas-datareader

IDE’s: Eclipse, Visual Studio, PyCharm, Spyder

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential | Beaverton, OR

Responsibilities:

  • Designed, developed, and enhanced PySpark scripts for several data implementation pipelines
  • Handled remediation work in the backend processes for new Confidential user-centric consumer changes
  • Performed Hive and Spark tuning using partitioning concepts in parquet files
  • Experienced in several testing scenarios using SparkSQL, Hive and Snowflake
  • Scheduled Spark jobs using Airflow scheduling in EMR clusters
  • Worked on materialized views, tables, and test scenarios in Snowflake
  • Experienced in handling data on different platforms like HDFS, S3, Snowflake, Hive in multiple file formats like CSV, Text, Parquet and ORC
  • Experienced in evaluating and analyzing business data on several factors useful for pipeline development
  • Worked closely with analysts and other data engineers on requirement gathering, analyzing data inconsistencies, anomalies in data and actively took part in providing solutions for any issues
  • Worked in large team setup using Agile methodology and participated in Standup calls, planning and refinement activities

Environment: Spark, Python, Hive, SQL, Snowflake, HDFS, Airflow, Agile

Data Engineer

Confidential | Las Vegas, NV

Responsibilities:

  • Designed and developed data integration workflows on big data technologies and platforms - Hadoop, Spark, MapReduce, Hive, HBase
  • Performed Hive and Spark tuning with partitioning and bucketing concepts in parquet files within executors/driver's memory
  • Developed Hive queries and Sqooped data from RDBMS to Hadoop staging area
  • Handled importing of data from various data sources, performed transformations using Hive, and loaded data into data lake
  • Experienced in handling large datasets using partitions, Spark in-memory capabilities, broadcasts, effective & efficient joins, transformations, and other operations
  • Processed data stored in data lake and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project
  • Developed dataflows and processes using SparkSQL & Spark Data Frames
  • Working on Hive Metastore backup, partitioning, and bucketing techniques in hive to improve the performance
  • Scheduling Spark jobs using Airflow workflow in Hadoop Cluster and generated detailed design documentation for the source-to-target transformations.
  • Worked briefly on GCP POC during the project, migrating out tools from AWS to GCP
  • Worked in Agile methodology and actively participated in standup calls, PI planning and work reported
  • Involved in Requirement gathering and prepared the Design documents

Environment: Spark, Python, Sqoop, Hive, Hadoop, SQL, HBase, MapReduce, HDFS, Airflow, Agile

Hadoop Developer

Confidential

Responsibilities:

  • Responsible for managing data coming from different RDMS source systems like Oracle, MySQL, Teradata and involved in maintaining the structured data within the HDFS in various file formats such as parquet, Avro and ORC for optimized storage patterns.
  • Improving data processing and storage throughput by using Cloudera Hadoop framework for distributed computing across a cluster of up to seventeen nodes.
  • Analyzed and transformed stored data by writing Spark jobs (using windows functions such as rank, row number, lead, lag, etc.) to allow downstream reporting and analytics based on business requirements.
  • Involved in creating Hive tables, loading with data, and writing hive queries which run internally in map reduce execution.
  • Optimizing the Hive Queries using the various files format like PARQUET, JSON and AVRO.
  • Worked with various compression formats in HDFS like Snappy.
  • Executed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
  • Used PySpark for extracting, cleaning, transforming, and loading data into Hive data warehouse.
  • Experience with partitions, bucketing concepts in Hive and designed them using managed and external tables in Hive.
  • Experienced in handling large datasets using Partitions, spark in memory capabilities, Broadcasts in spark, effective & efficient Joins and Transformations during ingestion process.
  • Experience writing Sqoop jobs to move data from various RDBMS into HDFS and vice versa.
  • Worked with Oozie workflow engine to schedule time-based jobs to perform multiple actions.
  • Developed ETL pipelines to source data to Business intelligence teams to build visualizations.
  • Involved in unit testing, interface testing, system testing and user acceptance testing of the workflow.
  • Involved in Agile methodologies, daily scrum meetings, Spring planning's.

Environment: Oracle, MySQL, Teradata, Cloudera, Hive, Spark, Sqoop, PySpark, Oozie.

Java Developer

Confidential

Responsibilities:

  • Created the Database, User, Environment, Activity, and Class diagram for the project (UML).
  • Designed and developed front end websites and applications using Java, JavaScript, HTML, DHTML, CSS.
  • Participate in reviews of design, functional specifications and code developed by other team members
  • Implement the Database using Oracle database engine
  • Involved in the design tables of the database in Oracle.
  • Wrote oracle stored procedures (PL/SQL) and called it using JDBC.
  • Involved in Units integration, bug fixing, and testing with test cases.
  • Fixed the bugs reported in User Testing and deployed the changes to the server.
  • Designed, implemented, and maintained Java application code within all phases of the Software Development Life Cycle (SDLC).

Environment: Java, Java Script, HTML, Oracle.

We'd love your feedback!