We provide IT Staff Augmentation Services!

Data Engineer Resume

2.00/5 (Submit Your Rating)

Houston, TX

SUMMARY

  • Having 7+ years of professional software development experience wif specialization in Big Data Engineering and Analytics and Java Projects.
  • Hands on experience in working wif Spark and Hadoop ecosystems like MapReduce, HDFS, Sqoop, Hive, Kafka, Oozie, Yarn, Impala, Pig, Flume and NoSQL Databases HBase.
  • Excellent noledge and understanding of Distributed Computing and Parallel processing frameworks.
  • Strong experience in working wif both batch and streaming process using Spark framework.
  • Good experience working wif Kafka clusters to storing real time streaming data and write custom Kafka producers and spark streaming consumers.
  • Experience in installation, configuration, and monitoring Hadoop clusters both on - perm and cloud.
  • Strong experience building data lakes in AWS Cloud utilizing services like S3, EMR, Glue Metastore, Athena, Redshift, Step Functions etc.,
  • Strong experience and noledge of real time data analytics using Kafka and Spark Streaming
  • Expertise in developing production ready Spark applications utilizing Spark RDD, Spark Data frames, Spark SQL, and Spark Streaming API's.
  • Strong hands-on noledge on using programming languages Scala and Python for developing Spark Applications.
  • Good experience troubleshooting data pipeline failures, identifying bottlenecks in long running pipelines.
  • Good experience productionizing and automating end to end data pipelines and allowing downstream applications to consume the data from data lakes in most optimized fashions.
  • Strong experience working wif various file formats like Parquet, ORC, Avro and JSON.
  • Strong experience using various features of Hive like creating managed and external tables, partitioning, and bucketing etc.,
  • Extending Hive core functionality by writing custom UDF’s for Data Analysis.
  • Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
  • Proficient in importing/exporting data from RDBMS to HDFS using Sqoop.
  • Having hands on experience wif Apache Nifi and Apache Airflow
  • Run DAGs using airflow. Created workflows using apache airflow.
  • Hands on experience on creating Docker containers of microservice Rest applications.
  • Strong experience working wif Core Java and Spring Boot for developing Rest APIs, JDBC, JEE technologies and Servlets.
  • Experience in version control systems using SVN and Git/GitHub and issue tracking tools like Jira.
  • Extensive experience working wif relational databases like PostgreSQL, Teradata, and MySQL database
  • Worked on Agile/SCRUM software development.
  • Ability to meet deadlines and handle pressure in coordinating multiple tasks in the work environment.

TECHNICAL SKILLS

Big Data Tools: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, Impala, Zookeeper, Ambari, Storm, Spark, and Kafka

No-SQL: HBase, Cassandra, MongoDB

Build and Deployment Tools: Maven, Sbt, Git, SVN, Jenkins

Programming and Scripting: Java, Scala, Python, SQL, Shell Scripting, Pig Latin, HiveQL

Databases: Teradata, Redshift, Oracle, My SQL, Postgres

Web Dev. Technologies: HTML, XML, JSON, CSS, JQUERY, JavaScript

AWS Services: EC2, EMR, S3, Redshift, EMR, Lambda, Glue, Simple Workflow, Athena

PROFESSIONAL EXPERIENCE

Confidential, Houston, TX

Data Engineer

Responsibilities:

  • Responsible for ingesting large volumes of user behavioral data and customer profile data to Analytics Data store.
  • Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
  • Developed Scala based Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning and reporting teams to consume.
  • Worked on troubleshooting spark application to make them more error tolerant.
  • Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
  • Wrote Kafka producers to stream the data from external rest API’s to Kafka topics.
  • Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to Snowflake.
  • Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, TEMPeffective & efficient Joins, transformations and other capabilities.
  • Worked extensively wif Sqoop for importing data from Oracle.
  • Designing and customizing data models for Data warehouse supporting data from multiple sources on real time.
  • Experience working for EMR cluster in AWS cloud and working wif S3, Redshift, Snowflake.
  • Involved in creating Hive tables, loading and analyzing data using hive scripts.
  • Implemented Partitioning, Dynamic Partitions, Bucketing in Hive.
  • Good experience wif continuous Integration of application using Bamboo.
  • Used Reporting tools like Tableau to connect wif Impala for generating daily reports of data.
  • Collaborated wif the infrastructure, network, database, application and BA teams to ensure data quality and availability.
  • Designed, documented operational problems by following standards and procedures using JIRA.

Environment: Hadoop, Spark, Scala, Python, Hive, Sqoop, Oozie, Kafka, Amazon EMR, YARN, JIRA, Amazon AWS, Shell Scripting, SBT, GITHUB, Maven.

Confidential, Richmond, VA

BigData Developer

Responsibilities:

  • Developed Spark Applications by using Scala and Implemented Apache Spark data processing Project to handle data from various RDBMS and Streaming sources.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, TEMPEffective & efficient Joins and Transformations.
  • Used Spark for implementing the transformations on the historic data.
  • Experience wif Pyspark for using Spark libraries by using Python scripting for data analysis and aggregation and for utilizing data frames, developed Spark SQL API for processing data.
  • Used Spark programming API over EMR Cluster Hadoop YARN to perform various data processing requirements.
  • Run DAGs using Apache airflow to structure batch jobs in an extremely efficient way.
  • Developed Spark Scala applications using both RDD/Data frames/Spark Sql for Data Aggregation, queries and writing data back into OLTP system using Spark JDBC.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Configured Spark Streaming to receive real time data from the Kafka and store the processed stream data back to Kafka.
  • Experienced in writing live Real-time Processing using Spark Streaming wif Kafka.
  • Involved in creating Hive tables and loading and analyzing data using hive queries.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Extensively worked wif S3 bucket in AWS.

Environment: Spark, Spark-Streaming, Spark SQL, AWS EMR, S3, Hive, Apache Kafka, Java, Scala, Shell scripting, Jenkins, Eclipse, Git, Tableau, MySQL and Agile Methodologies.

Confidential, Jersey City, NJ

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solution using Hadoop Cluster environment wif Cloudera distribution.
  • Convert raw data wif sequence data format, such as Avro and Parquet to reduce data processing time and increase data transferring efficiency through the network.
  • Worked on building end to end data pipelines on Hadoop Data Platforms.
  • Worked on Normalization and De-normalization techniques for optimum performance in relational and dimensional databases environments.
  • Designed developed and tested Extract Transform Load (ETL) applications wif different types of sources.
  • Creating files and tuned the SQL queries in Hive Utilizing HUE. Implemented MapReduce jobs in Hive by querying the available data.
  • Exploring wif Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD’s.
  • Experience wif Pyspark for using Spark libraries by using Python scripting for data analysis.
  • Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
  • Created User Defined Functions (UDF), User Defined Aggregated (UDA) Functions in Pig and Hive.
  • Worked on building custom ETL workflows using Spark/Hive to perform data cleaning and mapping.
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka portions.
  • Support for the cluster, topics on the Kafka manager. Cloud formation scripting, security and resource automation.

Environment: Python, Cloudera, HDFS, MapReduce, Flume, Kafka, Zookeeper, Pig, Hive, HQL, HBase, Spark, Kafka, ETL, Rest Services.

Confidential

Hadoop/Java Developer

Responsibilities:

  • Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
  • Setup and benchmarked Hadoop/HBase clusters for internal use.
  • Developed Java MapReduce programs for the analysis of sample log file stored in cluster.
  • Developed Simple to complex Map/reduce Jobs using Hive and Pig
  • Developed Map Reduce Programs for data analysis and data cleaning.
  • Developed PIG Latin scripts for the analysis of semi structured data.
  • Developed and involved in the industry specific UDF user defined functions
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Migration of ETL processes from RDBMS to Hive to test the easy data manipulation.
  • Developed Hive queries to process the data for downstream data analysis.

Environment: Apache Hadoop, HDFS, Cloudera Manager, CentOS, Java, MapReduce, Eclipse, Hive, PIG, Sqoop, Oozie and SQL.

Confidential

Java Developer

Responsibilities:

  • Designed and developed applications using Spring MVC framework wif Agile Methodology.
  • Developed JSP and HTML pages using CSS and JavaScript as part of the presentation layer.
  • Hibernate framework is used in persistence layer for mapping an object-oriented domain model to database.
  • Developed database schema and SQL queries for querying, inserting, and managing database.
  • Implemented various design patterns in the project such as Data Transfer Object, Data Access Object and Singleton.
  • Used Git for Source Code Management.
  • Used Maven scripts to fetch, build, and deploy application to development environment
  • Created RESTFUL web service interface to Java-based runtime engine.
  • Used Git for Source Code Management.
  • Used Apache Tomcat for deploying the application.
  • Used Junit for functional and unit testing of code.

Environment: Eclipse IDE, Java/J2EE, Spring, Hibernate, JSP, HTML, CSS, JavaScript, Maven, RESTful Web services, Apache Tomcat, Oracle, JUnit, Git

We'd love your feedback!