We provide IT Staff Augmentation Services!

Data Engineer Resume

5.00/5 (Submit Your Rating)

Bentonville, AR

PROFESSIONAL SUMMARY:

  • More than 8 years of IT experience in Software Development Life Cycle (Analysis, Design, Development, Testing, Deployment and Support) using WATERFALL and AGILE methodologies.
  • Having 4+ years of experience in Data Analysis using Hadoop Eco System components ( Spark , HDFS , MapReduce , Sqoop , Hive ) in Retail , Financial and Health - Care sector.
  • Experience with NoSQL databases like HBase and Cassandra.
  • Hands on experience in Sequence files, RC files, Avro, Parquet file formats.
  • Experience in running Hive scripts and Unix and Linux shell scripting.
  • Designed HIVE queries to perform data analysis, data transfer and table design to load data into Hadoop environment.
  • Implemented Sqoop for large dataset transfer between HDFS and RDBMS and vice-versa.
  • Experience in data workflow scheduler Oozie to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with the control flows.
  • Hands on Experience in designing and developing applications in Spark using Scala and Python.
  • Experience in developing Scala scripts to run in Spark cluster.
  • Created Partitions, Buckets when creating hive tables and uses various columnar formats like Parquet, ORC for storing the data.
  • Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in PIG and Hive.
  • Strong experience working with Spark Dataframes, Spark SQL, Spark ML and Spark Streaming APIs.
  • Developed Kafka producers to receive real time streaming feeds into Kafka topics.
  • Developed Spark Streaming applications to consume the JSON messages from Kafka topics and write to HBase.
  • Extensive knowledge on stream processing platforms like Flume and Kafka.
  • Strong experience troubleshooting failures in spark applications and fine-tuning for better performance.
  • Profound experience in working with Cloudera and Hortonworks Hadoop Distributions on multi-node cluster.
  • Qlik Sense Cloud is used to create interactive reports and dashboards with stunning charts and graphs.
  • Involved in Agile methodologies, daily scrum meetings, sprint planning.
  • Experience in using SQL Server 2012/2014/2016 , MySQL, PostgreSQL, SQLite3 and Oracle.
  • Experience in using IDEs like Eclipse, IntelliJ.
  • Hands on experience on writing Queries, Stored procedures, Functions and Triggers by using SQL.
  • Proficient and Worked with GIT, Jenkins and Maven.
  • Enthusiastic and Quick in learning new applications and tools, and willing to take individual responsibilities. A good team player with strong ability to learn and adapt new skills.
  • Good analytical, communication, problem solving skills and adore learning new technical, functional skills.

AREAS OF EXPERTISE:

Big Data Ecosystem: HDFS and Map Reduce, Hive, Impala, YARN, HUE, Oozie, Zookeeper, Solr, Apache Spark, Apache Kafka, Sqoop, Flume.

Hadoop Distributions: HBase, Cassandra, MongoDB

Programming Languages: SCALA, PYTHON, HiveQL, Ruby on Rails, C, C++, Java.

Scripting Languages: Shell Scripting, Java Scripting

BI Tools: Qlik Sense Cloud, Power BI.

Databases: SQL, Oracle, Teradata, DB2, PostgreSQL, MySQL, SQLite3

Cluster Management: Hortonworks, Cloudera Manager

Operating Systems: Windows, Mac, Unix, Linux

Version Control Tools: SVN, GitHub, Bitbucket, GitLab.

PROFESSIONAL EXPERIENCE:

Data Engineer

Confidential, Bentonville, AR

Responsibilities:

  • Developed TDCH scripts for importing and exporting data into HDFS and Hive.
  • Used Fair Scheduling to allocate resources in yarn.
  • Responsible to manage data coming from different sources.
  • Scheduled automated jobs using Cron scheduler.
  • Involved in creating Hive Tables, loading with data and writing Hive queries.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Read the ORC files and create Data frames to use in spark.
  • Performed data transformations and analytics on large dataset using Spark .
  • Experienced working with Spark Core and Spark SQL using Python.
  • Experienced working with Spark SQL using Pyspark.
  • Performance optimizations on Spark/Python.
  • Experience in developing Python scripts to run in Spark cluster.
  • Used Pyhton collection framework to store process the complex consumer information.
  • Integrated spark jobs with MLP platform.

Environment: Hadoop, HDFS, Hive, Spark, Python, Oozie, Cron, Teradata, Yarn, Unix, Hortonworks, TDCH, Spark SQL.

Hadoop/ Spark Developer

Confidential, Russellville, AR

Responsibilities:

  • Extracted the data from RDBMS into HDFS using Sqoop.
  • Developed UDF functions for Hive and wrote complex queries in Hive for data analysis.
  • Created tables in Cassandra to store variable data formats of data coming from different portfolios
  • Used ETL processes to load data from flat files into the target database by applying business logic on transformation mapping for inserting and updating records when loaded.
  • Import the data from different sources like HDFS/Hive into Spark RDD.
  • Experienced with in working with Spark Core and Spark SQL.
  • Experienced working with Spark Core and Spark SQL using Scala.
  • Experience in developing Scala scripts to run in Spark cluster.
  • Used Scala collection framework to store process the complex consumer information.
  • Used Scala for implemented fault tolerant mechanism by handling the various types of error messages and reprocess them without any concurrency issues.
  • Worked on different file formats Avro, RC and ORC file formats.
  • Created and worked on Sqoop jobs with incremental load to populate Hive External tables.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Performed data transformations and analytics on large dataset using Spark.
  • Integrated BI tool like Qlik Sense Cloud with Impala and analysed the data.

Environment: Hadoop, HDFS, Sqoop, Hive, Cassandra, Scala, Spark, Kafka, Linux, Qlik Sense Cloud, SQL.

Hadoop Developer

Confidential, Dallas, TX

Responsibilities:

  • Implemented real time data pipelines using Kafka and Spark Streaming.
  • Configured Flume to transport web server logs into HDFS.
  • Developed spark applications to perform data preparation and other analytics on data.
  • Worked extensively with Databricks cloud platform over AWS.
  • Developed multiple Kafka Producers and Consumers as per the specifications.
  • Configured Spark Streaming to receive real time data and store the stream data to S3.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop.
  • Experienced with Spark Core, Spark-SQL, Data Frame, RDDs and YARN.
  • Designed and developed Hive tables to store staging and historical data.
  • Created Hive tables as per requirement, internal and external tables are defined with appropriate statics and dynamic partitions, intended for efficiency.
  • Experience in using Parquet file format with Snappy compression for optimized storage of. Hive tables.
  • Developed Oozie workflow for scheduling and orchestrating the ETL process. Designed & Implemented Java MapReduce programs to support distributed data processing.
  • Involved in migrating MapReduce jobs into Spark jobs and used Spark SQL and DataFrames API to load structured and semi-structured data into Spark clusters.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Developed Sqoop jobs with incremental load to populate Hive External tables.
  • Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, EMR, S3, Redshift.
  • Involved in setting up and managing sessions. Currently responsible for mentoring peers and leading technical design.
  • Implemented the workflows using Apache Oozie framework to automate tasks.

Environment: Databricks, Hadoop, S3, Hive, Pig, Spark, Scala, Hive, Sqoop, Flume, HBase, YARN, RDBMS, Oozie.

Java/ Hadoop Developer

Confidential, Herndon, VA

Responsibilities:

  • Designed docs and specs for the near real-time data analytics using Hadoop and HBase.
  • Installed Cloudera Manager on the clusters.
  • Used a 15-node cluster on Amazon EC2.
  • Developed ad-clicks based data analytics, for keyword analysis and insights.
  • Crawled public posts from Facebook and tweets.
  • Used Solr search engine to search multiple sites and return recommendations.
  • Used Flume and Kafka to get the streaming data from Twitter and Facebook.
  • Used MongoDB to capture streaming data.
  • Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharing features.
  • Wrote MapReduce jobs with the Data Science team to analyze this data.
  • Converted output to structured data and imported to Informatica with analytics team.

Environment : Hadoop, MongoDB, HDFS, MapReduce, Flume, Java, Informatica, Cloudera Manage, Amazon EC2, Solr.

Java Developer

Confidential

Responsibilities:

  • Gathering requirements from end users and create functional requirements.
  • Contribute on process flow analysing the functional requirements.
  • Development of Graphical user interface for user self-service screen.
  • Implemented four eyes principle and created quality check process -reusable across all workflow on overall platform level.
  • Development of UI models using HTML, JSP, JavaScript, Web Link and CSS.
  • Developed Struts Action classes and Validation classes using Struts controller component and Struts validation framework.
  • Support in end user, testing and documentation.
  • Implemented Backing beans for handling UI components and stores its state in a scope.
  • Worked on implementing EJB Stateless sessions for communicating with Controller.
  • Implemented database integration using Hibernate and utilized spring with Hibernate for mapping with Oracle database.
  • Worked on Oracle PL/SQL queries to Select, Update and Delete data.
  • Worked on MAVEN for build automation. Used GIT for version control.

Environment: Java, J2EE, JSP, Maven, Linux, CSS, GIT Oracle, XML, SAX, Rational Rose, UML.

We'd love your feedback!