We provide IT Staff Augmentation Services!

Data Engineer Resume

4.00/5 (Submit Your Rating)

Honolulu, HI

SUMMARY

  • Certified AWS and Spark Developer with more than 5 years of Software development experience.
  • Complete understanding and its working of all the Hadoop eco component (Map Reduce, PIG, Hive, HBase, Flume, Zoo Keeper and Oozie)
  • Experience in supporting and monitoring Hadoopclusters using Horton works distribution (HDP2.X).
  • Experience with importing and exporting data from Relational Database to Hive Tables using Sqoop.
  • Worked on Spark Shell for writing script in python and transforming data as per business requirement.
  • HBase tables are created and load large set of data from Relational tables and Hive tables.
  • Hive and Pig is used to analyze the given data.
  • Custom UDF for Pig and Hive is created to combine strategies using Python/Java into Pig Latin and HQL.
  • Troubleshoot error in MapR, HBase Shell, Hive and various other eco component.
  • Experience working with Oozie and Zookeeper for scheduling the task and coordinate cluster resources respectively.
  • Good Knowledge of Spark and Scala.
  • Prepared visuals reports, interpret and analyzed the clean data using Tableau.
  • Hands on experience working on Spark core, Spark SQL queries, Kafka - Spark Streaming.
  • Well versed in writing SQL queries, stored Procedures, Functions, Cursors, Index, Triggers and packages.
  • Worked with different file system like text, parquet, xml and JSON and have an extensive working experience of web services like SOAP and REST.
  • Knowledge on Spark Machine Learning (ML) Libraries.
  • Worked with Python Pandas libraries for data analysis.
  • Experience in Hadoop administration activities such as installation and configuration of clusters using Cloudera and Horton works manager.
  • Good Knowledge in all aspects of Software Development Life Cycle (Analysis, System Design, Development, testing and maintenance) using Waterfall and Agile methodologies.
  • Quick learner and master new technologies with excellent communication and inter personal skills.

TECHNICAL SKILLS

Big Data/Hadoop: HDFS, Map Reduce, YARN, Solr, HIVE, Impala, PIG, Sqoop, Oozie, Flume, Zookeeper, HBASE, Kafka

Apache Spark: Spark Core, Spark SQL, Spark Streaming

AWS Services: EC2, S3, EMR, Route 53

Hadoop Distributions: Cloudera and Hortonworks

Java/J2EE Technologies: Java, J2EE, Servlets, JDBC, XML, AJAX, REST

Frameworks: Hibernate, Spring

Programming Languages: Java, C, C++, Python, Linux shell scripts

NoSQL DB Technologies: HBase, Cassandra, MongoDB

Database: Oracle, MySQL, Hive

Web Technologies: HTML5, CSS, XML, JavaScript

Operating Systems: Ubuntu (Linux), Windows, Mac OS, CentOS

PROFESSIONAL EXPERIENCE

Confidential, Honolulu, HI

Data Engineer

Responsibilities:

  • Involved in building Enterprise Data Lake on HDP 2.X in an agile development environment.
  • Extensively migrated existing architecture to SparkStreaming to process the live streaming data.
  • Manages nodes and monitor Hadoop cluster for job performance using Hortonworks manager.
  • Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets.
  • Wrote Spark API’s in python to process semi/unstructured data files like xml, json, Avro data files and sequence files for log files.
  • Executed Spark code using Python for Spark Streaming/SQL for faster processing of data. Experience on working with streaming data processing and Batch processing in Spark.
  • Configured Kafka source and sink to fetch the real-time data.
  • Prepared record based storage layer using HBase that enable fast, random read and write data.
  • Pig Latin and HiveQL script is developed for Analysis of data and created User Defined Functions (UDFs) to extend the default functionality of data processing.
  • Worked on fetching the data from Amazon S3 bucket to data lake and built Hive tables on top of it and create data frames in SparkSQL.
  • Performed SQL Joins among Hive tables to get input for Spark batch process.
  • Pig loader is used to load table from Hadoop to different cluster.
  • Converted Complex Object into sequence bits by using Avro, Parquet, JSON, CVS formats.
  • Experienced in transferring HiveQL to Impala for processing to minimize query response time.
  • Created Hive tables, dynamic partitions, buckets for sampling, and working on them using Hive QL.
  • Using Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and persists into Cassandra.
  • Prepared Spark script using python and Scala shell commands as per the requirement.
  • Load the data from different outsource web services such as Amazon EC2, S3 and MySQL databases.
  • Coordinate Oozie to import data at certain amount of time with Hadoop stack.

Environment: Hadoop, Scala, Map Reduce, HDFS, Spark, AWS, AWS EMR, Hive, Cassandra, Talent, maven, Jenkins, Pig, UNIX, Python, C#, MRUnit, Git.

Confidential

Graduate Research Assistant

Responsibilities:

  • Performed a variety of administrative tasks including preparing power point presentations, data entry and analysis of research data, and collecting marketing materials.
  • Facilitated projects in the office with other students and served on short-term college project teams
  • Assisted faculty members with their research projects by engaging in literature searches, academic research in the area of business, statistic and economics, and helping faculty in the development of new material for their courses.
  • Teaching Python for Bigdata to Undergraduate Student.

Environment: Python, Hadoop, AWS, Spark, Java (JDK 1.8).

Confidential, Foster City, CA

Hadoop Developer

Responsibilities:

  • Managed and reviewed Hadoop log files.
  • Worked on developing data pipeline using Flume, Sqoop and Pig to extract the data from web logs and store in HDFS.
  • Tested raw data and executed performance scripts.
  • Shared responsibility for administration of Hadoop, Hive, and Pig.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Developed data pipeline using Flume, Sqoop, pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
  • Loaded cache data into HBase using Sqoop.
  • Working in the Cluster Setup 2-node and 5-node clusters with HDP distribution.
  • Involved in the data prediction analysis using K-Mean algorithm.
  • Participated in apache Spark POCS for analyzing the sales data based on several business factors
  • Coordinate discussions with customer and functional team as may be required to get various inputs.
  • Technical design document preparation.

Environment: HDP (2.X), Hadoop, HDFS, MapReduce, Hive, Sqoop, Oozie, Spark, Spark Sql, YARN, Java, Python, Apache Kafka, Maven, Jenkins, Java (JDK 1.8).

Confidential

System Engineer

Responsibilities:

  • Worked with different teams on various phases of Software Development Life Cycle (SDLC) as a unit testing and design development
  • Worked on developing application front end using JavaScript and CSS.
  • Managed SQL Databases and worked on Stored Procedures, PL/SQL.
  • Fixed Bug, maintain and developed an existing application.
  • Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
  • Wrote Map Reduce code for Mapper and Reducer in Java and deploy jar for execution.
  • Import and export data using sqoop into HDFS and also analyze that data.
  • Monitor and maintain Hadoop cluster through cloudera manager.
  • Participate in daily agile meeting, code review and discussion to solve any technical issue.

Environment: Java, Hadoop, HDFS, Sqoop, Eclipse, My SQL and Ubuntu, Maven, Jenkins, Java (JDK 1.6).

We'd love your feedback!