We provide IT Staff Augmentation Services!

Big Data Engineer Resume

3.00 Rating

Sanjose, CA

SUMMARY

  • Hadoop Developer with overall 8+ years of experience in Software Development work environment and 4+ years of Big Data Eco Systems experience in ingestion, storage, querying, processing and analysis of Big Data.
  • Dealt with Apache Hadoop components like HDFS, MapReduce, HIVE, HBase, PIG, SQOOP, OOZIE.
  • Very good understanding of Hadoop architecture and the daemons of Hadoop - Name Node, Data Node, Resource Manager, Node Manager, Task Tracker, Job Tracker.
  • Sound knowledge in programming Spark 2.0 using Scala.
  • Experienced in developing end to end ETL pipelines.
  • Experience in writing Hive Queries for processing and analyzing large volumes of data.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
  • Experienced in writing Java Map Reduce Programs.
  • Experience in NOSQL databases: HBase, Cassandra, MongoDB. Knowledge to store and query your data with Drill, Phoenix, and Presto.
  • Design real-world systems using the Hadoop ecosystem by installing and working with a real Hadoop installation using Hortonworks, the Ambari UI and Cloudera.
  • Managed cluster with YARN, Zookeeper, and Hue. Knowledge on Mesos, Oozie and Zeppelin.
  • Handle streaming data in real time with Kafka, Flume and Spark Streaming. Knowledge on Flink and Storm.
  • Good understanding of Data Mining and Machine Learning techniques such as R and Rshiny.
  • More than 4 years of experience in JAVA, J2EE, SOAP, HTML and XML related technologies demonstrating strong analytical and problem solving skills, computer proficiency and ability to follow through with projects from inception to completion.
  • Developed applications using Java, RDBMS, and Linux shell scripting.
  • Experience in complete project life cycle of Client Server and Web applications.
  • Have good interpersonal, communicational skills, strong problem-solving skills, explore/adopt to new technologies with ease and a good team member.

TECHNICAL SKILLS

Languages: C, C++, Python, Java, Scala, Swift Programming.

Hadoop Ecosystem: HDFS, MapReduce, HBase, YARN, Hive, Pig, Apache Tez, Apache Spark, Sqoop, Oozie, Flume, Kafka, Mesos, Hue.

J2EE Technologies: Java Beans, Servlets, JSP, JDBC, SOAP, HTML and XML

Data-Science Technologies: R, Rshiny.

No SQL Database: HBase, Cassandra, MongoDB.

Platforms: Hortonworks, Cloudera.

DBMS/RDBMS: SQL server, Oracle, MySQL.

IDE’s: Eclipse, NetBeans IDE, Xcode, Tableau BI, Visual Studio.

Operating System: Window2000/XP/Vista/7, Unix, Red Hat Linux, Ubuntu, MacOS.

Other Technologies: Maven, Docker, HQL, CQL, Zookeeper, Putty, GitHub.

PROFESSIONAL EXPERIENCE

Confidential, SanJose, CA

Big Data Engineer

Responsibilities:

  • Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology.
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce.
  • Developed data pipeline using Flume, Sqoop to ingest customer behavioral data and purchase histories into HDFS for analysis.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Used Pig to perform data validation on the data ingested using scoop and flume and the cleansed data set is pushed into Hbase.
  • Participated in development/implementation of Cloudera Hadoop environment.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked with Zookeeper, Oozie, and Data Pipeline Operational Services for coordinating the cluster and scheduling workflows.
  • Designed and built the Reporting Application, which uses the Spark SQL to fetch and generate reports on HBase table data.
  • Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into HBase.
  • Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Involved in running MapReduce jobs for processing millions of records.
  • Developed Hive queries and Pig scripts to analyze large datasets.
  • Involved in importing and exporting the data from RDBMS to HDFS and vice versa using Sqoop.
  • Involved in generating the Adhoc reports using Pig and Hive queries.
  • Used Hive to analyze data ingested into Hbase by using Hive-Hbase integration and compute various metrics for reporting on the dashboard.
  • Provide operational support for Hadoop databases.
  • Developed job flows in Oozie to automate the workflow for pig and hive jobs.
  • Loaded the aggregated data onto Oracle from Hadoop environment using Sqoop for reporting on the dashboard.

Environment: RedHat Linux, HDFS, Map-Reduce, Hive, Java JDK, Pig, Sqoop, Flume, Zookeeper, Oozie, HBase.

Confidential, SanFrancisco, CA

Big Data Engineer

Responsibilities:

  • Involved in end to end data processing like ingestion, processing, and quality checks and splitting.
  • Real time streaming the data using Spark Streaming with Kafka
  • Developed Spark scripts by using Scala as per the requirement.
  • Load the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Performed different types of transformations and actions on the RDD to meet the business requirements.
  • Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
  • Also worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase and Sqoop.
  • Involved in loading data from UNIX file system to HDFS.
  • Created HBase tables to store data coming from different portfolios.
  • Responsible to manage data coming from various sources.
  • Installed and configured Hive and wrote Hive UDFs.
  • Cluster coordination services through Zookeeper.
  • Imported and exported data into HDFS and relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Installed and configured Hadoop MapReduce and HDFS.
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Involved in managing and reviewing Hadoop log files.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Wrote Hive queries for data analysis to meet the specified business requirements by creating Hive tables and working on them using Hive QL to simulate MapReduce functionalities.

Environment: Hadoop, MapReduce, Hive, Pig, Sqoop, HBase, Kafka, Spark, Scala, Linux OS.

Confidential, SanFrancisco, CA

Data Engineer

Responsibilities:

  • Processing very large XML files by parsing and loading the data into HDFS.
  • Writing Spark jobs for historic load of data by processing required fields from HDFS.
  • Writing Spark Streaming jobs to manage the data flow between Flume and HDFS.
  • Creating Hive tables and performing a historic load of data into Hive tables for Data Scientist Analysis.
  • Writing Pig scripts for transforming the data into desired formats.
  • Loading the data present in HDFS into other relative database.
  • Extensively used Stream sets to design various pipelines for transferring and transforming of data to and from HDFS, Hive tables and various other output systems.
  • Managing the data load between HDFS and HBase and vice versa.

Environment: Hadoop, Java, MapReduce, Spark, Hive, HDFS, PIG, Kafka, Flume, Yarn, Maven, HBase, Unix/Linux, Stream sets.

Confidential, Woodlands, TX

Data Engineer

Responsibilities:

  • Understanding the build and environment of BIM server and the Apache Tomcat.
  • Installing the compatible BIM server on the Apache Tomcat in Ubuntu.
  • Installing the docker environment and using its containers to run various software and databases.
  • Setting up a Cassandra database cluster in the local system environment and making it compatible to store model files such as RFC files.
  • Modifying the java files in the BIM server to replace the Berkley database with Cassandra in it.
  • Analyzing and presenting the efficiency of the BIM server with Cassandra database over the BIM server with Berkley database to process large set of RFC and other modeling files.
  • Run the BIM server using CQL.

Environment: Cassandra, Java, Docker, CQL, BIM server, BIM surfer, Apache Tomcat, Ubuntu, java, EMF modeling files, RFC files.

Confidential, Sunnyvale, CA

Data Engineer

Responsibilities:

  • Using Sloop to import data from teradata and SQL into HDFS.
  • Develop programs for sorting and analyzing the data in HDFS.
  • Created HBase tables to load data of structured, semi-structured and unstructured data coming from UNIX file system.
  • Wrote MapReduce/Pig/UDF (Java) jobs in performing data cleaning, transformations and joins.
  • Developed MR programs to analyze the data, populate staging tables and store the refined data in partitioned tables in the Enterprise Data Warehouse (EDW).
  • Created Hive External tables and loaded the data in to tables and queried data using HQL that helped market analysts to look for emerging trends by comparing fresh data with EDW reference tables.
  • Collected the log data from web servers and integrated in to HDFS using Flume.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS. Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Developed the Pig UDF’s to preprocess the data for analysis.
  • Used Sqoop to import and export data from HDFS to RDBMS for visualization and to generate reports.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.

Environment: Hadoop, Hive, Flume, Sqoop, Pig, Shell Scripting, Oozie, MapReduce.

Confidential

Application Developer

Responsibilities:

  • Developed a web application to make the functionality of the Spectra analysis user friendly.
  • Worked on the functionalities of MaldiQuant Algorithm.
  • Installing and setting up environment to run R code.
  • Setting up of compactable environment for colliding the features of R and Rshiny.
  • Converted the data generated from laser analysis in to CSV format and loaded into the application.
  • Automated the functionality of the MaldiQuant algorithm for spectral analysis of microbes.
  • Added features to manipulate the data for required results based on peak values detected.
  • Clustered format of analyzed data is generated as a result for analysis.

Environment: Rshiny, R, CSV.

Confidential

Application Developer

Responsibilities:

  • Created technical specification document for the given functional specifications.
  • Actively participated in the team status meetings.
  • Involved in the research process of creating multiple online reports using java
  • Developed JSPs for building the user interface.
  • Developed modules handling escalations and the turn-around time for each task assigned to each employee.
  • Designed, developed, and configured server side J2EE components.
  • Implemented the business logic.
  • Designed and maintained the documenting module of the application.
  • Worked in the Build and Release of the entire application.
  • Wrote Stored Procedures and Functions for Database.
  • Strictly followed the three tier architecture.
  • Involved in in writing test cases, code review, debugging, Unit Testing, System Testing and Integration Testing.

Environment: Eclipse IDE, Java, JDBC, JSP, XML, HTML, SQL, JavaScript.

We'd love your feedback!