We provide IT Staff Augmentation Services!

Big Data/ Hadoop Engineer Resume

4.00/5 (Submit Your Rating)

Bentonville, AR

SUMMARY

  • 7+ years of professional IT work experience in Analysis, Design, Administration, Development, Deployment and Maintenance of critical software in Big Data/ Hadoop Eco System, SQL/NO - SQL databases, Java /J2EE technologies and tools using industry accepted methodologies and procedures.
  • Hands on experience in using MapReduce programming model for Batch processing of data stored in HDFS.
  • Experience with Containerization/ Orchestration Services like Docker, Kubernetes and OpenShift.
  • Handled Data Movement, ETL, ELT, Data Transformation, Analysis and Visualization across the Data Lake by integrating it with various tools.
  • Expertise in Planning, Installing and Configuring Hadoop Cluster based on the business needs.
  • Installed and Configured multiple Hadoop clusters of different sizes and with ecosystem Components like Pig, Hive, Sqoop, Flume, HBase, Oozie and Zookeeper.
  • Hadoop Development: Extensively worked on Hadoop tools which include Pig, Hive, Oozie, Sqoop, Spark, Data frames, Spark Streaming, HBase and MapReduce programming. Created Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Developed Spark applications using Scala for easy Hadoop transitions. Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive. Developed Spark code and Spark-SQL/Streaming for fast Data Processing.
  • Data Ingestion into Hadoop (HDFS): Ingested data into Hadoop from various data sources like Oracle, MySQL. Created Sqoop job with incremental load to populate Hive External tables. Involved in importing the real-time data to Hadoop using Kafka and worked on Flume. Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • File Formats: Involved in running Hadoop streaming jobs to process terabytes of text data. Worked with different file formats such as Text, Sequence files, Avro, Orc and Parquet.
  • Scripting and Reporting: Created scripts for performing data-analysis with Pig, Hive and Impala. Generated reports extract and statistics on the distributed data on Hadoop cluster. Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.

TECHNICAL SKILLS

Operating System: Windows 7/8/10, Vista, UNIX, Linux, Ubuntu, Mac OS

Bigdata Technologies: Hadoop /HDFS, MapReduce, Hive, HBase, Scala, Spark, Apache, Pig, Sqoop, Oozie, Kafka, Cassandra & MongoDB, Ambari

Programming language: C, C++, Python, Java, J2EE, Scala, PL/SQL, Java Script

Databases: MongoDB, Oracle 10g/11g, PL/SQL, MySQL, MS SQL Server 2012

SQL Server Tools: Enterprise Manager, SQL Profiler, Query Analyzer, SQL Server 2008, SQL Server 2005 Management Studio, DTS, SSIS, SSRS, SSAS

Modelling Tools: Rational Rose, Star UML, Visual paradigm for UML Architecture Relational DBMS, Client-Server Architecture, OLAP, OLTP

Development Methodologies: Agile, Waterfall

PROFESSIONAL EXPERIENCE

Confidential

Big Data/ Hadoop Engineer

Responsibilities:

  • Developed Batch Processes using Spring Batch to read data from Hive, transform to FHIR (Fast Healthcare Interoperability Resources) and write the into MongoDB.
  • Developed REST APIs using HAPI FHIR opens source library and FHIR Restful Server Functionality to expose data residing in MongoDB to Clients and deploy as spring boot app.
  • Written the Optimized Hive queries to fetch data from Data Lake.
  • Provided the codebase and Testing approach for Functional Testing using Cucumber framework.
  • Experience in using DevOps technologies like Jenkins, Docker, Kubernetes, OpenShift etc.
  • Created applications in Core Java that satisfy use of database and continuous connectivity using JDBC, JSP, Spring and Hibernate.
  • Created Grafana dashboards, Functional and Performance Testing suites using HTML, CSS and JavaScript along with validation techniques.

Confidential, Bentonville, AR

Big Data/ Hadoop Engineer

Responsibilities:

  • Developed framework for converting existing PowerCenter mappings and to PySpark (Python and Spark) Jobs.
  • Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HBase.
  • QA Application has been developed using Scala/ Spark/ data frames to read data from Hive Tables on YARN Framework.
  • Evaluated and improved application performance with Spark and developed PySpark scripts.
  • Developed Spark applications in python using PySpark on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.
  • Worked on reading and writing multiple data formats like JSON, ORC, Parquet on HDFS using PySpark.
  • Analyzed the SQL scripts and designed it by using PySpark SQL for faster performance.
  • Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (cloud) using Sqoop and Flume.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and Azure cloud.
  • Used NoSQL database with HBase and Mongo DB. Exported the result set from Hive to MySQL using Shell scripts.
  • Processed the spend and goals data in Alteryx in such a way that it is suitable for reporting
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process.
  • Automated the generation of HQL, creation of Hive Tables and loading data into Hive tables by using Apache NiFi and Oozie.
  • Migrated existing java application into Microservices using spring boot and spring cloud.
  • Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Migrated Hive QL queries on structured into Spark QL to improve Performance.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
  • Written automated HBase test cases for data quality checks using HBase command line tools.
  • Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra.
  • Configured Kafka to read and write messages from external programs and handle real time data.
  • Used OOZIE Operational Services for batch processing, scheduling workflows dynamically and creating End to End data pipeline automation.

Confidential

Hadoop Developer

Responsibilities:

  • Developed data pipeline using Spark, Hive, Pig, python, Impala and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
  • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS.
  • Involved in Data Warehouse concepts, Testing methodologies, Bigdata Testing analysis in Hadoop and Hive.
  • Used Pig to do transformations, event joins and some Pre-Aggregations before storing the data onto HDFS.
  • Involved in using Sqoop for importing and exporting data between RDBMS, HDFS and Impala.
  • Used Hive to analyze the Partitioned and Bucketed data and compute various metrics for reporting.
  • Performed hive performance tuning aspects like Map join, cost-based optimization and column level statistics.
  • Involved in developing Hive DDLS to Create, Alter, Drop Hive tables and Load data from Linux file system to HDFS.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Extensively worked on NOSQL database and Postgres like HBase.

Confidential

Jr. Hadoop Developer

Responsibilities:

  • Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in Sqoop, HDFS Put or Copy from Local to ingest data and Map Reduce jobs.
  • Used PIG to do transformations, event joins, filter boot traffic and SOME PRE-AGGREGATIONS before storing the data onto HDFS.
  • Worked on ETL Data Ingestion, In-Stream data processing, Batch Analytics and Data Persistence Strategy.
  • Implemented Hadoop based data warehouses, Integrated Hadoop with Enterprise Data Warehouse Systems. Working on modeling process on OLAP.
  • Worked on Designing and Developing ETL Workflows using Java for processing data in HDFS/HBase using Oozie.
  • Extensively used tools in Hadoop Ecosystem including Hive, HDFS, Map Reduce, Yarn, Oozie, and Zookeeper.
  • Involved in integration of Hadoop cluster with Spark Engine to perform BATCH and GRAPHX operations.
  • Exploring SPARK for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-Sql, Data Frame, Pair RDD's and Spark Yarn.
  • Import the data from different sources like HDFS/HBase into SPARK RDD.
  • Developed Kafka Producer and Consumers, HBase clients, Spark and Hadoop Map Reduce jobs along with components on HDFS, Hive.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Involved in developing HIVE DDLS to create, alter, drop Hive tables and loading data from UNIX file system to HDFS.
  • Computed various metrics using Java Map Reduce to calculate metrics that define user experience.
  • Developed data pipeline using Flume, Sqoop, Postgres and Pig to extract the data from weblogs and store in HDFS.
  • Extracted and updated the data using Mongo import and export CLI.

Confidential

Jr Software Engineer

Responsibilities:

  • Gathered requirements from Client, Analysis and Preparing the Requirement Specification document.
  • Developed more than Web-based Applications using JSP, Ajax, jQuery, CSS to enhance functionality and user experience on web pages.
  • Extensively used different kinds of databases like Oracle, SQL Server, MYSQL and written SQL Store Procedures.
  • Analyzed MVC architecture, Struts framework in view of the application workflow and application development.
  • Developed multi-threading projects, used connection pool to manage concurrency situation. Using synchronized method and synchronized variable.
  • Front-end development using HTML, CSS, JSP and client-side validations performed using Java Script.
  • Worked on creating and updating the Oracle 9i.
  • Developed JUnit Test cases for the system.
  • Developed the User Interface using JSP/HTML and used CSS for style setting of the Web Pages.
  • Designed XML schema for the system and developed documentation for the system.
  • Created UML diagrams, forms and services.

We'd love your feedback!