Big Data Engineer Resume

SUMMARY

6+ years of professional experience in information technology with an expert hand in the areas of BIG DATA, HADOOP, SPARK, HIVE, OOZIE, SQOOP, SQL tuning, ETL development, report development, database development, data modelling.
Experience inGCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, Pub/sub, cloud shell, GSUTIL, BQ command line utilities, Composer and Airflow
Experience exclusively on BIG DATA ECOSYSTEM using HADOOP framework and related technologies such as HDFS, MapReduce, HIVE, HBASE, OOZIE, SQOOP, SPARK, Data pipeline, data analysis and processing with hive SQL, SPARK.
Excellent understanding of Hadoop architecture and complete understanding of Hadoop daemons and various components such as HDFS, YARN, Resource Manager, Node Manager, Name Node, Data Node and Map Reduce programming paradigm.
Worked on both latest Cloudera and Horton works Hadoop Distributions.
Experience with complex data processing pipelines, including ETL and data ingestion dealing with unstructured and semi-structured data.
Expertise in writing Map Reduce jobs using Java, Hive for data Processing.
Worked on Import & Export of data using ETL tool Sqoop from SQL Server, Teradata, DB2, Mainframes GDG files to HDFS.
Written Hive queries for data analysis to meet the requirements.
Hands on experience inSparkandcreating data frames, applying Transformations.
Experience in tuning the performances by using Partitioning & Bucketing.
Created Hive tables and processed data using HiveQL.
Handling different file formats on Parquet, Sequence files, Flat text files.
Strong understanding of NoSQL databases like HBase.
Contributed to the Open-Source Apache Software Foundation.
Experience in job/workflow scheduling and monitoring tools like Apache Oozie.
Excellent Java development skills using Spring, Servlets, JUnit, JSP, JDBC.
Experience inwritingComplexSQLQueriesinvolving multiple tablesinnerandouter joins.
Worked on the Maven and Gradle for building the project.
Building the data pipelines and using Alteryx for the data visualization and data analysis.
Worked on the migration from on-prem to the GCP.

PROFESSIONAL EXPERIENCE

Confidential

Big Data Engineer

Responsibilities:

Worked on Hadoop Horton works distribution which managed services viz. HDFS, Hive, HBase, Sqoop, Pyspark, Ambari, Zookeeper, oozie etc.)
Building data lakes using the Apache Sqoop, HDFS, Hive and PySpark.
Partitioned and queried the data in Hive for further analysis by the BI team.
Involved in extracting the data from various sources into HDFS for processing.
Responsible for building scalable distributed data solutions using Hadoop.
Worked on Oozie to schedule the jobs involving multiple actions.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Performed research and data analysis and used Alteryx for data visualization and data analytics.
Worked on Kerberos authentication for all the services in Hadoop Cluster
Analyzed the SAS code and derived SQL queries.
Hashing the PII and transferring files across system through PGP encryption.
Worked on the Apache Ambari.
Worked on GCP -Migrating data from Hadoop to GCP, Big query and Data Studio, ML, Cloud functions, Composer and Airflow, Dataflow.

Environment: HDFS, Hive, Oozie, Sqoop, Pyspark, SQL Server and TeradataSQL Assistant, Apache Ambari, GCP

Confidential

Analyst

Responsibilities:

Worked on Hadoop Horton works (HDP 2.6.0.2.2) distribution which managed services viz. HDFS, MapReduce2, Tez, Hive, HBase, Sqoop, Spark, Ambari Metrics, Zookeeper, oozie etc.)
Worked on SQL Server and designed complex queries for the data analysis
Experience with designing and building solutions for data ingestion both real time & batch using Sqoop.
Involved in Analyzing system failures, identifying root causes.
Retrieved data from HDFS into relational databases with Sqoop.
Partitioned and queried the data in Hive for further analysis by the BI team.
Involved in extracting the data from various sources into Hadoop HDFS for processing.
Worked on analyzing Hadoop cluster and different big data analytic tools including HBase database and Sqoop.
Responsible for building scalable distributed data solutions using Hadoop.
Installed Oozie workflow engine to run multiple Hive jobs.
Used MVC for building the project.

Environment: HDFS, Hive, Oozie, Sqoop, Apache Hadoop, Spark, Oracle.

Confidential

Big Data Analyst

Responsibilities:

Worked with data scientists and business analysts to understand and gather specific requirements and extract business relevant stories.
Worked on Hive and designed complex queries for data analysis.
Developed MapReduce jobs to parse semi structured data from various sources.
Knowledge in job work-flow scheduling and monitoring tools like oozie and Zookeeper.
Worked on various optimization techniques to manage the processing and storage of Big Data in Hadoop.
End to end ownership of the process, ensuring best practices of big data stack.
Involved in project/tasks estimation for smooth execution of sprint in Agile methodology
Responsible for project documentation and maintenance after delivery.

Environment: Hive, Spark, Sqoop, Oozie, Java, Python, Scala, Impala, Shell scripting, HBase

Confidential

System Analyst

Responsibilities:

Involved through design, development and test phases of the project.
Provided product support for the application.
Responsible for Unit and Integration Testing.
Performed Design reviews, code reviews, Test Plan and Test Case Reviews for development and QA team.
Worked on Oracle Identity Manager and designed complex queries for the data analysis.

Environment: SoapUI, Oracle, OIM, Java, Spring, LDAP

Confidential

Big Data Engineer

Responsibilities:

Worked as a HIVE team member and involved in design of the High Availability for the Hive Server. Hive Server is the Single Point of Failure and is a Data Warehouse solution for querying and analysis on large sets of Big Data. Involved in the design review of the High Availability (HA) feature on Hive.
Worked as a HBASE team member. HBASE is the column-oriented database that is built over HDFS. Worked together with Apache HBase committers and mentored team members
Involved in Requirement Analysis, design and execution, automation of unit testcases for HDFS, Hive, MapReduce and HBase in Junit.
Creation of the test cases, automation and execution of test cases on CI
Integrate Jcarder tool and code coverage tool Emma to CI
Good at Hadoop cluster setup, monitoring, administration.
Resolved all the customer queries related to installation, configuration, administration, etc.
Experience on Non-functional testing tools like Heap Dump Analyzers, Thread Dump Analyzers, GC log Analyzers, Profilers
Design and develop automation frameworks and automation suites using Java, Junit and Ant.
Worked on improving the Performance for many Huawei Hadoop versions.
Good knowledge on Linux commands and scripting.
Contributed the patches in Apache open source for HBase component for major bugs
Participated in product functional reviews, test specifications, document reviews.
Executing the MapReduce jobs and building data lakes
Worked on the Zookeeper, Bookkeeper and data analytics

Environment: Hive, Hadoop, HBase, Zookeeper, Bookkeeper, MapReduce.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship