Big Data Engineer Resume
3.00/5 (Submit Your Rating)
SUMMARY
- 6+ years of professional experience in information technology with an expert hand in the areas of BIG DATA, HADOOP, SPARK, HIVE, OOZIE, SQOOP, SQL tuning, ETL development, report development, database development, data modelling.
- Experience inGCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, Pub/sub, cloud shell, GSUTIL, BQ command line utilities, Composer and Airflow
- Experience exclusively on BIG DATA ECOSYSTEM using HADOOP framework and related technologies such as HDFS, MapReduce, HIVE, HBASE, OOZIE, SQOOP, SPARK, Data pipeline, data analysis and processing with hive SQL, SPARK.
- Excellent understanding of Hadoop architecture and complete understanding of Hadoop daemons and various components such as HDFS, YARN, Resource Manager, Node Manager, Name Node, Data Node and Map Reduce programming paradigm.
- Worked on both latest Cloudera and Horton works Hadoop Distributions.
- Experience with complex data processing pipelines, including ETL and data ingestion dealing with unstructured and semi-structured data.
- Expertise in writing Map Reduce jobs using Java, Hive for data Processing.
- Worked on Import & Export of data using ETL tool Sqoop from SQL Server, Teradata, DB2, Mainframes GDG files to HDFS.
- Written Hive queries for data analysis to meet the requirements.
- Hands on experience inSparkandcreating data frames, applying Transformations.
- Experience in tuning the performances by using Partitioning & Bucketing.
- Created Hive tables and processed data using HiveQL.
- Handling different file formats on Parquet, Sequence files, Flat text files.
- Strong understanding of NoSQL databases like HBase.
- Contributed to the Open-Source Apache Software Foundation.
- Experience in job/workflow scheduling and monitoring tools like Apache Oozie.
- Excellent Java development skills using Spring, Servlets, JUnit, JSP, JDBC.
- Experience inwritingComplexSQLQueriesinvolving multiple tablesinnerandouter joins.
- Worked on the Maven and Gradle for building the project.
- Building the data pipelines and using Alteryx for the data visualization and data analysis.
- Worked on the migration from on-prem to the GCP.
PROFESSIONAL EXPERIENCE
Confidential
Big Data Engineer
Responsibilities:
- Worked on Hadoop Horton works distribution which managed services viz. HDFS, Hive, HBase, Sqoop, Pyspark, Ambari, Zookeeper, oozie etc.)
- Building data lakes using the Apache Sqoop, HDFS, Hive and PySpark.
- Partitioned and queried the data in Hive for further analysis by the BI team.
- Involved in extracting the data from various sources into HDFS for processing.
- Responsible for building scalable distributed data solutions using Hadoop.
- Worked on Oozie to schedule the jobs involving multiple actions.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Performed research and data analysis and used Alteryx for data visualization and data analytics.
- Worked on Kerberos authentication for all the services in Hadoop Cluster
- Analyzed the SAS code and derived SQL queries.
- Hashing the PII and transferring files across system through PGP encryption.
- Worked on the Apache Ambari.
- Worked on GCP -Migrating data from Hadoop to GCP, Big query and Data Studio, ML, Cloud functions, Composer and Airflow, Dataflow.
Environment: HDFS, Hive, Oozie, Sqoop, Pyspark, SQL Server and TeradataSQL Assistant, Apache Ambari, GCP
Confidential
Analyst
Responsibilities:
- Worked on Hadoop Horton works (HDP 2.6.0.2.2) distribution which managed services viz. HDFS, MapReduce2, Tez, Hive, HBase, Sqoop, Spark, Ambari Metrics, Zookeeper, oozie etc.)
- Worked on SQL Server and designed complex queries for the data analysis
- Experience with designing and building solutions for data ingestion both real time & batch using Sqoop.
- Involved in Analyzing system failures, identifying root causes.
- Retrieved data from HDFS into relational databases with Sqoop.
- Partitioned and queried the data in Hive for further analysis by the BI team.
- Involved in extracting the data from various sources into Hadoop HDFS for processing.
- Worked on analyzing Hadoop cluster and different big data analytic tools including HBase database and Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop.
- Installed Oozie workflow engine to run multiple Hive jobs.
- Used MVC for building the project.
Environment: HDFS, Hive, Oozie, Sqoop, Apache Hadoop, Spark, Oracle.
Confidential
Big Data Analyst
Responsibilities:
- Worked with data scientists and business analysts to understand and gather specific requirements and extract business relevant stories.
- Worked on Hive and designed complex queries for data analysis.
- Developed MapReduce jobs to parse semi structured data from various sources.
- Knowledge in job work-flow scheduling and monitoring tools like oozie and Zookeeper.
- Worked on various optimization techniques to manage the processing and storage of Big Data in Hadoop.
- End to end ownership of the process, ensuring best practices of big data stack.
- Involved in project/tasks estimation for smooth execution of sprint in Agile methodology
- Responsible for project documentation and maintenance after delivery.
Environment: Hive, Spark, Sqoop, Oozie, Java, Python, Scala, Impala, Shell scripting, HBase
Confidential
System Analyst
Responsibilities:
- Involved through design, development and test phases of the project.
- Provided product support for the application.
- Responsible for Unit and Integration Testing.
- Performed Design reviews, code reviews, Test Plan and Test Case Reviews for development and QA team.
- Worked on Oracle Identity Manager and designed complex queries for the data analysis.
Environment: SoapUI, Oracle, OIM, Java, Spring, LDAP
Confidential
Big Data Engineer
Responsibilities:
- Worked as a HIVE team member and involved in design of the High Availability for the Hive Server. Hive Server is the Single Point of Failure and is a Data Warehouse solution for querying and analysis on large sets of Big Data. Involved in the design review of the High Availability (HA) feature on Hive.
- Worked as a HBASE team member. HBASE is the column-oriented database that is built over HDFS. Worked together with Apache HBase committers and mentored team members
- Involved in Requirement Analysis, design and execution, automation of unit testcases for HDFS, Hive, MapReduce and HBase in Junit.
- Creation of the test cases, automation and execution of test cases on CI
- Integrate Jcarder tool and code coverage tool Emma to CI
- Good at Hadoop cluster setup, monitoring, administration.
- Resolved all the customer queries related to installation, configuration, administration, etc.
- Experience on Non-functional testing tools like Heap Dump Analyzers, Thread Dump Analyzers, GC log Analyzers, Profilers
- Design and develop automation frameworks and automation suites using Java, Junit and Ant.
- Worked on improving the Performance for many Huawei Hadoop versions.
- Good knowledge on Linux commands and scripting.
- Contributed the patches in Apache open source for HBase component for major bugs
- Participated in product functional reviews, test specifications, document reviews.
- Executing the MapReduce jobs and building data lakes
- Worked on the Zookeeper, Bookkeeper and data analytics
Environment: Hive, Hadoop, HBase, Zookeeper, Bookkeeper, MapReduce.