Big Data Engineer Resume

SUMMARY:

5+ years of experience in IT industry, which includes experience in Big Data Technologies like Apache Hadoop, Spark, Hive, Oozie, Cloud Technologies like Microsoft Azure and Google Cloud Platform, Java and Web Technologies and strong programming experience using Java, inux, Python and SQL.
Experience on working in multiple domains such as Financial and Retail etc.
Experience in working with Cloudera, Google Cloud Platform and Microsoft Azure HDINSIGHT Distributions.
In depth knowledge of Distributed Systems Architecture and parallel processing frameworks.
Extensive knowledge of Apache Hadoop, Apache Spark and Google Cloud Platform Architecture.
Experience in working with Dataflow pipelines using Apache Beam API to migrate the data from various sources to BigQuery.
Worked on partitioning and clustering of datasets in BigQuery.
Created Apache Airflow pipelines using Python to schedule the daily/weekly jobs.
Hands on experience in Hadoop Ecosystem components such as Hadoop, HDFS, YARN, TEZ, Hive, Spark, Sqoop, MapReduce, Pig, OOZIE, HBASE, Airflow etc.
Good experience in creating complex data ingestion pipelines, data transformations, data management and data governance at an Enterprise level.
Experience in analyzing data using HiveQL, writing custom UDF's in Hive, and custom MapReduce programs in Java.
Written Hive Queries using Hive Query Language to join data from various sources to build target data for analytical purposes.
Strong knowledge on troubleshooting and tuning Spark applications and hive scripts for optimal performance.
Knowledge and experience of Batch and Real time data processing technologies like Spark.
Developed Spark applications to perform various transformations and actions using Spark SQL in Scala.
Strong expertise in Unix shell script programming.
Expertise in creating Shell - Scripts, Regular Expressions, etc.
Experience on loading and transforming of large sets

SKILLS:

.Net, Microsoft Visual Studio

Apache Spark

Application Development

Avro, Hadoop

Hbase, HDFS

Hive, HTML, JavaScript, Jenkins, JSON

Mapreduce

Pig, Python

Pyspark

Real Time

Scripting, XML, Zookeeper

Data Analysis, Database Analysis, Database Modeling, Database Design

JDBC, MySQL, Oracle, PL/SQL, Ambari

Oozie, Sqoop,Cassandra

ETL, Informatica, Kafka, Nosql

Data Amazon Web Services

AWS, Eclipse, Java, JIRA, QA Testing

Shell Scripting, Unix/Linux, RDBMS, CSS

Version Control, Test Cases

Pipeline, GCP

EXPERIENCE

Confidential

Big Data Engineer

Responsibilities:

Worked on Apache Dataflow to migrate the data from SQL server, ERP systems and FTP to BigQuery in GCP.
Created Dataflow pipelines using Apache Beam API in Java.
Created BigQuery partitioned and clustered datasets.
Created shard queries to process the data in parallel.
Created Apache Airflow pipelines using Python to schedule the daily/weekly jobs.
Experience in running the existing Spark applications in Dataproc clusters.
Worked on running Spark applications on clusters managed by Kubernetes which deploy the images into containers within pods.
Experience in working with Google Cloud Shell to deploy various applications.
Worked on Kubernetes Kubectl commands to perform various operations.
Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Hive.
Worked on Hive DDLs and Hive Query language (HQLs) for analytical processing.
Experience in Hive partitioning, bucketing and perform joins on hive tables and utilizing Hive SerDes like Regex, JSON and Avro.
Implemented dynamic partitions, bucketing and compression techniques in Hive External Tables and optimized the query performance.
Developed RDDs and Data Frames in Spark using Scala and applied several transformations to various datasets.
Developed Spark code using Scala and Spark - SQL for faster processing of data.
Handled data from various data sources and loaded them in SQL using stored procedures.
Worked on ingesting incremental updates from structured ERP systems (SAP) residing on Microsoft SQL server database on to Hadoop data platform using SQOOP.
Used Oozie to define a workflow to coordinate the execution of Spark, Hive and Sqoop jobs.
Monitored Hadoop Jobs and Reviewed Logs of the failed jobs to debug the issues based on the errors.
Worked on writing stored procedures in SQL to ingest the data from various data sources.
Modified ETL scripts for data load into HDFS, scheduled automated jobs and resolved production issues.
Good experience in understanding and visualizing data by integrating Hadoop with Tableau. Designed, documented operational problems and procedures using JIRA.

Confidential

Big Data Engineer

Responsibilities:

Migrating the Legacy Environment (Informatica) to Java Application and deployed in Kubernetes Docker platform.
Worked on a POC to import data from S3, apply Informatica transformations and publish data to Kafka.
Loaded Cassandra tables with the data in Kafka, then directed to MySQL for downstream applications.
Worked on implementing the Informatica components to Java applications.
Worked on building a CICD pipeline to deploy an application.
Hands - on experience in Apache NiFi and its custom processors.
Worked on publishing the JSON data using Kafka topics.
Used Oozie as jobs scheduler for most of the production applications.
Experienced in writing the test cases (JUnit, Mockito) and code coverage with SonarQube.
Created Sqoop Scripts to import/export user profile and other lookup data from RDBMS to S3 data store.
Developed various spark applications using python(pyspark) to perform cleansing, Transformation, and enrichment of these clickstream data.
Troubleshooting Spark applications for improved error tolerance and reliability.
Experienced in working with S3 in AWS cloud.
Implemented Partitioning (dynamic partitions and static partitions) and bucketing in HIVE.
Involved in continuous Integration of application using Jenkins.

Environment: Hadoop, Ambari, Kubernetes, Docker, Cassandra, Java, Sqoop, Spark, S3 buckets, Kafka, MySQL, Apache NiFi, JIRA, Junit, Mockito, Pig, Oozie, SonarQube, Shell Scripting, Hive, HDFS, Hbase.

Confidential

Software Engineer

Responsibilities:

Involved in various SDLC phases like Requirements gathering and analysis, Design, Development and Testing.
Developed SQL scripts and created tables based on the business requirement.
Connected SQL databases with Tableau for reporting purposes.
Developed Java classes, SQL queries and stored procedures to retrieve and transform the data from backend Oracle database using JDBC.
Used Maven for building and deploying the applications.
Used Tableau to automate the reporting without any manual effort.
Developed the business methods as per the IBM Rational Rose UML Model.
Developed web pages using HTML, CSS and JavaScript.
Extensively used Core Java to create applications based on the requirement.
Developed the Business Objects methods using Java and integrating the activity diagrams.
Worked using Rational Application Developer as Development Environment.
Worked using Junit as a unit testing component.
Designed error logging flow and error handling flow.
Used Apache log4j Logging framework for logging.
Designed and implemented the architecture for the project using OOAD, UML design patterns.
Followed Scrum development cycle for streamline processing with iterative and incremental development.
Worked with the testing team in creating new test cases and created the use cases for the module before the testing phase.
Provide support to resolve performance testing issues, profiling and cache mechanism.
Providing daily development status reports, weekly status reports, and weekly development summary and defects report.
Performs code reviews to ensure consistency to style standards and code quality.

Confidential

Web Developer Intern

Responsibilities:

Assisted in developing a webpage on Data Science based farmers support system for sustainable crop production.
Used HTML, CSS, JavaScript for the development of the app's homepage.
Worked with the development team for the debugging processes.
Coordinated with QA testing team to fix QA issues.
Used SQL to store the data of the farmer's crop production in the form of tables.

Confidential

Jr. Java Developer Intern

Responsibilities:

Involved in the creation of dynamic web pages with the use of JSP and HTML.
Responsible for daily activities include installation, configuring, working and debugging on Apache webserver, Websphere and Oracle database maintenance.
Configured and set up Java Workspace which included setting up server, theme installation and configuration.
Became familiar with using and navigating through Java Workspace (Eclipse).
Tested many components of web application and documented my observations.
Used JDBC to persist Java Objects into the database.
Maintain the data related to the staffing and the classes.
Maintain the standards set by the organization in the security of the data.
Query relation operations in the tasks assigned by the superiors.
Interacted with Analyst and Business owner to give up the best designing features for the users.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship