We provide IT Staff Augmentation Services!

Data Engineer Resume

2.00/5 (Submit Your Rating)

VirginiA

PROFESSIONAL SUMMARY:

  • Over 9+ years of experience in IT experience in Hadoop and experience in JAVA/J2EE application enhancement, development, and implementation.
  • Extensive experience in (SDLC) Software Development Life Cycle in Implementation, Development, Enhancement in methodologies like Waterfall and Agile/Scrum.
  • Extensive experience in working with Hadoop Eco system in distributed file system by using Hadoop paradigm using HDFS, Sqoop, Hive, Pig, HBase, Spark, Map Reduce Programs.
  • Strong knowledge in Assessing the Business Rules
  • Strong Knowledge of Hadoop Architecture on Installing and Configuring the Hadoop ecosystem and its components for the data processing and maintenance using HDFS, Sqoop, Hive, Spark, Zookeeper, Map Reduce Programs.
  • Experience in working with Hortonworks and Cloud Era (CDH4 and CDH5) for Executing Scripts.
  • Experience in building ETL pipelines using NIFI.
  • Experience in working with large volumes of data for the importing and exporting into HDFS using SQOOP Jobs and processing the data using HBase and Hive.
  • Experience in collecting the data from various data base and processing it into HDFS and performing the analytics using HIVE and storing the data to relational database.
  • Strong knowledge of Database in performing the Data Validation, Data Extraction, Data Modification and creating views, Tables, altering the tables and inserting data.
  • Experience in working and processing of Structured data, Unstructured Data, Semi Structured data and verification, importing the data sets in SQL Databases.
  • Strong knowledge in working with databases like Oracle, MYSQL, MS SQL, Cassandra, Mango DB for the validation, implementation, and processing of data into HDFS and Relational Databases Vice versa.
  • Experience in Working with Amazon Web Services (AWS) using EC2 and S3 for storage.
  • Strong knowledge in working with various data files like Delimited Text files, CSV files, Avro files, JSON and XML Files Processing.
  • Strong knowledge in creating the HIVE tables and processing of data into HIVE tables from various HDFS, and sorting and storing the data as per the business guidelines.
  • Knowledge on working User Defined Functions (UDF’S) for ETL data Processing using HIVEQL, and extending the defined functionality using specific data Processing.
  • Familiar with working with Oozie workflow engine to run multiple Hive and Pig jobs that run independently with time and data availability
  • Conversant in Agile Methodologies like Backlog refinement, Sprint Planning, Sprint Review and sprint retrospective.
  • Knowledge on Map Reduce codes using Java for combiners and implementing using the Maven Services.
  • Prior experience in working with java development using JAVA/J2EE technologies like JSP (Java Server Pages), JDBC, Servlets.
  • Experience in working with core Java components like objects, Classes, Methods, Parameters, Packages, constructors using Object Oriented Programming concepts (OOP’s).
  • Experience in working with version Control tools like Bit - Bucket, SVN, Source Tree.
  • Worked under Windows and UNIX/Linux Environments for the application development.
  • Experience in working with Tools for development like IntelliJ, Eclipse, CMDER.
  • Good communication, analytical, organizing & decision-making skill, with the ability to work within teams & a high level of self-motivation in performing all my responsibilities.
  • Hands on experience on unified Data Analytics with Databricks, Data bricks Workspace User Interface, Managing Databricks.
  • Expert in using Data bricks with Azure Data Factory (ADF) that process the data.
  • Experienced in working with Amazon Web Services(AWS) using EC2 for computing and S3 as storage.
  • Strong Knowledge in working with Amazon EC2to provide a complete solution for computing, query processing, and storage across a wide range of applications.

TECHNICAL SKILLS:

Big Data Technologies: HDFS, Hadoop, SQOOP, HIVE, HBase, Pig, Avro, Zookeeper, Oozie, Map Reduce, Avro, Spark (Spark Core and Spark SQL), Tez, Kafka, Nifi. Scripting and Programming Languages.: Java, CSH, HTML, Scala, Hive, SQL, Python, Shell Scripting.

Databases: ORACLE, MS SQL Server, MY SQL, NoSQL, Cassandra.

Monitoring Tools: Cloudera Manager, Ambari.

Reporting/ Bug Tracking tools: Jira, Confluence.

Operating Systems: Windows 7/10, Linux

Data Access Languages: SQL

Version Control Tools: Bit Bucket, SVN, Source Tree.

Application Tools: Eclipse, IntelliJ, Jupyter.

PROFESSIONAL EXPERIENCE:

Confidential, Virginia

Data Engineer

Responsibilities:

  • Working with HIVE and HDFS, implemented on Hadoop cluster using CentOS. Assisted with performance tuning and monitoring.
  • Implemented solutions utilizing Advanced AWS Components: EMR, EC2, etc integrated with Big Data/Hadoop Distribution Frameworks: Zookeeper, Yarn, Spark, Scala, NiFi etc.
  • Good experience on Apache Nifi Ecosystem.
  • Reviewed Business Requirements Documents and Technical and Functional Specifications for the data processing and provided the new implementation guidelines to BA for further usage.
  • Worked on the Agile/Scrum methodology for the analysis, design and implementation and enhancement of the data for the further .
  • Participated in data collection, data cleaning, data mining, developing models and visualizations.
  • Importing data from RDBMS using SQOOP and Storing it on to HDFS using Partitions and Mappers on regular intervals and exporting Sqoop data into RDBMS.
  • Created various Hive tables to load large sets of structured, semi-structured and unstructured data coming from Oracle, MySQL, and a variety of portfolios.
  • Imported data into HIVE from HDFS and performed analytics on hive tables and stored data into RDBMS. Imp0rting all tables data and performing the sorting operations and loading the specific data on to the tables for the BI Analysis Provided by the analysts.
  • Developing code to ingest data into HDFS using Sqoop, Spark and Scala.
  • Worked with AWS for data migration through HDFS, using Sqoop Migrated the data into HDFS and AWS once the data is loaded onto HDFS, performed data analysis using Spark SQL.
  • Developed Spark code using Scala and Spark-SQL for faster testing and processing of data
  • Performed HIVEQL for data processing and data analysis and created various file formats structures like AVRO, Text, CSV file formats.
  • Worked with Spark RDD’s, Datasets for converting RDD to Datasets.
  • Developed a data pipeline using Spark and Hive to ingest, transform and analyzing customer behavioral data.
  • Supported for the spark applications code and validated data using Scala by using the Spark Core and Spark SQL for the data analysis and exporting data into RDBMS.
  • Worked in the Linux Environment for the validation of SQOOP Jobs and Data processed onto it.
  • Validated the SQL Server data for the processing and implementation on to HDFS Streamline, as concerns raised by the BI analysis for the data inconsistency.
  • Created and maintained the technical documentation for launching the Hadoop clusters and listing Sqoop Job’s Information for creating and executing the HIVE Tables using HIVEQL.
  • Created and exported the HBase data into Hive Tables as part of transitioning form HBase to Hive for the smooth and easy operations.
  • Consolidating the SQOOP lists into one file using SQOOP CODEGEN for the analytics purposes.
  • Participating in design, code, and test inspections throughout life cycle to identify issues. Explaining technical considerations at related meetings, including those with internal clients.
  • Used Atlassian Bit-bucket for code repository and code review purposes.
  • Documented changes in Confluence pages for the further .
  • Worked closely with BAs for the changes in mapping documents and specification documents.
  • Hands on experience on PySpark application for running the spark jobs on the cluster.
  • Creating Spark clusters and configuring high concurrency clusters using Azure Databricks to speed up the preparation of high-quality data
  • Developed an automated process in Azure Cloud which can ingest data daily from web service and load in to Azure SQL DB.
  • Analyzed data where it lives by mounting Azure Data Lake and Blob to Databricks.
  • Extensive experience in working with Aws Cloud Platform (EC2, S3, EMR).
  • Migrated an existing on premises application to AWS .Used AWS services like EC2 and S3 for data sets processing and storing.

Environment: Aws, Azure, Cloudera, EMR, Hadoop, HDFS, Sqoop, Hive, MySQL, MSSQL, Linux, Spark core, Spark SQL, Scala, Bit Bucket, Jira, Confluence, Kafka, MS Office, Python, Eclipse, Zookeeper, Cassandra.

Confidential, Virginia

Data Engineer and Hadoop Developer

Responsibilities:

  • Worked on Hadoop eco systems including Hive, HBase, Oozie, Zookeeper, Spark Streaming. worked on retrieving the Spark streaming inputs from online services like online searches, requests, applications, clicks for understanding behavior of visitors, where all the data is stored, transformed and analyzed using Hadoop platform.
  • Responsible for developing Spark Streaming jobs, MapReduce jobs and involved in installations, upgrades, providing a large-scale data lake to store, analyze, transform the data as required for the business.
  • Created Hive tables using HiveQL that helped analysts by comparing fresh data with EDW tables and historical metrics.
  • Created and inserted data into Hive tables for the auto incremental of data into data tables dynamically using partitioning and bucketing and worked with AVRO Files and CSV Files.
  • Created Bucketing and Partitioning on the hive tables for extraction of data.
  • Involved in developing the Spark Streaming jobs by writing RDD’s using Scala and developing data frame using Spark SQL as needed.
  • Performed various operations using SQL for the creation of Hive tables like joining, filtering and sorting the data with various tables and retrieved required information onto the tables.
  • Worked with in Oozie workflow to automate the tasks of loading data into HDFS and pre-processing with Pig and Hive.
  • Reviewed and modified Business Requirements and Technical Specifications documents with the new implementation process for the further usage.
  • Worked on the Agile/Scrum methodology for the analysis, design and implementation and enhancement of the application.
  • Performed Linux Operations on HDFS server for the data look up and altered the jobs if any commits were disabled and re-scheduling the jobs for the data storage.
  • Worked for importing and exporting data from RDBMS to HDFS using Sqoop Operations and creating hive tables on top of created Sqoop jobs.
  • Worked with cloud era manager for the monitoring of jobs and managing in Hadoop Clusters.
  • Developed Hive UDF’s for the process improvement of loading the data into hive for the analytics purposes and store the data into Hadoop clusters.
  • Tested and validated the database tables using SQL queries and Stored Procedures and performed Data Validation and Data Integration.
  • Monitoring the Avro data files and CSV files data.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Experience working with Sqoop to transfer data between the HDFS to relational database like MySQL and vice versa and experience in using of Talend for this purpose. worked S3 buckets also managing policies for S3 buckets and Utilized S3 bucket for storage and backup onAWS.
  • Worked on MS SQL Database and performed backend validation, involved in post validation of the database using SQL Queries filters, sorting, aggregate functions and joins for the data accuracy.

Environment: AWS, Hadoop, HDFS, Map Reduce, Hive, HBase, Zookeeper, Java, Oracle, SQL Server, Scala, Spark, Sqoop, HBase, Python, Informatica, Talend, Cassandra, Hue, IntelliJ, Horton Works.

Confidential, CA

Data Engineer

Responsibilities:

  • Experienced in writing Spark Applications in Scala and Python (PySpark).
  • Imported Avro files using Apache Kafka and did some analytics using Sparking Scala.
  • Extracting real time data using Kafka and Spark streaming by Creating D streams and converting them into RDD, processing it and stored it into Cassandra.
  • Configured, deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Using Spark-Streaming APIs to perform transformations and actions on fly for building the common learner data model which gets the data from Kafka in near real time and persists into Cassandra.
  • Used Scala sbt to develop Scala coded spark projects and executed using spark-submit.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Developed the batch scripts to fetch the data from AWS S3storage and do required transformations in Scala using Spark framework.
  • Building the Cassandra nodes using AWS & amp; setting up the Cassandra cluster using Ansible automation tools
  • Worked and learned a great deal from Amazon Web Services (AWS) cloud services like EC2, S3, EMR, EBS, RDS and VPC.
  • Developed Scala scripts, UDF’s using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Involved in executing various Oozie workflows and automating parallelHadoop MapReduce jobs.
  • Developed Oozie Bundles to Schedule Pig, Sqoop and Hive jobs to create data pipelines.
  • Developed Hive queries to do analysis of the data and to generate the end reports to be used by business users.
  • Used spark and spark-SQL to read the parquet data and create the tables in hive using the Scala
  • API.
  • Design solution for various system components using Microsoft Azure.
  • Configures Azure cloud services for endpoint deployment.
  • Written generic extensive data quality check framework to be used by the application using impala.
  • Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoop cluster.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
  • Providing guidance to the developmentteam working on PySpark as ETL platform.
  • Used Pyspark jobs to run on Kubernetes Cluster for faster data processing
  • Involved in the process of Cassandra data modelling and building efficient data structures.

Environment: Hadoop, Hive, Impala, Oracle, Spark, Python, Pig, Sqoop, Oozie, Map Reduce, GIT, HDFS, Cassandra, Apache Kafka, Storm, Linux, Solr, Confluence, Jenkins.

Confidential, CA

Data Engineer

Responsibilities:

  • Worked on the Agile/Scrum methodology for the analysis, design and implementation of the application.
  • Installed and configured Hadoop clusters for application development and Hadoop tools like Hive, Pig, Sqoop, HBase, and Zookeeper.
  • Worked on developing ETL processes to load data into HDFS using Sqoop and export the results back to RDBMS.
  • Handled importing of data from various data sources, performed transformations using Hive, and loaded data into HDFS.
  • Moved data from Third party system to Hadoop File System (HDFS) vice versa using SQOOP commands hosted on an AWS cluster.
  • Configured Sqoop and developed scripts to extract data from MySQL into HDFS.
  • Created Hive queries that helped analysts by comparing fresh data with EDW tables and historical metrics.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Extensively worked on all kind of joins, filtering and sorting, aggregate functions and operators to fetch data from multiple tables Using SQL Queries. writing Hive Validation Scripts which are used in validation framework (for daily analysis through graphs and presented to business users).
  • Worked with SQOOP Jobs on retrieving the data from the relational databases in regular intervals.
  • Tested and validated the database tables using SQL queries and Stored Procedures and performed Data Validation and Data Integration.
  • Developed automatic job flows and ran through Oozie daily and when needed which runs MapReduce jobs internally.
  • Worked on ORACLE SQL Database and performs backend testing (SQL queries), Involved in testing the database and data validation using SQL Queries.

Environment: AWS, Hadoop, Hive, Sqoop, Oracle SQL, Linux, Java, Scala, Windows 7/10, Jira, XML, Avro, Parquet, My SQL, Informatica, Spark Core, Spark SQL, HDFS, Linux, Talend, HBase, Pig.

Confidential

Java Developer

Responsibilities:

  • Attended daily stand-up meetings in Agile/ SCRUM to update the status. Attended SPRINT Planning at the start of SPRINT.
  • Created and managed all hosted or local repositories through SourceTree's simple interface of GIT client, collaborated with GIT command lines
  • Developed and implemented Business Requirements using MVC framework. Implemented cross-cutting concerns such as logging, authentication and system performance using Spring AOP.
  • Used the Mozilla Firefox extension,Firebug, to view and debug HTML, DOM and JavaScript.
  • Developed a presentation layer using JSP, HTML, CSS and client validations using JavaScript.
  • Responsible for working with the client on establishing deliverables, timeline, managing project scope and project resources.
  • Debugging and Troubleshooting any technical issues while implementing the applications
  • Backend Testing of the Data Base by writing SQL queries and PL/SQL scripts to test the integrity of the application and Oracle databases.
  • Performed backend testing by writing and executing SQL queries to validate that data is being populated in appropriate tables and manually verify the correctness of the data with frontend values.
  • Produced reports and documentation for all testing efforts, results, activities, data, logging and tracking.
  • Tracked bugs and prepared the bug reports using the JIRA Defect Management Tool and interacted with the QA team to discuss technical issues.

Environment: Windows 7, MY SQL, MS SQL Server, Agile, Java, JSP, CSS, Eclipse, HTML, MS Office. al Details: Bachelors in CSE at Gitam University, Hyderabad, India 2012.

We'd love your feedback!