We provide IT Staff Augmentation Services!

Data Engineer Resume

0/5 (Submit Your Rating)

Baton Rouge, LA

SUMMARY

  • Around 5 years of professional experience in enterprise software development through Open Source Hadoop technologies and cloud - based solutions.
  • Handled client facing roles and worked in both development and support projects in Agile Model with Bi-weekly sprints.
  • Developing views in Denodo through connecting to Hive and comparing data from other sources.
  • Created Views on Denodo from Hive and web services. The views were exposed to BI teams and loaded into Oracle as well.
  • Hands-on with the admin tasks of Denodo and the number of sources integrated are 17 sources.
  • Implemented NiFi - spark streaming- Hbase pipeline and integrated the result into denodo.
  • Hands-on experience with Amazon web services components like EC2, EMR, S3, Codepipeline and Cloudwatch components.
  • Converted existing Map Reduce code into Spark sql and integrated the same with spark streaming at a latter stage of the project to develop the data lake.
  • Developed NIFI and KAFKA integration as per client requirements and optimizing to maximize the cluster load without compromising performance.
  • Effectively used Oozie to develop automatic workflows of Sqoop and Hive jobs. Parallel and Sequential execution work flows are built, User Details encryption is implemented.
  • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Proficient in performance tuning while dealing with huge data in HIVE and Spark SQL scripts.
  • Flexible with Agile methodologies, Scrum user stories and involved in sprint planning and giving code reviews.
  • Experienced in creating and analyzing Software Requirement Specifications (SRS) and Functional Specification Document (FSD). Strong knowledge of Software Development Life Cycle (SDLC).

TECHNICAL SKILLS

Primary Languages: Scala

Databases: IBM DB2, MySql, Oracle, SQL Server, Hive, Denodo

No-SQL: Hbase

Virtualization: Denodo

Cloud Components: AWS-EC2, EMR, S3, Cloudwatch, Codepipeline

IDE’s: Eclipse, IntelliJ

Domain Knowledge: Insurance, Telecom, Retail, Manufacturing

Frameworks: Spark, Scala-test

Distributions: Hortonworks, Cloudera, AWS

PROFESSIONAL EXPERIENCE

Confidential - Baton Rouge, LA

Environment: Amazon S3, EMR, EC2 and Hive, Nifi, Spark, Linux, GIT, HDP.

Data Engineer Customer Data Analysis

Responsibilities:

  • Worked on Spark SQL for extracting data from logfiles and cross check them with the rules database maintained in Oracle.
  • Developed a streaming pipeline to get the streaming log data for multi-player environments in Kafka-spark-Nifi components.
  • Implemented Performance improvements to Nifi scripts to improve job performance.
  • Asynchronous streaming is coded for Kafka since the log files frequency varies for different games.
  • Developed data sanity check code for incoming logs to validate the utilization factor of the log files.

Confidential, Baton Rouge, LA

Environment: Amazon S3, EMR, Cloudwatch and Hive, oozie, sqoop, Hive, Linux, GIT, HDP.

Data Engineer Datalake Design

Responsibilities:

  • Created Design Documents for the requirements and the Hortonworks cluster for the project requirements and AWS Eco-system.
  • Deployed the scripts on AWS EMR instance and made performance optimization changes.
  • Developed the Sqoop scripts to make the transfer data between Hive or HDFS and Oracle Database.
  • Created External/ Managed Hive tables with optimized partitioning and bucketing and Involved in developing the Hive Reports.
  • Worked on Spark SQL for faster execution of Hive queries using Spark SQL Context.
  • Invoking HQL from Spark SQL and storing it in ORC file as the storage format.
  • Migrated data from MySQL server to Hadoop using Sqoop for processing data.
  • Implemented Performance improvements to Sqoop scripts to improve job performance.
  • Implemented the Password encryption in the Oozie scripts.
  • Implemented compression codec in saving the data in HDFS and HIVE tables.
  • Scheduled the Oozie scripts through Coordinator Job.

Confidential

Environment: HDP, Pig, Hive, Map Reduce, Apache NIFI, HBase, Kafka, Apache Spark, Shell Script, MySQL, EDW and Denodo, SVN, Putty.

Senior Software Engineer Supply chain data analytics

Responsibilities:

  • Integrated DENODO virtualization tool with hive and SAP to build views to present in Tableau reports.
  • Designed the NIFI workflow to convert the XML data into JSON and connect to Kafka.
  • The JSON data is pushed to Hbase after it is processed through spark streaming code as first data pipeline.
  • Converting map reduce jobs into spark sql and for newer data spark streaming code is created and the results are pushed to hive as a second data pipeline.
  • Monitoring all the NIFI flows to get notifications in case if there is no data flow through the flow more than the specific time.
  • Created NIFI workflows to trigger spark jobs in case if we have any failures we got email notifications regarding the failures.
  • Integrated Kafka with spark steaming to process the JSON files and apply daily aggregations on top of them.
  • Involved in Converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
  • Written the Apache PIG scripts to process log files, flat files which are generating by the Mainframe server and moving into the HDFS /Hive data.
  • Created AVRO tables to store the processed JSON Data so that the schema resolution issues could be handled in the future.
  • Processed all log files (POS, TLog) generated from various sources to HDFS using Apache NIFI.
  • Implemented Kafka Brokers in order to read the data from TLog data from JMS queue and process same using Apache Spark parser.

Confidential

Environment: HDP, Pig, Hive, Apache NIFI, HBase, Kafka, Apache Spark, Shell Script, MySQL, EDW and Denodo, SVN, Putty.

Software Engineer Evidence management system/Classic Gateway

Responsibilities:

  • Created Datasets from the streaming pipeline to publish to tableau reports.
  • Designed the NIFI/HBASE pipeline to collect the processed customer data into Hbase tables.
  • Collecting and aggregating large amounts of log data using Apache sqoop and staging data in HDFS for further analysis.
  • Worked on incremental logic mode for maintaining data in cluster.
  • Using Avro file format compressed with Snappy in JSON tables for faster processing of data.
  • Implemented PIG Scripts to split the log Files into Structured ORC files. Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team
  • Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.

Confidential

Environment: HDP, Hive, PIG, MySQL, EDW and SVN, Putty.

Software Intern - SQL Developer

Responsibilities:

  • Generating the data through PIG from the log servers of Airtel.
  • Created Hive Internal tables to develop hive warehouse.
  • Developed tables for the requirements of the Business Analyst teams.

We'd love your feedback!