We provide IT Staff Augmentation Services!

Lead Hadoop/big Data Engineer Resume

3.00/5 (Submit Your Rating)

SUMMARY

  • 11+ years of experience in Software Analysis, Design, Development, Testing, Implementation, Documentation and Support.
  • 4+ Years of experience into Big Data/Hadoop and Analytics.
  • Worked as Lead Engineer for developing Big Data Lake Platform Services enabling application teams to configure and execute their ETL workflows.
  • Hands on Experience in Big Data Technologies - HDFS, Map Reduce, Pig, Hive, Impala, Sqoop, Spark (with Scala), Spark Streaming, Storm, Kafka.
  • Proficient in programming using Java, Scala.
  • Designed and Developed Batch & Streaming ETL workflows.
  • Worked on NoSQL databases HBase, MongoDB, Cassandra.
  • Proficiency in statistical modelling, Descriptive and Predictive analytics, Regression modelling, Time Series Analysis and Machine Learning using R, Spark MLlib, Knime and H20 Sparkling Water.
  • Worked on Big Data Platforms - CDH and Horton Works
  • Experienced in Containerization of Applications with Docker and Kubernetes.
  • Experienced in Performance tuning Spark Applications.

TECHNICAL SKILLS

Languages/Technologies: Java 8, J2EE, JDBC, Servlets, JSP, Hibernate, Spring, Spring Boot, Android Programming, Scala

Big Data Technologies/Predictive Analytics: HDFS, Map Reduce, Pig, Hive, Impala, Sqoop, Spark, Spark SQL, Spark GraphX, Storm, Kafka, Cloudera 5.13, R, Spark MLlib, Knime, H20

NoSQL Databases: HBase, MongoDB, Cassandra

Containerization: Docker with Kubernetes

SOA/BPM Technologies: Web Services, JBoss FSW 6.1, JBPM 6, Apache Camel

Web/Application Server: Apache Tomcat 7.0, JBoss EAP 6.1

Database: MySQL 5.5, ApacheDS LDAP, Oracle 11g and PostgreSQL

Scripting Languages: JavaScript, Unix Shell Scripting

IDE: Eclipse 3.5, Intellij and JBoss Developer Studio 8.1

Source Control: Rational Clear case, SVN, GIT

Defect Tracking Tool: Jira, Citrix Clarify

Build Tools: Ant, Maven, SBT

Testing Tools: JUnit, Mockito, SOAP UI, JMeter

PROFESSIONAL EXPERIENCE

Lead Hadoop/Big Data Engineer

Confidential

Responsibilities:

  • Design and Implementation lead for Data Lake Platform (EPS) which provides services to several application teams enabling them to configure and execute their ETL data workflows.
  • Developed Model Registry API which allows application users to register metadata about data (Schema Registry for Ingested and Enriched Datasets) as well as Enrichment Rules in Scala. worked on Spark GraphX (used for representing the model graph).
  • Developed Model Registration services (REST) with Swagger UI. worked on Containerizing EPS services with Docker and Kubernetes.

Environment: Languages/Technologies: Scala, Spark, Spark SQL, Kafka, Spark GraphX, Hive, Impala, Spring boot, REST services, Intellij, GIT Repository, Hadoop Framework CDH 5.13, Maven, UNIX Shell Scripting, Docker, Kubernetes.

Confidential

Big Data Engineer

Responsibilities:

  • Design and development of Spark Streaming jobs for processing the credit usage data reported by several credit checking systems.
  • Designed HBase table schema for persisting the raw credit usage as well as aggregated data
  • Developed Spark Streaming jobs for performing aggregations on the credit usage data.
  • Designed Impala tables schema (aggregated data is loaded into Impala tables for consumption by reporting tools like Tableau).
  • Design and development of Spark ETL batch jobs for processing historical credit usage data stored in HBase.
  • Developed Spark jobs for extracting statistical knowledge about Credit usage from Historical Data. (Descriptive Analytics). worked on Performance tuning of Spark jobs.
  • Created Kafka topics and Ingested raw credit usage data onto Kafka. worked on poc to create a predictive model based on Regression for estimating expected turnaround time and predict milestone breach for all the ETL jobs across Credit Risk. worked on creating predictive models using Spark MLlib. worked on a poc to create predictive model using Sparkling Water (Spark- H20 Integration).
  • Develop spark SQL tables & queries to perform Adhoc data analytics for analyst team.
  • Monitoring spark clusters.
  • Used Autosys (jil scripting) for scheduling the batch jobs. worked on file formats AVRO, PARQUET and SEQUENCE files.

Environment: Languages/Technologies: Scala, Spark, Spark SQL, Spark Streaming, Impala, HBase, Spring boot, REST services, Gem fire cache, Hadoop Framework (CDH 5.10), Maven, UNIX Shell Scripting, Spark ML, Sparking Water (Spark - H20 Integration)

Hadoop, Big Data Engineer

Confidential

Responsibilities:

  • Design, Development, Integration testing of Storm based Data Ingestion layer for preprocessing subscriber data.
  • Performance Tuning Storm Topologies and Message Buffers.
  • Worked on CEI Content Pack related aggregations using Hive Queries.
  • Worked on Predictive Model for predicting subscriber churn based on the customer care insights and customer experience index data using Random Forests Algorithm with Knime Platform.
  • Design and Develop POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive.
  • Involved in migrating Hive queries into Spark transformations using Data frames, Spark SQL, SQL Context, and Scala.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: CDH, Storm, Kafka, Hive, JBoss FSW, JBPM, HBase, Intellij, Java, Shell Script, XML, JIRA, GitHub, Jenkins.

Senior Java Developer

Confidential

Responsibilities:

  • Worked on several Telecom projects as a Java Backend Developer

Environment: Java, JBoss FSW, JBPM, Apache Camel, JMS, Web Services (REST and SOAP), PostgreSQL, Hibernate, Oracle, Shell Script, XML, JIRA, GitHub, Jenkins

We'd love your feedback!