We provide IT Staff Augmentation Services!

Data Engineer Resume

5.00/5 (Submit Your Rating)

Portland, OR

SUMMARY

  • Having 8+ Years of experience in Information Technology Industry. Having4+ years of experience as Hadoop/Spark developer using Big data technologies like Hadoop and Spark Ecosystems.
  • Experience in Hadoop Ecosystem components like MapReduce, Sqoop, Flume, Kafka, Pig, Hive, Spark, Storm, HBase, Oozie and Zookeeper.
  • Good Knowledge in writing Spark Applications in Scala and Java.
  • Involved in ingesting data into HDFS using Apache Nifi. Developed and deployed Apache Nifi flows across various environments, optimized Nifi data flows and written QA scripts in python for tracking missing files.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.
  • Replaced existing map - reduce jobs and Hive scripts wif Spark Data-Frame transformation and actions.
  • Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (AWScloud) using Sqoop.
  • Good Knowledge in collecting and storing stream data like log data in HDFS using Apache Flume.
  • Good knowledge in querying data from Cassandra for searching grouping and sorting.
  • Experience in Data Modeling and working wif Cassandra Query Language (CQL).
  • Hands on expertise in working and designing of Row keys & Schema Design wif NOSQL databases like Mongo DB.
  • Experience in extracting files from MongoDB through Sqoop and placed in HDFS and processed.
  • Experienced wif performing CRUD operations using HBase Java Client API.
  • Created dataflow between SQL Server and Hadoop clusters using Apache Nifi.
  • Involved in developing Impala scripts to do Adhoc queries.
  • Working knowledge of Amazon's Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Good Knowledge in writing live Real-time Processing using Spark Streaming wif Kafka.
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
  • Experience in cluster coordination using Zookeeper.
  • Used Oozie and Zookeeper operational services for coordinating cluster and scheduling workflows.
  • Supported various reporting teams and experience wif data visualization tool Tableau.
  • Designing & Creating ETL Jobs through Talend to load huge volumes of data into Cassandra, Hadoop Ecosystem and relational databases.
  • Expertise in developing Hive Generic UDF's to implement complex business logic to in corporate into Hive QL.
  • Created dataflow between SQL Server and Hadoop clusters using ApacheNifi.
  • Expertise in implementing Ad-hoc queries using Hive QL and good knowledge in creating Hive tables and loading and analyzing data using hive queries.
  • Having strong technical skills in Core Java wif working knowledge.
  • Experience in various faces of Software Development including analysis, design, development and deployment of applications using Servlets, JSP, Spring Framework, and JDBC.

TECHNICAL SKILLS

Operating System: Windows, Linux distributions like Ubuntu, CentOS

Hadoop Distribution: Cloudera (CDH 3, CDH4, CDH5), Horton Works

Languages: Java, Scala, Python

Data stores: MySQL, SQL Server

Big data: MapReduce, HDFS, Flume, Hive, Pig, Oozie, HBase, Sqoop, Spark, NiFi and Kafka

Amazon Stacks: AWS EMR, S3, EC2, Lambda, Route 53, EBS, CloudFront

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2

ETL: Talend and Informatica

Application Servers: Apache Nifi

Web Design Tools: HTML, DHTML, AJAX, JavaScript, jQuery and CSS, AngularJs, and JSON

Development/Build tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J.

No SQL Database: Cassandra, Mongo DB, H Base

Java Technologies: Servlets, JavaBeans, JSP, JDBC, and Spring MVC

PROFESSIONAL EXPERIENCE

Confidential, Portland, OR

Data Engineer

Responsibilities:

  • Contributed to teh Data Integration track for teh development of anew Cost Accounting & Profitability Analytical Domain on Azure Data Factory and Azure HDInsight.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Experienced in loading and transforming of large sets of structured, semi structured, and Unstructured data.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Expertise in implementing Spark Scala application using higher order functions for both batchand interactive analysis requirement.
  • Developed Spark/Python for regular expression (regex) project in teh Hadoop/Hive environment wif Linux/Windows for big data resources.
  • Designed, developed and implemented data masking methods and techniques including Blake2 to safeguard sensitive data such as PHI/PII in accordance wif HIPAA Laws using secure keys retrieved from Azure Key Vault.
  • Worked on transforming teh queries written in Hive to Spark Application. Worked on Apache Nifi to decompress and move JSON files from local to HDFS.
  • Experienced in developing Spark scripts for data analysis in both python and scala.
  • Built on premise data pipelines using Kafka and spark for real time data analysis.
  • Created reports in TABLEAU for visualization of teh data sets created and tested native Drill,Impala and Spark connectors.
  • Analysed teh SQL scripts and designed teh solution to implement using Scala.
  • Implemented Hive complex UDF's to execute business logic wif Hive Queries.
  • Responsible for loading bulk amount of data in HBase using MapReduce by directly creatingH-files and loading them.
  • Evaluated performance of Spark SQL vs IMPALA vs DRILL on offline data as a part of poc.
  • Worked on solr configuration and customizations based on requirements.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Designed and developed SSIS (ETL) packages to validate, extract, transform and load data from OLTPsystem to teh Data warehouse and Report-Data mart.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Ingested streaming data wif Apache NiFi into Kafka.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Responsible for developing data pipeline by implementing Kafka producers and consumers.
  • Worked on teh ETL scripts and fixed teh issues at teh time ofdataload from variousdatasources.
  • Performed data analysis wif HBase using Apache Phoenix.
  • Used Apache NIFI to copy teh data from local file system to HDFS.
  • Managing and reviewing Hadoop Log files to resolve any configuration issues.
  • Developed a program to extract teh name entities from OCR files.
  • Fixed defects as needed during teh QA phase, support QA testing, troubleshoot defects andidentify teh source of defects.
  • Used GIT for version control.

Environment: Hadoop, HDFS,Spark, SQL, HQL,Kafka, HIVE, Apache NiFi, YARN, Ambari, TEZ, ZooKeeper, Scala, Python, Sqoop, Shell Scripting, Microsoft Azure, Hortonworks Data Platform, Agile Methodology, SAFe®, Oracle, Teradata, SQL Server, Visual Studio, Jupyter, Git, Azure HDInsight, ADLS, Blob Storage, ADF,Kerberos, LDAP, Jenkins.

Confidential, WA

Hadoop/Spark Developer

Responsibilities:

  • Worked wif Hadoop Ecosystem components like Cassandra, Sqoop, Flume, Oozie, Hive and Pig.
  • Developed PIG and Hive UDF's in java for extended use of PIG and Hive and wrote Pig Scripts for sorting, joining, filtering and grouping teh data.
  • Developed spark programs using Scala, involved in creating Spark SQL Queries and Developed Oozie workflow for spark jobs.
  • Developed teh Oozie workflows wif Sqoop actions to migrate teh data from relational databases like Oracle, Teradata to HDFS.
  • Developed Hive queries to do analysis of teh data and to generate teh end reports to be used by business users.
  • Implemented Apache Nifi flow topologies to perform cleansing operations before moving data into HDFS.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Developed a data pipeline using Kafka, Cassandra and Hive to ingest, transform and analysing customer behavioural data.
  • Great familiarity wif Hive joins & used HQL for querying teh databases eventually leading to complex Hive UDFs.
  • Responsible to migrate iterative map reduce programs into Spark transformations using Spark and Scala.
  • Used Scala to write teh code for all teh use cases in Spark and Spark SQL.
  • Worked wif Spark core, Spark Streaming and spark SQL modules of Spark.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Developed a data pipeline using Spark and Hive to ingest, transform and analysing data.
  • Developed Spark scripts by using Scala shell commands as per teh requirement.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Analysed teh SQL scripts and designed teh solution to implement using Scala.
  • Responsible for developing data pipeline wif Amazon AWS to extract teh data from weblogs and store in HDFS.
  • Worked wif Apache Nifi for Data Ingestion. Triggered teh shell Script and Schedule them using Nifi.
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EMR and RDS.

Environment: Hadoop YARN, Spark Core, Spark Streaming, Apache Nifi,Spark SQL, Scala, Kafka, Hive, Cassandra, Sqoop, Amazon AWS, Tableau, Oozie, Cloudera, Oracle, Linux.

Confidential, WI

Hadoop/Spark Developer

Responsibilities:

  • Experienced in writing Spark Applications in Scala and Python (PySpark).
  • Imported Avro files using Apache Kafkaand did some analytics using SparkingScala.
  • Extracting real time data using Kafka and Spark streaming by Creating D streams and converting them into RDD, processing it and stored it into Cassandra.
  • Configured, deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Using Spark-Streaming APIs to perform transformations and actions on fly for building teh common learner data model which gets teh data from Kafka in near real time and persists into Cassandra.
  • Used Scala sbt to develop Scala coded spark projects and executed using spark-submit.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Developed teh batch scripts to fetch teh data from AWS S3storage and do required transformations in Scala usingSpark frame work.
  • Building teh Cassandra nodesusing AWS& setting up teh Cassandra cluster using Ansible automation tools
  • Worked and learned a great deal from Amazon Web Services(AWS) cloud services like EC2, S3, EMR, EBS, RDS and VPC.
  • Developed Scala scripts, UDF’s using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Involved in executing various Oozie workflows and automating parallelHadoop MapReduce jobs.
  • Developed Oozie Bundles to Schedule Pig, Sqoop and Hive jobs to create data pipelines.
  • Developed Hive queries to do analysis of teh data and to generate teh end reports to be used by business users.
  • Used spark and spark-SQL to read teh parquet data and create teh tables in hive using teh Scala API.
  • Design solution for various system components using Microsoft Azure.
  • Configures Azure cloud services for endpoint deployment.
  • Written generic extensive data quality check framework to be used by teh application using impala.
  • Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration wif Hadoop cluster.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
  • Involved in teh process ofCassandra data modelling and building efficient data structures.

Environment: Hadoop, Hive, Impala, Oracle, Spark, Python, Pig, Sqoop, Oozie, Map Reduce, GIT, HDFS, Cassandra, Apache Kafka, Storm,Linux, Solr, Confluence, Jenkins.

Confidential

Java Developer

Responsibilities:

  • Involved in Analysis, design and coding on JAVA/JSP Front End Environment.
  • Responsible for developing use cases, class and sequence diagram for teh modules using UML and Rational Rose Enterprise edition as a Feature owner.
  • Developed application using spring, Servlets, JSP and EJB.
  • Implemented MVC (Model View Controller) architecture.
  • Designed teh Application flow using Rational Rose.
  • Used web servers like Apache Tomcat.
  • Implemented Application prototype using HTML, CSS and JavaScript.
  • Developed teh user interfaces wif teh spring tag libraries.
  • Developed, build and deployment scripts using Apache ANT to customize WAR, EAR and EJB jar files.
  • Prepared field validation and on-scenario test cases using Junit and testing of teh module in 3 phases named unit testing and system using testing and regression testing.
  • Code and unit test according to client standards.
  • Used Oracle Database for data storage and coding stored procedures, functions and triggers.
  • Wrote DB queries using SQL for interacting wif database.
  • Design and develop XML processing components for dynamic menus on teh application.
  • Created Components using JAVA, spring and JNDI.
  • Prepared spring deployment descriptors using XML.
  • Developed teh entire application using Eclipse and deployed them on Web Sphere Application Server.
  • Problem Management during QA, Implementation and Post- Production Support.
  • Developed a logging component using Apache Log4J to log messages and errors and wrote test cases to verify teh code for different conditions using Junit.

Environment: Java, HTML, Spring, JSP, Servlets, DBMS, Web Services, JNDI, JDBC, Eclipse, Web sphere, XML/XSL, Apache Tomcat, TOAD, Oracle, MySQL, JUNIT, Log4j, SQL, PL/SQL, CSS.

Confidential

Java Developer

Responsibilities:

  • Developing rules based on different state policy using SpringMVC, iBatis ORM, spring web flow, JSP, JSTL, Oracle, MSSQL, SOA, XML, XSD, JSON, AJAX, and Log4j.
  • Gathered requirements, developed, implemented, tested and deployed enterprise integration patterns (EIP) based applications using Apache Camel, JBoss Fuse.
  • Developed service classes, domain/DAOs, and controllers using JAVA/J2EE technologies.
  • Designed and developed using web service framework - Apache CX.
  • Worked on Active MQ messaging service for integration.
  • Worked wif SQL queries to store and retrieve teh data in MS SQL server.
  • Performed unit testing using Junit.
  • Developed front end using JSTL, JSP, HTML, and Java Script.
  • Worked on continuous integration using Jenkins/Hudson.
  • Participated in all phases of development life cycle including analysis, design, development, testing, code reviews and documentations as needed.
  • Used ECLIPSE as IDE, MAVEN for build management, JIRA for issue tracking, CONFLUENCE for documentation purpose, GIT for version control, ARC (Advanced Rest Client) for endpoint testing, CRUCIBLE for code review and SQL Developer as DB client.

Environment: Spring Framework, Spring MVC, spring web flow, JSP, JSTL,, Oracle 11g, XML, JSON, Ajax, HTML, CSS,, RAD wif sub-eclipse, Jenkins, maven, SOA,, Log4j, Java, Junit.

We'd love your feedback!