Data Engineer Resume
Portland, OR
SUMMARY
- Having 8+ Years of experience in Information Technology Industry. Having4+ years of experience as Hadoop/Spark developer using Big data technologies like Hadoop and Spark Ecosystems.
- Experience in Hadoop Ecosystem components like MapReduce, Sqoop, Flume, Kafka, Pig, Hive, Spark, Storm, HBase, Oozie and Zookeeper.
- Good Knowledge in writing Spark Applications in Scala and Java.
- Involved in ingesting data into HDFS using Apache Nifi. Developed and deployed Apache Nifi flows across various environments, optimized Nifi data flows and written QA scripts in python for tracking missing files.
- Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.
- Replaced existing map - reduce jobs and Hive scripts with Spark Data-Frame transformation and actions.
- Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (AWScloud) using Sqoop.
- Good Knowledge in collecting and storing stream data like log data in HDFS using Apache Flume.
- Good knowledge in querying data from Cassandra for searching grouping and sorting.
- Experience in Data Modeling and working with Cassandra Query Language (CQL).
- Hands on expertise in working and designing of Row keys & Schema Design with NOSQL databases like Mongo DB.
- Experience in extracting files from MongoDB through Sqoop and placed in HDFS and processed.
- Experienced with performing CRUD operations using HBase Java Client API.
- Created dataflow between SQL Server and Hadoop clusters using Apache Nifi.
- Involved in developing Impala scripts to do Adhoc queries.
- Working knowledge of Amazon's Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
- Good Knowledge in writing live Real-time Processing using Spark Streaming with Kafka.
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
- Experience in cluster coordination using Zookeeper.
- Used Oozie and Zookeeper operational services for coordinating cluster and scheduling workflows.
- Supported various reporting teams and experience with data visualization tool Tableau.
- Designing & Creating ETL Jobs through Talend to load huge volumes of data into Cassandra, Hadoop Ecosystem and relational databases.
- Expertise in developing Hive Generic UDF's to implement complex business logic to in corporate into Hive QL.
- Created dataflow between SQL Server and Hadoop clusters using ApacheNifi.
- Expertise in implementing Ad-hoc queries using Hive QL and good knowledge in creating Hive tables and loading and analyzing data using hive queries.
- Having strong technical skills in Core Java with working knowledge.
- Experience in various faces of Software Development including analysis, design, development and deployment of applications using Servlets, JSP, Spring Framework, and JDBC.
TECHNICAL SKILLS
Operating System: Windows, Linux distributions like Ubuntu, CentOS
Hadoop Distribution: Cloudera (CDH 3, CDH4, CDH5), Horton Works
Languages: Java, Scala, Python
Data stores: MySQL, SQL Server
Big data: MapReduce, HDFS, Flume, Hive, Pig, Oozie, HBase, Sqoop, Spark, NiFi and Kafka
Amazon Stacks: AWS EMR, S3, EC2, Lambda, Route 53, EBS, CloudFront
RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2
ETL: Talend and Informatica
Application Servers: Apache Nifi
Web Design Tools: HTML, DHTML, AJAX, JavaScript, jQuery and CSS, AngularJs, and JSON
Development/Build tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J.
No SQL Database: Cassandra, Mongo DB, H Base
Java Technologies: Servlets, JavaBeans, JSP, JDBC, and Spring MVC
PROFESSIONAL EXPERIENCE
Confidential, Portland, OR
Data Engineer
Responsibilities:
- Contributed to the Data Integration track for the development of anew Cost Accounting & Profitability Analytical Domain on Azure Data Factory and Azure HDInsight.
- Responsible for building scalable distributed data solutions using Hadoop.
- Experienced in loading and transforming of large sets of structured, semi structured, and Unstructured data.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Expertise in implementing Spark Scala application using higher order functions for both batchand interactive analysis requirement.
- Developed Spark/Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
- Designed, developed and implemented data masking methods and techniques including Blake2 to safeguard sensitive data such as PHI/PII in accordance with HIPAA Laws using secure keys retrieved from Azure Key Vault.
- Worked on transforming the queries written in Hive to Spark Application. Worked on Apache Nifi to decompress and move JSON files from local to HDFS.
- Experienced in developing Spark scripts for data analysis in both python and scala.
- Built on premise data pipelines using Kafka and spark for real time data analysis.
- Created reports in TABLEAU for visualization of the data sets created and tested native Drill,Impala and Spark connectors.
- Analysed the SQL scripts and designed the solution to implement using Scala.
- Implemented Hive complex UDF's to execute business logic with Hive Queries.
- Responsible for loading bulk amount of data in HBase using MapReduce by directly creatingH-files and loading them.
- Evaluated performance of Spark SQL vs IMPALA vs DRILL on offline data as a part of poc.
- Worked on solr configuration and customizations based on requirements.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Designed and developed SSIS (ETL) packages to validate, extract, transform and load data from OLTPsystem to the Data warehouse and Report-Data mart.
- Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
- Ingested streaming data with Apache NiFi into Kafka.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Responsible for developing data pipeline by implementing Kafka producers and consumers.
- Worked on the ETL scripts and fixed the issues at the time ofdataload from variousdatasources.
- Performed data analysis with HBase using Apache Phoenix.
- Used Apache NIFI to copy the data from local file system to HDFS.
- Managing and reviewing Hadoop Log files to resolve any configuration issues.
- Developed a program to extract the name entities from OCR files.
- Fixed defects as needed during the QA phase, support QA testing, troubleshoot defects andidentify the source of defects.
- Used GIT for version control.
Environment: Hadoop, HDFS,Spark, SQL, HQL,Kafka, HIVE, Apache NiFi, YARN, Ambari, TEZ, ZooKeeper, Scala, Python, Sqoop, Shell Scripting, Microsoft Azure, Hortonworks Data Platform, Agile Methodology, SAFe®, Oracle, Teradata, SQL Server, Visual Studio, Jupyter, Git, Azure HDInsight, ADLS, Blob Storage, ADF,Kerberos, LDAP, Jenkins.
Confidential, WA
Hadoop/Spark Developer
Responsibilities:
- Worked with Hadoop Ecosystem components like Cassandra, Sqoop, Flume, Oozie, Hive and Pig.
- Developed PIG and Hive UDF's in java for extended use of PIG and Hive and wrote Pig Scripts for sorting, joining, filtering and grouping the data.
- Developed spark programs using Scala, involved in creating Spark SQL Queries and Developed Oozie workflow for spark jobs.
- Developed the Oozie workflows with Sqoop actions to migrate the data from relational databases like Oracle, Teradata to HDFS.
- Developed Hive queries to do analysis of the data and to generate the end reports to be used by business users.
- Implemented Apache Nifi flow topologies to perform cleansing operations before moving data into HDFS.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Developed a data pipeline using Kafka, Cassandra and Hive to ingest, transform and analysing customer behavioural data.
- Great familiarity with Hive joins & used HQL for querying the databases eventually leading to complex Hive UDFs.
- Responsible to migrate iterative map reduce programs into Spark transformations using Spark and Scala.
- Used Scala to write the code for all the use cases in Spark and Spark SQL.
- Worked with Spark core, Spark Streaming and spark SQL modules of Spark.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Developed a data pipeline using Spark and Hive to ingest, transform and analysing data.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Analysed the SQL scripts and designed the solution to implement using Scala.
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
- Worked with Apache Nifi for Data Ingestion. Triggered the shell Script and Schedule them using Nifi.
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Worked and learned a great deal from AWS Cloud services like EC2, S3, EMR and RDS.
Environment: Hadoop YARN, Spark Core, Spark Streaming, Apache Nifi,Spark SQL, Scala, Kafka, Hive, Cassandra, Sqoop, Amazon AWS, Tableau, Oozie, Cloudera, Oracle, Linux.
Confidential, WI
Hadoop/Spark Developer
Responsibilities:
- Experienced in writing Spark Applications in Scala and Python (PySpark).
- Imported Avro files using Apache Kafkaand did some analytics using SparkingScala.
- Extracting real time data using Kafka and Spark streaming by Creating D streams and converting them into RDD, processing it and stored it into Cassandra.
- Configured, deployed and maintained multi-node Dev and Test Kafka Clusters.
- Using Spark-Streaming APIs to perform transformations and actions on fly for building the common learner data model which gets the data from Kafka in near real time and persists into Cassandra.
- Used Scala sbt to develop Scala coded spark projects and executed using spark-submit.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Developed the batch scripts to fetch the data from AWS S3storage and do required transformations in Scala usingSpark frame work.
- Building the Cassandra nodesusing AWS& setting up the Cassandra cluster using Ansible automation tools
- Worked and learned a great deal from Amazon Web Services(AWS) cloud services like EC2, S3, EMR, EBS, RDS and VPC.
- Developed Scala scripts, UDF’s using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Involved in executing various Oozie workflows and automating parallelHadoop MapReduce jobs.
- Developed Oozie Bundles to Schedule Pig, Sqoop and Hive jobs to create data pipelines.
- Developed Hive queries to do analysis of the data and to generate the end reports to be used by business users.
- Used spark and spark-SQL to read the parquet data and create the tables in hive using the Scala API.
- Design solution for various system components using Microsoft Azure.
- Configures Azure cloud services for endpoint deployment.
- Written generic extensive data quality check framework to be used by the application using impala.
- Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoop cluster.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
- Involved in the process ofCassandra data modelling and building efficient data structures.
Environment: Hadoop, Hive, Impala, Oracle, Spark, Python, Pig, Sqoop, Oozie, Map Reduce, GIT, HDFS, Cassandra, Apache Kafka, Storm,Linux, Solr, Confluence, Jenkins.
Confidential
Java Developer
Responsibilities:
- Involved in Analysis, design and coding on JAVA/JSP Front End Environment.
- Responsible for developing use cases, class and sequence diagram for the modules using UML and Rational Rose Enterprise edition as a Feature owner.
- Developed application using spring, Servlets, JSP and EJB.
- Implemented MVC (Model View Controller) architecture.
- Designed the Application flow using Rational Rose.
- Used web servers like Apache Tomcat.
- Implemented Application prototype using HTML, CSS and JavaScript.
- Developed the user interfaces with the spring tag libraries.
- Developed, build and deployment scripts using Apache ANT to customize WAR, EAR and EJB jar files.
- Prepared field validation and on-scenario test cases using Junit and testing of the module in 3 phases named unit testing and system using testing and regression testing.
- Code and unit test according to client standards.
- Used Oracle Database for data storage and coding stored procedures, functions and triggers.
- Wrote DB queries using SQL for interacting with database.
- Design and develop XML processing components for dynamic menus on the application.
- Created Components using JAVA, spring and JNDI.
- Prepared spring deployment descriptors using XML.
- Developed the entire application using Eclipse and deployed them on Web Sphere Application Server.
- Problem Management during QA, Implementation and Post- Production Support.
- Developed a logging component using Apache Log4J to log messages and errors and wrote test cases to verify the code for different conditions using Junit.
Environment: Java, HTML, Spring, JSP, Servlets, DBMS, Web Services, JNDI, JDBC, Eclipse, Web sphere, XML/XSL, Apache Tomcat, TOAD, Oracle, MySQL, JUNIT, Log4j, SQL, PL/SQL, CSS.
Confidential
Java Developer
Responsibilities:
- Developing rules based on different state policy using SpringMVC, iBatis ORM, spring web flow, JSP, JSTL, Oracle, MSSQL, SOA, XML, XSD, JSON, AJAX, and Log4j.
- Gathered requirements, developed, implemented, tested and deployed enterprise integration patterns (EIP) based applications using Apache Camel, JBoss Fuse.
- Developed service classes, domain/DAOs, and controllers using JAVA/J2EE technologies.
- Designed and developed using web service framework - Apache CX.
- Worked on Active MQ messaging service for integration.
- Worked with SQL queries to store and retrieve the data in MS SQL server.
- Performed unit testing using Junit.
- Developed front end using JSTL, JSP, HTML, and Java Script.
- Worked on continuous integration using Jenkins/Hudson.
- Participated in all phases of development life cycle including analysis, design, development, testing, code reviews and documentations as needed.
- Used ECLIPSE as IDE, MAVEN for build management, JIRA for issue tracking, CONFLUENCE for documentation purpose, GIT for version control, ARC (Advanced Rest Client) for endpoint testing, CRUCIBLE for code review and SQL Developer as DB client.
Environment: Spring Framework, Spring MVC, spring web flow, JSP, JSTL,, Oracle 11g, XML, JSON, Ajax, HTML, CSS,, RAD with sub-eclipse, Jenkins, maven, SOA,, Log4j, Java, Junit.