We provide IT Staff Augmentation Services!

Data Engineer Resume

3.00/5 (Submit Your Rating)

Plano, TexaS

SUMMARY

  • Over 9+ years of professional IT experience with strong emphasis in development and testing of software applications.
  • Around 4+ years of experience in Hadoop distributed file system (HDFS), Impala, Sqoop, Hive, HBase, Spark, Hue, MapReduce framework, Kafka, Yarn, Flume, Oozie, Zookeeper and Pig.
  • Hands on experience on various Hadoop components ofHadoop ecosystem such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Application Manager.
  • Good knowledge on az infrastructure services Amazon Simple Storage Service (Amazon S3), EMR and Amazon Elastic Compute Cloud (Amazon EC2).
  • Experience in working with Amazon EMR, Cloudera (CDH3 & CDH4) and Hortonworks Hadoop Distributions.
  • Involved in loading the structured and semi structured data into spark clusters using Spark SQL and Data Frames API.
  • Worked on data processing and transformations and actions in spark by using Python (Pyspark) language
  • Developed a scalable distributed data solution using Hadoop on a 30 - node cluster using Aws cloud to run analysis on 25+ Terabytes of customer usage data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Developed framework for converting existing power center mappings and to pyspark jobs .
  • Knowledge on implementing hierarchies, massive parallel processing relationships types, packages, profiles for hierarchy management and stibo.
  • Experience in implementing Real-Time event processing and lambda using messaging systems like Spark Streaming.
  • Good understanding of Spark Architecture with Data bricks, Structured Streaming. Setting Up AWS, cloud formation and Microsoft Azure with Data bricks
  • Provide guidance to development team working on pyspark as ETL platform .
  • Capable of creating real time data streaming solutions and batch style large scale distributed computing applications using Apache Spark, Spark Streaming, Kafka and Flume.
  • Experience in analyzing data using Spark SQL, HIVEQL, PIG Latin, Spark/Scala and custom Map Reduce programs in Java.
  • Experience on Migrating SQL database toAzure data Lake, Azure data lake Analytics,Azure SQL Database, Data BricksandAzure SQL Data warehouseand controlling and granting database accessandMigrating On premise databases toAzure Data Lake storeusing Azure Data factory.
  • Have experience in Apache Spark, Spark Streaming, Spark SQL and NoSQL databases like HBase, Cassandra, and MongoDB.
  • Developed Spring boot application with microservices and deployed it into AWS using ECinstances.
  • Experience in creating DStreams from sources like Flume, Kafka and performed different Spark transformations and actions on it.
  • Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
  • Performed operations on real-time data using Storm, Spark Streaming from sources like Kafka, Flume.
  • Implemented Pig Latin scripts to process, analyze and manipulate data files to get required statistics.
  • Experienced with different file formats like Parquet, ORC, Avro, Sequence, CSV, XML, JSON, Text files.
  • Transform and analyse the data using Pyspark, HIVE, based on ETL mappings.
  • Good understanding of Spark eacture with Databricks, Structured Streaming. Setting Up AWS and Microsoft Azure with Databricks.
  • Worked with Big Data Hadoop distributions: Cloudera, Hortonworks,, cloud formation and Amazon AWS.
  • Developed MapReduce jobs using Java to process large data sets by fitting the problem into the MapReduce programming paradigm. Experience with Azure transformation projects and Azure architecture decision making Architect and implement ETL and data movement solutions using Azure Data Factory(ADF), SSIS
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming the data using Spark with Kafka for faster processing.
  • Having experience in developing a data pipeline using Kafka to store data into HDFS.
  • Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm-Kafka.
  • Used Scala SBT to develop Scala coded spark projects and executed using spark-submit.
  • Experience on Working with data extraction, transformation and load in Hive, Pig and HBase.
  • Orchestrated various Sqoop queries, Pig scripts, Hive queries using Oozie workflows and sub-workflows.
  • Responsible for handling different data formats like Avro, Parquet and ORC formats.
  • Experience in performance tuning, monitoring the Hadoop cluster by gathering and analyzing the existing infrastructure using Cloudera manager.
  • Knowledge of job workflow scheduling and monitoring tools like Oozie (Hive, Pig) and Zookeeper (Hbase

TECHNICAL SKILLS

Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm & Parquet.

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache

Languages: Java, Python, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++

Nosql Databases: Cassandra, MongoDB and HBase

Java Technologies: Servlets, JavaBeans, JSP, JDBC, and struts

Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs, ExtJS and JSON

Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J

Frameworks: Struts, spring and Hibernate

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2

Operating systems: UNIX, LINUX, Mac OS and Windows Variants

PROFESSIONAL EXPERIENCE

Confidential, Plano, Texas

Data Engineer

Responsibilities:

  • Hands on experience in Spark and Spark Streaming creating RDD & applying operations transformations and Actions.
  • DevelopedSparkapplications using Scala for easyHadooptransitions.
  • Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Transform and analyse the data using Pyspark, HIVE, based on ETL mappings.
  • Good understanding of Spark Architecture with Databricks, Structured Streaming. Setting Up AWS, cloud formation and Microsoft Azure with Databricks.
  • Developed Spring boot application with microservices and deployed it into AWS using EC2 instances.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Worked on migrating mapreduce programs into spark transformations using park and scala initially done using python pyspark
  • Analysed the sql scripts and designed it by using PySpark SQL for faster performance.
  • Developed Sparkcode using Scala and Spark-SQL for faster processing and testing.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
  • Responsible for loading Data pipelines from web servers and Teradata using Sqoop with Kafka and Spark Streaming API.
  • DevelopedKafkaproducer and consumers, Cassandra clients and Spark along with components on HDFS, Hive.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
  • Developed a scalable distributed data solution using Hadoop on a 30-node cluster using AWS cloud and cloud formation to run analysis on 25+ Terabytes of customer usage data.
  • Developed framework for converting existing power center mappings and to pyspark jobs.
  • Populated HDFS and HBase with huge amounts of data using ApacheKafka.
  • Used Kafka to ingest data into Spark engine.
  • Worked on data processing and transformations and actions in spark by using Python (Pyspark) language
  • Experience with Azure transformation projects and Azure architecture decision making Architect and implement ETL and data movement solutions using Azure Data Factory(ADF), SSIS
  • Provide guidance to development team working on pyspark as ETL platform .
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Managing and schedulingSparkJobs on aHadoopCluster using Oozie.
  • Experienced with different scripting language like Python and shell scripts.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
  • Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written inScala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Worked on SparkSQL, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
  • Experience with AWS Cloud IAM, Data pipeline, EMR, S3, EC2, AWS CLI, SNS & other services
  • Worked on various Hadoop Distributions (Cloudera, Hortonworks, and Amazon AWS) to implement and make use of those.
  • Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness ofPython into Pig Latin and HQL (HiveQL).
  • Extensively worked on Text, ORC, Avro andParquetfile formats and compression techniques like Gzip and Zlib.
  • Implemented Hortonworks NiFi (HDP 2.4) and recommended solution to inject data from multiple data sources to HDFS and Hive using NiFi.
  • Developed Spring boot application with microservices and deployed it into AWS using EC2 instances.
  • Developed various data loading strategies and performed various transformations for analyzing the datasets by using Hortonworks Distribution for Hadoop ecosystem.
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement and used Cassandra through Java services.
  • Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoop cluster.
  • Creating S3 buckets and managing policies for S3 buckets and utilized S3 bucket and Glacier for storage and backup AWS.
  • Performed AWS Cloud administration managing EC2 instances, S3, SES and SNS services.
  • Operated Elasticsearch time-series data like metrics and application events, area where the huge Beats ecosystem allows you to easily grab data for common applications.
  • Hands on experience in developing the applications with Java, J2EE, J2EE - Servlets, JSP, EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g and MS-SQL Server RDBMS.
  • Delivered zero defect code for three large projects which involved changes to both front end (Core Java, Presentation services) and back-end (Oracle)
  • Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
  • Encoded and decoded json objects using PySpark to create and modify the data frames in Apache Spark.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Involved in loading and transforming large Datasets from relational databases into HDFS and vice-versa using Sqoop imports and export.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from MYSQL into HDFS vice-versa using Sqoop.

Environment: Hadoop, Hive, MapReduce, Sqoop, Kafka, Spark, Yarn, Pig, Cassandra, Oozie, shell Scripting, Scala, Maven, Java, JUnit,unix, NIFI, MySQL, AWS, EMR, EC2, S3, Hortonworks.

Confidential, Dallas, Texas

Spark Developer/ hadoop developer

Responsibilities:

  • Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
  • Responsible for installing, configuring, supporting, and managing of Hadoop Clusters.
  • Importing and exporting data into HDFS from Oracle 10.2 database and vice versa using SQOOP.
  • Installed and configured Pig and written Pig Latin scripts.
  • Designed and implementedHIVE queries and functions for evaluation, filtering, loading and storing of data.
  • Experience on Migrating SQL database toAzure data Lake, Azure data lake Analytics,Azure SQL Database, Data BricksandAzure SQL Data warehouseand controlling and granting database accessandMigrating On premise databases toAzure Data Lake storeusing Azure Data factory.
  • Created HBase tables and column families to store the user event data.
  • Written automated HBase test cases for data quality checks using HBase command line tools.
  • Developed a data pipeline using HBase, Sparkand Hive to ingest, transform and analyzing customer behavioral data.
  • Good understanding of Spark Architecture with Data bricks, Structured Streaming. Setting Up AWS, cloud formation and Microsoft Azure with Data bricks.
  • Developed Spring boot application with microservices and deployed it into AWS using EC2 instances
  • Encoded and decoded json objects using PySpark to create and modify the data frames in Apache Spark.
  • Experience in collecting the log data from different sources like (webservers and social media) using Flume and storing on HDFS to perform MapReduce jobs.
  • Handled importing of data from machine logs using Flume.
  • Created Hive Tables, loaded data from Teradata using Sqoop.
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Configured, monitored, and optimized Flume agent to capture web logs from the VPN server to be put into Hadoop Data Lake.
  • Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and written Pig/Hive UDFs.
  • Wrote, tested and implemented Teradata Fast load, Multiload and Beta scripts, DML and DDL.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Worked on various Hadoop Distributions (Cloudera, Hortonworks, and Amazon AWS) to implement and make use of those
  • Ec2Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
  • Exported the analyzed data to the relational databases using Sqoop to further visualize and generate reports for the BI team.
  • Develop ETL Process using SPARK, SCALA, HIVE and HBASE.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Worked on migrating mapreduce programs into spark transformations using park and scala initially done using python pyspark
  • Analysed the sql scripts and designed it by using PySpark SQL for faster performance
  • Wrote Java code to format XML documents; upload them toSolrserver for indexing.
  • Used with NoSQL technology (Amazon Dyno dB) to gather and track event-based metric.
  • Maintenance of all the services in Hadoopecosystem using ZOOKEPER.
  • Worked on implementing Sparkframework.
  • Designed and implemented Spark jobs to support distributed data processing.
  • Expertise in Extraction, Transformation, loading data from Oracle, DB2, SQL Server, MS Access, Excel, Flat Files and XML using Talend.
  • Experienced on loading and transforming of large sets of structured, semi and unstructured data.
  • Help design of scalable Big Data clusters and solutions.
  • Followed agile methodology for the entire project.
  • Experience in working with Hadoop clusters using Cloudera distributions.
  • Involved inHadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
  • Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process.
  • Converting the existing relational database model toHadoopecosystem.

Environment: Hadoop, HDFS, Pig, Hive, Flume, Sqoop, Oozie, Python, Shell Scripting, SQL Talend, Spark, HBase, Elastic search, Linux- Ubuntu, Flume, Cloudera.

Confidential

Sr.Java Developer/ hadoop developer

Responsibilities:

  • Implemented applications usingJava, J2EE, JSP, Servlets, JDBC, RAD, XML, HTML, XHTML, Hibernate Struts, spring and JavaScript on Windows environments.
  • Experienced in developing web-based applications usingPython, Django, PHP, XML, CSS, HTML, JavaScript and jQuery.
  • Designed and implemented the training and reports modules of the application using Servlets, JSP andAjax.
  • Developed XML Web Services using SOAP, WSDL, and UDDI.
  • Created the UI tool - usingJava, XML, XSLT, DHTML and JavaScript
  • Experience in develop of SDLC life cycle and undergo in all the phases in it.
  • Developed action Servlets and JSPs for presentation in Struts MVC framework.
  • Worked with Struts MVC objects like Action Servlet, Controllers, validators, Web Application Context, Handler Mapping, Message Resource Bundles, Form Controller, and JNDI for look-up for J2EE components.
  • Developed PL/SQL View function in Oracle9i database for get available date module.
  • Used Oracle SQL 4.0 as the database and write SQL queries in the DAO Layer.
  • Experience in application using Core Java, JDBC, JSP, Servlets, spring, Hibernate, Web Services, SOAP, and WSD.
  • Used RESTFUL Services to interact with the Client by providing the RESTFUL URL mapping.
  • Used SVN and GitHub as version control tool.
  • Implemented Hibernate in the data access object layer to access and update information in the Oracle 10g Database.
  • Experience in JIRA and tracked the test results and interacted with the developers to resolve issue.
  • Used XSLT to transform my XML data structure into HTML pages.
  • Deployed EJB Components on Tomcat. Used JDBCAPI for interaction with OracleDB.
  • Wrote build & deployment scripts using shell, Perl and ANTscripts
  • Extensively used Java multi-threading to implement batch Jobs with JDK 1.5 features
  • Actively involved from fresh start of the project, requirement gathering to quality assurance testing.
  • Coded and Developed Multi-tier architecture in Java, J2EE, Servlets.
  • Conducted analysis, requirements study and design according to various design patterns and developed rendering to the use cases, taking ownership of the features.
  • Used various design patterns such as Command, Abstract Factory, Factory, and Singleton to improve the system performance.
  • Analyzing the critical coding defects and developing solutions.
  • Developed configurable front end using Struts technology. Also involved in component-based development of certain features which were reusable across modules.
  • Designed, developed and maintained the data layer using the ORM framework called Hibernate.
  • Used Hibernate framework for Persistence layer, involved in writing Stored Procedures for data retrieval and data storage and updates in Oracle database using Hibernate.
  • Developing & deploying Archive files (EAR, WAR, JAR) using ANT build tool.
  • Used Software development best practices for Object Oriented Design and methodologies throughout Object oriented development cycle.
  • Responsible for developing SQL Queries required for the JDBC.
  • Designed the database and worked on DB2 and executed DDLS and DMLS.
  • Active participation in architecture framework design and coding and test plan development.
  • Strictly followed Water Fall development methodologies for implementing projects.
  • Thoroughly documented the detailed process flow with UML diagrams and flow charts for distribution across various teams.
  • Involved in developing training presentations for developers (off shore support), QA, Production support.
  • Presented the process logical and physical flow to various teams using PowerPoint and Visio diagrams.

Environment: Java, Ajax, Informatica Power Center 8.x/9.x, REST API, SOAP API, Apache, Oracle 10/11g, SQL Loader, MYSQL SERVER, Flat Files, Targets, Aggregator, Router, Sequence Generator.

We'd love your feedback!