We provide IT Staff Augmentation Services!

Data Engineer Resume

4.00/5 (Submit Your Rating)

Bentonville, ArkansaS

SUMMARY:

  • A dynamic professional with 6+ years of diversified experience in the field of Information Technology with an emphasis on Big Data/Hadoop Eco System, SQL/NO - SQL databases, Java /J2EE technologies and tools using industry accepted methodologies and procedures.”
  • Around 3+ years of implementation experience in Hadoop Technologies.
  • Worked in multiple Hadoop distributions like Horton Works, AWS, Cloudera and MapR.
  • Experience in end to end implementation of project like Data Lake.
  • Experience with different file formats like ORC, Parquet, AVRO, JSON.
  • Expert in data ingestion tools like Sqoop, Flume, Kafka, Spark Streaming.
  • Experience in data cleansing scripts like Spark, MapReduce and Pig.
  • Intensive experience in Hive, Impala and Tez.
  • Exposure on NoSQL Db’s like HBase, Cassandra and Mongo DB.
  • Implemented centralized search using Solr or Cloudera search.
  • Ingested data from different sources like Oracle, Teradata, SQL server.
  • Experience in developing pipelines in spark using Scala and python.
  • Developing streaming pipelines using Kafka and Storm.
  • Experience in working with various CDC tools like Oracle Golden Gate, StreamSets.
  • Orchestrated multiple Hadoop application jobs using Oozie.
  • Experience in implementing optimization techniques in Hive and Spark.
  • Experience in scheduling TWS jobs for processing millions of records using ITG.
  • Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
  • Experience in Python and shell scripting
  • Experienced in working with various IDE’s like Eclipse, IntelliJ
  • Experience working with cloud tools like Amazon Web Services and Azure.
  • Hands on experience in Sequence files, Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
  • Experienced with the Apache Spark improving the performance and optimization of the existing algorithms in Hadoop using Apache Spark Context, Apache Spark-SQL, Data Frame, Pair RDD's, Apache Spark YARN.
  • Worked with Apache Spark which provides fast and general engine for large data processing integrated with functional programming language Scala.
  • Used Scala and Python to convert Hive/SQL queries into RDD transformations in Apache Spark.
  • Experience in developing spring Boot applications for transformations.
  • Experience in connecting different end points using JDBC.
  • Experience in writing JUnit test cases and build tools like Maven.
  • Experience in developing SQL and PL/SQL scripts.
  • Exposure on BI Tools like Tableau and QlikView.
  • Intensive skill in scripting using Bash, Python and Shell.
  • Experience in software methodologies like Agile, Waterfall model.

TECHNICAL SKILLS:

  • Spark
  • Kafka
  • StreamSets
  • Sqoop
  • Hive
  • Pig
  • HDFS
  • Impala
  • Flume
  • Map Reduce
  • Oozie
  • HBase
  • Elastic search
  • Zookeeper
  • Cassandra
  • MongoDB
  • Scala
  • Teradata
  • Sql Server
  • My SQL
  • Oracle
  • Core Java
  • J2EE
  • JDBC
  • Spring Boot
  • Python
  • Shell
  • Git
  • SVN
  • CVS

PROFESSIONAL EXPERIENCE:

Confidential, Bentonville, Arkansas

Data Engineer

Responsibilities:

  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Experience in installation, configuring, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH 5.X) distributions.
  • Worked on Cloudera distribution for Hadoop ecosystem and installed and configured Flume, Hive, Pig, Sqoop and Oozie, Automic on the Hadoop cluster.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Managed and reviewed Hadoop log files to identify issues when job fails and used HUE for UI based pig script execution, Automic scheduling.
  • Involved in creating data-lake by extracting customer's data from various data sources to HDFS which include data from Excel, databases, and log data from servers.
  • Designed number of partitions and replication factor for Kafka topics based on business requirements and worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
  • Used various Spark Transformations and Actions for cleansing the input data and involved in using the Spark application master to monitor the Spark jobs and capture the logs for the spark jobs.
  • Experience in refactoring the existing spark batch process for different logs written in Scala.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data and worked on extensible framework for building high performance batch and interactive data processing application on hive.
  • Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra.
  • Fine Tuning and Productionizing the Teradata SQl queries that are running for long time in a queue.
  • Created Hive External Tables for the incremental imports into Hive using Ingest, Reconcile, Compact and Purge Strategy.
  • Experience in working with Hive for processing the raw data.
  • Created partitions, bucketing across state in Hive to handle structured data.
  • Implemented Dash boards that handle HiveQL queries internally like Aggregation functions, basic hive operations, and different kind of join operations.
  • Implemented business logic based on state in Hive using Generic UDF's.
  • Used Hive queries to analyze the large data sets.
  • Build reusable Hive UDF’s libraries for business requirements.
  • Designed and implemented Incremental Imports into Hive tables.
  • Worked with Data Modeling team for the Table insertion in to noncatalog or catalog zone while moving data to production.
  • Writing workflows and scheduling using Automic.
  • Provide Automic batch job flow support to application development and management during releases in the production environment.
  • Developed Automic workflow for scheduling and orchestrating the ETL process within the Cloudera Hadoop system.

Technologies used: Hadoop stack, Spark SQL, KSQL, Spark-Streaming, Scala, CICD, Cassandra, Cloudera, Kafka, Hive, Pig, Sqoop, Automic, Linux.

Confidential, Columbus, Ohio

Scala/Spark Developer

Responsibilities:

  • Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
  • Configured Spark streaming to get ongoing information from the Kafka and store the dstream information to HDFS.
  • Responsible for fetching real time data using Kafka and processing using Spark streaming with Scala.
  • Involved in loading data from rest endpoints to Kafka Producers and transferring the data to Kafka Brokers.
  • Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
  • Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
  • Migrated Map Reduce programs into Spark transformations using Scala.
  • Experienced with Spark Context, Spark-SQL, Spark YARN.
  • Implemented Spark-SQL with various file formats like JSON, Parquet and ORC.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
  • Loaded the data into Spark RDD and perform in memory data Computation to generate the Output response.
  • Worked on loading AVRO/PARQUET/TXT files in Spark Framework using Scala language and created Spark Data frames and RDDs to process the data and save file in parquet format in HDFS to load into fact table using ORC Reader.
  • Worked on Spark-Streaming APIs to perform transformations and actions to store and stream data into HDFS by using Scala.
  • Good knowledge in setting up batch intervals, split intervals and window intervals in Spark Streaming.
  • Implemented data quality checks using Spark Streaming and arranged passable and bad flags on the data.
  • Developed traits and case classes etc in Scala.
  • Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Tested the performance using Elasticsearch and Kibana with APM
  • Implemented CICD allowing for deploy to multiple client Kubernetes/AWS environments.
  • Worked on Hive to implement Web Interfacing and stored the data in Hive external tables.
  • Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
  • Involved in Data Querying and Summarization using Hive and created UDF’s, UDAF’s and UDTF’s.
  • Created and managed table in Hive and Impala using Hue Web Interface.
  • Extensively works in data Extraction, Transformation and Loading from source to target system using Informatica and Teradata utilities like fast export, fast load, multi load, TPT.
  • Works with Teradata utilities like BTEQ, Fast Load and Multi Load.
  • Implemented Sqoop jobs to import/export large data exchanges between RDBMS and Hive platforms.
  • Extensively used Zookeeper as a backup server and job scheduling of Spark Jobs.
  • Worked on Cloudera distribution and deployed on AWS EC2 Instances.
  • Experienced in loading the real-time data to NoSQL database like Cassandra.
  • Experienced in using Data Stax Spark-Cassandra Connector which is used to store the data in Cassandra from Spark.
  • Well versed in using of Elastic Load Balancer for Autoscaling in EC2 servers.
  • Configured work flows that involves Hadoop actions using Oozie scheduler.
  • Used Oozie work flows and Java schedulers to manage and schedule jobs on a Hadoop cluster.
  • Used Sqoop to import data from Relational Databases like MySQL, Oracle.
  • Continuously monitored and managed the Hadoop Cluster using Cloudera Manager.
  • Used Cloudera manager to pull metrics on various cluster features like JVM, Running Map and reduce tasks.
  • Involved in importing structured and unstructured data into HDFS.
  • Developed Pig scripts to help perform analytics on JSON and XML data.
  • Experienced with Faceted Reader search and Full Text Search Data querying using Solr.
  • Maintain the Data lake in Hadoop by building data pipe line using Sqoop, Hive and PySpark.
  • Created Tableau visualization for the internal management (client team) using Simba SparkSQL Connector.
  • Integrated Hive server 2 with Tableau using Horton Works Hive ODBC driver, for auto generation of Hive queries for non-technical business user.
  • Coordinated with SCRUM team in delivering agreed user stories on time for every sprint.

Technologies used: Hadoop stack, Spark SQL, KSQL, Spark-Streaming, AWS S3, AWS EMR, google cloud, GraphX, Scala, Python, Pyspark, Kafka, Hive, Pig, Sqoop, Solr, Oozie, vertica, Impala, CICD, Cassandra, Cloudera, Oracle 10g, MySQL, spring boot, Linux.

Confidential

Java/Big Data Developer

Responsibilities:

  • Worked in tuning Hive and Pig to improve performance and solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs.
  • Exported analysed data to the relational databases using Sqoop for visualization & Report generation.
  • Installed and configured Hadoop MapReduce , HDFS , developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
  • Wrote MapReduce jobs to discover trends in data usage by users.
  • Established custom MapReduce programs to analyze data and used Pig Latin to clean unwanted data.
  • Written multiple MapReduce programs in Java for Data Analysis.
  • Wrote MapReduce job using Pig Latin and Java API .
  • Extensively worked on analyzing data using HiveQL , Pig Latin , and custom Map Reduce programs.
  • Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume .
  • Involved in automation of FTP process in Talend and FTPing the Files in UNIX.
  • Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS .
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing pig Scripts .
  • Experience in various data transformation and analysis tools like Map Reduce , Pig and Hive to handle files in multiple formats (JSON, Text, XML, Binary, Logs etc.).

Confidential

Inter/ Java Developer

Responsibilities:

  • Involved in gathering business requirements, analyzing the project and created UML diagrams such as Use Cases, Class Diagrams, Sequence Diagrams and flowcharts for the optimization Module using Microsoft Visio .
  • Configured faces-config.xml for the page navigation rules and created managed and backing beans for the Optimization module.
  • Developing Enterprise Application using Spring MVC, JSP, MySQL .
  • Working on developing client-side Web Services components using Jax-Ws technologies.
  • Extensively worked on JUnit for testing the application code of server-client data transferring.
  • Developed and enhanced products in design and in alignment with business objectives.
  • Used SVN as a repository for managing/deploying application code.
  • Used XML to maintain the Queries, JSP page mapping, Bean Mapping etc.
  • Used Oracle 10g as the backend database and written PL/SQL scripts.
  • Implemented database transactions using Spring AOP & Java EE CDI capability.
  • Enriched organization reputation via fulfilling requests and exploring opportunities.
  • Business Analysis, Reporting Service and Integrate to Sage Accpac (ERP).
  • Developing new and maintaining existing functionality using SPRING MVC, Hibernate .
  • Creating new and maintaining existing web pages build in JSP, Servlet .

Technologies used: Java, SpringMVC, Hibernate, MSSQL, JSP, Servlet, JDBC, ODBC, JSP, Servlet, NetBeans, GlassFish, Spring, Oracle, MySQL, Sybase, Eclipse, Tomcat, WebLogic Server.

We'd love your feedback!