We provide IT Staff Augmentation Services!

Hadoop Developer/data Engineer Resume

2.00/5 (Submit Your Rating)

Salisbury, NC

PROFESSIONAL SUMMARY:

  • 7+ years of experience in IT, which includes hands on experience in Big data Engineering, Analytics, Cloud Technologies, Hadoop ecosystem, Data Warehousing related technologies in Financial and Communication sectors.
  • Expertise with the tools in Hadoop Ecosystem including SPARK, PYTHON, SCALA, HBASE, FLUME, HIVE, HDFS, MAPREDUCE, SQOOP, PIG, KAFKA, YARN, OOZIE, and ZOOKEEPER.
  • 3+ Years of experience in Big Data Analytics using Various Hadoop eco - systems tools and Spark Framework and Currently working on Spark and Spark Streaming frameworks extensively using Scala as the main programming dialect.
  • Good experience on Spark Data Bricks, Snowflake data warehouse and developed production ready spark applications using Spark-SQL, Spark-Core, Spark-ML, Spark-Streaming API’s.
  • Expertise in Inbound and Outbound (importing/exporting) data form/to traditional RDBMS using Apache SQOOP.
  • Experienced in designing both time driven and data driven automated workflows using OOZIE order to run jobs of Hadoop MapReduce and PIG.
  • Hands on experience with NoSQL databases like HBase, Cassandra, MongoDB and Relational databases like Oracle, MySQL, SQL Server, Postgres, and Teradata.
  • Worked with major distributions like Cloudera (CDH 3&4) & Horton works Distributions and AWS. Also worked on UNIX and DWH in support for various Distributions.
  • Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoop cluster.
  • Having experience in developing a data pipeline using Kafka to store data into HDFS.
  • Good Knowledge in loading the data from Oracle and MySQL databases to HDFS system using SQOOP (Structured Data) and FLUME (Log Files & XML).
  • Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics such as evaluation, filtering, loading, and storing of data.
  • Performed data engineering functions and ETL process: data extraction, transformation, loading, data sourcing, mapping, conversion, and integration in support of enterprise data infrastructures - data warehouse, operational data stores and master data management.
  • Used Maven extensively for building MapReduce jar files and deployed it to Amazon Web Services (AWS) using EC2 virtual Servers in the cloud and Experience in build scripts to do continuous integrations systems like Jenkins.
  • Knowledge on analyzing data interactively using Apache Spark and Apache Zeppelin.
  • Strong programming experience using Java, Scala, Python and SQL.
  • Experience with SOAP & REST web services, XML, AJAX & JSON.
  • Experienced in using version control systems like SVN, GIT build tool.
  • Good knowledge in cloud integration with Amazon Elastic MapReduce (EMR), Amazon Cloud Compute (EC2), Amazon's Simple Storage Service (S3), Microsoft Azure and hands-on expertise with AWS Databases such as RDS(Aurora), Redshift and DynamoDB.
  • Experience in configuring the Zookeeper to coordinate servers in clusters and to maintain data consistency.
  • Detailed understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Scrum, Waterfall and Agile.
  • Good knowledge on various scripting languages like Linux/Unix shell scripting and Python.
  • Excellent networking and communication with all levels of stakeholders as appropriate, including executives, application developers, business users, and customers.
  • Devoted to professionalism, highly organized, ability to work under strict deadlines with attention to details, possess excellent written and communication skills.

TECHNICAL SKILL:

Big Data/Hadoop Technologies: HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark (Scala & Python), Kafka, NiFi, Oozie, Zookeeper, Apache Spark, Spark Streaming

No-SQL Databases: HBase, Cassandra, MongoDB

Languages: Java, Scala, Python, R, SQL, UNIX Shell scripting, Pig Latin, HiveQL, PL/SQL

Databases: Microsoft SQL Server, MySQL, Oracle, Postgres, DB2, MS access, Teradata

Hadoop distributions: Cloudera, Hortonworks, Apache, Databricks

Build and Version Tools: Jenkins, Maven, Git, Ant, SBT

Development Tools: Eclipse, IntelliJ, Visual Studio, NetBeans, JUnit

Operating Systems: Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS

Development Methodologies: Agile/Scrum, Waterfall

Business Intelligence Tools: Tableau, Power BI

App Servers: WebLogic, WebSphere, JBoss, Tomcat

Cloud Services: AWS, Microsoft Azure

Amazon web services: Elastic Map Reduce (EMR 4.1.0), EC2 Instances, Airflow, Amazon S3, Amazon Redshift, Ganglia, Ruby EMR Utility(monitoring), Kinesis(streams)

Other tools: Putty, WinSCP, Data Lake, Talend, GitHub.

PROFESSIONAL EXPERIENCE:

Confidential, Salisbury, NC

Hadoop Developer/Data Engineer

Responsibilities:

  • Coordinated with business customers to gather business requirements. And interacted with other technical peers to derive Technical requirements.
  • Developed MapReduce, Hive scripts to cleanse, validate and transform data.
  • Worked with Data ingestion, querying, processing and analysis of big data.
  • Experienced to implement Hortonworks distribution system.
  • Implemented Map Reduce programs to handle semi/unstructured data like xml, json, Avro data files and sequence files for log files.
  • Used Spark API over Hortonworks Hadoop YARN to process required analytics on data.
  • Developed Sqoop jobs to import and store massive volumes of data in HDFS and Hive.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis.
  • Designed Data Quality Framework to perform schema validation and data profiling on Spark (pySpark).
  • Experience working for EMR cluster in AWS cloud and working with S3.
  • Leveraged spark (pySpark) to manipulate unstructured data and apply text mining on user's table utilization data.
  • Worked on creating and optimizing Hive scripts for data analysts based on the requirements.
  • Created Hive UDFs to encapsulate complex and reusable logic for the end users.
  • Developing predictive analytics using Apache Spark Scala APIs.
  • Experienced in migrating HiveQL into Impala to minimize query response time.
  • Orchestrated Oozie workflow engine to run multiple Hive and Pig jobs.
  • Experienced with using different kind of compression techniques like Lzo, Snappy, Bzip2, Gzip to save data and optimize data transfer over network using Avro, Parquet, and Orc file.
  • Implemented data injection systems by creating Kafka brokers, Java producers, Consumers, custom encoders.
  • Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
  • Developed Spark code using Scala and Spark-Sql Streaming for faster testing and processing of data.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Developed some utility helper classes to get data from HBase tables.
  • Good experience in troubleshooting performance issues and tuning Hadoop cluster.
  • Knowledge in Spark Core, Streaming, Data Frames and SQL, MLib, GraphX.
  • Implemented Caching for Spark Transformations, action to use as reusable component.
  • Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
  • Used maven to build and deploy the Jars for MapReduce, Pig and Hive UDFs.
  • Developed workflows in Oozie.
  • Extensively used the Hue browser for interacting with Hadoop components.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.

Environment: Linux (CentOS, RedHat), UNIX Shell, Pig, Hive, MapReduce, YARN, Spark 1.4.1, Eclipse, JDK1.7, Oozie Workflows, Postgres, AWS, S3, EMR, Cloudera, HBASE, SQOOP, Scala, Kafka, Python, Cassandra, maven, Horton works, Cloudera Manager

Confidential, Denver, CO

Hadoop/Big Data Developer

Responsibilities:

  • Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
  • Developed Spark API to import data into HDFS from Teradata and created Hive tables.
  • Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
  • Worked on unstructured and semi structured data of 100 TB and with replication factor of 3 the total size is 300 TB
  • Collected and aggregated a large amount of log data using Apache Flume and staged data in HDFS for further analysis.
  • Used PIG as ETL tool for transforming, filtering, events joining and performing aggregations.
  • Scripted UDF and UDAF’s for Hive.
  • Populated HDFS and Cassandra with large amounts of data using Apache Kafka.
  • Worked on Spark stream processing to get the data into in-memory and implemented RDD transformations, actions to process as units.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark sql Context.
  • Leveraged AWS S3 as storage layer for HDFS.
  • Developed data pipeline using Kafka to store data into HDFS.
  • Responsible for loading Data pipelines from web servers and Teradata using Sqoop with Kafka and Spark
  • Streaming API.
  • Managed multiple AWS accounts with multiple VPC’s for both production and non-production where primary objectives are automation, buildout, integration, and cost control.
  • Developed Hive Queries for creating foundation tables from stage data.
  • Used DML statements to perform different operations on Hive tables.
  • Developed job flows to automate the workflow for PIG and HIVE jobs.
  • Worked on Apache Crunch library to write, test and run Hadoop Map Reduce Pipeline jobs.
  • Ran data formatting scripts in python and created terabyte csv files to be consumed by Hadoop MapReduce jobs.
  • Performed Kafka analysis, feature selection, feature extraction using Apache Spark Machine learning streaming libraries in python.
  • Developed python code using version control tools like GIT hub and SVN on vagrant machines.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, python, and Scala.
  • Wrote python modules to view and connect the Apache Cassandra instance.
  • Involved in writing MapReduce jobs.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Cluster coordination services through Zookeeper.
  • Created Hive tables, dynamic partitions and buckets for sampling and worked on them using the Hive QL.
  • Extracted the data from Teradata into HDFS using Scoop.
  • Adjusted the minimum share of the maps and reducers for all the queues.
  • Used Tableau for visualizing the data reported from Hive tables.
  • Worked in using Sequence files, RC File, AVRO and HAR file formats.

Environment: Hadoop, HDFS, Apache Crunch, Map Reduce, Hive, Flume, Sqoop, Zookeeper, Kafka, Storm, Cassandra, Spark, Puppet, Storm, Linux.

Confidential, Camp Hill PA

Hadoop Developer

Responsibilities:

  • Developed complex MapReduce jobs in Java to perform data extraction, aggregation and transformation.
  • Load the data into HDFS from different Data sources like Oracle, DB2 using Sqoop and load into Hive tables.
  • Analyzed big data sets by running Hive queries and Pig scripts.
  • Integrated the hive warehouse with HBase for information sharing among teams.
  • Developed the Sqoop scripts for the interaction between Pig and MySQL Database.
  • Worked on Static and Dynamic partitioning and Bucketing in Hive.
  • Scripted complex Hive QL queries on Hive tables for analytical functions.
  • Developed complex Hive UDFs to work with sequence files.
  • Designed and developed Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map Reduce outputs.
  • Created dashboards in Tableau to create meaningful metrics for decision making.
  • Performed rule checks on multiple file formats like XML, JSON, CSV and compressed file formats.
  • Monitored System health and logs and respond accordingly to any warning or failure conditions.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Used storage format like AVRO to access multiple columnar data quickly in complex queries.
  • Implemented Counters for diagnosing problem in queries and for quality control and application-level statistics.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Implemented Log4j to trace logs and to track information.
  • Developed some helper class for abstracting Cassandra cluster connection act as core toolkit.
  • Installed Oozie workflow engine and scheduled it to run data/time dependent Hive and Pig jobs
  • Involved in Agile methodologies, daily Scrum meetings, Sprint planning.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, Hadoop, HDFS, Map Reduce, Hive, HBase, Linux.

Confidential

Hadoop Consultant

Responsibilities:

  • Installed, configured and job creation in Hadoop Map-Reduce, Pig, Hive, HBase, Spark RDD, Pair RDD, Flume, Oozie, Sqoop environment.
  • Involved in application migration from Hadoop to Spark for the fast processing.
  • Extracted data from Oracle database into HDFS using Sqoop.
  • Developed Oozie workflows to schedule and manage Sqoop, Hive, Pig jobs to Extract-Transform-Load process.
  • Used Flume, and configured it to use multiplexing, replicating, multi-source, interceptors, selectors to import log files from Web Servers in to HDFS/Hive.
  • Managed and scheduled Jobs on a Hadoop cluster using Shell Scripts.
  • Maintained Cluster co-ordination services through Zookeeper for system.
  • Involved in filter the partition data based on different year range different format using Hive functions.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve the performance.
  • Involved in developing Impala scripts to do Adhoc queries.
  • Experience in Oozie workflow scheduler template to manage various jobs like Sqoop, MR, Pig, Hive, Shell scripts, etc.
  • Involved in importing and exporting data from HBase using Spark.
  • Involved in POC for migrating ETLS from Hive to Spark in Spark on Yarn Environment.
  • Actively participating in the code reviews, meetings and solving any technical issues.

Environment: Hadoop, MapReduce, HDFS, SQL, UNIX

Confidential

Java Developer

Responsibilities:

  • Designed and developed the server-side layer using XML, JSP, JDBC, JNDI, EJB and DAO patterns using eclipse IDE.
  • Involved in design and development of a front-end application using JavaScript, CSS, HTML and JSPs and spring MVC for registering a new entry and management and configured it to connect to database using Hibernate.
  • Developed java beans and JSP's by using spring and JSTL tag libs for supplements.
  • Development of EJB's, Servlets and JSP files for implementing Business rules and Security options using IBM Web Sphere.
  • Developed and tested the Efficiency Management module using EJB, Servlets, and JSP & Core Java components in WebLogic Application Server.
  • Used Spring Framework as middle tier application framework, persistence strategy using spring support for Hibernate for integrating with database.
  • Implemented Hibernate in the data access object layer to access and update information in the Oracle Database.
  • Configured the deployment descriptors in Hibernate to achieve object relational mapping.
  • Involved in developing Stored Procedures, Queries and Functions.
  • Write SQL queries to pull some information from the Backend.
  • Designed and implemented the architecture for the project using OOAD, UML design patterns.
  • Extensively involved working in the usage of HTML, CSS, JavaScript and Ajax for client-side development and validations.
  • Involved in creating tables, stored procedures in SQL for data manipulation and retrieval using SQL Server, Oracle and DB2.
  • Participated in requirement gathering and converting the requirements into technical specifications.

Environment: Java, JSF Framework, Eclipse IDE, Ajax, Apache Axis, OOAD, Web Logic, Java script, HTML, XML, CSS, SQL Server, Oracle, Web services, Ajax, Spring, OOAD and UML, Windows.

We'd love your feedback!