Hadoop Developer / Spark Developer Resume

SUMMARY

Have 8+ years of experience in Big Data Technologies like Hadoop (HDFS & Map Reduce), PIG, HIVE, SQOOP, SPARK.
Working knowledge in multi - tiered distributed environment, OOAD concepts, good understanding of Software Development Lifecycle (SDLC).
Experience in working in environments using Agile development and Kanban Support methodologies.
Hands on experience with the Hadoop stack (MapReduce, HDFS, Sqoop, Flume, Pig, Hive, HBase, Oozie and Zookeeper).
Strong Knowledge on Apache Spark with Scala Environment.
Good knowledge on Kafka and integrated with Spark and Storm.
Strong knowledge on NoSQL databases like Cassandra and HBASE and Microsoft Azure for testing the applications.
Experience in Cloudera distribution and Horton Works Distribution (HDP).
Expertise in SQL and PL/SQL programming, developing complex code units.
Solid Experience of creating PL/SQL Packages, Procedures, Functions, Triggers, Views and Exception handling for retrieving, manipulating, checking and migrating complex data sets in Oracle.
Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
Familiarity with Teradata, Oracle and MySQL databases
Experience in analyzing data using HIVEQL, PIG Latin and custom MapReduce programs in JAVA. Extending HIVE and PIG core functionality by using custom UDF's.
Good experience with both MapReduce 1 (MRv1) and MapReduce 2 (YARN)
Very good understanding of Partitions, Bucketing concepts in Hive and designed both.
Knowledge in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
Good understanding of HDFS Designs, Daemons and HDFS high availability (HA).
Used Talend ETL tool for data integration, data transformation, and data loading
Worked on different OS like UNIX /Linux and developed various shell scripts.
Versatile team player with good communication, analytical, presentation and inter-personal skills.
Excellent understanding of Hadoop architecture, Hadoop Distributed File System and various components such as HDFS, Name Node, Data Node, Job Tracker, Task Tracker, YARN, Spark and MapReduce concepts
Used Spark extensively to perform data transformations, data validations and data aggregations.
Good Knowledge on Sparkframework on both batch and real-time data processing

TECHNICAL SKILLS

BigData Technologies: Hadoop 0.22.0 Map Reduce, HDFS, Hive, Pig, Sqoop, Oozie, ZooKeeper, Scala, Apache Spark

Hadoop Distributions: Cloudera (CDH4/CDH5), Hortonworks

Languages: Java, C, SQL, PIG-Latin, Scala

IDE Tools: Eclipse, NetBeans

Framework: Junit

Operating Systems: Windows (XP,7,8,10), UNIX, LINUX, Ubuntu, CentOS

Application Servers: Tomcat, WebLogic

Databases: Oracle 8i, 9i, 10g and MySql

PROFESSIONAL EXPERIENCE

Confidential

Hadoop Developer / Spark Developer

Responsibilities:

Actively involving POC to migrate Algorithms from MapReduce and Hive to Spark using scala.
Responsible for de-modularizing components and prepare independent components like matching, arbitration and cleansing.
Developed UDFs using JAVA for HIVE queries.
Loaded all the data from Teradata Database and kept in the Hadoop Cluster.
Used Microsoft Azure for building the applications and for building, testing, deploying the applications.
Developed Hive scripts for analyst requirements for adhoc analysis
Managed External tables in Hive for optimized performance in Google Cloud.
Developed workflows and coordinator jobs in Oozie
Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Hive
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into MFS.
Hands on experience in scripting for automation, and monitoring using Shell, PHP, Python.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
Builded Hadoop Datalakes and developed the architecture and used in implementations within the organisations.
Used Talend ETL tool for writing the source, enriching and transforming the data and write to target.
Used Talend ETL tool for monitoring and managing complex deployments with ease to generate the response.
Extensively used the advanced features of PL/SQL like Records, Tables, Object types and Dynamic SQL.
Created PLSQL packages, procedures to customize the data extraction from Oracle Source.
Developed and modified PL/SQL code to make new enhancements and resolve problems as per custom requirements.
Tested and debugged Oracle PL/SQL packages.
Support the users of application with using multiple SQL and PL/SQL techniques.
Load the data into Spark RDD and performed in-memory data computation to generate the output response.
Developed Spark scripts by using Scala shell commands as per the requirement.
Used Spark API over Hadoop YARN to perform analytics on data in Hive.
Integrated Kafka with Spark and used System tools to migrate the data. Used Replicated tool for high availability of nodes.
Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
Customized the dashboards and done access management and identity in AWS
Developed Spark scripts by using Scala shell commands as per the requirement and analysed the data using Confidential EMR.
Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN.

Environment: Java, Scala, Python, J2EE, Hadoop, Spark, HBase, Hive, Pig, Sqoop, MySQL, Microsoft Azure, Teradata, GitHub, YARN.

Confidential, TX

Hadoop Developer

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop.
Developed Simple to complex Map/reduce Jobs using Hive and Pig.
Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior
Used UDF's to implement business logic in Hadoop.
Experienced with different scripting language like Python and shell scripts.
Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
Understanding and Collecting requirements from Business user side or Business Analyst and try to convert that requirement to SQL queries or PL/SQL.
Used PL/SQL Tables, Cursors to process huge volumes of data and used bulk collect for mass update as performance improvement process.
Applied the concepts of Load balancing and Confidential cloud front, Elastic cloud balancer in AWS.
Used EC2 for developing and deploying the applications faster
Customized the dashboards and done access management and identity in AWS
Developed Spark scripts by using Scala shell commands as per the requirement and analysed the data using Confidential EMR.
Created Confidential buckets for cluster logs and output data and launched Confidential EMR cluster.
Used EMR Redshift for real time graphs and data analytics
Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
Continuous monitoring and managing the Hadoop cluster
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
Installed Oozie workflow engine to run multiple Hive and Pig jobs
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Actively involved in all phases of SDLC process/methodology and ensured project delivery.
Created reusable components to enhance process efficiency and minimize impact to application.
Diligently Track progress during course of the project, prepare daily reports, status reports and relevant documents that are needed in various phases of the project and communicate it to the leadership
Recommended best practices, enhancements to existing process, implementing technological improvements and efficiencies
Analyzed current programs including performance, diagnosis and troubleshooting the problem.
Prepared technical specification documentation
Extensively used the LOG4j to log regular Debug and Exception statements.

Environment: Hadoop, HDFS, MapReduce, Hive, Python, PIG, Java, Oozie, HBASE, Sqoop, Flume, MySQL.

Confidential, CT

Hadoop Developer

Responsibilities:

Experience in installing, configuring and using Hadoop Ecosystem components.
Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in Java for data cleansing and pre-processing.
Experience in Importing and exporting data into HDFS and Hive using Sqoop.
Experienced in defining job flows.
Experienced in managing and reviewing Hadoop log files.
Participated in development/implementation of Cloudera Hadoop environment.
Load and transform large sets of structured, semi-structured and unstructured data.
Experience in working with various kinds of data sources such as Oracle, DB2, and MySQL.
Successfully loaded files to Hive and HDFS from traditional databases.
Responsible for managing data coming from different sources.
Gained good experience with NOSQL database like HBase.
Supported Map Reduce Programs those are running on the cluster.
Involved in loading data from UNIX file system to HDFS.
Experience in writing custom UDFs in java for Hive and Pig to extend the functionality.
Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in MapReduce way.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Cloudera HDFS, UNIX, Hadoop, MapReduce, Hive, Pig Latin, Java, SQL, Sqoop, HBASE, RDBMS, Eclipse

Confidential

Java-J2EE Developer

Responsibilities:

Used JSP pages through Servlets Controller for client-side view.
Created JQuery, JavaScript plug-ins for UI.
Always used the best practices of Java/J2EE to minimize the unnecessary object creation.
Implement RESTful web services with the Struts framework.
Verify them with the J Unit testing framework.
Working experience in using Oracle 10g backend Database.
Used JMS Queues to develop Internal Messaging System.
Developed the UML Use Cases, Activity, Sequence and Class diagrams using Rational Rose.
Developed Java, JDBC, and Java Beans using JBuilder IDE.
Developed JSP pages and Servlets for customer maintenance.
Apache Tomcat Server was used to deploy the application.
Involving in Building the modules in Linux environment with ant script.
Used Resource Manager to schedule the job in Unix server.
Performed Unit testing, Integration testing for all the modules of the system.
Developed JAVA BEAN components utilizing AWT and SWING classes.

Environment: Java, JDK, Servlets, JSP, HTML, JBuilder, HTML, JavaScript, CSS, Tomcat, Apache HTTP Server, XML, JUNIT, EJB, RESTful, Oracle.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship