Hadoop Developer / Spark Developer Resume
SUMMARY
- Have 8+ years of experience in Big Data Technologies like Hadoop (HDFS & Map Reduce), PIG, HIVE, SQOOP, SPARK.
- Working knowledge in multi - tiered distributed environment, OOAD concepts, good understanding of Software Development Lifecycle (SDLC).
- Experience in working in environments using Agile development and Kanban Support methodologies.
- Hands on experience with the Hadoop stack (MapReduce, HDFS, Sqoop, Flume, Pig, Hive, HBase, Oozie and Zookeeper).
- Strong Knowledge on Apache Spark with Scala Environment.
- Good knowledge on Kafka and integrated with Spark and Storm.
- Strong knowledge on NoSQL databases like Cassandra and HBASE and Microsoft Azure for testing the applications.
- Experience in Cloudera distribution and Horton Works Distribution (HDP).
- Expertise in SQL and PL/SQL programming, developing complex code units.
- Solid Experience of creating PL/SQL Packages, Procedures, Functions, Triggers, Views and Exception handling for retrieving, manipulating, checking and migrating complex data sets in Oracle.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
- Familiarity with Teradata, Oracle and MySQL databases
- Experience in analyzing data using HIVEQL, PIG Latin and custom MapReduce programs in JAVA. Extending HIVE and PIG core functionality by using custom UDF's.
- Good experience with both MapReduce 1 (MRv1) and MapReduce 2 (YARN)
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both.
- Knowledge in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
- Good understanding of HDFS Designs, Daemons and HDFS high availability (HA).
- Used Talend ETL tool for data integration, data transformation, and data loading
- Worked on different OS like UNIX /Linux and developed various shell scripts.
- Versatile team player with good communication, analytical, presentation and inter-personal skills.
- Excellent understanding of Hadoop architecture, Hadoop Distributed File System and various components such as HDFS, Name Node, Data Node, Job Tracker, Task Tracker, YARN, Spark and MapReduce concepts
- Used Spark extensively to perform data transformations, data validations and data aggregations.
- Good Knowledge on Sparkframework on both batch and real-time data processing
TECHNICAL SKILLS
BigData Technologies: Hadoop 0.22.0 Map Reduce, HDFS, Hive, Pig, Sqoop, Oozie, ZooKeeper, Scala, Apache Spark
Hadoop Distributions: Cloudera (CDH4/CDH5), Hortonworks
Languages: Java, C, SQL, PIG-Latin, Scala
IDE Tools: Eclipse, NetBeans
Framework: Junit
Operating Systems: Windows (XP,7,8,10), UNIX, LINUX, Ubuntu, CentOS
Application Servers: Tomcat, WebLogic
Databases: Oracle 8i, 9i, 10g and MySql
PROFESSIONAL EXPERIENCE
Confidential
Hadoop Developer / Spark Developer
Responsibilities:
- Actively involving POC to migrate Algorithms from MapReduce and Hive to Spark using scala.
- Responsible for de-modularizing components and prepare independent components like matching, arbitration and cleansing.
- Developed UDFs using JAVA for HIVE queries.
- Loaded all the data from Teradata Database and kept in the Hadoop Cluster.
- Used Microsoft Azure for building the applications and for building, testing, deploying the applications.
- Developed Hive scripts for analyst requirements for adhoc analysis
- Managed External tables in Hive for optimized performance in Google Cloud.
- Developed workflows and coordinator jobs in Oozie
- Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Hive
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into MFS.
- Hands on experience in scripting for automation, and monitoring using Shell, PHP, Python.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Builded Hadoop Datalakes and developed the architecture and used in implementations within the organisations.
- Used Talend ETL tool for writing the source, enriching and transforming the data and write to target.
- Used Talend ETL tool for monitoring and managing complex deployments with ease to generate the response.
- Extensively used the advanced features of PL/SQL like Records, Tables, Object types and Dynamic SQL.
- Created PLSQL packages, procedures to customize the data extraction from Oracle Source.
- Developed and modified PL/SQL code to make new enhancements and resolve problems as per custom requirements.
- Tested and debugged Oracle PL/SQL packages.
- Support the users of application with using multiple SQL and PL/SQL techniques.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Hadoop YARN to perform analytics on data in Hive.
- Integrated Kafka with Spark and used System tools to migrate the data. Used Replicated tool for high availability of nodes.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Customized the dashboards and done access management and identity in AWS
- Developed Spark scripts by using Scala shell commands as per the requirement and analysed the data using Confidential EMR.
- Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN.
Environment: Java, Scala, Python, J2EE, Hadoop, Spark, HBase, Hive, Pig, Sqoop, MySQL, Microsoft Azure, Teradata, GitHub, YARN.
Confidential, TX
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Developed Simple to complex Map/reduce Jobs using Hive and Pig.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior
- Used UDF's to implement business logic in Hadoop.
- Experienced with different scripting language like Python and shell scripts.
- Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
- Understanding and Collecting requirements from Business user side or Business Analyst and try to convert that requirement to SQL queries or PL/SQL.
- Used PL/SQL Tables, Cursors to process huge volumes of data and used bulk collect for mass update as performance improvement process.
- Applied the concepts of Load balancing and Confidential cloud front, Elastic cloud balancer in AWS.
- Used EC2 for developing and deploying the applications faster
- Customized the dashboards and done access management and identity in AWS
- Developed Spark scripts by using Scala shell commands as per the requirement and analysed the data using Confidential EMR.
- Created Confidential buckets for cluster logs and output data and launched Confidential EMR cluster.
- Used EMR Redshift for real time graphs and data analytics
- Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
- Continuous monitoring and managing the Hadoop cluster
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Installed Oozie workflow engine to run multiple Hive and Pig jobs
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Actively involved in all phases of SDLC process/methodology and ensured project delivery.
- Created reusable components to enhance process efficiency and minimize impact to application.
- Diligently Track progress during course of the project, prepare daily reports, status reports and relevant documents that are needed in various phases of the project and communicate it to the leadership
- Recommended best practices, enhancements to existing process, implementing technological improvements and efficiencies
- Analyzed current programs including performance, diagnosis and troubleshooting the problem.
- Prepared technical specification documentation
- Extensively used the LOG4j to log regular Debug and Exception statements.
Environment: Hadoop, HDFS, MapReduce, Hive, Python, PIG, Java, Oozie, HBASE, Sqoop, Flume, MySQL.
Confidential, CT
Hadoop Developer
Responsibilities:
- Experience in installing, configuring and using Hadoop Ecosystem components.
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in Java for data cleansing and pre-processing.
- Experience in Importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in defining job flows.
- Experienced in managing and reviewing Hadoop log files.
- Participated in development/implementation of Cloudera Hadoop environment.
- Load and transform large sets of structured, semi-structured and unstructured data.
- Experience in working with various kinds of data sources such as Oracle, DB2, and MySQL.
- Successfully loaded files to Hive and HDFS from traditional databases.
- Responsible for managing data coming from different sources.
- Gained good experience with NOSQL database like HBase.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Experience in writing custom UDFs in java for Hive and Pig to extend the functionality.
- Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in MapReduce way.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Cloudera HDFS, UNIX, Hadoop, MapReduce, Hive, Pig Latin, Java, SQL, Sqoop, HBASE, RDBMS, Eclipse
Confidential
Java-J2EE Developer
Responsibilities:
- Used JSP pages through Servlets Controller for client-side view.
- Created JQuery, JavaScript plug-ins for UI.
- Always used the best practices of Java/J2EE to minimize the unnecessary object creation.
- Implement RESTful web services with the Struts framework.
- Verify them with the J Unit testing framework.
- Working experience in using Oracle 10g backend Database.
- Used JMS Queues to develop Internal Messaging System.
- Developed the UML Use Cases, Activity, Sequence and Class diagrams using Rational Rose.
- Developed Java, JDBC, and Java Beans using JBuilder IDE.
- Developed JSP pages and Servlets for customer maintenance.
- Apache Tomcat Server was used to deploy the application.
- Involving in Building the modules in Linux environment with ant script.
- Used Resource Manager to schedule the job in Unix server.
- Performed Unit testing, Integration testing for all the modules of the system.
- Developed JAVA BEAN components utilizing AWT and SWING classes.
Environment: Java, JDK, Servlets, JSP, HTML, JBuilder, HTML, JavaScript, CSS, Tomcat, Apache HTTP Server, XML, JUNIT, EJB, RESTful, Oracle.