Hadoop / Spark Developer Resume
Atlanta, GA
SUMMARY
- Over 8+ years of commendable experience in the IT industry with proven expertise in Big Data Analytics, and Development.
- Having 3 years of experience in Bigdata related technologies like Hadoop frameworks, Map Reduce, Hive, HBase, PIG, Sqoop, Spark, Kafka, Flume, ZooKeeper, Oozie, and Storm.
- Having working experience on Cloudera Data Platform using VMware Player, Cent OS 6 Linux environment.
- Strong experience on Hadoop distributions like Cloudera, MapR and HortonWorks.
- Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, JSON and Avro.
- Expertise in Database Design, Creation and Management of Schemas, writing Stored Procedures, Functions, DDL, DML SQL queries
- Good knowledge of No - SQL databases Cassandra, MongoDB and HBase.
- Worked on HBase to load and retrieve data for real time processing using Rest API.
- Experience in installation, configuration, Management, supporting and monitoringHadoop cluster using various distributions such as Apache SPARK, Cloudera and AWS Service console.
- Implemented batch processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
- Very good experience of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Good working experience using Sqoop to import data into HDFS or Hive from RDBMS and exporting data back to HDFS or HIVE from RDBMS.
- Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
- Worked with Big Data distributions like Cloudera (CDH 3 and 4) with Cloudera Manager.
- Worked in ETL tools like Talend to simplify Map Reduce jobs from the front end. Also have knowledge of Pentaho and Informatica as another working ETL tool with Big Data.
- Worked with BI tools like Tableau for report creation and further analysis from the front end.
- Extensive knowledge in using SQL queries for backend database analysis.
- Involved in unit testing of Map Reduce programs using Apache MRunit.
- Worked on Amazon Web Services and EC2
- Experience developing applications using Java, J2EE, JSP, MVC, Hibernate, JMS, JSF, EJB, XML, AJAX and web based development tools.
- Experience in creating Reusable Transformations (Joiner, Sorter, Aggregator,Expression, Lookup, Router, Filter, Update Strategy, Sequence Generator, Normalizer and Rank) and Mappings using InformaticaDesigner and processing tasks using Workflow Manager to move data from multiple sources into targets.
- Implemented SOAP based web services.
- Used Curl scripts to test RESTful Web Services.
- Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle.
- Experience working with Build tools like Maven and Ant.
- Experienced in both Waterfall and Agile Development (SCRUM) methodologies
- Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.
- Experience in developing service components using JDBC.
TECHNICAL SKILLS
Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Avro, Hadoop Streaming, Storm, Spark, Kafka, YARN, Crunch, Zookeeper, HBase, Impala, Cassandra, MongoDb, Neo4J
Languages: Java, Python, C, Scala, SQL, PL/SQL, Shell Script
JEE Technologies: JSP, Servlets and JDBC
Web Technologies: HTML5, CSS3, JavaScript, JQuery, AJAX, Servlets, JSP,JSON, XML, XHTML, Rest Web Services
Frame works: Hibernate, Spring, Struts, JMS, EJB, JUnit, MRUnit, JAXB
IDE Tools: Eclipse, NetBeans
Database: Oracle 8i, 10g, 11g, IBM DB2, MySQL, Derby
Build Tools: Jenkins, Maven, ANT
Methodologie: Agile Methodologie (Scrum, Kanban)
Web Servers: Jboss, Tomcat, Web Logic, Web Sphere
Reporting Tools: Jasper Reports, iReport, Tableau, QlikView
PROFESSIONAL EXPERIENCE
Confidential, Atlanta, GA
Hadoop / Spark Developer
Responsibilities:
- Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
- Developed the services to run the Map-Reduce jobs as per the requirement basis.
- Importing and exporting data into HDFS and HIVE using Sqoop.
- Responsible to manage data coming from different sources.
- Monitoring the running MapReduce programs on the cluster.
- Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and also written Pig/Hive UDFs.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Writing MapReduce (Hadoop) programs to convert text files into AVRO and loading into Hive (Hadoop) tables
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Developing design documents considering all possible approaches and identifying best of them.
- Loading Data into HBase using Bulk Load and Non-bulk load.
- Developed scripts and automated data management from end to end and sync up b/w all the clusters.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop.
- Import the data from different sources like HDFS/HBase into SparkRDD.
- Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Involved in gathering the requirements, designing, development and testing.
- Followed agile methodology for the entire project.
- Prepare technical design documents, detailed design documents.
Environment: Hive, HBase, Flume, Java, Maven, Impala, AngularJs, Splunk, Pig, Spark, Oozie, Oracle, Yarn, GitHub, Junit, Tableau, Unix, Cloudera, Flume, Sqoop, HDFS, Tomcat, Java, Scala, Python.
Confidential, San Diego, CA
Hadoop/Spark Developer
Responsibilities:
- Handled moving of data from various data sources and perform transformations using Pig.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Used Sqoop to transfer data between RDBMS and Hadoop Distributed File System.
- Analyze and understand the requirements given by downstream users.
- Involved in data transformations and data cleansing using Pig.
- Assisted in troubleshooting and optimization of MapReduce jobs and Pig Latin scripts.
- Effectively used Oozie to develop automatic workflows of Sqoop, MapReduce and Hive jobs
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Experienced with performing CURD operations in HBase.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Implementing various advanced join operations using Pig Latin.
- Worked on Sequence files, ORC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume.
- Continuous monitoring and managing the Hadoop cluster through Ambari.
- Worked on the proof-of-concept for Apache Spark framework initiation.
- Developed, monitored and did coordinate project activities, such as scheduling, tracking, and reporting.
Environment: Hortonworks (HDP 2.2), MapReduce V2, Hue, Ambari, Apache Hive 0.13, Apache Pig 0.14, Apache Sqoop 1.4.x, Apache Oozie 4.x, Oracle 11g, Shell scripting.
Confidential, Nashville, TN
Hadoop Developer
Responsibilities:
- Involved in loading data from UNIX file system to HDFS.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
- Worked on Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Racks, Disk Topology, Manage and review data backups, Manage and review Hadoop log files.
- My responsibility involves in setting up the Hadoop cluster for the project and also working on the project using HQL and SQL.
- Involved in performing the Linear Regression using Scala API and Spark.
- Used D3.js to visualize the json data generated from hive quires related to Tumors.
- Involved in setting up the Genomics Pipeline in Hadoop.
- Used Spark to perform Variant calling Techniques In big data Genomics.
- Installed and configured MapReduce, HIVE and the HDFS; implemented CDH5 Hadoop cluster on CentOS. Assisted with performance tuning, monitoring and troubleshooting.
- Created Map Reduce programs for some refined queries on big data.
- Involved in the development of Pig UDF'S to analyze by pre-processing the data
- Developed Map Reduce programs for some refined queries on big data.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Managing and scheduling jobs on a Hadoop cluster using Oozie.
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
- Involved in developing the complex Map/reduce Jobs for data cleaning.
- Implemented Partitioning and bucketing in Hive.
- Mentored analyst and test team for writing Hive Queries.
- Involved in setting up of HBase to use HDFS.
- Extensively used Pig for data cleansing.
- Configured Flume to extract the data from the web server output files to load into HDFS.
- Used Spark Streaming to fetch the twitter data with ASU hast tags to perform the sentiment analysis.
- Used Hive partitioning and bucketing for performance optimization of the hive tables and created around 20000 partitions.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Created RDD's in Spark technology and extracted data from data warehouse on to the Spark RDD's
- Used Spark with Scala/Python.
- Extensively used components like tHL7Input, tHL7Output and tfile components to create Talend jobs.
- Worked in optimization and improving performance of the Talend jobs.
- Created topics on the Desktop portal using Spark Streaming with Kafka and Zookeeper.
- Involved in getting back the lost data using DAG process.
- Used Datastax JAR's for this project.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Environment: MapReduce, HDFS, Hive, Java (jdk1.7), Pig, Linux, XML. HBase, Zookeeper, Kafka, Sqoop, Flume, Oozie, Greenplum DB
Confidential, Dublin, OH
Hadoop Developer
Responsibilities:
- Involved with the application teams to install Hadoop updates, patches and version upgrades as required.
- Worked on analyzing, writing Hadoop MapReduce jobs using JavaAPI, Pig and Hive.
- Created HBase tables to store variable data formats of data coming from different portfolios.
- Involved in Configuring core-site.xml and mapred-site.xml according to the multi node cluster environment.
- Having experience on Hadoop eco system components HDFS, MapReduce, Hive, Pig, Sqoop and HBase.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Experience in developing the user interfaces using HTML, CSS, AJAX and JAVASCRIPT.
- Experience working on installing and configuration of windows active directory.
- Strong Knowledge on HDFS, MapReduce and NoSQL Database like HBase.
- Experience in client side Technologies such as HTML, CSS, JavaScript, jQuery
- Responsible for writing Hive Queries for analyzing terabytes of customer data from HBase and put the results in output file.
- Designed Business classes and used Design Patterns like Data Access Object, MVC etc.
- Responsible for the overall layout design, color scheme of the web site using HTML, bootstrap and CSS3.
- Created Server Side of application for project management using Node JS and Mongo DB
Environment: Java 6, MongoDB, Apache Web server, HTML, JDBC, NoSQL, meteor.js, Eclipse, UNIX, CSS3, XML, JQuery, Oracle.
Confidential
Java Developer
Responsibilities:
- Gathered specifications from the requirements.
- Developed the application using Struts MVC 2 architecture.
- Developed JSP custom tags and Struts tags to support custom User Interfaces.
- Developed front-end pages using JSP, HTML and CSS
- Developed core Java classes for utility classes, business logic, and test cases
- Developed SQL queries using MySQL and established connectivity
- Used Stored Procedures for performing different database operations
- Used JDBC for interacting with Database
- Developed servlets for processing the request
- Used Exception Handling for handling exceptions
- Designed sequence diagrams and use case diagrams for proper implementation
- Used Rational Rose for design and implementation
Environment: JSP, HTML, CSS, JavaScript, MySQL, JDBC, Servlets, Exception Handling, UML, Rational Rose.
Confidential
Java Developer
Responsibilities:
- Developed Custom tags, JSTL to support custom User Interfaces.
- Designed the user interfaces using JSP.
- Designed and Implemented MVC architecture using Struts Framework, Coding involves writing Action Classes/Custom Tag Libraries, JSP.
- Developed Action Forms and Controllers in Struts 2.0/1.2 framework.
- Utilized various Struts features like Tiles, tagged libraries and Declarative Exception Handling via XML for the design.
- Involved in writing and business layer using EJB, BO, DAO and VO.
- Implemented Business processes such as user authentication, Account Transfer using Session EJBs.
- Worked with Oracle Database to create tables, procedures, functions and select statements.
- Used Log4J to capture the log that includes runtime exceptions and developed WAR framework to alert the client and production support in case of application failures.
- Developed the application using the iterative and incremental software development process of SCRUM technology.
- Developed the Dao's using SQL and Data Source Object.
- Development carried out under Eclipse Integrated Development Environment (IDE).
- Used JBoss for deploying various components of application
- Used Ant for building Scripts.
- Used JUNIT for testing and check API performance.
- Used Clear case Version Control for Project Configuration Management
Environment: Java1.5, J2EE, Struts, HTML, CSS, JavaScript, Hibernate, SQL 2005, ANT, Log4j, JUnit, XML, JSP, JSTL, AJAX, JBoss, ClearCase.