Hadoop Spark Developer Resume
Irvine, CA
SUMMARY
- Around 7+ years of experience in Analysis, Architecture, Design, Development, Testing, Maintenance and User training of software application which includes over 3+ Years in Big Data, Hadoop and HDFS environment and experience in JAVA/J2EE.
- Hands on experience on Hadoop (HDFS, Map Reduce, PIG, HIVE, and SQOOP etc).
- Hands on experience on Spark (1.5, 1.6) & Scala Full stack developer.
- Seasoned Hadoop/Spark/Scala/Java developer experience in Object Oriented programming.
- Experience in Installing, Configuring, Testing Hadoop Ecosystem components, experience on Hadoop clusters using major Hadoop Distributions - Cloudera (CDH3, CDH4 and CDH5) and HortonWorks.
- Experienced in building highly scalable Big-data solutions using Hadoop and multiple distributions i.e., Cloudera, Hortonworks and NoSQL platforms (Hbase& Cassandra).
- Experience in analyzing data using HiveQL, Pig Latin and writing custom mapreduce programs in Java and Python.
- Experienced in converting HiveQL queries into Spark transformations using Spark RDDs and Scala.
- Hands on experience in Apache Sqoop, Apache Storm and Apache Hive integration.
- Hands on experience working with different File Formats like TEXTFILE, JSON, AVROFILE, ORC for HIVE querying and processing.
- Experience Data Modelling using Star and Snow Flake Schema and also worked with Metadata.
- Experience in AWS cloud environment and on S3 storage and EC2 instances.
- Experience on Apache Kafka, used for Messaging broker, Log Aggregation and Stream processing.
- Expertise in migration data from different databases (i.e. Oracle, DB2, Teradata) to HDFS.
- Experience in designing and coding web applications using Core Java & Web Technologies- JSP, Servlets and JDBC, full Understanding of utilizing J2EE technology Stack, including Java related frameworks like spring, ORM Frameworks (Hibernate).
- Have good interpersonal, communicational skills, strong problem solving skills, Strong analytical and judgment techniques.
TECHNICAL SKILLS
Hadoop/Big Data Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Storm, Zookeeper, Kafka, Impala, HCatalog, Apache Spark, Spark Streaming, Spark SQL, Hbase and Cassandra, AWS, Horton works, Cloudera
Web technologies: JSP, Servlets, JDBC, Java Script, CSS
Application Servers: IBM Web sphere, Tomcat
Development and BI Tools: TOAD Visio,, Rational Rose, Endure Informatica 9.1
Databases: Oracle9g/10g & MySQL 4.x/5.x, Hbase, NoSQL
Programming Languages: Java (JDK 5/JDK 6), C/C++, Python, Scala, HTML, SQL
Operating Systems: UNIX, Windows, LINUX, Mac OS X
Development Methodologies: Agile Methodology -SCRUM, Hybrid
PROFESSIONAL EXPERIENCE
Confidential, Irvine, CA
Hadoop spark developer
Responsibilities:
- Evaluated Business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Worked on analyzing Hadoop cluster and different big data analytical and processing tools including Pig, Hive, Spark, Spark Streaming.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Migrating various Hive UDF’s and queries into Spark SQL for faster requests.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and exported the data from HDFS to MYSQL using Sqoop.
- Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
- Hands on experience in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
- Used HIVE to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Experience in using Apache Kafka for log aggregations.
- Experience on working with different data types like FLATFILES, ORC, AVRO and JSON.
- Developed Talend jobs for reading log files.
- Involved in implementing Cluster for Cassandra to address HBase limitations.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems and suggested some solution.
Environment: MapReduce, HDFS, Hive, Pig, Spark, Spark-Streaming, Spark SQL, Apache Kafka, Sqoop, Java, Scala, CDH4, CDH5, AWS, Eclipse, Oracle, Git, Shell Scripting and Cassandra.
Confidential, Fort Worth, TX.
Hadoop developer
Responsibilities:
- Involved in Various Stages of Software Development Life Cycle (SDLC) deliverables of the project using the AGILE Software development methodology.
- Worked on importing data from various sources and performed transformations using Cloudera MapReduce, hive to load data into HDFS.
- Loading the data from the different data sources like (Teradata, DB2, Oracle and Flat files) into HDFS using Sqoop and load into Hive tables, which are partitioned.
- Created different PIG Scripts and converted them as shell command to provide aliases for common operation for project business flow.
- Hands on experience in Apache Sqoop, Apache Storm and Apache Hive integration as part of the project implementation.
- Expereince in using Apache Storm to build real-time data integration systems, to analyze clean, normalize, and resolve large amounts of non-unique data points with low latency and high throughput.
- Experience in working on log files using Apache Storm.
- Implemented various Hive queries for analysis and call them from java client engine to run on different nodes.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Developed scripts to bring the log files from FTP Server and then processing it to load into Hive tables.
- Experience in developing Hive UDFs using Java programming language.
- Experience in creating statistics of logs and extracts useful information from the statistics in real-time using Apache Storm.
- Moved data from HDFS to Hbase using Map Reduce and Bulk Output Format class.
- Experience in Implementing Rack Topology scripts to the Hadoop Cluster.
- Developed Helper class for abstracting Hbase cluster connection act as core toolkit.
- Participated day-to-day meeting, status meeting, and effective communication with team members.
Environment: MapReduce, HDFS, Hive, Pig, Hbase, Apache Storm, HDP, Sqoop, Java, Eclipse, Oracle, Linux, Shell Scripting, Maven, Git.
Confidential, Denver, CO
Hadoop Developer
Responsibilities:
- Responsible for architecting Hadoop clusters with CDH3.
- Extensively involved in Installation and configuration of Cloudera distribution Hadoop 2, 3, NameNode, Secondary NameNode, JobTracker, TaskTrackers and DataNodes.
- Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
- Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
- Managed and reviewed Hadoop Log files.
- Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Designed a data warehouse using Hive. Created partitioned tables in Hive.
- Mentored analyst and test team for writing Hive Queries.
- Installed and configured Hive and also written Hive UDFs.
- Involved in HDFS maintenance and WEBUI it through Hadoop-Java API.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Created HBase tables to store various data formats of PII data coming from different portfolios.
- Extensively used Pig for data cleansing.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Oozie, Java (jdk1.6), Oracle 11g/10g, PL/SQL, SQL*PLUS, Windows NT, UNIX Shell Scripting.
Confidential
Java Developer
Responsibilities:
- Involved in various stages of Enhancements in the Application by doing the required analysis, development, and testing.
- Prepared the High and Low level design document and Generating Digital Signature.
- For the registration and validation of the enrolling customer developed logic and code.
- Extensively used Java Multi-Threading concept for downloading files from a URL.
- Extensively used Eclipse IDE for developing, debugging, integrating and deploying the application.
- Developed web-based user interfaces using J2EE Technologies.
- Handled Client side Validations used JavaScript.
- Used Validation Framework for Server side Validations.
- Created test cases for the Unit and Integration testing.
- Front-end was integrated with Oracle database using JDBC API through JDBC-ODBC Bridge driver at server side.
- Developed required stored procedures and database functions using PL/SQL.
- Developed, Tested and debugged various components in WebLogic Application Server.
- Used XML, XSL for Data presentation, Report generation and customer feedback documents.
- Implemented Logging framework using Log4J.
- Involved in code review and documentation review of technical artifacts.
Environment: Java Servlets, JSP, JavaScript, XML, HTML, UML, Apache Tomcat, Eclipse, JDBC, Oracle 11g and other basic office tools.