Hadoop/spark Developer Resume
Dallas, TX
PROFESSIONAL SUMMARY:
- 6+ years of Full Software Development Life Cycle experience in Software, System analysis, design, development, testing, deployment, maintenance, enhancements, re - engineering, migration, troubleshooting and support of multi-tiered web applications in high performing environments.
- 4+ years of comprehensive experience in Big Data Technology Stack.
- 3+ years working experience in Spark Core, Spark SQL, Spark Streaming and Kafka.
- Expertise in deploying Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and Map Reduce/Yarn concepts.
- Expertise in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster along with CDH3&4 clusters.
- Extensive hands-on programming experience in various technologies like JAVA, J2EE, JSP, JSF, JMS, Servlets, Portlets (JSR 168, JSR 286, IBM API), SQL, JDBC, EJB, HTML, XML, Struts, Spring, Web Services, SOAP, REST, RAD, Scala, Eclipse, SOLR, Visual Studio on Windows, UNIX and AIX.
- Good experience with setting up and configuring a Hadoop cluster on cloud infrastructure like Amazon web Services (EC2, EMR, and S3)
- Experience in automating the Hadoop Installation, configuration and maintaining the cluster by using the tools like Puppet and Chef.
- Experience on commercial distribution of Hadoop including Hortonworks production HDP (Hortonworks Data Platform) and Cloudera CDH.
- Very good understanding on NOSQL databases like HBase, Cassandra and MongoDB.
- Designed and implemented Cassandra NoSQL based database and associated RESTful web service that persists high-volume user profile data for vertical teams.
- Experience in building large scale highly available Web Applications and worked on web services and other integration patterns.
- Good knowledge on Data warehouse concepts and ETL development using tools informatica PowerCenter.
- Experience in managing and reviewing Hadoop log files, deploying Pig, Hive, Scoop and Cloudera Manager, importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
- Analyzed the data on Cassandra cluster by running queries for searching, sorting and grouping.
- Experience in Hive and Pig core functionality by writing custom UDFs.
- Experience in analyzing data using HiveQL, Pig Latin, MapReduce and Yarn.
- Developed MapReduce jobs to automate transfer of data from HBase.
- Familiarity on real time streaming data with Spark for fast large scale in memory MapReduce.
- Experience with Eclipse/ RSA. Good coding skills in Java and Scala.
- Expertise in RDBMS like MS SQL Server, MySQL and DB2.
- Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper, of NoSQL databases such as HBase and of administrative tasks such as installing Hadoop, Commissioning and decommissioning, and its ecosystem components such as Flume, Oozie, Hive and Pig.
- Hands on experience in RDBMS, and Linux shell scripting.
- Techno-functional responsibilities include interfacing with users, identifying functional and technical gaps, estimates, designing custom solutions, development and production support.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
TECHNICAL SKILLS:
Programming Languages: C, C++, Java, Shell Scripting, JavaScript, SQL, PL/SQL, Python, Scala
J2EE Technologies: J2EE, Struts 2.1/2.2, Spring, 3.x, Servlets, JSP, JDBC, Hibernate 3.x/4.x, JUnit, REST/SOAP Web services
Big Data Ecosystem: HDFS, HBase, MapReduce, Hive, Pig, Sqoop, Cassandra, Oozie, Zookeeper, Flume, Spark Core, Spark-SQL, Spark Streaming and Kafka.
Databases: Oracle 10g/9i/8i, MySQL, SQL Server, Teradata, DB2, Informix
Modeling Tools: BREW, J2ME, Android
Web Technologies: HTML, JavaScript, XML, jQuery, Ajax, CSS
Web Services: REST, Jersey, Axis 1.x, SOAP, WSDL, UDDI
IDEs: MyEclipse, Eclipse, IntelliJ IDEA, NetBeans, WSAD
WORK EXPERIENCE:
Confidential, Dallas, TX
Hadoop/Spark Developer
Responsibilities:
- Migrated 160 tables from Oracle to Cassandra using Apache Spark.
- Handled importing of data from various data sources, performed transformations using Spark and loaded data into Cassandra.
- Worked on the Core, Spark SQL and Spark Streaming modules of Spark extensively.
- Used Scala to write code for all Spark use cases.
- Assigned name to each of the columns using case class option in Scala.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Data Frame, Pair RDD's and YARN.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Built real time pipeline for streaming data using Kafka/Microsoft Azure Queue and Spark Streaming.
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in Cassandra cluster.
- Performed performance tuning for Spark Steaming e.g. setting right Batch Interval time, correct level of Parallelism, selection of correct Serialization & memory tuning.
- Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
- Queried and analyzed data from Cassandra for quick searching, sorting and grouping.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Worked with Cassandra UDT (User-Defined Type) extensively.
- Involved in Spark-Cassandra data modeling.
Environment: Java 1.8, Scala 2.11.8, Hive, HDFS, YARN, Apache Spark 1.6.1, Cassandra 2.1.12, CDH 5.x, Sqoop, Kafka, Python, Oracle 12c.
Confidential, Southfield, MI
Hadoop Developer
Responsibilities:
- Involved in all phases of development activities from requirements collection to production support.
- Migrated from different RDBMS system and focused on migrating from Cloudera distribution to Amazon to reduce project cost
- Worked on deploying and tuning live Hortonworks production HDP (Hortonworks Data Platform) clusters.
- Worked with different feeds data like JSON, CSV, XML and implemented data lake concept.
- Defined UDFs using PIG and Hive in order to capture customer behavior.
- Design and implement MapReduce jobs to support distributed processing using java, Hive and Pig.
- Create Hive external tables on the MapReduce output before partitioning, bucketing is applied on top of it.
- Maintenance of data importing scripts using Hive and MapReduce jobs.
- Created MapReduce Jobs on Amazon Elastic Map Reduce (Amazon EMR)
- Worked on hive data warehouse modeling to interface with BI tools such as Tableau.
- Administered hive permissions & user access with Kerberos authentication.
- Extracted the data from Teradata into HDFS using the Sqoop.
- Build customized memory indexes for high performance information retrieval of products using Apache Lucene and Apache SOLR, which provides more precise and useful search data.
- Develop and maintain several batch jobs to run automatically depending on business requirements.
- Import and export data between the environments like MySQL, HDFS and deploying into productions.
- Developed dashboards, reports, ad - hoc views, domains in Jasper soft server & Tableau for business/stakeholders.
- Supported Map Reduce Programs those are running on the cluster
- Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS
- Involved in writing Flume and Hive scripts to extract, transform and load the data into Database.
- Very Good experience on UNIX shell scripting, Python and WLST.
- Good experience on developing of ETL Scripts for Data cleansing and Transformation.
- Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
- Expertise in designing python scripts to interact with middleware/back end services.
- Worked on python scripts to analyze the data of the customer.
- Used Jira for bug tracking.
- Responsible for Coding batch pipelines, Restful Service, Map Reduce program, Hive query's, testing, debugging, Peer code review, troubleshooting and maintain status report
- Loaded data into MongoDB.
- Used Git to check-in and checkout code changes.
Environment: Hadoop-Java API, Hortonworks, Linux, Python, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Zookeeper, MapReduce, MongoDB, AWS EC2, EMR, Jenkins, SOLR, Restful Service, Teradata, Tableau, Amazon Data Pipeline.
Confidential, Camp Hill, PA
Big Data/Hadoop Developer
Responsibilities:
- Involved in complete SDLC of the project includes requirements gathering, design documents, development, testing and production environments.
- Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
- Developed Java Map Reduce programs on log data to transform into structured way.
- Developed Java Map Reduce programs using Python programming.
- Developed optimal strategies for distributing the web log data over the cluster; importing and exporting the stored web log data into HDFS and Hive using Sqoop.
- Created HBase tables to load large sets of structured, semi - structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
- Experienced on setting up Hortonworks cluster and installing all the ecosystem components through Ambari and manually through command line.
- Monitored workload, job performance and capacity planning using Hortonworks Ambari.
- Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
- Involved in Agile methodologies, daily scrum meetings, spring planning
- Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, manage and review data backups and log files.
- Collected the log data from web servers and integrated into HDFS using Flume. queries and Pig Scripts.
Environment: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, HBase, Zookeeper, Java, ETL, Linux, Oozie, Maven, Shell Scripting, Python, RHEL, Rational tools, Hortonworks HDP.
Confidential
Java/J2EE Developer
Responsibilities:
- Analysis, design and development of Application based on J2EE using Struts, Spring and Hibernate
- Involved in developing the user interface using Struts and worked with JSP, Servlets, JSF, JSTL/EL.
- Worked with JDBC and Hibernate.
- Configured and Maintained Subversion version control.
- Implemented Data Access Object, MVC design patterns.
- Experience of working in Agile Methodology.
- Worked with both SOAP and Restful web Services.
- Used PL/SQL for queries and stored procedures in ORACLE as the backend RDBMS.
- Worked with Complex SQL queries, Functions and Stored Procedures.
- Developed Test Scripts using JUnit and JMockit.
- Use of core java, which includes Generics and Annotations.
- Involved in refactoring the existing code.
- Implemented Struts, J2EE Design Patterns like MVC, Spring Rest API, DAO, Singleton and DTO Design patterns.
- Developed Web Services using XML messages that use SOAP.
- Developed Spring Configuration file to define data source, beans and Hibernate properties.
- Experience in using WebSphere Application Server to Deploy Application.
- Used SVN as a version control.
- Designed middleware components like POJO (Plain Old Java Objects such as Java beans)
- Developed controller and bean classes using spring and configured them in spring configuration file.
- Worked with Struts Validation Framework to implement Client Side and Server Side validations.
- Worked with log4j utility to implement run time log events.
- Worked with ANT and Maven to develop build scripts.
- Worked with Hibernate, JDBC to handle data needs.
- Configured Development Environment using Tomcat and Apache Web Server.
Environment: Struts 1.x/2.x, Spring 2.0, J2SE 1.6, JEE 6, JSP 2.1, J2EE Design Patterns, HTML 5, JavaScript, JSF, jQuery 1.6/1.7, jQuery UI, XML, Servlets 2.5, WSDL, JUnit, JMockit, CSS, AJAX, Apache 2.0, Java Beans, Tomcat 5.5, Oracle 9i/10g, Oracle Application Server.