- 8+ years of IT experience and over 4 years of experience in analysis, design, development and implementation of large - scale applications within Big Data Hadoop environment using technologies such as Spark, Map Reduce, Cassandra, Hive, Pig, Sqoop, Oozie, HBase, Zoopkeeper and HDFS.
- Well-versed in Hadoop architecture and Spark Architecture with various components of Hadoop 1.X and 2.X such as HDFS, Job Tracker, Task Tracker, Data Node, Name Node and YARN concepts such as Resource Manager, Node Manager.
- Strong experience in writing Spark, Map Reduce programs, HiveQL and PigLatin scripts leading to good understanding in Map Reduce design patterns, data analysis using Hive and Pig.
- Strong knowledge of Apache Spark, Hive and Pig's analytical functions, extending Spark, Hive and Pig functionality by writing custom UDFs.
- Broad knowledge of working with Apache Spark Streaming API on Big Data Distributions in an active cluster environment.
- Involved in creation of tables, indexes and was involved in writing queries to read/manipulate data in Teradata
- Very capable at using AWS utilities such as EMR, S3 and Cloudwatch to run and monitor Hadoop/Spark jobs on AWS.
- Very well versed in workflow scheduling and monitoring tools such as Oozie, Hue and Zookeeper.
- Experienced in Scala programming for writing applications in Apache Spark.
- Proficient in importing and exporting data from Relational Database Systems to HDFS and vice versa, using Sqoop.
- Proficient in importing and exporting data from Teradata system to HDFS and vice versa, using Teradata utilities such as Multiload, Fastload or Basic Teradata Query (BTEQ).
- Experience in using Apache Flume for collecting, aggregating and moving vast amounts of data from application servers.
- Expertise in the implementation of Core Java concepts of Java, J2EE Technologies: JSP, Servlets, JSF, JSTL, EJB transaction implementation (CMP, BMP, Message-Driven Beans), JMS, Struts, Spring, Hibernate, JavaBeans, JDBC, XML, Web Services, JNDI, Multi-Threading, Data structures etc.
- Good knowledge in implementing web service layers and prototyping User Interfaces for intranet, web applications and websites using HTML5, XML, CSS, CSS3, AJAX, Java Script, JQuery, Bootstrap, AngularJS, Ext JS, JSP and JSF.
- Good understanding of column-family NoSQL databases like HBase and Cassandra in enterprise use cases.
- Very capable in processing of large sets of structured, semi-structured and unstructured data and supporting system application architecture in Hadoop, Spark and SQL databases such as Teradata, MySQL, DB2.
- Working experience in Impala, Mahout, SparkSQL, Storm, Avro, Kafka, Hue and AWS.
- Experience with installing, backup, recovery, configuration and development on multiple Hadoop distribution platforms like Hortonworks Distribution Platform (HDP), Cloudera Distribution for Hadoop (CDH).
- Hands-on experience in application development using Java, RDBMS, and Linux shell scripting, Perl.
- Hands-on experience working with IDE tools such as Eclipse, NetBeans, Visual Studio, and Maven.
- Ability to adapt to evolving technologies, a keen sense of responsibility and accomplishment.
Big Data Technologies: Hadoop (Horton works, Cloudera), HDFS, YARN, Map Reduce, Apache Spark, Apache Pig, Apache Hive, Apache HBase, Impala, Sqoop, Cassandra, MongoDB, Spark Streaming, Spark SQL, Spark ML, Flume, Oozie, Hue, Zookeeper, ActiveMQ and Apache Kafka, Amazon EMR.
ETL Tools: Informatica Power Center 9.6/8.x/7.x (Source Analyzer, Mapping Designer, Mapplet, Transformations, Workflow Monitor, Workflow Manager), Flat file system (Fixed width, Delimiter), OLAP
Databases: MySQL, Oracle 11g, DB2, MS-SQL Server, HBase, Cassandra, MongoDB.
Data Modeling Tool / Methodology: Dimensional Data Modeling, Star Schema, Snow Flake Schema, Extended Star Schema, Physical and Logical Modeling, Erwin 5.1/4.1
Operating Systems: UNIX, Windows, Linux
IDE: IntelliJ, Eclipse, NetBeans, Visual Studio, XCode and Maven, Junit, MRUnit.
Reporting Tools: Qlikview
J2EE Technology: Java Beans, Servlets, JSP, JDBC, EJB, JNDI, JMS, RMI.
Confidential, Milwaukee, WI
Senior Hadoop/Spark Developer
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Prepared Spark builds from source code and ran the Pig Scripts using Spark rather using MR jobs for better performance.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Used Spark API over Hortonworks, Hadoop YARN to perform analytics on data in Hive.
- Wrote Hive queries for data analysis to meet the business requirements.
- Developed Kafka producer and consumers for message handling.
- Used Amazon CLI for data transfers to and from Amazon S3 buckets.
- Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets.
- Exploring with Spark improving performance and optimization of the existing algorithms in Hadoop MapReduce using Spark Context, Spark-SQL, Data Frames, Pair RDD's and Spark YARN.
- Deployed MapReduce and Spark jobs on Amazon Elastic MapReduce using datasets stored on S3.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
- Used Amazon Cloudwatch to monitor and track resources on AWS.
- Knowledge of designing and deployment of Hadoop cluster and different Big Data analytic tools including Hive, HBase, Oozie, Sqoop, Flume, Spark, Impala, Cassandra.
- Real time streaming of data using Spark with Kafka.
- Responsible for developing data pipeline using Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
- Experienced with batch processing of data sources using Apache Spark and Elastic search.
- Experienced in implementing Spark RDD transformations, actions to implement business analysis.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Integrating user data from Cassandra to data in HDFS.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
- Involved in importing the real-time data to Hadoop using Kafka and implemented Oozie jobs for daily imports.
- Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in Oozie.
- Created Hive tables and involved in data loading and writing Hive UDFs.
Environment: CDH4, CDH5, Scala, Spark, Spark Streaming, Spark SQL, HDFS, AWS, Hive, Pig, Linux, Eclipse, Oozie, Hue, Flume, MapReduce, Apache Kafka, Sqoop, Oracle, Shell Scripting and Cassandra, Hortonworks.Confidential, TX
Senior Hadoop Developer
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
- Supported MapReduce Programs, those are running on the cluster.
- Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Hive.
- Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
- Creating Hive tables, dynamic partitioning, buckets for sampling, and working on them using HiveQL.
- Used Pig to parse the data and store it in Avro format.
- Stored data in tabular formats using Hive tables and Hive SerDes.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Used Amazon CLI for data transfers to and from Amazon S3 buckets.
- Executed Hadoop jobs on AWS EMR using programs, data stored in S3 Buckets.
- Involved in creating UNIX shell scripts for database connectivity and executing queries in parallel job execution.
- Developed and written Apache Pig scripts and Hive scripts to process the HDFS data.
- Designed and implemented incremental imports into Hive tables.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Exported the analyzed data to the relational databases using Sqoop for visualization.
- Analyzed large and critical datasets using HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper.
- Developed custom aggregate UDF's in Hive to parse log files.
- Identified the required data to be pooled to HDFS, and created Sqoop scripts which were scheduled periodically to migrate data to the Hadoop environment.
- Involved with File Processing using Pig Latin.
- Created MapReduce jobs involving combiners and partitioners to deliver better results and worked on application performance optimization for an HDFS cluster.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
Environment: Cloudera, MapReduce, HDFS, Pig Scripts, Hive Scripts, HBase, Sqoop, Zookeeper, Oozie, Oracle, Shell Scripting.Confidential, TX
- Wrote the MapReduce jobs to parse the web logs which are stored in HDFS.
- Designed and implemented MapReduce based large-scale parallel relation-learning system.
- Developed the services to run MapReduce jobs as per the daily requirement.
- Involved in creating Hive tables, loading them with data and writing hive queries.
- Involved in optimizing Hive Queries, joins to get better results for Hive ad-hoc queries.
- Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data into HDFS.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Hands on experience with NoSQL databases like HBase for POC (proof of concept) in storing URL's, images, products and supplements information at real time.
- Developed integrated dash board to perform CRUD operations on HBase data using Thrift API.
- Implemented error notification module to support team using HBase co-processors (Observers).
- Configured, integrated Flume sources, channels, destinations to analyze log data in HDFS.
- Implemented flume custom interceptors to perform cleansing operations before moving data onto HDFS.
- Involved in troubleshooting errors in Shell, Hive and MapReduce.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Developed Oozie workflows which are scheduled monthly.
- Designed and developed read lock capability in HDFS.
- Developed unit test cases using MRUnit and involved in unit testing and integration testing.
Environment: Cloudera, Hadoop, MapReduce, HDFS, Hive, Pig, Oozie, NoSQL (HBase), Sqoop, Linux, Java, Thrift, Junit, MRUnit, JDBC, Oracle 11g, SQL.Confidential
- Involved in all project phase discussions and executed project from PDP (predefined phase) to rollout phase with post implementation activities.
- Developed user interface using JSP, JSP Tag libraries, Spring Tag libraries and Java Script to simplify the complexities of the application.
- Involved in enhancing certain modules of web portal using Spring Framework.
- Developed web and service layer components using Spring MVC.
- Implemented various design patterns like MVC, Factory, Singleton.
- Designed user interface for users to interact with system using JQuery, Java Script, HTML5, CSS3.
- Writing custom filters, directives, controller for the HTML using Angular code.
- Followed Agile Methodology in software development.
- Used Hibernate DAO Support to integrate Hibernate with Spring to access database
- Implemented RESTful Web Services and associated business module integration for getting status of claim report.
- Worked with Angular JS for creating own HTML elements for building a very companied web site which is Open Source Project.
- Developed backend business logic with Spring Framework and achieved asynchronous messaging with Java Messaging Services (JMS).
- Used Mongo DB for storing minimal data documents and used in file sharing.
- Performed Client-side validation using Java Script.
- Use JSF to design web application. Including DB connection, pom.xml file for Dependency Injection, java and XHTML File, MVC model.
- Designed user interfaces using JSP Standard Tag Libraries, HTML, DHTML, CSS, JSF and JSP
- Validated the user inputs using Spring Validator.
- Dependency Injection was used across all the layers of application
- Developed database Schema & populating data using SQL statements, PL/SQL Functions, Stored Procedures, Triggers and Bulk Upload Monitored error logs using Log4J and fixed the problems.
- Worked on JUnit Framework for Test Driven Development (TDD).
- Worked on source code management tools such as SVN.
Environment: Java, J2EE, JSP, JSF, Spring, Hibernate, Java Script, Angular JS, JQuery, HTML, CSS/CSS3 Servlet, Mongo DB, Oracle 11g, Apache Tomcat, Eclipse IDE, XML, MVC, Factory, Singleton, RESTful Web services, SVN.Confidential
- Interacted with business analysts and architecture group's gathering requirements and use cases.
- Involved in Object Oriented Analysis and Design (OOAD) using UML for designing the application.
- Developed Class diagrams, Sequence diagrams, and State diagrams.
- Developed the application using the Struts.
- Implemented Struts Validation Framework for Server-side validation.
- Developed JSP pages for the presentation layer, used customtaglibraries, JSP Standard Tag Library (JSTL).
- Developed the Session Beans for handling the complex business logic.
- Developed the Hibernate for handling the database access.
- Extensively wrote Stored Procedures, Triggers, and Cursors, Views for data retrieval and data storage and updates in PL/SQL database.
- Wrote Apache ANT build scripts for building the application and unit test cases using JUnit for performing the unit testing.
- Designing Test Plans, Test Cases and performed System Testing.
- Coordinated the build and deployment of EARs on Webs in Test and Development environments.
- Extensively used CVS as source control and Involved in the Configuration Management software configuration/change control board