- Over all 8+ years of experience in data analysis, data modeling and implementation of enterprise class systems spanning Big Data, Data Integration, Object Oriented programming, Data warehousing and Advanced Analytics
- 4 years of experience with Hadoop, HDFS, Map Reduce and Hadoop Ecosystem (Hive, Hive,Oozie, Kafka, Impala & Spark, AVRO, JSON).
- Good knowledge of Hive optimization with ORC, Partitions and Bucketing.
- Data ingestion schedulers have been created using Sqoop and Oozie scheduler.
- Have hands on experience in writing MapReduce jobs using Java.
- Hands on experience in writing pig Latin scripts and pig commands and hive queries.
- Having good knowledge and experience in Spark and Kafka.
- Hands on experience in installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, Sqoop, Pig, Scala,Hive, Impala & Spark
- Experience in database development using SQL and PL/SQL and experience working on databases like Oracle 9i/10g, Informix, and SQL Server.
- Having handful experience on AWS (EC2, Redshift,Cloud Watch, Route 53, EMR, Cloud front, S3, IAM)
- Experience working on NoSQL databases including HBase & MongoDB.
- Experience using Sqoopto import data into HDFS from RDBMS and vice - versa.
- Experience in Database Design and Development using Relational Databases (Oracle, MS-SQL, MySQL Server 2005/2008) and NoSQL Databases (MongoDB, Cassandra, HBase)
- Effective team player and excellent communication skills with insight to determine priorities, schedule work and meet critical deadlines.
- Having good work experience in file formats such as AVRO, JSON, and Parquet etc. with Hadoop tools using SerDe concepts
- Experience in analyzing data in Spark using Scala and Pyspark.
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with HiveQL queries.
- Experience in importing streaming data into HDFS using flume sources and flume sinks and transforming the data using flume interceptors
- Experience utilizing Java tools in Business, Web, and Client-Server environments including Java, Jdbc, Servlets, Jsp, Struts Framework, Jasper Reports and Sql.
- Proficient in using various IDEs like Eclipse, Net beans.
- Experienced with different scripting languages like Python and shell scripting.
Big Data Ecosystem: Hadoop, Map Reduce, HDFS, HBase, Spark, Scala, Impala, Hive, Pig, Oozie,sqoop, Flume, Kafka, CDH4, JSON, AVRO
Java Technologies: Java 5,Java 6, JAXP, AJAX, I18N, JFC Swing, Log4j, Java Help API
Methodologies: Agile, UML, Design Patterns
Database: Oracle 10g, DB2,MySQL, No Sql (MongoDB),Hbase, Cassandra
Cloud: AWS (EC2, Redshift,Cloud Watch, Route 53, EMR, Cloud front, S3, IAM)
Application Server: Apache Tomcat 5.x 6.0, Jboss 4.0
Web Tools: HTML, Java Script, XML, DTD, Schemas, XSL, XSLT, XPath, DOM, XQuery
Tools: SQL developer, DB visualize, Hortonworks
IDE / Testing Tools: NetBeans, Eclipse, WSAD, RAD, Mat lab
Operating System: Windows. Linux
Scripts: Bash, Python, ANT
Testing API: JUNIT
Confidential, Dallas, TX
Big Data / Hadoop Lead
- Involved in the project from POC and worked from data staging till saturation of DataMart and reporting.
- Worked in an onsite-offshore environment.
- Completely responsible for creating data model for storing & processing data and for generating & reporting alerts. This model is being implemented as standard across all regions as a global solution.
- Involved in discussions and guiding other region teams on Citi Big data platform and AML cards data model and strategy.
- Responsible for technical design and review of data dictionary (Business requirement).
- Responsible for providing technical solutions and work arounds.
- Migrating the needed data from Data warehouse and Product processors into HDFS using Talend and Sqoop and importing various formats of flat files in to HDFS.
- Analysis and development of Spark Cassandra connector to load data from flat file to Cassandra.
- Using Spark Streaming to bring all credit card transactions in the Hadoop environment.
- Involved in design of overall Citi Group Big data architecture.
- Involved in discussion with source systems for issues related to DQ in data.
- Integrated the hive warehouse with Spark & Impala. We replaced impala with spark due to impala’s security issue.
- Comfortable with SCALA functional programming idioms and very familiar with Iterate / Enumerate streaming patterns. Almost entire DQ and end to end reconciliation is done in SCALA & SPARK.
- Implemented partitioning, dynamic partitions, indexing and buckets HIVE.
- Created Custom UDF’s in JAVA to overcome HIVE limitations on cloudera CDH5.
- Used Hive to process data and Batch data filtering. Used Spark/Impala for any other value centric data filtering.
- Supported and Monitored Map Reduce Programs running on the cluster.
- Monitored logs and responded accordingly to any warning or failure conditions.
- Responsible for preserving code and design integrity using SVN and SharePoint.
- Gave a demo to business users on using Datameer for analytics.
Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Java, Talend, Spark, Cassandra, Impala, Scala, Sqoop, Cloudera CDH5, Platform, SVN, SharePoint, Data Meer and Maven.
Confidential, New York city, NY
Big Data / Hadoop Developer
- Used struts validation framework for form level validation.
- Wrote test cases in JUnit for unit testing of classes.
- Worked on Spring to develop different modules to assist the product in handling different requirements.
- Implemented CDH3 Hadoop cluster on CentOS.
- Implemented POC's to configure data tax Cassandra with Hadoop.
- Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/Ubuntu) and Configuring launched instances with respect to specific applications.
- Launching and Setup of HADOOP Cluster which includes configuring different components of HADOOP.
- Hands on experience in loading data from UNIX file system to HDFS.
- Experienced with Performing Cassandra Queryoperations using Thrift API to perform real time analytics.
- Cluster coordination services through Zookeeper.
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Involved in creating Hive tables, loading data and running hive queries in those data.
- Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
- Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts.
- Working knowledge in writing Pig's Load and Store functions.
Environment: Apache Hadoop 1.0.1, MapReduce, HDFS, CentOS, Zookeeper, Sqoop, Cassandra, Hive, Pig, Oozie, Java, Eclipse, Amazon EC2, JSP, Servlets.
Confidential, Somerset, NJ
Java/ J2EE Developer
- Coded the business methods according to the IBM Rational Rose UML model.
- Extensively used Core Java, Servlets, JSP and XML.
- Used Struts 1.2 in presentation tier.
- Generated the Hibernate XML and Java Mappings for the schemas
- Used DB2 Database to store the system data
- Used Rational Application Developer (RAD) as Integrated Development Environment (IDE).
- Used unit testing for all the components using JUnit.
- Used Apache log 4j Logging framework for logging of trace and Auditing.
- Used IBM Web-Sphere as the Application Server.
- Used IBM Rational Clearcase as the version controller.
Environment: s: Java 1.6, Servlets, JSP, Struts1.2, IBM Rational Application Developer (RAD) 6, Web sphere 6.0, iText, AJAX, Rational Clear case, Rational Rose, Oracle 9i, log4j.
- Involved in the complete SDLC software development life cycle of the application from requirement analysis to testing.
- Developed the modules based on struts MVC Architecture.
- Created Business Logic using Servlets, Session beans and deployed them on WebLogic server.
- Used MVC struts framework for application design.
- Created complex SQL Queries, PL/SQL Stored procedures, Functions for back end.
- Prepared the Functional, Design and Test case specifications.
- Involved in writing Stored Procedures in Oracle to do some database side validations.
- Performed unit testing, system testing and integration testing
- Developed Unit Test Cases. Used JUnit for unit testing of the application.
- Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects. Resolved more priority defects as per the schedule.
- Used Eclipse IDE for all coding in Java, Servlets and JSPs.
- Co-ordinate with the QA lead for development of test plan, test cases, test code and actual testing, responsible for defects allocation and ensuring that the defects are resolved.
- Used Flex Styles and CSS to manage the Look and Feel of the application.
- Deployed the application on Web Sphere Application server.