- About 10+ years of IT experience in software Design, Development, Analysis, Maintenance, Testing, Support and troubleshooting in Finance, Media and Telecom Industries.
- Over 3 years of experience in Hadoop, Hive, Pig, Sqoop, Flume, Oozie, Yarn, Impala, Kafka, Zookeeper, Flume, MongoDB, Cassandra and designing and implementing Map/Reduce jobs to support distributed data processing and process large data sets utilizing the Hadoop cluster.
- Expertise in implementing, supporting, utilizing and troubleshooting COBOL, JAVA, C, C++, SAS and PL/1 programs.
- Worked on integrated BI tools and Mainframe applications.
- Expertise in creating test plans, test cases and test scripts.
- Expertise Unit Testing, Regression Testing, Integration testing, User Acceptance testing, production implementation and maintenance.
- Experience in writing MapReduce jobs using apache Crunch.
- Experience in working with Hadoop clusters using Cloudera (CDH5) and Horton Works distributions.
- Hands on experience in installing, configuring CloudEra's Apache Hadoop ecosystem components like Flume - ng, Hbase, Zoo Keeper, Oozie, Hive, Sqoop, Cascading, Hue, Pig and Hue with CDH3 4 5 clusters.
- Experience in Big Data analysis using PIG and HIVE and understanding of SQOOP and Puppet.
- Experience in developing customized UDFs in java to extend Hive and Pig Latin functionality.
- Good understanding of HDFS Designs, Daemons, federation and HDFS high availability (HA).
- Strong experience working with Test Driven Development (TDD) techniques such as JUunit and Mockito along with code coverage tools like Emma.
- Well versed in designing and implementing MapReduce jobs using JAVA on Eclipse to solve real world scaling problems.
- Hands on experience working with Java project build managers Apache MAVEN and ANT.
- Solid understanding of the high volume, high performance systems.
- Strong Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
- Fair amount of experience with scripting in PERL and Python.
- Good knowledge in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
- Basic Knowledge of UNIX and shell scripting.
- Have flair to adapt to new software applications and products, self-starter, have excellent communication skills and good understanding of business work flow.
Big Data Technologies:: Hadoop, HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Zookeeper, Impala, Crunch, Oozie
Languages:: C, Java J2EE
Technologies:: Java, Java Beans, J2EE (JSP, Servlets, EJB), JDBC & NoSQL
Databases:: Hbase, NoSQL, Redis MongoDB, and Cassandra
Build tools:: Maven and ANT
Statistical Programming:: R
Operating Systems: LINUX (Centos and Ubuntu), Windows XP, 7, MS DOS Office
Tools: MS-OFFICE - Excel, Word, PowerPoint
Confidential, NYC, NY
Sr. Big data developer/Admin
- Installed, configured, and maintained Apache Hadoop clusters for application development and major components of Hadoop Ecosystem: Hive, Pig, HBase, Sqoop, Flume, Oozie and Zookeeper
- Implemented six nodes CDH4 Hadoop Cluster on CentOS
- Importing and exporting data into HDFS and Hive from different RDBMS using Sqoop
- Experienced in defining job flows to run multiple MapReduce and Pig jobs using Oozie
- Importing log files using Flume into HDFS and load into Hive tables to query data
- Used HBase-Hive integration, written multiple Hive UDFs for complex queries
- Involved in writing APIs to Read HBase tables, cleanse data and write to another HBase table
- Created multiple Hive tables, implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access
- Responsible for architecting Hadoop clusters with CDH4 on CentOS, managing with Cloudera Manager
- Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
- Utilize frameworks and extensions to Hadoop such as Cascading and Hive
- Experience working with Apache SOLR for indexing and querying
- Knowledge on Zookeeper internals
- Automated QA testing of client /server interactions for IBM Biginsights.
- Experienced in running batch processes using Pig Scripts and developed Pig UDFs for data manipulation according to Business Requirements
- Experienced in writing programs using HBase Client API
- Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop
- Experienced in design, development, tuning and maintenance of NoSQL database
- Written MapReduce program in Python with the Hadoop streaming API
- Developed unit test cases for Hadoop MapReduce jobs with MRUnit
- Excellent experience in ETL analysis, designing, developing, testing and implementing ETL processes including performance tuning and query optimizing of database
- Experience in using Pentaho Data Integration tool for data integration, OLAP analysis and ETL process
- Experience integrating R with Hadoop with RHadoop for statistical analysis and predictive modelling
Environment: Cloudera, RHEL 6.3/Centos 6.3, Oracle 10g, SQL Server, MapReduce, Hive, Solr, Pig, HDFS, HBase, Flume, Sqoop, Zookeeper, Python, Flat files, AWS(Amazon Web services), MongoDB, Teradata, UNIX/LINUX Shell Script.
Confidential, Kansas City, MO
Sr. Big data Lead developer
- Developed Big Data analytic models using Hive.
- Developed multiple MapReduce jobs in java for data cleaning, pre-processing.
- Involved in creating Hive Tables, loading with data and writing Hive queries, which will invoke and run MapReduce jobs in the backend.
- Extract and load the data from DB2 and Mainframe tape files and copy over to HDFS.
- Create Statistical reports as needed for quantitative analysis of business management team.
- Importing and exporting data into HDFS, Hbase and Hive using Sqoop.
- Analyzed the data using Hive queries and running Pig scripts to study customer behavior.
- Developed UDF's to implement business logic in Hadoop.
- Load log data into HDFS using Flume.
- Queried and analyzed data from Datastax Cassandra for quick searching, sorting and grouping.
- Worked on Apache Pig for reporting, analytics, Cascading for machine learning
- Developed extraction modes using Hive from policy.
- Designed and implemented Map Reduce jobs to support distributed data processing.
- Processed large data sets utilizing our Hadoop cluster.
- Designed NoSQL schemas in Hbase.
- Responsible for building scalable distributed data solutions using MongoDb, Cassandra
- Hands on experience working on Amazon SQS.
- Developed MapReduce ETL in Java and Pig.
- Responsible for performing extensive data validation using HIVE.
- Imported and exported the data using Sqoop from HDFS to Relational Database systems
- Managing and scheduling Jobs on a Hadoop cluster using Oozie.
- Responsible to manage data coming from different sources.
- Experience in loading and transforming of large sets of structured, semi structured and unstructured data.
Environment: Linux, Hadoop, Hive, Hbase, Pig, Flume, Map Reduce, MongoDb, Cassandra, Sqoop, Python, Scala, SQL, DB2, AWS(Amazon Web services), Cascade, Impala, Teradata and Oracle.
Confidential, New York, NY
Java Hadoop developer
- Interact with customers, business partners and all stakeholders to understand the business objective and drive solutions that effectively meet the needs of a client.
- Sketches the big data solution architecture, then monitors and governs the implementation.
- Design strategies and programs to collect, store, analyse and visualize data from various sources for specific projects.
- Work with bigdata scientists and engineers to produce powerful data processes to be purposed into real time analytics and reporting applications as well as building the necessary hardware and software for the chosen big data solution.
- Involved in writing and reviewing Unit Test using JUNIT and Mockito
- Implemented J2EE Design Patterns like MVC, Service Locator and Session Façade.
- Developed and implemented the MVC Architectural Pattern using Spring Framework including JSP, Servlets, EJB, Form Bean and Action classes
- Devised and led the implementation of the next generation architecture for more efficient data ingestion and processing, formulated procedures for planning and execution of system upgrades for all existing Hadoop clusters.
- Participated in development and execution of system and disaster recovery processes and actively collaborated in all Security Hardening processes on the Cluster.
- Upgrade the Hadoop Cluster from CDH 4.1 to CDH5 (Hadoop with Yarn) Cloudera Distribution.
- Support the data analysts and developers of BI and for Hive/Pig development.
Environment: Apache Hadoop, HDFS, Cassandra, MapReduce, Hbase, Impala, Java (jdk1.6), Kafka, MySQL, Amazon, DB Visualizer, Linux, Sqoop, Apache Hive, Apache Pig, Infosphere Python, Scala, NoSQL, Flume, Oozie
Sr. Java developer
- Participated in JAD sessions with the developers
- Developed new enhancements for the system based on the user requirements
- Designed and developed user interface screens using JSP
- Developed stored procedures, triggers, functions for the application
Worked in a team of five developers to complete different modules
- Implemented server-side business components using session beans
- Used Spring framework for bean wiring and maintained configuration file
- Created various DAO components using Hibernate
- Implemented SOAP based web services
- Enhanced the resume search criteria by including additional parameters
- Developed the automatic email alert feature for employers to see the overview of daily applications
- Optimized existing reports for performance and generated new reports
- Worked on resolving service requests submitted by the management on a daily basis
- Used Oracle JDeveloper and SQL Navigator as tools for Java and PL/SQL development.
- Responsible for planning, designing with ER Studio and coding.
- Designed and developed Site Hierarchy interface and other GUI Screen applications with java SWING.
- Worked on Creating Form Bean, Action classes and Configuration files using Struts framework.
- Implemented Validation framework for field validations.
- Used Struts Internationalization provision in order to support.
- Threads scheduling is used for the calendar tool.
- Developed extensible XLST procedures for handling navigational trees of any depth.
- Involved coding and review of the system.
- Migrated C++ Image Viewer component for add - on features.
- Involved for preparation of user entry screens and web forms.
- Worked on writing SQL Queries and PL/SQL Stored Procedures using Oracle 9i. Used JDBC for connectivity.
- Deployed the application on using FTP to Linux OS.
- Involved in unit testing and test cases.