- A Hadoop certified professional with over 8 years of IT experience includes 4+ years of experience in Big Data, Hadoop Eco System related technologies with domain experience in Financial, Banking, Health Care, Retail and Non - profit Organizations in Software Development and support of applications.
- Excellent understanding/knowledge of Hadoop Ecosystem including HDFS, MapReduce, Hive, Pig, Spark, Storm, Kafka, YARN, HBase, Oozie, ZooKeeper, Flume and Sqoop based Big Data Platforms.
- Expertise in design and implementation of Big Data solutions in Banking, Retail and E-commerce domains.
- Experienced with NoSQL databases like HBase, Cassandra, MongoDB and CouchBase.
- Comprehensive experience in building Web-based applications using J2EE Frame works like Spring, Hibernate, EJB, Struts and JMS.
- Excellent ability to use analytical tools to mine data and evaluate the underlying patterns.
- Assisted in Cluster maintenance, Cluster Monitoring, Managing and Reviewing data backups and log files.
- Hands-on experience in developing MapReduce programs using Apache Hadoop for analyzing the Big Data.
- Expertise in optimizing traffic across network using Combiners, joining multiple schema datasets using Joins and organizing data using Partitioners and Buckets.
- Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml and Avro.
- Expertise in composing MapReduce Pipelines with many user-defined functions using Apache PIG.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
- Implemented business logic by writing Pig Latin UDFs in Java and used various UDFs from Piggybanks and other sources HIVE.
- Expertise in Hive Query Language (HiveQL), Hive Security and debugging Hive issues.
- Responsible for performing extensive data validation using HIVE Dynamic Partitioning and Bucketing.
- Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python/Java into Pig Latin and HQL (Hive QL).
- Worked on different set of tables like External Tables and Managed Tables.
- Experience with working different Hive SerDe's that handle file formats like Avro, xml.
- Analyzed the data by performing Hive queries and used HIVE UDF's for complex querying NoSQL.
- Expert database engineer, NoSQL and relational data modeling.
- Responsible for building scalable distributed data solutions using MongoDB, Cassandra.
- In depth knowledge of creating and administering Cassandra NoSQL database systems.
- Involved in Cassandra DB administration in multi-node cluster.
- Experience in HBase Cluster Setup and Implementation.
- Experience in Administering and Installation of Hadoop clusters using Cloudera Manager and Apache Platforms.
- Experience in Big Data platforms like Hortonworks, Cloudera, Amazon EC2 and Apache.
- Experience in cluster administration of Hadoop 2.2.0.
- Experience in using visualization tools like Qlikview and Tableau.
Languages: Java, Hadoop, COBOL, CICS, C, C++, SQL, PL / SQL
Databases: NoSQL, Oracle, DB2, MySQL, SQLite, MS SQL Server 2008 / 2012, MS Access.
Operating Systems: Windows 98/NT/XP/Vista/7, Windows CE, Linux, UNIX, IOS, MAC.
Methodologies: Agile, Rapid Application Development, Waterfall Model, Iterative Model Design Patterns: Singleton, Adapter, Builder, Iterator, Template.
Big data Platforms: Hortonworks, Cloudera, Amazon AWS, Apache
Frameworks: Hibernate, EJB, Struts, Spring
Confidential, Brenham, TX
Sr. Hadoop Developer
- Importing data from relational data stores to Hadoop using Sqoop.
- Worked on Big data platform Hortonworks. Used Kafka as a messaging system to get data from different sources.
- Creating various Hive and Pig Latin scripts for performing ETL transformations on the transactional and application specific data sources.
- Wrote and executed PIG scripts using Grunt shell.
- Big data analysis using Pig and User defined functions (UDF).
- Performed joins, group by and other operations in Hive and PIG.
- Processed and formatted the output from PIG, Hive before sending to the Hadoop output file.
- Used HIVE definition to map the output file to tables.
- Wrote map reduce/Hbase jobs.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
- Worked with HBASE NOSQL database.
- Experienced in analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
- Worked with Apache Sparkfor quick analytics on object relationships.
- Created UDF’s to encrypt the customer sensitive data and stored into HDFS and performed analysis using PIG.
- Effective working with the team in performing the big data tasks and delivering the projects in time.
- Involved in cluster setup meetings with the administration team.
Environment: Apache Hadoop 2.2.0, Hortonworks, MapReduce, Hive, Hbase, HDFS, Cassandra, PIG, Sqoop, Spark, Oozie, MongoDB, Kafka, Java 1.7, UNIX, Shell Scripting, XML.
Confidential, Atlanta, GA
Sr. Hadoop Developer
- Worked on Hortonworks platform. Developed data pipeline using Flume and Sqoop to ingest customer behavioral data and financial histories from traditional databases into HDFS for analysis.
- Involved in writing Map Reduce jobs.
- Involved in Sqoop, HDFS Put or Copy from Local to ingest data.
- Used Pig to do transformations, event joins, filter boot traffic and some pre-aggregations before storing the data onto HDFS.
- Involved in developing Pig UDFs for the needed functionality that is not available from Apache Pig.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Involved in developing Hive DDLs to create, alter and drop Hive tables.
- Involved in developing Hive UDFs for the needed functionality that is not available from Apache Hive.
- Computed various metrics using Java Map Reduce to calculate metrics that define user experience, revenue etc.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Extracted and updated the data into Monod using Mongo import and export command line utility interface.
- Designed and implemented a Cassandra NoSQL based database and associated RESTful web service that persists high-volume user profile data for vertical teams.
- Migrated high-volume OLTP transactions from Oracle to Cassandra and played a lead role in Cassandra tuning; during migration from Oracle based data stores.
- Involved in using SQOOP for importing and exporting data into HDFS.
- Involved in processing ingested raw data using Map Reduce, Apache Pig and Hive.
- Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
- Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or CopyToLocal.
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and Map Reduce) and move the data files within and outside of HDFS.
Environment: Hadoop 2.2.0, Map Reduce, Cassandra, Kafka, Mongo, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, Oracle 11g, Core Java, Hortonworks, HDFS, Eclipse.
Confidential, Jacksonville, FL
- Part of team for developing and writing PIG scripts.
- Loaded the data from RDBMS SERVER to Hive using Sqoop.
- Created Hive tables to store the processed results in a tabular format.
- Developed the Sqoop scripts in order to make the interaction between Hive and MySQL Database.
- Developed Java Mapper and Reducer programs for complex business requirements.
- Developed Java custom record reader, partitioner and serialization techniques.
- Used different data formats (Text format and Avro format) while loading the data into HDFS.
- Created Managed tables and External tables in Hive and loaded data from HDFS.
- Performed complex HiveQL queries on Hive tables.
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with HiveQL queries.
- Created partitioned tables and loaded data using both static partition and dynamic partition method.
- Created custom user defined functions in Hive.
- Performed SQOOP import from Oracle to load the data in HDFS and directly into Hive tables.
- Developed Pig Scripts to store unstructured data in HDFS.
- Scheduled map reduce jobs in production environment using Oozie scheduler.
- Analyzed the Hadoop logs using PIG scripts to oversee the errors caused by the team.
Environment: HDFS, Map Reduce, Hive, Sqoop, Pig, Flume, HBase, Oozie Scheduler, Java, Shell Scripts.
Confidential, Phoenix, AZ
Java and Hadoop Developer
- Extensively implemented various QA methodologies, testing strategies, and test plans in all stages of SDLC by followed Agile SCRUM methodology.
- Developed Pig Scripts for validating and cleansing the data.
- Developed MapReduce programs to phrase the raw data, and stored the refined data in Cognition DB.
- Created HIVE queries for moving data from Cornerstone (Data Lake) to HDFS locations.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
- Shared responsibility for administration of Hadoop, Hive and Pig.
- Exported data from HDFS to RDBMS for visualization and user report generation.
- Involved in the process of load, transform and analyze Transactions data from various providers into Hadoop on an on-going basis.
- Filtered, transformed and combined data which came from Cornerstone (Data Lake) based on business requirements using custom Pig Scripts and stored in Cognition (downstream DB).
- Extensively worked on PIG scripts.
- Responsible for design and creation of Test cases (in Rally) and tested the Tableau dashboards using Functional testing, system testing, Integration testing, regression testing, HiveQL testing and UAT.
- SQL queries and back end testing, Tableau report testing, deployment into UAT and Production.
- Participated and conducted Issue Log weekly status meetings, Report status meetings and Project status meetings to discuss issues and workarounds
- Communicated with developers throughout all the phases of testing to eliminate Roadblocks
- Generated daily progress report and represented in daily Agile Scrum meetings.
Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Core Java, Rally, UNIX, Tableau.
- Managed a team of 6 employees and delivered a project involving the development of Business Intelligence and Enterprise Asset Management.
- Analyzed extensive data available on Client side and coalesced into Data Warehouse with Talend Big Data Environment and suggested ideas to develop business.
- Was part of the team which brought up and implemented the Business Intelligence and Enterprise Asset Management to the client where all the end to end daily operations of the company are maintained.
- Worked in automating release management tasks, thus reducing the defect count of the projects and ensuring a smooth implementation of projects across various teams.
- Worked on various technologies and environments like, Windows (XP, 7, 8), Linux, Web Services RESTful, J2EE, Web Technologies, My SQL, MS-SQLserver, Selenium Testing, Talend Big Data Studio tool which helped me in proposing optimistic ideas to business.
- Developed a tool to automate on business intelligence to graphically represent the relationship between Sales and Customers. Compiling and running the applications.
- Writing the test plans and test cases for the developed screens.
- Executing test cases and fixing the bugs through unit testing.
- Integrating the module with other modules and deploying them on Weblogic12 server.
Environment: Java/J2EE, Servlets, JSP, Apache Tomcat, Weblogic, EJB, Struts, Oracle, XML, HTML5, MY SQL, MS-SQL server