- Overall 8+ years of proactive IT experience in Analysis, Design, Development, Implementation, and Testing of software applications which includes an accomplished almost 4+ Years of experience in Bigdata, Development, and Design of Java based enterprise applications.
- Leveraged strong skills in developing applications involving Bigdata technologies like Hadoop, Yarn, Flume, Hive, Pig, Sqoop, HBase, Cloudera, MapR, Avro, Spark and Scala.
- Extensively worked on major components of Hadoop Ecosystem like HDFS, HBase, Hive, Sqoop, PIG.
- Developed various scripts, numerous batch jobs to schedule various Hadoop programs.
- Experience in analysing data using HiveQL, and custom programs in Java.
- Hands on experience in importing and exporting data from different databases like Oracle, MySQL, into HDFS and Hive using Sqoop.
- Implemented Flume for collecting, aggregating and moving a large number of server logs and streaming data to HDFS.
- Hands on experience in spark, Scala, and Mark logic.
- Extensively used Design Patterns to solve complex programs.
- Developed Hive queries for data analysis to meet the business requirements.
- Experience in extending Hive and Pig core functionality by writing custom UDFs like UDAFs and UDTFs.
- Experienced implementing Security mechanism for Hive Data.
- Extensively used ETL methodology for performing Data Profiling, Data Migration, Extraction, Transformation and loading using Talend/SSIS and designed data conversions from a wide variety of source systems.
- Extensively created mappings in Talend using tMap, tJoin, tReplicate, tParallelize, tJava, tJavarow, tDie, tAggregateRow, tWarn, tLogCatcher, tMysqlScd, tFilter, tGlobalmap etc.
- Experience with Hive Queries Performance Tuning.
- Experienced in improving data cleansing process using Pig Latin operations, transformations and join operations.
- Extensive knowledge of NoSQL database like HBase.
- Experienced in performing CRUD operations using HBase Java Client API and Rest API.
- Experience in designing both time drove and data driven automated workflows using Bedrock and Talend.
- Good knowledge in creating PL/SQL Stored Procedures, Creating indexes, Packages, Functions, Triggers, Cursors with Oracle (9i, 10g, 11g), and MySQL server.
- Expert in designing and writing on - demand UNIX shell scripts.
- Extensively worked with Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java Design Patterns.
- Excellent Java development skills using J2EEFrameworks like Struts, EJBs and Web Services.
- Proficient in development methodologies such as Scrum, Agile, and Waterfall.
- Passion to excel in any assignment and have good debugging and problem-solving skills.
- Ability to work under high pressure and close deadlines.
- Excellent adaptability, ability to learn, good analytical and programmatic skills.
Hadoop/Big Data Technologies: HDFS, HBase, Hive, Pig, Impala, SQOOP, Flume, OOZIE, Spark, SparkQL, and Zookeeper, AWS, Cloudera, Hortonworks, Kafka, Avro, BigQuery.
Languages: Core Java, XML, HTML and HiveQL.
J2EE Technologies: Servlets, JSP, JMS, JSTL, AJAX, DOJO, JSON and Blaze DS.
Frameworks: Spring 2, Struts 2 and Hibernate 3.
Application & Web Services: WebSphere 6.0, JBoss 4.X and Tomcat 5.
Scripting Languages: Java Script, Angular JS, Pig Latin, Python 2.7and Scala.
Database (SQL/No SQL): Oracle 9i, SQL Server 2005, MySQL, HBase and Mongo DB 2.2
IDE: Eclipse and Edit plus.
PM Tools: MS MPP, Risk Management, ESA.
Other Tools: SVN, Apache Ant, Junit and Star UML, TOAD, Pl/SQL Developer, Perforce, JIRA, Bugzilla, Visual Source, QC, Agile Methodology.
EAI Tools: TIBCO 5.6.
Bug Tracking/ Ticketing: Mercury Quality Centre and Service Now.
Operating System: Windows 98/2000, Linux /Unix and Mac.
Confidential, Fort Collins, CO.
Hadoop/Big Data Developer
- Translation of functional and technical requirements into detailed architecture and design.
- Developed automated scripts for all jobs starting from pulling the data from Mainframes to HDFS system.
- Developed and Designed ETL Applications and Automated using Oozie workflows and Shell scripts with error handling and mailing System.
- Implemented nine nodes CDH4Hadoop cluster on Ubuntu LINUX.
- Implemented programs by joining data sets from different sources using joins.
- Optimized s programs by configuring configurationally parameters and implemented optimized joins.
- Implemented map reduce solutions like Top-K, summarizations, data partitions using design patterns.
- Implemented map reduce programs to handle different file formats like XML, Avro, sequence files and implemented compression techniques.
- Developed hive queries according to business requirement.
- Designed/created Hive Internal tables, partitions to store structured data.
- Developed Hive custom UDF's to in corporate business logics into Hive Queries.
- Used Hive Optimized file formats like ORC formats and Parquet formats.
- Implemented Hive Serializes, desterilizes to handle Avro files, used Xpath expressions to handle XML files.
- Importing & exporting data from RDBMS through Sqoop.
- Designed Cassandra data modelling to analyse near real time analysis.
- Configured Cassandra clustered, v-nodes, replication strategies, and configured data model using data stax community.
- Ensured NFS is configured for Name Node.
- Designed/Implemented time series data analysis using Cassandra file system.
- Implemented CRUD operation on top of Cassandra data using CQL and Rest API.
- Implemented data import/export data from structured data using Sqoop import/export options.
- Implemented Sqoop saved jobs, incremental imports to import data.
- Used Compression Techniques (snappy) with file formats to leverage the storage in HDFS.
- Used cloud era manager to perform cluster monitoring, debug jobs, handle job submission on the cluster.
- Successfully migrated Legacy application to Bigdata application using Hive/Pig in Production level.
Environment: Hadoop, HDFS, Map Reduce, Hive, Sqoop, Oozy, Cloudera, PIG, Java (JDK 1.6), Eclipse, MySQL and Ubuntu, Zookeeper.
Confidential, Boston, MA.
Hadoop/Big Data Developer
- Involved in all phases of development activities from requirements collection to production support.
- Detailed understanding of the current system and find out the different sources of data for EMR.
- Involved in a Cluster setup.
- Performed Batch processing of logs from various data sources using
- Automated job Cloudera submission Via Jenkins scripts and Chef.
- Predictive analytics (which can monitor inventory levels and ensure product availability)
- Analysis of customers' purchasing behaviours in JMS.
- Response to value-added services based on clients' profiles and purchasing habits
- Worked on gathering and refining requirements, interviewing business users to understand and document data requirements including elements, entities, and relationships, in addition to visualization and report specifications.
- Defined UDFs using PIG and Hive in order to capture customer behaviour.
- Design and implement map reduce jobs to support distributed processing using java, Hive, Spark SQL and Apache Pig, Oozie.
- Integrated Apache Kafka for data ingestion.
- Create Scala, Hive external tables on the output before partitioning, bucketing is applied on top of it.
- Providing Shell scripting graphs in order to show the trends
- Maintenance of data importing scripts using HBase, Hive and jobs
- Developed and maintain several batch jobs to run automatically depending on business requirements
- Import and export data between the environments like MySQL, HDFS, and Unit testing and Deploying for internal usage monitoring performance of the solution.
Environment: Apache Hadoop, Cloudera, RHEL, Hive, HBase, PIG, HDFS, Java Map-Reduce, Core Java, Python, Maven, GIT, Jenkins, UNIX, MYSQL, Eclipse, Oozie, Sqoop, Flume and Cloudera Distribution, Oracle, Teradata, and MySQL.
Confidential, Provo, Utah
- Detailed Understanding of the existing build system, Tools related for information on various products and releases and test results information
- Designed and implemented jobs to support distributed processing using java, Hive, and Apache Pig.
- Developed UDF's to provide custom hive and pig capabilities using SOAP/RESTful services.
- Built a mechanism for talend, automatically moving the existing proprietary binary format data files to HDFS using a service called Ingestion service.
- Implemented a prototype to integrate PDF documents into a web application using GitHub.
- Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, Scrum data manipulation
- Performed Scala, Data transformations in Scala, HIVE and used partitions, buckets for performance improvements.
- Written custom Input format and record reader classes for reading and processing the binary format in.
- Written Custom writable classes for Hadoop serialization and De serialization of Time series tuples.
- Implemented Custom File loader for Pig so that we can query directly on the large Data files such as build logs
- Used Python for pattern matching in build logs to format errors and warnings
- Developed Pig Latin scripts & Shell script for validating the different query modes in Historian.
- Created Hive external tables on the output before partitioning bucketing is applied on top of it.
- Improved the Performance by Scala, tuning of HIVE and map reduce using talend, Active MQ, and JBoss.
- Developed Daily Test engine using Python for continuous tests.
- Used Shell scripting for Jenkins job automation with Talend.
- Building a custom calculation engine which can be programmed according to user needs.
- Ingestion of data into Hadoop using Shell scripting for Scrum, Elastic Sqoop and apply data transformations and using Pig and HIVE.
- Handled the performance improvement changes to Pre-Ingestion service which is responsible for generating the Bigdata Format binary files from an older version of Historian.
- Worked with support teams and resolved operational & performance issues
- Research, Scrum, evaluate and utilize new technologies/tools/frameworks around Hadoop eco system
- Prepared graphs from test results posted to MIA.
Environment: Apache Hadoop, Hive, Scala, PIG, HDFS, Cloudera, Java Map-Reduce, Core Java, Python, Maven, GIT, Jenkins, UNIX, MYSQL, Eclipse, Oozie, Sqoop, Flume, Oracle, My SQL, and CDH4.X.
Confidential, North Bergen, New Jersey.
- Worked on analysing Hadoop cluster using different Bigdata analytic tools including Hive, Pig.
- Installed and configured the Hadoop cluster using the Cloudera's CDH distribution and monitored the cluster performance using the Cloudera Manager.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Implemented schedulers on the Job tracker to share resources of the cluster for the jobs given by cluster.
- Involved in Design, develop Hive Data model, loading with data and writing Java UDF for Hive.
- Handled importing and exporting data into HDFS by developing solutions, analysed the data using, Hive and produce summary results from Hadoop to downstream systems.
- Used Sqoop to import and export the data from Hadoop Distributed File System (HDFS) to RDBMS.
- Created Hive tables and loaded data from HDFS to hive tables as per the requirement.
- Established custom s programs to analyse data and used HQL queries for data cleansing.
- Created components like Hive UDFs for missing functionality in Hive to analyse and process large volumes of data extracted from the No-SQL database Cassandra.
- Collecting and aggregating substantial amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Worked on optimization of algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for an HDFS cluster.
- Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, Scrum data manipulation.
Environment: Cloudera Distribution (CDH), HDFS, Pig, Hive, Map Reduce, Sqoop, Hbase, Impala, Java, SQL, Cassandra.
- Involved in the analysis, design, and development, testing phases of the Software Development Life Cycle (SDLC).
- Analysis, design, and development of Application based on J2EE using Struts and Tiles, Spring 2.0 and Hibernate 3.0.
- Involved in interacting with the Business Analyst and Architect during the Sprint Planning Sessions.
- Used XML Web Services for transferring data between different applications.
- Used Apache CXF web service stack for developing web services and SOAPUI and XML-SPY for testing web services.
- Used JAXB for binding XML to Java.
- Used SAX and DOM parsers to parse XML data.
- Hibernate was used for Object-Relational mapping with Oracle database.
- Worked with Spring IOC for injecting the beans and reduced the coupling between the classes.
- Implemented Spring IOC (Inversion of Control)/DI (Dependency Injection) for wiring the object dependencies across the application.
- Implemented spring transaction management for implementing transactions for the application.
- Implemented design patterns for Service Locator.
- Performed unit testing using JUnit3, Easy Mock Testing Framework for performing Unit testing.
- Worked on PL/SQL stored procedures using PL/SQL Developer.
- Involved in Fixing the production Defects for the application.
- Used ANT as build-tool for building J2EE applications.
Environment: HDFS, HBase, HDP Horton, Sqoop, Data Processing Layer, HUE, AZURE, Erwin, MS Visio, Tableau, SQL, MongoDB, Oozie, UNIX, MySQL, RDBMS, Ambari, Solr Cloud, PL/SQL, TOAD, Java.
- Involved in design and implementation of server-side programming.
- Involved in gathering requirements, analysed them and prepared high-level documents.
- Participated in all client meetings to understand the requirements.
- Actively involved in designing and data modelling using Rational Rose Tool (UML)
- Involved in the design of the SPACE database.
- Implemented User Interface using spring tiles framework.
- Tuxedo server, which provides case details, is fetched by the help of web services technology (i.e. binding, finding a service and use of XML message format etc.)
- Involved in the integration system with BT's systems like GTC, CSS or through e Link hub and IBMMQ series.
- Developed, Deployed and tested JSP's, Servlets in WebLogic.
- Used Eclipse as IDE tool and integrated WebLogic with Eclipse to deploy and develop the applications and JDBC to connect the database.
Environment: Struts Framework, Java 1.3, XML, Data Modelling, JDBC, SQL, Pl/SQL, JMS, Web Services, SOAP, Solaris 9, ANT tool, Toad, Eclipse.