- 8 years of professional IT work experience in Analysis, Design, Development, Deployment and Maintenance of critical software and big data applications.
- 4+ years of hands on experience across Hadoop and that includes extensive experience into Big Data technologies.
- Hands on experience in developing and deploying enterprise based applications using major Hadoop ecosystem components like Map Reduce , YARN , Hive , Pig , Hbase , Flume , Sqoop , Spark Streaming , Spark SQL , Storm , Kafka , Oozie , Zookeeper and Cassandra .
- Hands on experience of multiple distributions like Cloudera , Hortonworks and Mapr .
- Experience in installation, configuration, supporting and managing Cloudera Hadoop platform along with CDH4 and CDH5 clusters.
- Used Pig & Python scripting for preprocessing the data.
- Experience in managing Hadoop clusters using Cloudera manager tool.
- Capable of processing large sets of structured , semi - structured and unstructured data and supporting systems application architecture.
- Experience in Automating Sqoop , Hive and Pig scripts using Oozie work flow scheduler.
- Hands on experience in application development using Java, Scala and Linux shell scripting.
- Hands on experience in using MapReduce programming model for Batch processing of data stored in HDFS .
- Good experience in optimizing MapReduce algorithms using Mappers, Reducers, combiners and partitioners.
- Very good experience in Python and Shell scripting .
- Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
- Experience in optimizing Hive queries, optimized joins and using different data files with Custom SerDe's.
- Experience in using different file formats like CSV , Sequence , AVRO , RC , ORC , JSON and PARQUET files and different compression Techniques like LZO, Gzip, Bzip2 and Snappy .
- Experience in big data ingestion tools like Sqoop , Flume and Apache Kafka .
- Experience in using Flume and Kafka to load the log data from multiple sources into HDFS .
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala .
- Hands on experience with NoSQL Databases like Hbase , MongoDB and Cassandra .
- Experience in retrieving data from databases like MYSQL , Teradata, Informix, DB2 and Oracle into HDFS using Sqoop and ingesting them into Hbase and Cassandra .
- Hands on experience in setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop jobs.
- Involved in design and development of various web and enterprise applications using various technologies like JSP, Servlets, Struts, Hibernate, and spring, JDBC, JSF, XML, Java Script, HTML, AJAX, SOAP and Amazon Web Services.
- Worked on backend database programming using SQL , PL/SQL, Stored Procedures, Functions, Macros, Indexes, Joins, Views, Packages and Database Triggers.
- Good understanding and experience with Software Development methodologies like Agile and Waterfall.
- Effective leadership quality with good skills in strategy, business development, client management and project management.
Big Data Ecosystem: Hadoop, Map Reduce, YARN, Pig, Hive, Hbase, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, MongoDB, Cassandra, Kafka.
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks and MapR.
No SQL Databases: Cassandra, MongoDB, Hbase, Dynamo DB, CouchDB.
Java: J2EE, JSP, CSS, Jquery, Servlets, HTML, Java Script
Mainframe: JCL, COBOL, CICS, DB2.
Databases: MYSQL, Oracle, DB2 for Mainframes, Teradata, Informix.
Operating Systems: Windows, Unix, Linux
Other Tools: Putty, WINSCP, Filezilla, EDI(Gentran), Streamweaver, Compuset.
Frameworks: Struts, spring, Hibernate.
App/Web servers: WebSphere, WebLogic, JBoss, Tomcat
Confidential, NORTH CAROLINA.
- Responsible for building scalable distributed data solutions using Hadoop .
- Developed the ETL jobs to load the data into a data warehouse, which is coming from various data sources like Mainframes, flat file.
- Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra .
- Configuring of Hive , PIG , Impala , Sqoop , Flume and Oozie in CDH 5.
- Experience in using Sqoop to import the data on to Cassandra tables from different relational databases and also importing data from various sources to the Cassandra cluster using Java API's.
- Created Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from Linux, NoSQL and a variety of portfolios.
- Involved in creating data-models for customer data using Cassandra Query Language.
- Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Hands on writing Map Reduce code to make semi structured data as structured data and for inserting data into Hbase from HDFS .
- Developed Junit tests for testing Map Reduce and also performed testing using small sample data.
- Develop wrapper and utility automation scripts in Python .
- Development of user visible site using Perl , back end admin sites using Python and big data using core java.
- Developed Perl Scripts and Power shell for automation purpose.
- Interacted with business users in gathering the business requirements.
- Supported Hbase Architecture Design with the Hadoop Architect team to develop a Database Design in HDFS .
- Transformed and aggregated data for analysis by implementing work flow management of Sqoop , Hive and Pig scripts.
- Write scripts to automate application deployments and configurations monitoring YARN .
- Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python into Pig Latin and HiveQL.
- Written MapReduce programs in Python with the Hadoop streaming API.
- Wrote Hive queries for data analysis to meet the business requirements.
- Involved in creating Hive tables and loading them with data and writing Hive queries.
- Involved in importing data from Oracle tables to HDFS and Hbase tables using Sqoop .
- Real time Streaming the data using Spark with Kafka for faster processing.
- Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala .
- Involved in converting Cassandra / Hive / SQL queries into Spark transformations using Spark RDDs in Scala .
- Involved in doing POC’s for performance comparison of Spark SQL with Hive .
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
- Extensively used components like tWaitForFile, tIterate ToFlow, tFlowToIterate, tHashoutput, tHashInput, tMap, tRunjob, tJava, tNormalize and tfile components to create Talend jobs
- Developed Talend jobs to move inbound files to HDFS file location based on monthly, weekly, daily and hourly partitioning.
Environment: Cloudera, Map Reduce , Spark SQL , Spark Streaming , Pig , Hive , Flume , Oozie , Java , Eclipse, Zookeeper , Cassandra , Hbase , Talend , Github.
Confidential - PLANO, TEXAS.
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig , Hive , Hbase database and Sqoop .
- Having Knowledge to implement Hortonworks distribution system (HDP 2.1, HDP 2.2 and HDP 2.3).
- Developed Map Reduce programs for some refined queries on big data.
- Creating Hive tables and working on them for data analysis in order to cope up with the requirements.
- Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
- Worked with business team in creating Hive queries for ad hoc access.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- In depth understanding of Classic MapReduce and YARN architectures.
- Implemented Hive Generic UDF' s to implement business logic.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Installed and configured Pig for ETL jobs.
- Developed Pig UDF’s to pre-process the data for analysis.
- Designed high level ETL architecture for overall data transfer from the OLTP to OLAP .
- Analyzed the data by performing Hive queries, ran Pig scripts, Spark SQL and Spark Streaming .
- Developed Spark code using Scala and Spark - SQL / Streaming for faster testing and processing of data.
- Worked on developing internal testing tools which were written in Python .
- Developed script which will Load the data into Spark RDD and do in memory data Computation to generate the output.
- Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
- Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
- Involved in creating generic Sqoop import script for loading data into Hive tables from RDBMS .
- Involved in continuous monitoring of operations using Storm .
- Involved in design and developed Kafka and Storm based data with the infrastructure team.
- Involved in managing and scheduling jobs on a Hadoop cluster using Oozie .
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig .
- Supported in setting up QA environment and updating configurations for implementing scripts.
- Designed Sources to Targets mappings from SQL Server, Excel/Flat files to Oracle using Informatica Power Center.
Environment: Hortonworks, Hadoop , Map Reduce , HDFS , Ambari , Hive , Pig , Sqoop , Apache Kafka , Apache Storm , Oozie , SQL , Flume , Spark , Hbase , Cassandra , Informatica , Java , Github .
Confidential - MILWAUKEE, WISCONSIN
- Analyzed data using Hadoop Components Hive and Pig .
- Experienced in development using Cloudera distribution system.
- Worked Hands on with ETL process.
- Developed Hadoop Streaming jobs to ingest large amount of data.
- Load and transform large data sets of structured, semi structured and unstructured data using Hadoop / Big Data concepts.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Involved in loading data from UNIX file system to HDFS . Experience in working with Hadoop clusters using Cloudera ( CDH5 ) distributions.
- Experience in Importing and Exporting the Data using SQOOP from HDFS to Relational Database systems.
- Imported data using Sqoop from Teradata using Teradata connector.
- Created Sub-Queries for filtering and faster execution of data. Created multiple Join tables and fetched the required data.
- Used python libraries like Beautiful Soap, NumPy and SQL Alchemy.
- Worked on Python OpenStack APIs and used NumPy for Numerical analysis.
- Install and Set up HBASE and Impala .
- Used Apache Impala to read, write and query the Hadoop data in HDFS , Hbase and Cassandra .
- Implemented Partitioning , Dynamic Partitions and Buckets in Hive .
- Supported Map Reduce Programs those are running on the cluster .
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Bulk load the data into Oracle using JDBC template.
Environment: Cloudera, HDFS , Pig , Hive , Map Reduce , python , Sqoop , Storm , Kafka , LINUX , Hbase , Impala , Java , SQL , Cassandra , MongoDB , SVN .
Confidential - Boston, Massachusetts
- Developed JSP, JSF and Servlets to dynamically generate HTML and display the data to the client side.
- Used Hibernate Framework for persistence onto oracle database.
- Written and debugged the ANT Scripts for building the entire web application.
- Developed web services in Java and Experienced with SOAP, WSDL and used WSDL to publish the services to another application.
- Develop Perl packages and scripts to access the databases MS SQL 2008 server, create PDF files and excel reports for clients.
- Troubleshooting existing Perl program.
- Experience on developing XML, XSD, XSL, XSLT, JSON, JAXB components and Free marker for XML processing.
- Implemented Java Message Services (JMS) using JMS API.
- Involved in managing and reviewing Hadoop log files.
- Installed and configured Hadoop , YARN , Map Reduce , Flume , HDFS , developed multiple Map Reduce jobs in Java for data cleaning.
- Coded Hadoop Map Reduce jobs for energy generation and PS.
- Coded using Servlets, SOAP Client and Apache CXF Rest API' s for delivering the data from our application to external and internal for communication protocol.
- Expertise in writing Hadoop Jobs to analyze data using Map Reduce , Apache Crunch, Hive , Pig and Solr, Splunk .
- Created SOAP Web Service using JAX-WS , to enabled client to consume a SOAP Web Service.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems ( RDBMS ) and vice-versa.
- Experienced in designing and developing multi-tier scalable applications using Java and J2EE Design Patterns.
- Designing data models in Cassandra and working with Cassandra Query Language (CQL).
Environment: Mapr, Java, HTML, Java Script, SQL Server, PL/SQL, JSP, Spring, Hibernate, Web Services, SOAP, SOA, JSF, Java, JMS, Junit, Oracle, Eclipse, SVN, XML, CSS, Log4j, Ant, Apache Tomcat.
- Involved in projects utilizing Java, Java EE web applications to create fully-integrated client management systems.
- Developed UI using HTML , Java Script , JSP and developed business Logic and interfacing components using Business Objects, JDBC and XML .
- Participated in user requirement sessions to analysis and gather Business requirements.
- Development of user visible site using Perl , back end admin sites using Python and big data using core java.
- Involved in development of the application using Spring Web MVC and other components of the Spring framework . Also implemented Dependency Injection using the Spring framework .
- Elaborated Use Cases based on business requirements and was responsible for creation of class Diagrams, Sequence Diagrams.
- Implemented Object-relation mapping in the persistence layer using Hibernate(ORM) framework.
- Implemented REST Web Services with Jersey API to deal with customer requests
- Experienced in developing Restful web services: consumed and also produced.
- Used Hibernate for the Database connection and Hibernate Query Language (HQL) to add and retrieve the information from the Database.
- Implemented Spring Jdbc for connecting oracle database.
- Designed the application using MVC framework for easy maintainability
- Provided bug fixing and testing for existing web applications.
- Involved in full system life cycle and responsible for Developing, Testing, Implementing.
- Involved in Unit Testing, Integration Testing and System Testing.
Environment: Spring, JSP, Servlets, REST, Oracle, AJAX, Java Script, JQuery, Hibernate, WebLogic, Log4j, HTML, XML, CVS, Eclipse, SOAP Web Services, XSLT, XSD, UNIX, Maven, Mockito Junits, Jenkins, shell scripting, MVS, ISPF.