- Around 9years of professional IT work experience in Analysis, Design, Development, Deployment and Maintenance of critical software and big dataapplications.
- 5years of hands on experience across Hadoop and that includes extensive experience into BigData technologies.
- Hands on experience in developing and deploying enterprise based applications using major Hadoop ecosystem components like Map Reduce , Hive , Pig , Hbase , Flume , YARN , Sqoop , SparkStreaming , SparkSQL , Storm , Kafka , Oozie , Cassandra and Zookeeper.
- Hands on experience of multiple distributions like Cloudera , Hortonworks and Mapr .
- Experience in installation, configuration, supporting and managing ClouderaHadoop platform along with CDH4 and CDH5 clusters.
- Capable of processing large sets of structured , semi - structured and unstructured data and supporting systems application architecture.
- Experience in Automating Sqoop , Hive and Pig scripts using Oozie work flow scheduler.
- Hands on experience in using MapReduce programming model for Batch processing of data stored in HDFS .
- Good experience in optimizing MapReduce algorithms using Mappers, Reducers, combiners and partitioners.
- Good experience in Python and Shell scripting .
- Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
- Experience in using different file formats like CSV , Sequence , AVRO , RC , ORC , JSON and PARQUET files and different compression Techniques like LZO,Gzip,Bzip2 and Snappy .
- Experience in big data ingestion tools like Sqoop , Flume and ApacheKafka .
- Experience in using Flume and Kafka to load the log data from multiple sources into HDFS .
- Hands on experience with NoSQL Databases like Hbase , MongoDB and Cassandra .
- Experience in retrieving data from databases like MYSQL , Teradata,Informix,DB2 and Oracle into HDFS using Sqoop and ingesting them into Hbase and Cassandra .
- Good understanding and experience with Software Development methodologies like Agile and Waterfall.
Big Data Ecosystem: Hadoop, Map Reduce, YARN, Pig, Hive, Hbase, Flume, Sqoop, Impala, Oozie, Zookeeper,ApacheSpark, Kafka,Scala,MongoDB, Cassandra.
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks and MapR.
No SQL Databases: Cassandra, MongoDB, Hbase, CouchDB.
Java: J2EE, JSP, CSS, Jquery, Servlets,HTML,Java Script
Mainframe: JCL, COBOL, CICS, DB2.
Databases: MYSQL,Oracle,DB2 for Mainframes,Teradata,Informix.
Operating Systems: Windows, Unix, Linux
Other Tools: Putty, WINSCP, Filezilla, Streamweaver, Compuset.
Frameworks: Struts, spring, Hibernate.
App/Web servers: WebSphere, WebLogic, JBoss, Tomcat.
SR HADOOP DEVELOPER
Confidential, Kansas City, MO
- Responsible for building Scala ble distributed data solutions using Hadoop .
- Optimized Hive queries and used Hive on top of Spark engine.
- Developed Kafka producer and consumers, Cassandra clients and Spark along with components on HDFS , Hive .
- Populated HDFS and HBase with huge amounts of data using Apache Kafka .
- Used Kafka to ingest data into Spark engine.
- Hands on experience in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
- Developed SPARK applications using Scala for easy Hadoop transitions.
- Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie .
- Experienced data pipelines using Kafka and Akka for handling large terabytes of data.
- Involved in integration of Hadoop cluster with Spark engine to perform BATCH and GRAPHX operations.
- Worked on indexes, scalability and query language supporting using Cassandra
- Created SQOOP scripts for importing data from different data sources to Hive and Cassandra .
- Used HUE for running Hive queries and created partitions according to day using Hive to improve performance.
- Implemented Apache Nifi flow topologies to perform cleansing operations before moving data into HDFS .
- Developed a Data flow to pull the data from the REST API using Apache Nifi with context configuration enabled.
- Written Python , Shell scripts for various deployments and automation process.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Developed data pipeline using Flume, Sqoop , Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
Environment: Apache Spark , Kafka , Map Reduce , Cassandra , YARN , Sqoop , Oozie , HDFS , Hive , Pig , Java , Hadoop distribution of Cloudera 5.4/5.5, Linux, XML, Eclipse, MySQL.
SR HADOOP DEVELOPER
Confidential, Chicago, Illinois
- Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra .
- Developed the ETL jobs to load the data into a data warehouse, which is coming from various data sources like Mainframes, flat file.
- Configuring of Hive , PIG , Impala , Sqoop , Flume and Oozie in Cloudera(CDH5) .
- Experience in using Sqoop to import the data on to Cassandra tables from different relational databases and importing data from various sources to the Cassandra cluster using JavaAPI 's.
- Created Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from Linux, NoSQL and a variety of portfolios.
- Involved in creating data-models for customer data using Cassandra Query Language.
- Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Develop wrapper and utility automation scripts in Python .
- Good knowledge in using Apache NiFi to automate the data movement between different Hadoop systems.
- Write scripts to automate application deployments and configurations monitoring YARN .
- Written MapReduce programs in Python with the Hadoop streaming API.
- Involved in creating Hive tables and loading them with data and writing Hive queries.
- Migration of some ETL processes from Microsoft SQL Server to Hadoop utilizing Pig as data pipe line for easy data manipulation.
- Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
- Involved in importingdata from Oracle tables to HDFS and Hbase tables using Sqoop .
- Developed scripts which will load the data into Spark RDD and do in memory data Computation to generate the output.
- Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
- Involved in converting Cassandra / Hive / SQL queries into Spark transformations using SparkRDD s in Scala .
- Experience in Elastic search technologies in creating custom Solr Query components.
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
- Worked on different data sources such as Oracle , Netezza , MySQL , Flat files etc.
- Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza .
- Good knowledge in using apache NiFi to automate the data movement betweendifferent Hadoop systems.
- Developed Talend jobs to move inbound files to HDFS file location based on monthly, weekly, daily and hourly partitioning.
Environment : Cloudera , Map Reduce , SparkSQL , SparkStreaming , Pig , Hive , Flume , Hue , Oozie , Java , Eclipse, Zookeeper , Cassandra , Hbase , Talend , Github.
Confidential, Milwaukee, Wisconsin
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig , Hive , Hbase database and Sqoop .
- Experienced to implement Hortonworks distribution system (HDP 2.1, HDP 2.2 and HDP 2.3).
- Developed Map Reduce programs for some refined queries on big data.
- Experienced in working with Elastic MapReduce (EMR).
- Creating Hive tables and working on them for data analysisto cope up with the requirements.
- Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
- Worked with business team in creating Hive queries for ad hoc access.
- In depth understanding of Classic MapReduce and YARN architectures.
- Implemented Hive Generic UDF' s to implement business logic.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Installed and configured Pig for ETL jobs.
- Developed Pig UDF’s to pre-process the data for analysis.
- Analyzed the data by performing Hive queries, ran Pig scripts, SparkSQL and SparkStreaming .
- Developed Spark code using Scala and Spark - SQL / Streaming for faster testing and processing of data.
- Used Apache NiFi to copy the data from local file system to HDFS .
- Developed SparkStreaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
- Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
- Involved in creating generic Sqoop import script for loading data into Hive tables from RDBMS .
- Involved in continuous monitoring of operations using Storm .
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Implemented indexing for logs from Oozie to Elastic Search .
- Design, develop, unit test, and support ETL mappings and scripts for data marts using Talend .
Environment: Hortonworks , Hadoop , Map Reduce , HDFS , Hive , Pig , Sqoop , ApacheKafka , ApacheStorm , Oozie , SQL , Flume , Spark , Hbase , Cassandra , Informatica , Java , Github .
Confidential, Chicago, IL
- Analyzed data using Hadoop Components Hive and Pig .
- Experienced in development using Cloudera distribution system.
- Worked Hands on with ETL process.
- Developed Hadoop Streaming jobs to ingest large amount of data.
- Load and transform large data sets of structured, semi structured and unstructured data using Hadoop / Big Data concepts.
- Involved in doing POC’s for performance comparison of SparkSQL with Hive .
- Imported data using Sqoop from Teradata using Teradata connector .
- Created Sub-Queries for filtering and faster execution of data. Created multiple Join tables and fetched the required data.
- Worked in AWS environmentfor development and deployment of custom Hadoop applications.
- Install and set up HBASE and Impala .
- Used Apache Impala to read, write and query the Hadoop data in HDFS , Hbase and Cassandra .
- Implemented Partitioning , Dynamic Partitions and Buckets in Hive .
- Supported Map Reduce Programs those are running on the cluster.
- Developed ETL test scripts based on technical specifications/Data design documents and Source to Target mappings.
- Configured Talend ETL on single and multi-server environments.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Bulk load the data into Oracle using JDBC template.
- Created Groovy scripts to load the CSV files into table into Oracle tables.
Environment : Cloudera , HDFS , Pig , Hive , Map Reduce , python , Sqoop , Storm , Kafka , LINUX , Hbase , Impala , Java , SQL , Cassandra , MongoDB , SVN .
Confidential, Boston, Massachusetts
- Developed JSP, JSF and Servlets to dynamically generate HTML and display the data to the client side.
- Used Hibernate Framework for persistence onto oracle database.
- Written and debugged the ANT Scripts for building the entire web application.
- Developed web services in Java and Experienced with SOAP, WSDL and used WSDL to publish the services to another application.
- Implemented Java Message Services(JMS) using JMS API.
- Involved in managing and reviewing Hadoop log files.
- Installed and configured Hadoop , YARN , Map Reduce , Flume , HDFS , developed multiple Map Reduce jobs in Java for data cleaning.
- Coded Hadoop Map Reduce jobs for energy generation and PS.
- Coded using Servlets, SOAP Client and Apache CXF RestAPI' s for delivering the data from our application to external and internal for communication protocol.
- Worked on Cloudera distribution system for running Hadoop jobs on it.
- Expertise in writing Hadoop Jobs to analyze data using Map Reduce , Hive , Pig and Solr, Splunk .
- Created SOAP Web Service using JAX-WS , to enabled client to consume a SOAP Web Service.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems ( RDBMS ) and vice-versa.
- Experienced in designing and developing multi-tier scalable applications using Java and J2EE Design Patterns.
Environment: Mapr, Java, HTML, Java Script, SQL Server, PL/SQL, JSP, Spring, Hibernate, Web Services, SOAP, SOA, JSF, Java, JMS, Junit, Oracle, Eclipse, SVN, XML, CSS, Log4j, Ant, Apache Tomcat.
- Involved in projects utilizing Java, JavaEE web applications to create fully-integrated client management systems
- Participated in user requirement sessions to analysis and gather Business requirements.
- Development of user visible site using Perl , back end admin sites using Python and big data using core java.
- Involved in development of the application using Spring Web MVC and other components of the
- Elaborated Use Cases based on business requirements and was responsible for creation of class Diagrams, Sequence Diagrams.
- Implemented Object-relation mapping in the persistence layer using Hibernate(ORM) framework.
- Implemented REST Web Services with Jersey API to deal with customer requests
- Experienced in developing Restful web services: consumed and also produced.
- Used Hibernate for the Database connection and Hibernate Query Language (HQL) to add and retrieve the information from the Database.
- Implemented Spring JDBC for connecting Oracle database.
- Designed the application using MVC framework for easy maintainability
- Provided bug fixing and testing for existingweb applications.
- Involved in full system life cycle and responsible for Developing, Testing, Implementing.
- Involved in Unit Testing, Integration Testing and System Testing.
- Implemented Form Beans and their Validations.
- Written Hibernate components.
- Developed client side validations with Java script.
Environment: Spring, JSP, Servlets,REST,Oracle, AJAX, Java Script, JQuery, Hibernate, WebLogic, Log4j, HTML, XML, CVS, Eclipse, SOAP Web Services,XSLT, XSD, UNIX, Maven,Mockito Junits,Jenkins, shell scripting, MVS, ISPF.