- Over 8 years of experience in software development with experience in phases of Hadoop and HDFSdevelopment
- Professional experience in IT Industry with 4 years in Development, Implementation and Configuration of large scale Hadoop ecosystem components like Hive, Hbase, Pig, Sqoop, Flume, Oozie, Kafka, Apache Spark on Linux environment.
- Hands on experience with the Hadoop Stack (MapReduce, Hive, HDFS, Sqoop, Pig, HBase, Flume, Oozie and Zookeeper, Apache Solr, Apache Storm, Kafka, YARN), cluster architecture and monitoring the cluster.
- Experience in developing strategic methods for deploying big data technologies to efficiently solve Big Data processing requirement.
- Experience in Development and maintenance of various applications using Java, J2EE.
- Experience in using Hadoop for standalone, Pseudo and distributed modes.
- Performed Data transfer between HDFS and other Relational Database Systems (MySQL, SQLServer, Oracle and DB2) using Sqoop.
- Worked on Hadoop 2.0 architecture.
- Used Oozie job scheduler to schedule MapReduce, Hive and pig jobs. Experience in automating the job execution
- Experience with NoSQL databases like HBase and fair knowledge in MongoDB and Cassandra.
- Knowledge in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions.
- Experience in working with different relational databases like MySQL, SQL Server, Oracle and DB2.
- Strong experience in database design, writing complex SQL Queries.
- Used derived queries and OLAP functions for breaking up complex queries into simpler queries.
- Expertise in development of multi - tiered web based enterprise applications using J2EE technologies like Servlets, JSP, JDBC and Hibernate.
- Extensive coding experience in Java and Mainframes - COBOL, CICS and JCL.
- Experience in development methodologies such as Agile, Scrum, BDD Continuous Integration and Waterfall.
- Proficient in software documentation and technical report writing.
- Experience in working with Onsite-Offshore model.
- Developed various UDFs in Map-Reduce and Python for Pig and Hive.
- Decent experience and knowledge in other SQL and NoSQL Databases like MySQL, MS SQL, Mongo DB, HBase, Accumulo,Neo4j and Cassandra.
- Good Data Warehouse experience in MSSQL.
- Good knowledge and firm understanding of J2EE frontend/backend, SQL and database concepts.
- Good experience in Linux, UNIX, Windows and MacOS environment.
- Used various development tools like Eclipse, GIT, Android Studio and Subversion.
- Knowledge with Cloudera, Hadoop,Horton works and Map-R distribution components and their custom packages.
- Experience in working with Flume/Kafka to load the log data from different sources into HDFS.
- Experience in designing the Zookeeper to facilitate the servers in clusters and to keep up the information consistency.
- Experience in planning both time driven and information driven mechanized work processes utilizing Oozie using python.
- Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN
Big Data Ecosystem: Hadoop, Map Reduce, YARN, Pig, Hive, Hbase, Flume, Sqoop, ImpalaOozie, Zookeeper, Apache Spark, Kafka, Scala, MongoDB, Cassandra.
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks and MapR.
No SQL Databases: Cassandra, MongoDB, Hbase, CouchDB.
J2EE Technologies: J2EE, JSP, CSS, Jquery, Servlets, HTML, EJB, JMS, JNDI, LDAP, JPAJDBC, Annotations
Mainframe: JCL, COBOL, CICS, DB2.
Databases: MYSQL, Oracle, DB2 for Mainframes, Teradata, Informix, MongoDBCassandra.
Operating Systems: Windows, Linux, UNIX and CentOS.
Other Tools: Putty, WINSCP, Filezilla, Stream weaver, Compuset.
Frameworks: Struts, spring, Hibernate.
App/Web servers: WebSphere, WebLogic, JBoss, Tomcat.
- Responsible for installation and configuration of Hive, Pig, Hbase and Sqoop on the Hadoop cluster and created hive tables to store the processed results in a tabular format.
- Configured Spark Streaming to receive real time data from the ApacheKafka and store the stream data to HDFS using Scala.
- Developed the Sqoop scripts to make the interaction between Hive and vertica Database.
- Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
- Build servers using AWS: Importing volumes, launching EC2, creating security groups, auto-scaling, load balancers, Route 53, SES and SNS in the defined virtual private connection.
- Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
- Streamed AWS log group into Lambda function to create service now incident.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analysed them by running Hive queries and Pig scripts.
- Created Managed tables and External tables in Hive and loaded data from HDFS.
- Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
- Scheduled several times based Oozie workflow by developing Python scripts.
- Developed PigLatinscripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT to extract data from data files to load into HDFS.
- Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
- Worked on S3 buckets on AWS to store Cloud Formation Templates and worked on AWS to create EC2 instances.
- Designing ETLDataPipeline flow to ingest the data from RDBMS source to Hadoop using shellscript, sqoop, package and MySQL.
- Optimized the .' tables using optimization techniques like partitions and bucketing to provide better.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Implementing Hadoop with the AWSEC2 system using a few instances in gathering and analysing data log files.
- Involved in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
- Created partitioned tables and loaded data using both static partition and dynamic partition method.
- Developed custom Apache Spark programs in Scala to analyse and transform unstructured data.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loadeddata into HDFS and Extracted the data from Oracle into HDFS using Sqoop
- Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.
- Test Driven Development (TDD) process and extensive experience with Agile and SCRUM programming methodology.
- Implemented POC to migrate Map Reduce jobs into SparkRDD transformations using SCALA
- Scheduled map reduces jobs in production environment using Oozie scheduler.
- Involved in Cluster maintenance, ClusterMonitoring and Troubleshooting, Manage and review data backups and log files.
- Designed and implemented map reduce jobs to support distributed processing using java, Hive and Apache Pig
- Analyzing Hadoop cluster and different BigData analytic tools including Pig, Hive, HBase and Sqoop.
- Improved the Performance by tuning of HIVE and map reduce.
- Research, evaluate and utilize modern technologies/tools/frameworks around Hadoop ecosystem.
Confidential, Alpharetta, GA
- Responsible for building scalable distributed data solutions using Hadoop components
- Used ApacheMaven to build and configure the application for the Mapreduce jobs
- Developed a custom File System plug in for Hadoop so it can access files on Data Platform and also which allows Hadoop Map Reduce programs, HBase, Pig and Hive to have an access to files directly.
- Setup and bench marked Hadoop/HBase clusters for internal use
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Responsible for building scalable distributed data solutions using Hadoop and Developing PigLatin scripts to extract the data from the web server output files to load into HDFS
- Experience in NoSQL data stores Hbase, Cassandra and MongoDB
- Extracted and loaded data into Data Lake environment (AmazonS3) by using Sqoop which was accessed by business users and data scientists
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop
- Used the RegEx, JSON and AvroSerDe's for serialization and de-serialization packaged with Hive to parse the contents of streamed log data
- Handled high volumes of data where a group of transactions is collected over a period of time using Batch data processing and Applied Batch processing in payroll and Billing systems
- Performed transformations like event joins, filter bot traffic and some pre-aggregations using Pig
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce
- Loaded data into HDFS and extracted the data from Oracle into HDFS using Sqoop.
- Analysed the data by performing Hive queries and running Pigscripts to study customer behavior.
- Installed and configured Cloudera Manager for easy management of existing Hadoop cluster
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig
- Responsible for managing and reviewing Hadoop log files. Designed and developed data management using MySQL
- Written Pythonscripts to parse XML documents and load the data in database
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements
- Written the shellscripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions
- Performed various optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
Environment: HDFS, Hive, PIG, UNIX, SQL, Java MapReduce, Hbase, Sqoop, Oozie, Linux, Data Pipeline, Cloudera Hadoop Distribution, Python, MySQL, Git, MapR-DB
Confidential, Oak Brook, IL
- Involved in the HLD design of the cluster, cluster setup and designing the application flow.
- Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing the data in HDFS and analysing the data.
- Experience in handling VSAM files in mainframe to move them to Hadoop using SFTP.
- Using Flume to handle streaming data and loaded the data into Hadoop cluster.
- Created shellscript to ingestion the files from Edge Node to HDFS.
- Worked on creating Map Reduce scripts for processing the data.
- Working extensively on HIVE, SQOOP, MAPREDUCE, SHELL, PIG and PYTHON.
- Using SQOOP to move the structured data from MySQL to HDFS, HIVE, PIG and HBase.
- Experience in writing HIVE JOIN Queries.
- Using PIG predefined functions to convert the fixed width file to delimited file.
- Worked on different BigData file formats like txt, sequence, avro, parquet and snappy compression.
- Using Java to read the AVRO file.
- Develop Hive SQL scripts to perform the incremental loads.
- Using HIVE join queries to join multiple tables of a source system and load them into Elastic Search Tables.
- Importing and Exporting Big Data in CDH in to every data analytics ecosystem.
- Involved in data migration from one cluster to another cluster.
- Analysing HBase database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirement.
- Creating the Hive tables and partitioned tables using Hive Index and bucket to make ease data analytics.
- Experience in HBase database manipulation with structured, unstructured and semi-structured types of data.
- Using Oozie to schedule the workflows to perform shell action and hive actions.
- Experience in writing the business logics for defining the DAT, CSV files for MapReduce.
- Experience in managing Hadoop Jobs and logs of all the scripts.
- Experience in writing aggression logics in different combinations to perform complex data analytics to the business needs.
Environment: Hadoop, HDFS, Hive, Pig, Sqoop, MapReduce, Cloudera, NoSQL, HBase, Shell Scripting, Linux.
Confidential, Memphis, TN
- Implemented CDH3Hadoopcluster on CentOS.
- Design and develop a daily process to do incremental import of raw data from Oracle into Hive tables using Sqoop.
- Launching AmazonEC2 Cloud Instances using Amazon Images (Linux/Ubuntu) and Configuring launched instances with respect to specific applications.
- Launching and Setup of HADOOP Cluster which includes configuring different components of HADOOP.
- Hands on experience in loading data from UNIX file system to HDFS.
- Cluster coordination services through Zookeeper.
- Worked on designing and developing a module in OAJ to collect and store the user actions performed by the plant engineers in and I/AseriesDCS system using C++ and MKSTools, windows.
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Involved in creating Hive tables, loading data and running hive queries in those data.
- Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
- Involved in writing optimized PigScript along with involved in developing and testing Pig Latin Scripts.
- Working knowledge in writing Pig's Load and Store functions.
Environment : Apache Hadoop 1.0.1, MapReduce, HDFS, CentOS, Zookeeper, C++, Sqoop, Hive, Pig, Oozie, Java, Eclipse, Amazon EC2, JSP, Servlets, Oracle.
- Involved in Design, Developing, Testing, and Support of Application.
- Implemented User Interface in Model-View-Controller Architecture, which accomplishes a tight and neat coordination of Spring-MVC, JSP.
- Implemented Dependency Injection (IOC) feature of spring framework to inject beans into User Interface and AOP for Logging.
- Developed reusable and interoperable Web service modules based on SOA architecture using SOAP, JAXRPC and ApacheAXIS 2.0.
- Used various design patterns like Business Delegate, session façade
- Developed unit testing framework by JUnit test cases.
- Developed database access layer using Spring-JDBC with postgres database.
- Actively participated in code reviews and design discussions.
- Used SOAPUI for testing the webservice response.
- Created and published SOAP based web services using Apache Axis 2.0framework tool. These web services will be used to upload the documents from the client to the claim handler.
- Implemented the logging mechanism using log4jframework.
- Used SVN version controlling system for the source code and project management.
- Involved in requirement analysis, in design phase, good exposure to UML, OOAD.
- Involved in Requirements gathering, Requirements analysis, Design, Development, Integration and Deployment.
- Extensively used SpringMVC framework to develop the web layer for the application. Configured Dispatcher Servlet in web.xml.
- Designed and developed DAO layer using spring and Hibernate apart from using Criteria API.
- Created/generated Hibernate classes and configured XML apart from managing CRUD operations (insert, update, and delete)
- Involved in writing HQL and SQL Queries for Oracle 10g database.
- Used log4j for logging messages.
- Developed the classes for Unit Testing by using JUnit.
- Developed Business components using Spring Framework and database connections using JDBC.