- Overall 8+ years of professional experience in IT in BIGDATA using HADOOP framework and Analysis, Design, Development, Testing, Documentation, Deployment, Integration, and Maintenance of web based and Client/Server applications using SQL and Big Data technologies.
- Good knowledge on Hadoop Architecture and its components such as HDFS, Map Reduce, Job Tracker, Task Tracker, Name Node, Data Node.
- I have experience in Application Development using Hadoop and related Big Data technologies such as HBASE, HIVE, PIG, FLUME, OOZIE, SQOOP, and ZOOKEEPER.
- Extensive experience in GUI design using JSP, JSF, HMVC Pattern, MVC Architecture, leading to substantial reduction in time and effort.
- Having extensive knowledge on Hadoop technology experience in Storage, writing Queries, processing and analysis of data.
- Hands on experience with Microsoft Azure Cloud services, Storage Accounts and Virtual Networks
- Extensive experience in developing Pig Latin Scripts for transformations and using Hive Query Language for data analytics.
- Good at manage hosting plans for Azure Infrastructure, implementing and deploying workloads on Azure virtual machines (VMs).
- Having good knowledge in writing Map Reduce jobs through Pig, Hive, and Sqoop.
- Extending Pig and Hive core functionality by writing customized User Defined Functions for analysis of data, file processing, by running Pig Latin Scripts.
- Having experience in creating Hive internal/external Tables using shared Meta Store.
- Written Sqoop Queries to import data into Hadoop from Tera data/SQL Server.
- Experience on Working on Apache Sqoop for relational data dumps.
- Knowledge in Streaming the Data to HDFS using Flume.
- Worked on importing data into HBase using HBase Shell.
- Used Apache Oozie for scheduling and managing the Hadoop Jobs.
- Excellent programming skills with experience in Java, C, SQL and Python Programming.
- In depth and extensive knowledge of analyzing data using Hive QL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Good understanding of Zookeeper for monitoring and managing Hadoop jobs.
- Having extensive knowledge on RDBMS such as Oracle, Microsoft SQL Server, MYSQL
- Extensive experience working on various databases and database script development using SQL and PL/SQL.
- Good understanding of NoSQL databases such as HBase, Cassandra and MongoDB.
- Experience with operating systems: Linux, Red Hat, and UNIX.
- Supported Map Reduce Programs running on the cluster and wrote custom Map Reduce Scripts for Data Processing in Java.
- Experience in Developing Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
- Extensive knowledge in J2EE technologies such as Object Oriented Programming techniques (OOPS), JSP, and JDBC.
- Extensive experience in different IDEs like Eclipse, Net Beans.
Programming Languages: Java, J2EE, C, SQL/PLSQL, PIG LATIN, Scala, HTML, XML.
Hadoop/Big Data: Map Reduce, Spark, Spark SQL, Py Spark, Spark R, Pig, Hive,Sqoop, Hbase, Flume, Kafka Cassandra, Yarn, Oozie, Zookeeper, Elastic Search
RDBMS Languages: My SQL, PL/SQL, Mongo DB, HBase, Cassandra.
Cloud: Azure, AWS.
NoSQL: Mongo DB, HBase, Apache Cassandra.
Tools: /IDES: .Net Beans, Eclipse, GIT, Putty.
Operating System: Linux, Windows, Ubuntu, Red Hat Linux, UNIX
Methodologies: Agile, Waterfall model.
Testing Hadoop: MR UNIT Testing, Quality Center, Hive Testing.
- Coordinated with business customers to gather business requirements. And, interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents
- Worked on analysingHadoop2.7.2 cluster and different Big Data analytic tools including Pig0.16.0, Hive2.0HBase1.1.2 database and SQOOP1.4.6
- Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark - SQL, Data Frame and pair RDD's
- Experience in developing Spark SQL applications both using SQL and DSL
- Implemented Spark2.0 using Python3.6.0 and Spark SQL for faster processing of data.
- Implemented algorithms for real time analysis in Spark.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Practical knowledge on implementing Kafka with third-party systems, such as Spark and Hadoop.
- Involved in validating the aggregate table based on the rollup process documented in the data mapping. Developed Hive QL, Spark RDDSQL and automated the flow using shell scripting.
- Identifying opportunities to improve infrastructure that effectively and efficiently utilizes the Microsoft Azure Windows server 2008/2012/R2, Microsoft SQL Server, Microsoft Visual Studio, Windows PowerShell, Cloud infrastructure.
- Deployed Azure IaaS virtual machines (VMs) and Cloud services (PaaS role instances) into secure VNets and subnets.
- Developed MapReduce3programs to parse the raw data and store the refined data in tables.
- Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume1.7.0.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Created Hive tables, loaded data and wrote Hive queries that run within the map.
- Used OOZIE1.2.1 Operational Services for batch processing and scheduling workflows dynamically and created UDF's to store specialized data structures in HBase and Cassandra.
- Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra and configured Kafka to read and write messages from external programs.
- Optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Have experience in NIFI which runs in a cluster and provides real-time control that makes it easy to manage the movement of data between any source and any destination
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Involved in fetching brands data from social media applications like Facebook, twitter.
- Developed and updated social media analytics dashboards on regular basis.
- Extensive usage of alias for Oozie and HDFS commands
- Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
- Manage and review Hadoop log files.
- Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages.
- Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.
Environment: Hadoop, Map Reduce, Yarn, Spark, Hive, Pig, HBase, Oozie, Sqoop, Flume, Oracle 11g, Core Java, Cloudera, HDFS, Eclipse, Azure.
Confidential, Dallas, Texas
- Involved in the high-level design of the Hadoop2.6.3 architecture for the existingdata structure and Problem statement and setup the 64-node cluster and configured the entire Hadoopplatform.
- Implemented Data Interface to get information of customers using RestAPIand Pre-Processdata using MapReduce 2.0and store into HDFS (Hortonworks)
- Extracted files from MySQL, Oracle, and Teradata 2through Sqoop 1.4.6 and placed in HDFS Cloudera Distribution and processed.
- Configured Hive 1.1.1 metastore, which stores the metadata for Hive tables and partitions in a relational database.
- Worked with various HDFS file formats like Avro1.7.6, Sequence File, Jsonand various compression formats like Snappy, bzip2.
- Developed efficient MapReduce programs for filtering out the unstructured data and developed multiple MapReduce jobs to perform datacleaning and preprocessing on Hortonworks.
- Developed the Pig 0.15.0 UDF's to pre-process the data for analysis and Migrated ETL operations into Hadoopsystem using Pig Latin scripts and Python Scripts3.5.1.
- Used Pig as ETL tool to do transformations, event joins, filtering and some pre-aggregations before storing the data into HDFS.
- Worked on creating Custom Azure Templates for quick deployments and advanced PowerShell scripting.
- Contribute to the support forums (specific to Azure Networking, Azure Virtual Machines, Azure Active Directory, Azure Storage) for Microsoft Developers Network.
- Troubleshooting, debugging & altering Talend issues, while maintaining the health and performance of theETLenvironment.
- Developed Hive queries for data sampling and analysis to the analysts.
- Loaded data into the cluster from dynamically generated files usingFlume and from relationaldatabase management systems using Sqoop.
- Developed custom Unix SHELL scripts to do pre and post validations of master and slave nodes, before and after configuring the name node and datanodes respectively.
- Experienced in runningHadoop streaming jobs to process terabytes of formatted data usingPythonscripts.
- Developed small distributed applications in our projects using Zookeeper3.4.7 and scheduled the workflows using Oozie 4.2.0.
- Developed complex Talend jobs mappings to load the data from various sources using different components.
- Developed a SCP Stimulator which emulates the behavior of intelligent networking and Interacts with SSF.
- Created HBase tables from Hive and Wrote HiveQL statements to access HBase0.98.12.1 table's data.
- Proficient in designing Row keys and Schema Design for NoSQL DatabaseHbaseand knowledge of other NOSQL database Cassandra.
- Used Hive to perform data validation on the data ingested using scoop and flume and the cleansed data set is pushed intoHbase.
- Created a MapReduce program which looks into data in HBasecurrent and prior versions to identify transactional updates. These updates are loaded into Hive externaltables which are in turn referred by Hivescripts in transactionalfeeds generation.
Environment: Hadoop (Cloudera), HDFS, Map Reduce, Hive, Scala, Python, Pig, Sqoop, Web Sphere, Hibernate, spring, Oozie, REST Web Services, AWS, Solaris, DB2, UNIX Shell Scripting, JDBC, Azure.
- Executed Hive queries that helped in analysis of market trends by comparing the new data with EDW reference tables and historical data.
- Managed and reviewed Hadoop log files job tracker, NameNode, secondary NameNode, data node, and task tracker.
- Tested raw market data and executed performance scripts on data to reduce the runtime.
- Involved in loading the created Files into HBase for faster access of large sets of customer data without affecting the performance.
- Importing and exporting the data from HDFS to RDBMS using Sqoop and Kafka.
- Executed the Hive jobs to parse the logs and structure them in relational format to provide effective queries on the log data.
- Created Hive tables (Internal/external) for loading data and have written queries that will run internally in MapReduce and queries to process the data.
- Developed PigScripts for capturing data change and record processing between new data and already existed data in HDFS.
- Creating scalable perform ant machine learning applications using the Mahout.
- Populated HDFS and Cassandra with huge amounts of data using ApacheKafka.
- Involved in importing of data from different data sources, and performed various queries using Hive, MapReduce, and PigLatin.
- Involved in loading data from local file system to HDFS using HDFS Shell commands.
- Experience on UNIX shell scripts for process and loading data from various interfaces to HDFS.
- Develop different components of Hadoop ecosystem system process that involves Map Reduce, and Hive.
Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, Big Data, Java, Flume, Kafka, Yarn, HBase, Kafka Oozie, Java, SQL scripting, Linux shell scripting, Mahout, Eclipse and Cloudera.
- Worked on Distributed/Cloud Computing (Map Reduce/ Hadoop, Hive, Pig, HBase, Sqoop, Spark AVRO, Zookeeper etc.), Cloudera distributed Hadoop (CDH4)
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and processing.
- Involved in installing Hadoop Ecosystem components.
- Importing and exporting data into HDFS, Pig, Hive and HBase using SQOOP.
- Responsible to manage data coming from different sources.
- Flume and from relational database management systems using SQOOP.
- Responsible to manage data coming from different data sources.
- Involved in gathering the requirements, designing, development and testing.
- Worked on loading and transformation of large sets of structured, semi structured data into Hadoop system.
- Developed simple and complex Map Reduce programs in Java for Data Analysis.
- Load data from various data sources into HDFS using Flume.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Worked on Hue interface for querying the data.
- Created Hive tables to store the processed results in a tabular format.
- Developed Hive Scripts for implementing dynamic Partitions.
- Developed Pig scripts for data analysis and extended its functionality by developing custom UDF's.
- Extensive knowledge on PIG scripts using bags and tuples.
- Experience in managing and reviewing Hadoop log files.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Have experience in NIFI which can be used in mission-critical data flows with rigorous security & compliance requirements
- Exported analyzed data to relational databases using SQOOP for visualization to generate reports for the BI team.
Environment: Hadoop (CDH4), UNIX, Eclipse, HDFS, Java, Map Reduce, Apache Pig, Hive, HBase, Oozie, NIFI, SQOOP and My SQL.
- Installation, Configuration & Upgrade of Solaris and Linux operating system.
- Experience in Extraction, Transformation, and Loading (ETL) of data from multiple sources like Flat files, XML files, and Databases.
- Analyzed large data sets by running Hive queries and Pigscripts.
- Involved in creating Hive tables and loading and analyzing data using hive queries.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Developed multiple MapReduce jobs in java language for data processing.
- Installed and configured Hive and also written Hive User Defined Functions.
- Load and transform large sets of structured, semi structured and unstructured data using MapReduce programming.
- Using Sqoop to import and export functionalities to handle large data set transfer between DB2 database and HDFS.
- Experience in writing HiveJOIN Queries.
- Using Flume to stream the data and loaded it into Hadoop cluster.
- Created MapReduce programs to process the data.
- Used Sqoop to move the structured data from MySql to HDFS, Hive, Pig and HBase.
- Used Pig predefined functions to convert the fixed width file to delimited file.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, LINUX, Core Java, Scala, MYSQL, Teradata.
- Competency in using XML Web Services by using SOAP to transfer data to supply chain and for domain expertise Monitoring Systems.
- Worked on Maven to build tool for building jar files.
- Used the Hibernate framework (ORM) to interact with the database.
- Knowledge in struts tiles framework for layout management.
- Worked on design, analysis, and development and testing various phases of the application.
- Develop named HQL queries and Criteria for use in application.
- Developed user interface using JSP and HTML.
- Used JDBC for the Database connectivity.
- Involved in projects utilizing Java, JavaEEweb applications in the creation of fully-integrated client management systems.
- Consistently met deadlines as well as requirements for all production work orders.
- Executed SQL statements for searching contactors depending on Criteria.
- Development and integration of the application using EclipseIDE.
- Developed Junit for server side code.
- Involved in building, testing and debugging of JSP pages in the system.
- Involved in multi-tiered J2EE design utilizing spring (IOC) architecture and Hibernate.
- Configured spring managed beans.
- Spring Security API is used for configured security.
- Investigated, debug and fixed the potential bugs in the implementation code.
Environment: Java, J2EE, JSP, Hibernate, Struts, XML Schema, SOAP, Java Script, PL/SQL, Junit, AJAX, HQL, JSP, HTML, JDBC, Maven, Eclipse.