- Over 7 years of extensive IT experience in all phases of software development life cycle (SDLC) including 4 years of hands on experience working with Hadoop, HDFS, Map Reduce frae work and Hadoop ecosystem like Hive, Hue, Pig, Sqoop, HBase, Zookeeper, OOzie, Kafka and Apache spark.
- Excellent Programming skills at a higher level of abstraction using Scala and Spark.
- Thorough knowledge with the data extraction, transformation and load in Hive, Pig and HBase.
- Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
- Experienced using Sqoop to import data into HDFS from RDBMS and vice - versa.
- Having Good knowledge on Single and Multi-Node Cluster configurations.
- Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
- Created Hive tables to store data into HDFS and processed data using HiveQL.
- Excellent understanding of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, Data Node and MapReduce programming paradigm.
- Hands on experience in installing, configuring, and using Hadoop components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper and Flume.
- Extensive experience in writing Map Reduce, Hive, PIG Scripting and HDFS.
- Good understanding and knowledge of NOSQL databases like MongoDB, HBase, and Cassandra.
- Worked on Importing and exporting data into HDFS and Hive using Sqoop.
- Wrote Hive queries for data analysis to meet the requirements
- Strong working knowledge and ability to debug complex problems.
- Basic knowledge of Linux, Unix and well versed in Core JAVA.
- Worked with Apache Spark which provides fast and general engine for large data processing.
- Integrated with functional programming language Scala.
- Extending HIVE and PIG core functionality by using custom User Defined Function (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
- Acquaintance in requirement extraction, analysis and design document preparation.
- Excellent oral and written communication skills.
- Collaborated well across technology groups.
- Major strengths include the ability to learn new technologies quickly and adapt to new environments.
- Have a good experience working in agile development environment including Scrum methodology.
Programming Languages: SQL, Java, J2EE, Scala and Unix shell scripting
Big Data Ecosystem: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Impala, Hue, Sqoop, Kafka, Oozie, Flume, Zookeeper, Spark, Cloudera and Hortonworks.
Databases & NoSQL: Oracle, Teradata, MySQL, SQL Server, DB2, Familiar with NoSQL- HBase
Scripting&Query Languages: UNIX Shell scripting, SQL and PL/SQL.
Hadoop Paradigms: Map Reduce, YARN, In-memory computing, High Availability, Real-time Streaming.
Other Tools: Eclipse, IntelliJ, SVN, GitHub, Jira, BitBucket.
Methodology: Agile, waterfall
Confidential, Latham, NY
- Experience in working with MapReduce programs using Apache Hadoop for working with Big Data.
- Experience in using Pig, Hive, Sqoop, HBase, Flume,Impala.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting
- Extending Hive and Pig core functionality by writing custom UDFs.
- Experience in analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.
- Familiar with Java virtual machine (JVM) and multi-threaded processing.
- Worked on NoSQL databases on HBase.
- Knowledge in job workflow scheduling and monitoring tools like oozie and Zookeeper
- Experience in designing, developing and implementing connectivity products that allow efficient exchange of data between our core database engine and the Hadoop ecosystem.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, MySQL, Spark, Cassandra, Pig, Netezza, Sqoop, Oozie, Version one, Shell, Map Reduce.
Confidential, Hartford, CT
- Coordinated with the BA team for finalization of requirements.
- Responsible for generating actionable insights from complex data to drive real business results for various application teams.
- Provided solutions by writing scripts in PIG Latin to process HDFS data.
- Effectively used Sqoop to migrate data from RDBMS to HDFS.
- Effectively used SerDe to load data in Hive table.
- Developed Spark code by using Scala and Spark-SQL for faster processing and testing.
- Experience in retrieving data from databases like MYSQL and DB2 into HDFS using Sqoop and ingesting them into HBase.
- Worked with Apache Spark which provides fast and general engine for large data processing integrated with functional programming language Scala.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Developed Hive Queries to analyze the data in HDFS to identify issues and behavioral patterns.
- Experienced on loading and transforming of large sets of structured, semi and unstructured data.
- Monitored Hadoop scripts which take the input from HDFS and load the data into Hive.
- Effectively used Oozie to develop automatic workflows of Sqoop, Mapreduce and Hive jobs.
- Post implementation support and various production fixes, discrepancies.
- Experienced with handling administration activations using Cloudera manager.
- Conducted various code walkthroughs/Reviews for the modules developed.
- Developed Shell scripts to automate routine DBA tasks.
- Extensively involved in Installation and configuration of Cloudera distribution Hadoop, Name Node, Secondary Name Node, Job Tracker, Task Trackers, and Data Nodes.
- Delivered successfully various projects in this application within stipulated timeline.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, MySQL, Spark, Cassandra, Pig, Netezza, Sqoop, Oozie, Version one, Shell, Map Reduce, SVN.
Confidential, Beaverton, OR
- Built Hadoop cluster ensuring High availability for NameNode, mixed-workload management, performance optimization, health monitoring, backup and recovery across one or more nodes
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase Database and Sqoop.
- Experienced in collecting, aggregating, and moving large amounts of streaming data into HDFS using Flume.
- Load and transform large sets of structured, semi structured and unstructured data
- Responsible for building scalable distributed data solutions using Hadoop.
- Interacted with different system groups for analysis of systems.
- Created tables, views in Teradata, according to the requirements.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
- Implemented a script to transmit sysprin information from Oracle to Hbase using Sqoop.
- Implemented best income logic using Pig scripts and UDFs.
- Implemented test scripts to support test-driven development and continuous integration.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers/sensors.
- Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data
- Managed and reviewed Hadoop Log files
- Hands on experience in AWS Cloud in various AWS services such as Redshift cluster, Route 53 domain configuration.
- Responsible for smooth error-free solution and Integration with Hadoop
- Designed a data warehouse using Hive
- Used Control-m and oozie scheduling tool to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs
- Involved in Scrum calls, Grooming and Demo meeting, Very good experience with agile methodology.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, MySQL, Control-M, Ubuntu, Oracle, Spark, Java, Cassandra, Pig, Netezza, Sqoop, Oozie, AWS,Version one, Shell, Map Reduce, SVN.
Confidential, Charlotte, NC
- Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
- Used Sqoop to transfer data between databases (Oracle & Teradata) and HDFS and used Flume to stream the log data from servers.
- Developed MapReduce programs for pre-processing and cleansing the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Experienced in managing and reviewing Hadoop log files.
- Involved in loading data from UNIX file system to HDFS.
- Load and transform large sets of structured, semi structured and unstructured data.
- Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of Map Reduce jobs.
- Implemented Different analytical algorithms using map reduce programs to apply on top of HDFS data.
- Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data onto HDFS.
- Implemented Partitions, Buckets in Hive for optimization.
- Involved in creating Hive tables, loading structured data and writing hive queries which will run internally in map reduce way.
Environment: Apache Hadoop, Cloudera, Hive, Pig, Sqoop, Zookeeper, HBase, Java, Oozie, Oracle, Teradata, and UNIX Shell Scripting.
- Worked on Sqoop jobs to import data from Oracle and bring into HDFS.
- Performace tuning of Spark and Sqoop Job.
- Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
- Hands on design and development of an application using Hive (UDF).
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query
- Provide support data analysts in running Pig and Hive queries.
- Created partitioned tables in Hive.
- Worked on Data Modelling for Dimension and Fact tables in Hive Warehouse.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Hadoop 2.7, Spark 1.4.1, Scala 2.10, SBT 0.13, Sqoop 1.4.6, Mapreduce, HDFS, Pig, Hive 0.13, JavaOracle, Windows.
Java/ J2EE Developer
- Development of Java code to meet specifications and designs an d using best practices.
- Development of low level component base design documentation (UML).
- Developed the DAO layer for the application using Spring Hibernate Template support.
- Implemented Transactions using spring framework.
- Used Spring MVC and Web Flow to bind web parameters to business logic.
- Implemented Ant and Maven build tools to build jar and war files and deployed war files to target servers.
- Maintained relationship between objects using Spring IOC.
- Extensively written COREJAVA & Multi-Threading code in application
- Implemented the JMS Topic to receive the input in the form of XML and parsed them through a common XSD.
- Written JDBC statements, prepared statements, and callable statements in Java, JSPs and Servlets.
- Followed Scrum approach for the development process.
- Modified and added database functions, procedures and triggers pertaining to business logic of the application.
- Used TOAD to check and verify all the database turnaround times and also tested the connections for response times and query round trip behavior.
- Used ANT Builder to build the code for production line.
- Used Eclipse IDE for all recoding in Java, Servlets and JSPs.
- Used JSP Tag Libraries (JSTL) to implement the logic inside the JSPs.
- Used AJAX to get the data from the server asynchronously by using JSON object.
- Used JIRA as a bug-reporting tool for updating the bug report.
- Focus on converting the existing features in the application towards globalization of the application, which is internationalization of the web representation.
- Have worked on Oracle 10g data base for storing and retrieving the application data.
- Involved configuring JMS in application developer.
- Developed MQ JMS Queues for asynchronous messaging and Web Services using SOAP/WSDL.
- Involved in Web Logic administration like setting up Data Sources, deploying applications.
- Configured and Deployed the Web Application Achieve (WAR) in Web Logic Application Server.