- 7+ years of experience in IT which includes Analysis, Design, Development of Big Data using Hadoop, design and development of web applications using JAVA, J2EE and data base and data warehousing development using My SQL, Oracle and Informatica.
- Around 4+ years of work experience on Big Data Analytics with hands on experience in installing, configuring and using ecosystem components like Hadoop Map reduce, HDFS, HBase, Zookeeper, Hive, Sqoop, Pig, Flume, Cassandra, Kafka and Spark.
- Good Understanding of Hadoop architecture and Hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts and HDFS Framework.
- Experience in using Cloudera Manager for installation and management of single - node and multi-node Hadoop cluster (CDH4&CDH5).
- Experience in Data load management, importing & exporting data using SQOOP & FLUME.
- Experience in analyzing data using Hive, Pig and custom MR programs in Java.
- Experience in scheduling and monitoring jobs using Oozie and Zookeeper.
- Experienced in writing Map Reduce programs & UDF’s for both Pig & Hive in java.
- Experience in dealing with log files to extract data and to copy into HDFS using flume.
- Developed Hadoop test classes using MR unit for checking Input and Output.
- Experience in integrating Hive and Hbase for effective operations.
- Developed the Pig UDF’S to pre-process the data for analysis.
- Experience in Impala, Solr, MongoDB, HBase and Spark.
- Hands on knowledge of writing code in Scala.
- Good Understanding of MPP databases such as HP Vertica and Impala.
- Having good knowledge in writing scripts using shell, Python & Perl in Linux.
- Experience in Database design, Entity relationship, Database analysis, Programming SQL, Stored procedure’s PL/SQL, Packages and Triggers in Oracle and SQL Server on Windows & LINUX.
- Worked on different file formats (ORCFILE, AVRO, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO).
- Strong understanding of Data warehouse concepts, ETL, data modeling experience using Normalization, Business Process Analysis, Reengineering, Dimensional Data Modeling, Physical & Logical data modeling.
- Experience in JAVA, J2EE, WEB SERVICES, REST, SOAP, HTML and XML related technologies with strong analytical and problem solving skills and ability to follow through with projects from inception to completion.
- Expertise in AWS Identity and Access Management (IAM) such as creating users, groups, organizing IAM users to groups, assigning roles to groups.
- Worked on setting up the life cycle policies to back the data from AWS S3 to AWS Glacier, Worked with various AWS, EC2 and S3 CLI tools.
- Experienced in designing, built, and deploying a multitude applications utilizing the AWS stack (Including EC2, S3 and EMR),focusing on high-availability, fault tolerance, and auto-scaling.
- Experience with Amazon Web Services, AWS Command Line interface, and AWS data pipeline.
- Knowledge in software Development Life Cycle (Requirements Analysis, Design, Development, Testing, Deployment and Support).
- Have good interpersonal communication skills, strong problem solving skills, explore/adopt to new technologies with ease and a good team member.
Big Data Technologies : HDFS, YARN, Map Reduce, Pig, Hive, HBase, ZooKeeper, Oozie, Sqoop, Impala, Flume, Kafka, Storm and Spark
Cloud Services: Amazon Web Services (EC2, S3, EMR, Redshift)
Cluster Management Tools: Cloudera Manager, HortonWorks, Ambari
NoSQL : Hbase, Cassandra
Programming Languages : Java, Python and Scala
Frameworks : Hibernate, Struts, and Spring
Web Services : REST, SOAP, Tomcat and Web Sphere
Client Technologies : JQUERY, JAVA Script, AJAX, HTML5
Operating Systems : UNIX, Windows, LINUX(Ubuntu)
Web Technologies : JSP, Servelets, Java Scripts, Java Beans
Databases: Oracle 10g/11g, MySQL 4.x/5.x
Development Tools : TOAD, SQL Developer, ANT, Maven,Jenkins
Office Tools : MS-Excel, Word, PowerPoint
Confidential, Atlanta, GA
Hadoop Scala Developer
- Worked with the business analyst team for gathering requirements and client needs.
- Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in scala.
- As a part Data acquisition in, used sqoop and flume to inject the data from server to hadoop using incremental import.
- In pre-processing phase used spark to remove all the missing data and data transformation to create new features.
- In data exploration stage used hive and impala to get some insights about the customer data.
- Used flume, sqoop, h adoop, spark and oozie for building data pipeline.
- Installed and configured Hadoop Map Reduce, HDFS, Developed multiple Map Reduce jobs in java for data cleaning and Processing.
- Importing and exporting data into HDFS and Hive using Sqoop
- Experienced in defining job flows
- Experienced in managing and reviewing Hadoop log files.
- Involved in configuring Hadoop ecosystem components like HBase, Hive, Pig and Sqoop.
- Involved in writing Sqoopjobs to load data from RDBMS into HDFS and vice-versa.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Used HIVE to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Hands on experience in large scale data processing using Spark.
- Hands on experience in creating RDD’s and applying Transformations and Actions on them. Good at applying Spark filter conditions on data and worked on joins in Spark.
- Used Spark DataFrame API to perform analytics on hive data and implemented various checkpoints on RDD's to disk to handle job failures and debugging.
- Developed Spark jobs written in Pythonto perform operations like aggregation, data processing and data analysis.
- Hands on experience in handling Hive tables using Spark SQL.
- Developed a data pipeline using Kafka and Spark Streaming to store data into HDFS and performed the real time analytics on the incoming data.
Environment: Cloudera, Informatica Power Center 9.5, Oracle 11g,AWS S3, Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Yarn, Storm, Kafka, Linux, Java, Oozie, Spark, Scala, SQL.
- Installed, configured and maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Configured MySQL Database to store Hive metadata.
- Used Sqoop in order to import data from various relational data sources like MySQL into HDFS.
- Responsible to manage customer data coming from different sources.
- Managing and scheduling jobs on a Hadoop cluster using Oozie .
- Implemented business logic by writing PIG UDF’s in java and used various UDF s from Piggybank and other sources.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Analyzed the customer data by performing Hive queries to know user behavior.
- Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Used different Serde’s for converting JSON data into pipe separated data.
- Hands on experience in working with different file formats like Text file and Avro File Format.
- Worked on partitioning and Bucketing the Hive table and running the scripts in parallel to reduce the run time of the scripts.
- Involved in Hadoop cluster tasks like Adding and Removing Nodes without any effect to running jobs and data.
- Hands on experience in configuring cluster on EC2 instances using Cloudera Manager.
- Experience in creating tables on top of data on AWS S3 obtained from different data sources.
- Experience working with off-shore teams and communicating daily status on issues, road-blocks.
Environment: Cloudera Hadoop, MapReduce, HDFS, Hive, Java (jdk1.7), AWS EC2, Pig, Linux, XML. HBase, Zookeeper, Sqoop.
Confidential, Jersey City, NJ
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hadoop Map Reduce, HDFS and Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Hands on writing Map Reduce code to make unstructured data as structured data and for inserting data into HBase from HDFS.
- Experience in creating integration between Hive and HBase.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Hands on using log files and to copy them into HDFS using flume.
- Implemented business logic by writing Pig and Hive UDFs for some aggregative operations and to get the results from them.
- Experience in optimization of Map reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS cluster.
- Hands on experience in exporting the results into relational databases using Sqoop for visualization and to generate reports for the BI team.
- Experienced with NoSQL database and handled using the queries.
- Monitored the health of Map Reduce Programs which are running on the cluster.
- Ambari was used to monitor and manage the Hadoop Cluster.
Environment: Hadoop, MapReduce, Hortonworks, HDFS, Hive, Pig, HBase, Sqoop, Flume, Oozie, SQL, Java (jdk 1.6), Eclipse.
- Responsible and active in the analysis, design, implementation and deployment of full Software Development Lifecycle (SDLC) of the project.
- Developed Struts action classes, action forms and performed action mapping using Struts framework and performed data validation in form beans and action classes.
- Extensively used Struts framework as the controller to handle subsequent client requests and invoke the model based upon user requests.
- Defined the search criteria and pulled out the record of the customer from the database. Make the required changes and save the updated record back to the database.
- Developed build and deployment scripts using Apache ANT to customize WAR and EAR files.
- Used DAO and JDBC for database access.
- Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
- Design and develop XML processing components for dynamic menus on the application.
- Involved in postproduction support and maintenance of the application.
- Involved in the analysis, design, implementation, and testing of the project.
- Developed web components using JSP, Servlets and JDBC.
- Implemented database using SQL Server.
- Involved in fixing bugs and unit testing with test cases using JUnit.
- Developed user and technical documentation.
Environment: Oracle 11g, Java 1.5, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Tomcat 6.
- Performed in various phases of the Software Development Life Cycle (SDLC).
- Developed user interfaces using JSP framework with AJAX, Java Script, HTML, XHTML and CSS.
- Performed the design and development of various modules using CBD Navigator Framework
- Deployed J2EE applications in Web sphere application server by building and deploying ear file using ANT script.
- Created tables, stored procedures in SQL for data manipulation and retrieval.
- CVS tool is used for version control of code and project documents.
- Designed and implemented the training and reports modules of the application using Servlets, JSP and Ajax.
- Developed custom JSP tags for the application.
- Writing queries for fetching and manipulating data using ORM software iBatis.
- Used Quartz schedulers to run the jobs sequentially at given time.
- Implemented design patterns like Filter, Cache Manager and Singleton to improve the performance of the application.
- Implemented the reports module of the application using Jasper Reports to display dynamically generated reports for business intelligence.
- Deployed the application in client's location on Tomcat Server.