- 9 years of professional IT work experience in Analysis, Design, Administration, Development, Deployment and Maintenance of critical software and big data applications.
- Over 3+ years of experience in Big Data platform as both Developer and Administrator.
- Hands on experience in developing and deploying enterprise based applications using major Hadoop ecosystem components like Map Reduce, YARN, Hive, Pig, HBase, Flume, Sqoop, SparkStreaming, SparkSQL, Storm, Kafka, Oozieand Cassandra.
- Hands on experience in using MapReduce programming model for Batch processing of data stored in HDFS.
- Exposure to administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig
- Installed and configured multiple Hadoop clusters of different sizes and with ecosystem components like Pig, Hive, Sqoop, Flume, HBase, Oozie and Zookeeper.
- Worked on all major distributions of Hadoop Clouderaand Hortonworks.
- Responsible for designing and building a Data Lake using Hadoop and its ecosystem components.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.
- Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
- Experience in installation, configuration, Management, supporting and monitoring Hadoop cluster using various distributions such as Apache and Cloudera.
- Experience using middleware architecture using Sun Java technologies like J2EE, Servlets, and application servers like Web Sphere and Web logic.
- Used Different Spark Modules like Spark core, Spark RDD's, Spark Data frame, Spark SQL.
- Converted Various Hive queries into Spark transformations and Actions that are required.
- Experience in working on apache Hadoop open source distribution with technologies like HDFS, Map-reduce, Python, Pig, Hive, Hue, HBase, SQOOP, Oozie, Zookeeper, Spark, Spark-Streaming, Storm, Kafka, Cassandra, Impala, Snappy, Green plum and MongoDB, Mesos.
- In-Depth knowledge of Scala and Experience building Spark applications using Scala.
- Good experience working on Tableau and Spotfire and enabled the JDBC/ODBC data connectivity from those to Hive tables.
- Designed neat and insightful dashboards in Tableau.
- Have worked and designed on array of reports which includes Crosstab, Chart, Drill-Down, Drill-Through, Customer-Segment, and Geodemographicsegmentation.
- Deep understanding of Tableau features such as site and serveradministration, Calculatedfields, Tablecalculations, Parameters, Filter’s (Normalandquick), highlighting, Levelofdetail,Granularity, Aggregation, Reference line and many more.
- Adequate knowledge of Scrum, Agile and Waterfall methodologies.
- Designed and developed multiple J2EEModel 2 MVC based Web Application using J2EE.
- Worked on various Tools and IDEs like Eclipse, IBM Rational, Apache Ant-Build Tool, MS-Office, PLSQL Developer, and SQL Plus.
- Highly motivated with the ability to work independently or as an integral part of a team and Committed to highest levels of profession.
Big Data Technologies: Hadoop, HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Oozie, Hadoop distribution, and HBase,Spark
Programming Languages: Java (5, 6, 7),Python,Scala
Databases/RDBMS: MySQL, SQL/PL-SQL, MS-SQL Server 2005, Oracle 9i/10g/11g
ETL Tools: Cassandra, HBASE,ELASTIC SEARCH, Alteryx.
Operating Systems: Linux, Windows XP/7/8
Software Life Cycles: SDLC, Waterfall and Agile models
Office Tools: MS-Office,MS-Project and Risk Analysis tools, Visio
Utilities/Tools: Eclipse, Tomcat, NetBeans, JUnit, SQL, SOAP UI, ANT, Maven, Automation and MR-Unit
Cloud Platforms: Amazon EC2
Visualization Tools: Tableau.
Confidential, Manhattan, New York
- Worked on Hadoop cluster scaling from 4 nodes in development environment to 8 nodes in pre-production stage and up to 24 nodes in production.
- Involved in complete Implementation lifecycle, specialized in writing custom MapReduce , Pig and Hive programs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS .
- Possess good Linux and Hadoop System Administration skills, networking, shell scripting and familiarity with open source configuration management and deployment tools such as Chef .
- Worked with Puppet for application deployment
- Configured Kafka to read and write messages from external programs.
- Configured Kafka to handle real time data.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Developed MapReduce and Spark jobs to discover trends in data usage by users.
- Implemented Spark using Python and Spark SQL for faster processing of data.
- Developed functional programs in SCALA for connecting the streaming data application and gathering web data using JSON and XML and passing it to FLUME.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Exported the patterns analyzed back into Teradata using Sqoop. Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Used the Spark -Cassandra Connector to load data to and from Cassandra.
- Real time streaming the data using Spark with Kafka.
- Good knowledge on building Apache spark applications using Scala .
- Developed several business services using Java RESTful WebServices using Spring MVC framework
- Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie .
- Used Apache Oozie for scheduling and managing the Hadoop Jobs. Knowledge on HCatalog for Hadoop based storage management.
- Expert in creating and designing data ingest pipelines using technologies such as spring Integration, Apache Storm-kafka
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System ( HDFS ).
- Implemented test scripts to support test driven development and continuous integration.
- Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP
- Responsible to manage data coming from different sources.
- Experienced in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
- Used File System check ( FSCK ) to check the health of files in HDFS .
- Developed the UNIX shell scripts for creating the reports from Hive data .
- Used JAVA , J2EE application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle ( SDLC )
- Involved in the pilot of Hadoop cluster hosted on Amazon Web Services ( AWS )
- Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza .
- Involved in collecting metrics for Hadoop clusters using Ganglia and Ambari.
- Extracted files from CouchDB , MongoDB through Sqoop and placed in HDFS for processed
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store ( Hbase ).
- Configured Kerberos for the clusters
Environment: Hadoop, Map Reduce, HDFS, Ambari, Hive, Sqoop, Apache Kafka, Oozie, SQL, Alteryx, Flume, Spark, Cassandra, Scala, Java, AWS, GitHub.
Confidential, Manhattan, New York
Hadoop Data Analyst
- Worked on cloud platform which was built with a scalable distributed data solution using Hadoop on a 40-node cluster using AWS cloud to run analysis on 25+ Terabytes of customer usage data.
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
- Designing and implementing semi-structured data analytics platform leveraging Hadoop.
- Worked on performance analysis and improvements for Hive and Pig scripts at MapReduce job tuning level.
- Installation and Configuration of Hadoop Cluster. Working with Cloudera Support Team to Fine tune Cluster. Developed a custom File System plugin for Hadoop so it can access files on Hitachi Data Platform.
- Developed connectors for elastic search and green plum for data transfer from a kafka topic. Performed Data Ingestion from multiple internal clients using Apache Kafka Developed k-streams using java for real time data processing.
- Involved in Optimization of Hive Queries.
- Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
- Involved in Data Ingestion to HDFS from various data sources.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Extensively used Apache Sqoop for efficiently transferring bulk data between Apache Hadoop and relational databases.
- Automated Sqoop, hive and pig jobs using Oozie scheduling.
- Extensive knowledge in NoSQL databases like HBase
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
- Responsible for continuous monitoring and managing Elastic MapReduce (EMR) cluster through AWS console.
- Have good knowledge on writing and using the user defined functions in HIVE, PIG and MapReduce.
- Helped business team by installing and configuring Hadoop ecosystem components along with Hadoop admin.
- Developed multiple Kafka Producers and Consumers from scratch as per the business requirements.
- Worked on loading log data into HDFS through Flume
- Created and maintained technical documentation for executing Hive queries and Pig Scripts.
- Worked on debugging and performance tuning of Hive &Pig jobs.
- Used Oozie to schedule various jobs on Hadoop cluster.
- Used Hive to analyses the partitioned and bucketed data.
- Worked on establishing connectivity between Tableau andHive.
Environment: Hortonworks 2.4, Hadoop, HDFS, Map Reduce, Mongo DB,Cloudera Java, VMware, HIVE, Eclipse, PIG, Hive, HBase, AWS, Tableau, Sqoop, Flume, Linux, UNIX
Confidential, Philadelphia, PA
- Worked with Business analysts and Product owners to analyze and understand the requirements and giving the estimates.
- Implement J2EE design patterns such as Singleton, DAO, DTO and MVC.
- Developed this web application to store all system information in a central location using Spring MVC, JSP, Servlet and HTML.
- Used SpringAOP module to handle transaction management services for objects in any Spring-based application.
- Implemented SpringDI and Spring Transactions in business layer.
- Developed data access components using JDBC, DAOs, and Beans for data manipulation.
- Designed and developed database objects like Tables, Views, Stored Procedures, User Functions using PL/SQL, SQL Developer and used them in WEB components.
- Used iBATIS for dynamically building SQLQueries based on parameters.
- Developed Junit test cases for Unit Testing &Used Maven as build and configuration tool.
- Used Shell scripting to create jobs to run on daily basis.
- Debugged the application using Firebug and traversed through the nodes of the tree using DOM functions.
- Monitored the error logs using log4j and fixed the problems.
- Used Eclipse IDE and deployed the application on Web Logic server
Confidential, Plano, TX
- Design and development of Java classes using Object Oriented Methodology.
- Worked in system using Java, JSP and SERVLET.
- Development of Java classes and methods for handling Data from database.
- Experience in sequence data pre-processing, extraction, model fitting and validation using ML pipelines.
- Uses Talend Open Studio to load files into Hadoop HIVE tables and performed ETL aggregations in Hadoop HIVE.
- Used Sqoop to import data from SQL server to Hadoop ecosystem.
- Integration of Cassandra with Talend and automation of jobs.
- Did Scheduling and monitoring the console outputs through Jenkins.
- Worked in Agile environment, this uses Jira to maintain the story points.
- Worked on Implementation of a toolkit that abstracted Solrand Elastic Search.
- Maintenance and troubleshooting in Cassandra cluster.
- Installed and configured Hive and written HiveUDFs in java and python
- Attended and Conducted User meetings for requirement analysis and project reporting.
- Testing and bug fixing and providing support the production.
Environment: Hadoop, HDFS, Map Reduce, Java, HIVE, Eclipse, Talend, Hive, HBase, Sqoop, Flume, Cassandra, Solr.
Confidential, Austin, TX
- Involved in configuring xml for different services by configuring action-mappings, packages and interceptor stacks in the Bridges web framework.
- Involved in writing action classes to handle the requests as per the business requirements.
- Involved in writing clients for JAX-RS web services to consume some of the services provided by other systems.
- Involved in developing JAX-RS web services to exchange information and services with other systems.
- Involved in configuring hibernate configuration file for different databases.
- Involved in writing hibernate mapping files to follow Declarative approach.
- Involved in configuring DAO's with use of session factory and transactions.
- Used JDBC to connect & retrieve data from DB(for existing Swing application) to connect to Oracle.
- Designed and coded the various components of the UI to gather relevant information from the parsed XML file to create the initial layout of the screen with the various swing components.
- Involved in developing JSP's with help of JSP-tags and in house tags to meet the business requirements.
- Involved in writing java scripts that handles events related page closing.
- Used variety of Design Patterns such as Business Delegate, Session Façade and Singleton, Front Controller.
- Involved in consuming web services by writing clients for the services provided by other components.
- Maintained the user accounts (IAM), RDS, Route 53, VPC, RDB, Dynamo DB, SES, SQS and SNS services in AWS cloud.