- 9 years of professional IT work experience in Analysis, Design, Administration, Development, Deployment and Maintenance of critical software and big data applications.
- Over 3+ years of experience in Big Data platform as both Developer and Administrator.
- Hands on experience in developing and deploying enterprise based applications using major Hadoop ecosystem components like Map Reduce, YARN, Hive, Pig, HBase, Flume, Sqoop, SparkStreaming, SparkSQL, Storm, Kafka, Oozieand Cassandra.
- Hands on experience in using MapReduce programming model for Batch processing of data stored in HDFS.
- Exposure to administrative tasks such as installingHadoopand its ecosystem components such as Hive and Pig
- Installed and configured multiple Hadoop clusters of different sizes and with ecosystem components like Pig, Hive, Sqoop, Flume, HBase, Oozie and Zookeeper.
- Worked on all major distributions of HadoopClouderaand Hortonworks.
- Responsible for designing and building a Data Lake using Hadoop and its ecosystem components.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Worked with the Spark for improving performance and optimization of the existing algorithms inHadoopusing Spark Context, Spark - SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.
- Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
- Experience in installation, configuration, Management, supporting and monitoringHadoopcluster using various distributions such as Apache and Cloudera.
- Experience using middleware architecture using Sun Java technologies like J2EE, Servlets, and application servers like Web Sphere and Web logic.
- Used Different Spark Modules like Spark core, Spark RDD's, Spark Data frame, Spark SQL.
- Converted Various Hive queries into Spark transformations and Actions that are required.
- Experience in working on apacheHadoopopen source distribution with technologies like HDFS, Map-reduce, Python, Pig, Hive, Hue, HBase, SQOOP, Oozie, Zookeeper, Spark, Spark-Streaming, Storm, Kafka, Cassandra, Impala, Snappy, Green plum and MongoDB, Mesos.
- In-Depth knowledge of Scala and Experience building Spark applications using Scala.
- Good experience working on Tableau and Spotfire and enabled the JDBC/ODBC data connectivity from those to Hive tables.
- Designed neat and insightful dashboards in Tableau.
- Have worked and designed on array of reports which includes Crosstab, Chart, Drill-Down, Drill-Through, Customer-Segment, and Geodemographicsegmentation.
- Deep understanding of Tableau features such as site and serveradministration, Calculatedfields, Tablecalculations, Parameters, Filter’s (Normalandquick), highlighting, Levelofdetail,Granularity, Aggregation, Reference line and many more.
- Adequate knowledge of Scrum, Agile and Waterfall methodologies.
- Designed and developed multiple J2EEModel 2 MVC based Web Application using J2EE.
- Worked on various Tools and IDEs like Eclipse, IBM Rational, Apache Ant-Build Tool, MS-Office, PLSQL Developer, and SQL Plus.
- Highly motivated with the ability to work independently or as an integral part of a team and Committed to highest levels of profession.
Big Data Technologies: Hadoop, HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Oozie, Hadoop distribution, and HBase,Spark
Programming Languages: Java (5, 6, 7),Python,Scala
Databases/RDBMS: MySQL, SQL/PL-SQL, MS-SQL Server 2005, Oracle 9i/10g/11g
ETL Tools: Cassandra, HBASE,ELASTIC SEARCH, Alteryx.
Operating Systems: Linux, Windows XP/7/8
Software Life Cycles: SDLC, Waterfall and Agile models
Office Tools: MS-Office,MS-Project and Risk Analysis tools, Visio
Utilities/Tools: Eclipse, Tomcat, NetBeans, JUnit, SQL, SOAP UI, ANT, Maven, Automation and MR-Unit
Cloud Platforms: Amazon EC2
Visualization Tools: Tableau.
Confidential, Fairfax, VA
Sr. Hadoop Developer
- Worked onHadoopcluster scaling from 4 nodes in development environment to 8 nodes in pre-production stage and up to 24 nodes in production.
- Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS.
- Possess good Linux and Hadoop System Administration skills, networking, shell scripting and familiarity with open source configuration management and deployment tools such as Chef.
- Worked with Puppet for application deployment
- Configured Kafka to read and write messages from external programs.
- Configured Kafka to handle real time data.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Developed MapReduce and Spark jobs to discover trends in data usage by users.
- Implemented Spark using Python and Spark SQL for faster processing of data.
- Developed functional programs in SCALA for connecting the streaming data application and gathering web data using JSON and XML and passing it to FLUME.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Exported the patterns analyzed back into Teradata using Sqoop. Continuous monitoring and managing theHadoopcluster through Cloudera Manager.
- Used the Spark -Cassandra Connector to load data to and from Cassandra.
- Real time streaming the data using Spark with Kafka.
- Good knowledge on building Apache spark applications using Scala.
- Developed several business services using Java RESTful WebServices using Spring MVC framework
- Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie.
- Used Apache Oozie for scheduling and managing theHadoopJobs. Knowledge on HCatalog forHadoopbased storage management.
- Expert in creating and designing data ingest pipelines using technologies such as spring Integration, Apache Storm-kafka
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location inHadoopDistributed File System (HDFS).
- Implemented test scripts to support test driven development and continuous integration.
- Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP
- Responsible to manage data coming from different sources.
- Experienced in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
- Used File System check (FSCK) to check the health of files in HDFS.
- Developed the UNIX shell scripts for creating the reports from Hive data.
- Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC)
- Involved in the pilot of Hadoop cluster hosted on Amazon Web Services (AWS)
- Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza.
- Involved in collecting metrics forHadoopclusters using Ganglia and Ambari.
- Extracted files from CouchDB, MongoDB through Sqoop and placed in HDFS for processed
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (Hbase).
- Configured Kerberos for the clusters
Environment: Hadoop, Map Reduce, HDFS, Ambari, Hive, Sqoop, Apache Kafka, Oozie, SQL, Alteryx, Flume, Spark, Cassandra, Scala, Java, AWS, GitHub.
Hadoop Data Analyst
- Worked on cloud platform which was built with a scalable distributed data solution using Hadoopon a 40-node cluster using AWS cloud to run analysis on 25+ Terabytes of customer usage data.
- Worked on analyzing Hadoopstack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
- Designing and implementing semi-structured data analytics platform leveraging Hadoop.
- Worked on performance analysis and improvements for Hive and Pig scripts Confidential MapReduce job tuning level.
- Installation and Configuration ofHadoopCluster. Working with Cloudera Support Team to Fine tune Cluster. Developed a custom File System plugin forHadoopso it can access files on Hitachi Data Platform.
- Developed connectors for elastic search and green plum for data transfer from a kafka topic. Performed Data Ingestion from multiple internal clients using Apache Kafka Developed k-streams using java for real time data processing.
- Involved in Optimization of Hive Queries.
- Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
- Involved in Data Ingestion to HDFS from various data sources.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Extensively used Apache Sqoop for efficiently transferring bulk data between Apache Hadoopand relational databases.
- Automated Sqoop, hive and pig jobs using Oozie scheduling.
- Extensive knowledge in NoSQL databases like HBase
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
- Responsible for continuous monitoring and managing Elastic MapReduce (EMR) cluster through AWS console.
- Have good knowledge on writing and using the user defined functions in HIVE, PIG and MapReduce.
- Helped business team by installing and configuring Hadoopecosystem components along with Hadoopadmin.
- Developed multiple Kafka Producers and Consumers from scratch as per the business requirements.
- Worked on loading log data into HDFS through Flume
- Created and maintained technical documentation for executing Hive queries and Pig Scripts.
- Worked on debugging and performance tuning of Hive &Pig jobs.
- Used Oozie to schedule various jobs on Hadoop cluster.
- Used Hive to analyses the partitioned and bucketed data.
- Worked on establishing connectivity between Tableau andHive.
Environment: Hortonworks 2.4, Hadoop, HDFS, Map Reduce, Mongo DB,Cloudera Java, VMware, HIVE, Eclipse, PIG, Hive, HBase, AWS, Tableau, Sqoop, Flume, Linux, UNIX
Confidential - Beaverton, OR
- Worked with Business analysts and Product owners to analyze and understand the requirements and giving the estimates.
- Implement J2EE design patterns such as Singleton, DAO, DTO and MVC.
- Developed this web application to store all system information in a central location using Spring MVC, JSP, Servlet and HTML.
- Used SpringAOP module to handle transaction management services for objects in any Spring-based application.
- Implemented SpringDI and Spring Transactions in business layer.
- Developed data access components using JDBC, DAOs, and Beans for data manipulation.
- Designed and developed database objects like Tables, Views, Stored Procedures, User Functions using PL/SQL, SQL Developer and used them in WEB components.
- Used iBATIS for dynamically building SQLQueries based on parameters.
- Developed Junit test cases for Unit Testing &Used Maven as build and configuration tool.
- Used Shell scripting to create jobs to run on daily basis.
- Debugged the application using Firebug and traversed through the nodes of the tree using DOM functions.
- Monitored the error logs using log4j and fixed the problems.
- Used Eclipse IDE and deployed the application on Web Logic server