- With years of industry experience in Big Data I want to explore technology and processes for all about data.
- Over 6 years of IT experience as a Developer, Designer and Quality Assurance Tester with cross - platform experience using Hadoop Ecosystem
- Hands on experience in installing, configuring and using Hadoop Ecosystem - HDFS, MapReduce, Pig, Hive, Oozie, Flume, HBase, Spark, Sqoop, Flume and Oozie.
- Strong understanding of various Hadoop services, MapReduce and YARN architecture.
- Experienced in importing-exporting data into HDFS using Sqoop
- Experienced in loading data to Hive partitions and creating buckets in Hive
- Strong understanding of NoSQL databases like HBase, MongoDB & Cassandra
- Maintained, troubleshoot, monitored and backed up in Hadoop clusters
- Administered and maintained Cloudera clusters, provisioned physical Linux systems
- Experience in HDFS data storage and support for running map-reduce jobs
- Experience in Chef, Puppet or related tools for configuration management
- Worked on analyzing Hadoop cluster and different analytic tools including Pig, Hbase database and Sqoop
- Involved in Infrastructure set up and installation of HDP stack on Amazon Cloud
- Experienced with ingesting data from RDBMS sources like Oracle, SQL and Teradata into HDFS using Sqoop
- Added and installed of new components and removal of them through Cloudera Manager
- Experience in designing and implementing HDFS access controls, directory and file permissions user authorization that facilitates stable, secure access for multiple users in a large multi-tenant cluster
- Extensive experience in importing and exporting streaming data into HDFS using stream processing platforms like Flume and Kafka messaging system.
- Experience in AWS CloudFront, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
- Experience in real time analytics with Apache Spark (RDD, Data Frames and Streaming API).
- Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
- Experience in integrating Hadoop with Kafka. Expertise in uploading Click stream data from Kafka to HDFS.
- Expert in utilizing Kafka for messaging and publishing subscribe messaging system
Confidential, Jersey City, NJ
- Worked on Hadoop cluster scaling from 4 nodes in development environment to 8 nodes in pre-production stage and up to 24 nodes in production
- Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive
- Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS
- Possess good Linux and Hadoop System Administration skills, networking, shell scripting and familiarity with open source configuration management and deployment tools such as Chef
- Worked with Puppet for application deployment and configured Kafka to handle real time data
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
- Implemented Spark using Python and Spark SQL for faster processing of data
- Developed Scala functional programs for streaming data and gathered JSON and XML data and passed to Flume
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database
- Used the Spark -Cassandra Connector to load data to and from Cassandra.
- Real time streaming the data using Spark with Kafka.
- Good knowledge on building Apache spark applications using Scala.
- Developed several business services using Java RESTful Web Services using Spring MVC framework
- Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie.
- Used Apache Oozie for scheduling and managing the Hadoop Jobs, knowledgeable on HCatalog
- Created and designed data ingest pipelines using technologies such as Spring integration, Apache Storm-Kafka
- Used Flume in gathering and moving log data files from Application Servers to a central location in HDFS
- Implemented test scripts to support test driven development and continuous integration.
- Dumped the data from HDFS to MYSQL database and vice-versa using Sqoop
- Experienced in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements
- Developed the UNIX shell scripts for creating the reports from Hive data
- Involved in the pilot of Hadoop cluster hosted on AWS and configured Kerberos for the clusters
- Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza
- Involved in collecting metrics for Hadoop clusters using Ganglia and Ambari
Environment: Hadoop, Map Reduce, HDFS, Ambari, Hive, Sqoop, Apache Kafka, Oozie, SQL, Alteryx, Flume, Spark, Cassandra, Scala, Java, AWS, GitHub.
Confidential, New York, NY
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
- Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, managing. Management and support of Hadoop Services including HDFS, Hive, Impala, and SPARK.
- Installing, Upgrading and Managing Hadoop Cluster on Cloudera.
- Troubleshooting many cloud related issues such as Data Node down, Network failure, login issues and data block missing.
- Worked as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC (Proof-of-Concept) to PROD clusters on Cloudera (CDH 5.5.2) distribution.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
- Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
- Migrated Flume with Spark for real time data and developed the Spark Streaming Application with java to consume the data from Kafka and push them into Hive.
- Configured Kafka for efficiently collecting, aggregating and moving large amounts of click stream data from many different sources to HDFS. Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Imported logs from web servers with Flume to ingest the data into HDFS.
- Using Flume and Spool directory loading the data from local system to HDFS.
- Retrieved data from HDFS into relational databases with Sqoop.
- Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis Fine tuning hive jobs for optimized performance.
- Scripting Hadoop package installation and configuration to support fully-automated deployments.
- Involved in chef-infra maintenance including backup/security fix on Chef Server.
- Deployed application updates using Jenkins. Installed, configured, and managed Jenkins
- Triggering the SIT environment build of client remotely through Jenkins.
- Deployed and configured Git repositories with branching, forks, tagging, and notifications.
- Experienced and proficient deploying and administering GitHub
- Deploy builds to production and work with the teams to identify and troubleshoot any issues.
- Worked on MongoDB database concepts such as locking, transactions, indexes, replication, schema design
- Consulted with the operations team on deploying, migrating data, monitoring, analyzing, and tuning MongoDB applications.
- Viewing the selected issues of web interface using SonarQube.
- Developed a fully functional login page for the company's user facing website with complete UI and validations.
- Installed, Configured and utilized AppDynamics (Tremendous Performance Management Tool) in the whole JBoss Environment (Prod and Non-Prod).
- Reviewed OpenShift PaaS product architecture and suggested improvement features after conducting research on Competitors products.
- Migrated data source passwords to encrypted passwords using Vault tool in all the JBoss application servers
- Participated in Migration undergoing from JBoss 4 to Web logic or JBoss 4 to JBoss 6 and its respective POC.
- Responsible for upgradation of SonarQube using upgrade center.
- Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
- Installed and configured Hive in Hadoop cluster and help business users/application teams fine tune their HIVE QL for optimizing performance and efficient use of resources in cluster.
- Conduct performance tuning of the Hadoop Cluster and map reduce jobs. Also, the real-time applications with best practices to fix the design flaws.
- Implemented Oozie work-flow for ETL Process for critical data feeds across the platform.
- Configured Ethernet bonding for all Nodes to double the network bandwidth
- Implementing Kerberos Security Authentication protocol for existing cluster.
- Built high availability for major production cluster and designed automatic failover control using Zookeeper Failover Controller (ZKFC) and Quorum Journal nodes.
Environment: Linux (CentOS, RedHat), UNIX Shell, Pig, Hive, MapReduce, YARN, Spark 1.4.1, Eclipse, Core Java, JDK1.7, Oozie Workflows, AWS, S3, EMR, Cloudera, HBASE, SQOOP, Scala, Kafka, Python, Cassandra, maven, Horton works, Cloudera Manager.
Confidential, New York, NY
- Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
- Installed application on AWS EC2 instances and configured the storage on S3 buckets.
- Created S3 buckets, policies and IAM and role-based polices
- Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch
- Managed servers on the AWS platform instances using Puppet, Chef Configuration management.
- Developed Pig scripts to transform the raw data into intelligent data as specified by business users.
- Worked in AWS environment for development and deployment of Custom Hadoop Applications.
- Worked closely with the data modelers to model the new incoming data sets.
- Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts for scheduling of few jobs.
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, Sqoop, flume, Spark, Impala, Cassandra with Horton work Distribution.
- Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
- Upgraded, configured and maintained various Hadoop infrastructures like Pig, Hive, and HBase.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs testing and processing of data.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Performed real time analysis on the incoming data.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.
Confidential, New York, NY
- Set up, implemented and administered Hadoop infrastructure
- Analyzed technical and functional requirements documents and design and developed QA Test Plan/Test cases, Test Scenario by maintaining E2E flow of process.
- Developed testing script for internal brokerage application that is utilized by branch and financial market representatives to recommend and manage customer portfolios; including international and capital markets.
- Designed and Developed Smoke and Regression automation script and Automation of functional testing framework for all modules using Selenium and WebDriver.
- Created Data Driven scripts for adding multiple customers, checking online accounts, user interfaces validations, and reports validations.
- Performed cross verification of trade entry between mainframe systems, its web apps and downstream system
- Extensively used Selenium WebDriver API to test the web application.
- Configured Selenium WebDriver, TestNG, Maven tool, Cucumber, and BDD Framework and created Selenium automation scripts in java using TestNG
- Performed Data-Driven testing by developing Java based library to read test data from Excel & Properties files.
- Extensively performed DB2 database testing to validate the trade entry from mainframe to backend system.
- Developed internal application using Angular.js and Node.js connecting to Oracle on the backend.
- Expertise in debugging issues occurred in front end part of web-based application which is developed using HTML5, CSS3, Angular JS, Node.JS and Java.
- Developed smoke automation test suite for regression test suite.
- Applied various testing technique in test cases to cover all business scenario for quality coverage.
- Interacted with development team to understand design flow, code review, discuss unit test plan.
- Executed tests in System & integration Regression testing In Testing environment.
- Conducted Defect triage meeting, Defect root cause analysis, track defect in HP ALM Quality Center, manage defect by follow up open items, and retest defects with regression testing.
- Provide QA/UAT sign off after closely reviewing all the test cases in Quality Center along with receiving the Policy sign off the project.
Environment: HP ALM, Selenium WebDriver, JUnit, Cucumber, Angular JS, Node.JS Jenkins, GitHub, Windows, UNIX, Agile, MS SQL, IBM DB2, Putty, WinSCP, FTP Server, Notepad++, C#, DB Visualizer.