- Over 7+ years of professional IT experience in analysis, design, and development using Hadoop, Java J2EE and SQL.
- 5+ years of experience in developing large scale applications using Hadoop and Other Big data tools.
- Created a complete processing engine based on Cloudera distribution.
- Hands - on experience with Production Hadoop applications such as administration, configuration management, debugging and performance tuning.
- Experience in developing solutions to analyze large data security efficiently with Kerberos.
- Experience with new Hadoop 2.0 architecture YARN (MRV2) and developing YARN Applications on it.
- Excellent Knowledge on Hadoop architecture as in HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Excellent hands on with importing and exporting data from different Relational Database Systems like MySQL and Oracle into HDFS and Hive using Sqoop.
- Knowledge in Kafka installation & integrational with Spark Streaming.
- Hands-on experience in writing Pig Latin scripts, working with grunt shells and job scheduling with Oozie.
- Excellent understanding/knowledge of Hadoop Distributed system architecture and design principles.
- Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5 using Puppet.
- Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Experience in converting MapReduce applications to Spark.
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Good knowledge in using job scheduling and workflow designing tools like Oozie.
- Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
- Experience in Hadoop administration activities such as installation and configuration of clusters using Cloudera Manager, Hortonworks, Presto and Ambari.
- Have good experience creating real time data streaming solutions using Spark/Storm, Kafka and Flume.
- Very good understanding on NOSQL databases like MongoDB and HBase.
- Extensive experience in creating Class Diagrams, Activity Diagrams, Sequence Diagrams using Unified Modeling Language(UML).
- Extending Hive and Pig core functionality by writing custom UDFs.
- Installed and configured Hadoop, MapReduce, HDFS developed multiple MapReduce jobs in java for data cleaning and Up gradation Cloudera from 5.5 to 6.0 version.
- Good understanding of Data Mining and Machine Learning techniques.
- Experience in handling messaging services using Apache Kafka.
- Experiences in fine-tuning Map reduce jobs for better scalability and performance.
- Created custom python/shell scripts to import data via SQOOP from various SQL databases such as Teradata, SQL Server, and Oracle.
- Experience on NoSQL Databases such as HBase and Cassandra.
- Experience in writing SQL, PL/SQL queries, Stored Procedures for accessing and managing databases such as Oracle, SQL Server, MySQL
- Working experience in Development, Production and QA Environments.
- Experienced in SDLC, Agile (SCRUM) Methodology, And Iterative Waterfall.
- Well experienced in building servers like DHCP, PXE with kickstart, DNS and NFS and used them in building infrastructure in Linux Environment and working with Puppet for application deployment.
- Experienced in Linux Administration tasks like IP Management (IP Addressing, Sub netting, Ethernet Bonding, and Static IP).
- Good communication and interpersonal skills, a committed team player and a quick learner.
Big Data Technologies: Apache Hadoop, Map-Reduce, Cloudera 4.3.2, HDFS, Cloudera Impala, Hortonworks, HDFS, Pig, Hive, Hbase, ZooKeeper, Sqoop, Flume, Hue, Presto, Ranger, Zeppelin, OOZIE, Kerberos.
Languages: Core Java, J2EE, SQL, PL/SQL, Unix Shell Scripting, Perl, Python, Shell.
Web/Application servers: Tomcat, JBoss 5.1.0
Databases: Oracle 11G/10G, SQL Server, DB2, Sybase, Teradata
Frame Works: Hadoop, MapReduce, MVC, Struts 2.x/1.x
IDE: IntelliJ IDEA 7.2, EditPlus3, Eclipse3.5, NetBeans6.5, TOAD, PL/SQL, Teradata
Version Control: VSS Visual Source Safe, Subversion, CVS
Testing Technologies: JUnit 4/3.8
Office Packages: MS-Office 2010, 2007, 2003 and Vision
Operating Systems: MS-DOS, Windows XP, Windows 7, UNIX and Linux
Confidential, Chicago, IL
- Supported 200+ servers and 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
- Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5 using Puppet.
- Good understanding on Architecture, Configurations and Security of Cassandra with Falcon. Data read path and write path for Cassandra.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Developed simple and complex MapReduce programs in Java for Data Analysis.
- Transfer the data from HDFS TO MONGODB using pig, hive and Map reduce scripts and visualize the streaming data in dashboard tableau.
- Managed and reviewed Hadoop log files, file system management and monitoring Hadoop cluster capacity planning.
- Monitoring workload, job performance and capacity planning using Cloudera Manager.
- Hands on experience in working with ecosystem like Hive, Pig scripts, Sqoop, MapReduce, YARN, and zookeeper. Strong knowledge of hive's analytical functions.
- Written Flume configuration files to store streaming data in HDFS.
- Upgraded Kafka 0.8.2.2 to 0.9.0.0
- As an admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
- Do analytics using map reduce, hive and pig in HDFS and sends back those results to MongoDB databases and update information in collections.
- Used Restful Web Services API to connect with the MapR table. Involved to create connection to Database was developed through restful web services API.
- Currently working as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC to PROD clusters.
- Built Cassandra Cluster on both the physical machines and on AWS Automated Cassandra Builds/installation/monitoring etc.
- Installation and configuration of Openstack Juno on RedHat6 with multiple Compute nodes.
- Involved in loading data from UNIX file system to HDFS. And created custom Solr Query components to enable optimum search matching.
- Consulted with the operations team on deploying, migrating data, monitoring, analyzing, and tuning MongoDB applications.
- Installed and configured CDH-Hadoop environment and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Used Jira for project tracking, Bug tracking and Project Management. Load data from various data sources into HDFS using Kafka.
- Used Python scripts to update content in the database and manipulate files.
- Generated Python Django Forms to record data of online users.
- Install KAFKA on Hadoop cluster and configure producer and consumer coding part in java to establish connection from twitter source to HDFS with popular hash tags.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. PIG Latin scripts for the analysis of semi structured data.
- Involved in Requirement analysis, Design, Development, Data Design and Mapping, extraction, validation and creating complex business requirements.
Environment: Cloudera 4.3.2, HDFS, CDH4.7, Hadoop-2.0.0 HDFS, MapReduce, Hue, MongoDB-2.6, Hive-0.10, Sqoop-1.4.3, Oozie-3.3.4, Jira, Zeppelin, Web Logic 8.1 Kafka, Ranger, Yarn, Falcon, Kerberos, Impala, Pig, Python Scripting, MySQL,Perl, Red Hat Linux, CentOS and other UNIX utilities, Cloudera Manager.
Confidential, Minneapolis, MN
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Installed and Configured Sqoop to import and export the data into MapR-FS, HBase and Hive from Relational databases.
- Administering large MapR Hadoop environments build and support cluster set up, performance tuning and monitoring in an enterprise environment.
- Installed and Configured MapR-zookeeper, MapR-cldb, MapP-job tracker, MapR-task tracker, MapR-resource manager, MapR-node manager, MapR-fileserver, and MapR-webserver.
- Worked independently with Cloudera support for any issue/concerns with Hadoop cluster.
- Point of Contact for Vendor escalation Cloudera Manager Up gradation from 5.3 to 5.5 versions.
- Installed and configured Knox gateway to secure HIVE through ODBC, WebHcat and Oozie services.
- Load data from relational databases into MapR-FS filesystem and HBase using Sqoop.
- Setting up MapR metrics with NoSQL database to log metrics data.
- Close monitoring and analysis of the MapReduce job executions on cluster at task level.
- Optimized Hadoop clusters components to achieve high performance.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios.
- Integrated HDP clusters with Active Directory and enabled Kerberos for Authentication.
- Worked on commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Worked on creating the Data Model for HBase from the current Oracle Data model.
- Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services.
- Leveraged Chef to manage and maintain builds in various environments.
- Planned for hardware and software installation on production cluster and communicated with multiple teams to get it done.
- Performed troubleshooting, fixed and deployed many Python bug fixes of the applications and involved in fine tuning of existing processes followed advance patterns and methodologies.
- Monitoring the Hadoop cluster functioning through MCS.
- Worked on NoSQL databases including HBase.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Worked with Linux server admin team in administering the server hardware and operating system.
- Worked closely with data analysts to construct creative solutions for their analysis tasks.
- Managed and reviewed Hadoop and HBase log files.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports.
- Worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
- Automated workflows using shell scripts pull data from various databases into Hadoop.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment: Hadoop 1.2.1, Map Reduce, Hive 0.10.0, Perl, Pig 0.11.1, Kerberos, Ranger, Presto, Hue, Oozie 3.3.0, H base 0.94.11, Sqoop1.4.4, Flume 1.4.0, Zeppelin, Perl Java, Python, SQL, PL/SQL, Oracle 10g, Eclipse
Hadoop Administrator/ Tester
- Experienced on setting up Hortonworks cluster and installing all the ecosystem components through Ambari and manually from command line.
- Cluster maintenance, Monitoring, commissioning and decommissioning of data nodes, troubleshooting, manage and review log files.
- Actively involved in installation performance tuning, patching, regular backups, user account administration, upgrades and documentation.
- Installation of new components and removal of them through Ambari.
- Configured Zookeeper to coordinate the servers in clusters to maintain the data consistency.
- Used Cloudera Navigator for data governance: Audit and Linage.
- Periodically reviewed Hadoop related logs and fixed errors.
- Commissioned new cluster nodes for increased capacity and decommissioned servers with hardware problems.
- Responsible for adding new eco system components, like storm, flume, knox with required custom configurations based on the requirements and Hadoop daemons.
- Developed Python, Shell Scripts and Power shell for automation purpose.
- Implemented Kerberos Security Authentication protocol for existing cluster.
- Worked with Ranger, Knox configuration to provide centralized security to Hadoop services.
- Created independent libraries in Python which can be used by multiple projects which have common functionalities.
- Hands on experience with NoSQL databases like Hbase, Cassandra and MongoDB.
- Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of databases.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
- Performing Linux systems administration on production and development servers (Red Hat Linux, CentOS and other UNIX utilities).
Environment: Hadoop HDFS, MapReduce, Hortonworks, Falcon, Cloudera, Ambari, Ranger, Knox, Puppet, Hive, Pig, Kafka, Oozie, Sqoop, Shell, Python, MongoDB, Apache HBase.
- Solid Understanding of Hadoop HDFS, Map-Reduce and other Eco-System Projects.
- Installation and Configuration of Hadoop Cluster.
- Plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly. The plugin also provided data locality for Hadoop across host nodes and virtual machines.
- Developed map Reduce jobs to analyze data and provide heuristics reports.
- Adding, Decommissioning and rebalancing nodes.
- Created POC to store Server Log data into Cassandra to identify System Alert Metrics.
- Rack Aware Configuration.
- Configuring Client Machines.
- Configuring, Monitoring and Management Tools.
- HDFS Support and Maintenance.
- Cluster HA Setup.
- Applying Patches and Perform Version Upgrades.
- Incident Management, Problem Management and Change Management.
- Performance Management and Reporting.
- Contributed to building hands-on tutorials for the community to learn how to setup Hortonworks Data Platform (powered by Hadoop) and Hortonworks Data flow.
- Developed and designed automation framework using Python and Shell scripting.
- Recover from Name Node failures.
- Schedule Map Reduce Jobs -FIFO and FAIR share.
- Installation and Configuration of other Open Source Software like Pig, Hive, HBASE, Flume and Sqoop.
- Integration with RDBMS using swoop and JDBC Connectors.
- Working with Dev Team to tune Job Knowledge of Writing Hive Jobs.
Environment: HDP 2.2, Python, HBase, Kafka, HDFS, Yarn, Hortonworks, MongoDB, Hive, Oozie, Pig, Sqoop, Shell Scripting, Python, MySQL, RHEL, CentOS, Ambari.
- Configuring and tuning system and network parameters for optimum performance.
- Gained knowledge on troubleshooting and problem solving skills, including application and network-level troubleshooting ability.
- Gained knowledge and experience on writing shell scripts to automate the tasks.
- Identifying and triaging outages monitor and remediate systems and network performance.
- Installing, Upgrading and Managing Hadoop Cluster on Hortonworks.
- Developing tools to automate the deployment, administration, and monitoring of a large-scale Linux environment.
- Performing server tuning, operating system upgrades.
- Participating in the planning phase for system requirements on various projects for deployment of business functions.
- Participating in 24x7 on-call rotation and maintenance windows.
- Communication & coordination with internal / external groups and operations.
Environment: Windows 2008/2007 server, Unix Shell Scripting, SQL Manager Studio, Red Hat Linux, Microsoft SQL Server 2000/2005/2008, MS Access, Hortonworks, NoSQL, Linux/Unix, Putty Connection Manager, Putty, SSH.