Hadoop Developer Resume
New York, NY
PROFESSIONAL SUMMARY:
- Over 7+ years of total professional experience in IT field involving project development, implementation, deployment and maintenance using Hadoop ecosystem related technologies with domain knowledge in Finance, Banking, Communication, Insurance, Retail Industry and Health care.
- 4+ years of hands on experience in Hadoop Ecosystem technologies like HDFS, MapReduce, Yarn, Spark, Hive, Pig, Oozie, Sqoop, Flume, Zookeeper, HBase.
- Over all two years of hands on experience using Spark framework with Scala.
- 3+ years of Java programming experience in developing web based applications and Client - Server technologies.
- In depth understanding of Hadoop Architecture and its various components such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, and MapReduce concepts.
- Proficient knowledge on Apache Spark and Apache Storm to process real time data.
- Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
- Good exposure to performance tuning hive queries, map-reduce jobs, spark jobs.
- Worked with various formats of files like delimited text files, click stream log files, Apache log files,Avro files, JSON files, XML Files
- Experience on installation and configuration of spark standalone mode for testing and development environments.
- Developed simple to complex MapReduce jobs using Java language.
- Worked on live 60 nodes Hadoop cluster running on Cloudera CDH4.
- Extensive experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java
- Developed UDF, UDAF, UDTF functions for Hive and Pig.
- Good knowledge of Partitions, Bucketing concepts, designed and managed them and created external tables in Hive in order to optimize performance.
- Good experience in Avro files, RC files, Combiners, Counters for best practices and performance improvements.
- Good knowledge on Joins, group and aggregation concepts and resolved performance issues in Hive and Pig scripts by implementing them.
- Experience with Big Data ML toolkits such as Mahout and Spark ML.
- Experience in job work flow scheduling and monitoring tools like Oozie and Zookeeper.
- Worked on NoSQL databases including HBase, Cassandra and Mongo DB.
- Experience in importing data from a Relational database management system (RDBMS) such as MySql and Oracle into HDFS, Hive and exported the processed data back into RDBMS using Sqoop.
- Experience in importing data from RDBMS to HBase and exporting data into RDBMS using Sqoop.
- Implemented Flume for collecting, aggregating and moving large amount of server logs and streaming data to HDFS.
- Experience in HBase cluster setup and implementation.
- Done Administration, installing, upgrading and managing distributions of Cassandra.
- Good knowledge in performance troubleshooting and tuning Cassandra clusters and understanding of Cassandra Data Modeling based on applications.
- Experience in setting up Hadoop in Pseudo distributed environment.
- Experience in setting up Hive, Pig, HBase and Sqoop in Ubuntu operating system.
- Good knowledge on Software development life cycle (SDLC).
- Experience as Java Developer in Web, Client Server technologies using Java, J2EE, Servlets, JSP, EJB, Hibernate framework and Spring framework.
- Good understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Waterfall and Agile
PROFESSIONAL EXPERIENCE:
HADOOP DEVELOPER
Confidential, New York, NY
Responsibilities:
- Developed architecture document, process documentation, server diagrams, requisition documents
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
- Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, managing. Management and support of Hadoop Services including HDFS, Hive, Impala, and SPARK.
- Installing, Upgrading and Managing Hadoop Cluster on Cloudera.
- Troubleshooting many cloud related issues such as Data Node down, Network failure, login issues and data block missing.
- Worked as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC (Proof-of-Concept) to PROD clusters on Cloudera (CDH 5.5.2) distribution.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
- Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
- Migrated Flume with Spark for real time data and developed the Spark Streaming Application with java to consume the data from Kafka and push them into Hive.
- Configured Kafka for efficiently collecting, aggregating and moving large amounts of click stream data from many different sources to HDFS. Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Imported logs from web servers with Flume to ingest the data into HDFS.
- Using Flume and Spool directory loading the data from local system to HDFS.
- Retrieved data from HDFS into relational databases with Sqoop.
- Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis Fine tuning hive jobs for optimized performance.
- Scripting Hadoop package installation and configuration to support fully-automated deployments.
- Involved in chef-infra maintenance including backup/security fix on Chef Server.
- Deployed application updates using Jenkins. Installed, configured, and managed Jenkins
- Triggering the SIT environment build of client remotely through Jenkins.
- Deployed and configured Git repositories with branching, forks, tagging, and notifications.
- Experienced and proficient deploying and administering GitHub
- Deploy builds to production and work with the teams to identify and troubleshoot any issues.
- Worked on MongoDB database concepts such as locking, transactions, indexes, replication, schema design
- Consulted with the operations team on deploying, migrating data, monitoring, analyzing, and tuning MongoDB applications.
- Viewing the selected issues of web interface using SonarQube.
- Developed a fully functional login page for the company's user facing website with complete UI and validations.
- Installed, Configured and utilized AppDynamics (Tremendous Performance Management Tool) in the whole JBoss Environment (Prod and Non-Prod).
- Reviewed OpenShift PaaS product architecture and suggested improvement features after conducting research on Competitors products.
- Migrated data source passwords to encrypted passwords using Vault tool in all the JBoss application servers
- Participated in Migration undergoing from JBoss 4 to Web logic or JBoss 4 to JBoss 6 and its respective POC.
- Responsible for upgradation of SonarQube using upgrade center.
- Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
- Installed and configured Hive in Hadoop cluster and help business users/application teams fine tune their HIVE QL for optimizing performance and efficient use of resources in cluster.
- Conduct performance tuning of the Hadoop Cluster and map reduce jobs. Also, the real-time applications with best practices to fix the design flaws.
- Implemented Oozie work-flow for ETL Process for critical data feeds across the platform.
- Configured Ethernet bonding for all Nodes to double the network bandwidth
- Implementing Kerberos Security Authentication protocol for existing cluster.
- Built high availability for major production cluster and designed automatic failover control using Zookeeper Failover Controller (ZKFC) and Quorum Journal nodes.
Environment: HDFS, Map Reduce, Hive, Sqoop, Pig, Flume, Vertica, Oozie Scheduler, Java, Shell Scripts, Teradata, Oracle, HBase, MongoDB, Cassandra, Cloudera, AWS, JavaScript, JSP, pyspark 2.x, Kafka, Spark, Scala and ETL, Python.
HADOOP DEVELOPER
Confidential, Dallas, TX
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
- Conducted POC on Hortonworks and suggested the best practice in terms HDP, HDFS platform
- Set up Hortonworks Infrastructure from configuring clusters to Node
- Installed Ambari server on the clouds
- Setup security using Kerberos and AD on Hortonworks clusters/Cloudera CDH
- Assign access to users by multiple users’ login.
- Installed and configured CDH cluster, using Cloudera Manager for management of existing Hadoop cluster.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Extensively using Cloudera manager for managing multiple clusters with petabytes of data.
- Having knowledge on documenting processes, server diagrams, preparing server requisition documents
- Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
- Managing the configuration of the cluster to the meet the needs of analysis whether I/O bound, or CPU bound
- Working on setting up 100 node production cluster and a 40-node backup cluster at two different data centers
- Performance tune and manage growth of the O/S, disk usage, and network traffic
- Responsible for building scalable distributed data solutions using Hadoop.
- Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products
- Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
- Load and transform large sets of structured, semi structured and unstructured data
- Experience in managing and reviewing Hadoop log files.
- Job management using Fair scheduler.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
- Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from Python using Python-MySQL connector MySQL dB package to retrieve information.
- Created Oozie workflows to run multiple MR, Hive and pig jobs.
- Set up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Develop Spark code using Scala and Spark-SQL for faster testing and data processing
- Involved in the development of Spark Streaming for various data sources using Scala
- Import the data from different sources like HDFS/MYSQL into SparkRDD.
Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python, Machine Learning, NLP (Natural Language Processing)
JAVA DEVELOPER
Confidential, New York, NY
Responsibilities:
- Launched Amazon EC2 Instances using AWS (Linux/ Ubuntu/RHEL) and configured instances with respect to specific applications
- Write design document based on requirements from MMSEA user guide.
- Performed requirement gathering, design, coding, testing, implementation and deployment.
- Worked on modeling of Dialog process, Business Processes and coding Business Objects, QueryMapper and JUnit files.
- Involved in the design and creation of Class diagrams, Sequence diagrams and Activity Diagrams using UML models
- Created the Business Objects methods using Java and integrating the activity diagrams.
- Involved in developing JSP pages using Struts custom tags, jQuery and Tiles Framework.
- Used JavaScript to perform client side validations and Struts-Validator Framework for server-side validation
- Worked in web services using SOAP, WSDL.
- Wrote Query Mappers and MQ Experience in JUnit Test Cases.
- Developed the UI using XSL and JavaScript.
- Managed software configuration using ClearCase and SVN.
- Design, develop and test features and enhancements.
- Performed error rate analysis of production issues and technical errors.
- Developed test environment for testing all the Web Service exposed as part of the core module and their integration with partner services in Integration test.
- Analyze user requirement document and develop test plan, which includes test objectives, test strategies, test environment, and test priorities.
- Responsible for performing end-to-end system testing of application writing JUnit test cases
- Perform Functional testing, Performance testing, Integration testing, Regression testing, Smoke testing and User Acceptance Testing (UAT).
- Converted Complex SQL queries running at mainframes into pig and Hive as a part of a migration from mainframes into Hadoop cluster.
Environment: s: Shell Scripting, Java 6, JEE, Spring, Hibernate, Eclipse, Oracle 10g, JavaScript, Servlets, Nodejs, JMS, Ant, Log4j and Junit, Hadoop (Pig & Hive