Hadoop Developer / Admin Resume
Manhattan, KS
SUMMARY:
- Big Data Consultant with 10 years of overall IT experience in a variety of industries, which includes hands on experience in Big Data technologies.
- 6+ years of comprehensive experience in Big Data processing using Hadoop and its ecosystem (MapReduce, Pig, Hive, Sqoop, Spark, Kafka and HBase).
- Also experienced on Hadoop Administration like software installation, configuration, software upgrades, backup and recovery, cluster setup, cluster performance and monitoring on daily basis, maintaining cluster up and run on healthy.
- Worked on installing, configuring, and administrating Hadoop cluster for distributions like Cloudera Distribution 4, 5 and Hortonworks 2.1, 2.2.
- Experience in building Cloudera distribution of Hadoop with Knox gateway and apache Ranger.
- Efficient in writing MapReduce Programs and using Apache Hadoop API for analyzing the structured and unstructured data.
- Expert in working with Hive data warehouse tool - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- Perform maintenance, monitoring, deployments, and upgrades across infrastructure.
- Debugging Pig and Hive scripts and optimizing MapReduce job and debugging Map reduce job.
- Administrator for Pig, Hive and Hbase installing updates patches and upgrades.
- Hands-on experience in managing and reviewing Hadoop logs.
- Good knowledge about YARN configuration.
- Use of Amazon Cloud (AWS) using Elastic MapReduce, Elasticsearch, Cloudera Impala
- Experience with Docker Cloud, Docker UCP, Docker container snapshots, attaching to a running container, removing images, and managing the directory structures and managing containers.
- Expertise in scripting for automation, and monitoring using Shell, Bash, Python scripts.
- Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom Map Reduce programs in Java.
- Experience in ServiceNow as both the developer as well as administrator
- Extending Hive and Pig core functionality by writing custom UDFs.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Knowledge of NoSQL databases such as Hbase, MongoDB and Cassandra.
- Also used Hbase in accordance with PIG/Hive as and when required for real time low latency queries.
- Knowledge of job workflow scheduling and monitoring tools like Oozie (hive, pig) and Zookeeper (Hbase).
- Experience in developing solutions to analyze large data sets efficiently.
- Good knowledge on Spark (spark streaming, spark SQL), Scala and Kafka.
- Good in creating and designing data ingest pipelines using technologies such as Apache Storm- Kafka.
- Maintained list of source systems and data copies, tools used in data ingestion, and landing location in Hadoop.
- Developed various shell scripts and python scripts to address various production issues.
- Integrated clusters with Active Directory for Kerberos and User Authentication/Authorization.
- Good Knowledge of data compression formats like Parquet, Avro .
- Dealt with huge transaction volumes while interfacing the front end application written in Java, JSP, Struts, Webworks, Spring, JSF, Hibernate, Web service and EJB with Web sphere Application server an d Jboss.
- Experience in Job scheduling using Autosys .
- Delivered zero defect code for three large projects which involved changes to both front end ( Core Java, Presentation services) and back-end ( DB2).
- Experiance in Test Driven Developement (TDD), Mocking Frameworks, and Continiuos Integration (Hudson & Jenkins)
- Strong experience in designing Message Flows and writing complex ESQL scripts and invoked Web service through message flow.
- Designed and developed a Batch Framework like Spring Batch framework.
- Working knowledge of Node.js and Express JavaScript Framework.
TECHNICAL SKILLS:
Operating Systems: Linux, Windows, Android, UNIX
Programming Languages: Java JDK1.4/1.5/1.6 (JDK 5/JDK 6), C/C++, Mat lab, R, HTML, SQL, PL/SQL
CI/CD: Docker Ansible.
Scripting Languages: Shell, Python, SQL, XML, Git bash, Bash.
Frameworks: Spring 2.x/3.x,Struts 1.x/2.x, Hibernate 2.x/3.x and JPA
Web Services: WSDL, SOAP, Apache CXF/XFire, Apache Axis, REST, Jersey
Version Control: Visual Sources Safe, SVN
Web Technologies: Direct Web remoting, HTML, XML, JMS, Core Java, J2EE, Soap & REST Web Services, JSP, Servlets, EJB, JavaScript, Struts, Spring, Web works, JSF, Ajax.
Databases technologies: Oracle 8i/9i/10g, Microsoft SQL Server, DB2 & MySQL 4.x/5.x
Middleware Technologies: XML gateway, Web sphere Message Queue, JMS
Other s: Junit, ANT, Maven, Android Platform, Microsoft Office, SQL Developer, DB2 control center,Microsoft Visio, Hudson, Subversion, GIT, Nexus, Artifactory and Trac, ServiceNow.
Developement Strategies: Water-Fall, Agile, Pair Programming and Test-Driven Development
PROFESSIONAL EXPERIENCE:
Confidential, Manhattan, KS
Hadoop Developer / Admin
Responsibilities:
- Worked on Hadoop Hortonworks distribution which managed services viz. HDFS, MapReduce2, Hive, Pig, Hbase, Sqoop, Spark, Ambari Metrics, Zookeeper, Falcon and Oozie etc. for 4 cluster ranges from LAB, DEV, QA to PROD.
- Monitor Hadoop cluster connectivity and security on Ambari monitoring system.
- Led the installation, configuration and deployment of product software's on new edge nodes that connect and contact Hadoop cluster for data acquisition.
- Developed custom Apache Spark programs for data validation to filter unwanted data and cleanse the data.
- Designed and Developed Sqoop scripts to extract data from a relational database into Hadoop.
- Developed Map Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources.
- Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, troubleshooting review data backups, review log files.
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Interacting with HDP support and log the issues in portal and fixing them as per the recommendations.
- Retrieved data from HDFS into relational databases with Sqoop. Parsed cleansed and mined useful and meaningful data in HDFS using Map Reduce for further analysis.
- Collected, aggregated, and indexed various information technology logs including syslog, neflow, and application logs into Splunk indexes
- Developed Splunk applications to include tech. add ons, dashboards, tstats queries, and accelerated data models as well as custom visualizations.
- Responsible for developing data pipeline using HDInsight, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Worked with Sqoop to export analyzed data from HDFS environment into RDBMS for report generation and visualization purpose.
- Worked on performing minor upgrade from CDH4.u3 to CDH4.u5
- Experience with cloud-based tools like Jira, Jenkins, Git, Artifactory, creation of CI/CD pipelines.
- Experience in developing a Continuous integration and Continuous deployment system that would not only deploy the code onto environment like Dev, Staging and Production but also help’s in scaling the system along with continuous monitoring.
- Stood up ELK stack for log aggregation server using Logstash, and Elasticsearch
- Hands on Experience Working with the Business Team to gather the Requirements for AS-IS and perform the operations successfully for TO-BE process.
- Develop and manage application code, User Interface, and third-party integration components.
- Strong experience in process improvement and re-engineering, business requirements gathering and process flowcharts.
- Experience in maintaining the necessary development documentation as needed (ex: technical design, developer notes etc.).
- Experienced knowledge over the Restful API's like Elastic Search.
- Experience in performing the core configuration tasks including system policies, business rules and client scripts.
- Worked with MYSQL and postgres SQL Databases
- Expertise working on the detailed discovery steps like Screen layout, variables, auto completion, and tables in Service watch.
- Innovated new user interfaces to describe big data using AngularJS, D3, and Splunk Dashboards
- Well-versed in performing day to day activities of the administration of the Service-Now tool and maintaining the business services and configuration item relationships in Service-Now tool.
- Worked on Service Catalog, Incident Management, GRC, Configuration & Change Management and Release Management with far reaching learning on Content Management System.
- Created Workflow activities and approvals. Implemented new workflows that use a variety of activities to understand how records are generated from workflows.
- Developed client scripts, UI Policies, Script Includes, business Rules across the application as per the requirement.
- Implemented custom interceptors for Sqoop to filter data and defined channel selectors to multiplex the data into different sinks.
- Extracted data from SQL Server 2008 into data marts, views, and/or flat files for Tableau workbook consumption using T-SQL. Partitioned and queried the data in Hive for further analysis by the BI team.
- Managed Tableau extracts on Tableau Server and administered Tableau Server.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment: Hadoop HortonWorks, HDFS, Map Reduce Hive, Elastic Search Pig, Tableau, HBase, Spark, Sqoop, Oozie, Cassandra, Zookeeper and Windows.
Confidential, Jackson, MI
Hadoop Developer / Admin
Responsibilities:
- Developed several advanced Map Reduce programs to process data files received from different sensors.
- Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
- Used Sqoop to export data from HDFS to RDBMS.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Wrote queries in mongoDB to generate reports to display in the dash board .
- Installed, configured and deployed data node hosts for Hadoop Cluster deployment.
- Installed various Hadoop ecosystems and Hadoop Daemons.
- Developed interactive visualizations and rigorous analytics on cloud-scale data for DARPA using Java and Scala, D3.js, Spring Data, HBase, Apache Spark, Apache Solr, and Storm
- Maintain ed the cluster securely using Kerberos and making the cluster upend running all the time also troubleshooting if any problem persists.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
- Administrator for Pig, Hive and Hbase installing updates patches and upgrades.
- Managed commissioning & decommissioning of data nodes.
- Implemented optimization and performance testing and tuning of Hive and Pig.
- Defined workflow using Oozie framework for automation .
- Experience in migrating HiveQL into Impala to minimize query response time.
- Implemented internal SSO implementation.
- Worked with HiveQL on big data of logs to perform a trend analysis of user behavior on various online modules.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Gained experience in managing and reviewing Hadoop log files.
- Wrote Hadoop Job Client utilities and integrated them into monitoring system.
- Developed a data pipeline using Kafka to store data into HDFS.
- Responsible for creating, modifying topics (Kafka Queues) as and when required with varying configurations involving replication factors and partitions.
- Written shell scripts and Python scripts for automation of job.
Environment: Horton Works, HDFS, Hive, HQL scripts, Map Reduce, Java, HBase, Pig, Sqoop, Kafka, Impala, Shell Scripts, Python Scripts, Oozier Co-ordinator, MySQL and SFTP.
Confidential, IL
Hadoop Admin/Developer.
Responsibilities:
- Developed the application using J2EE Framework that leverages classical Model View Layer (MVC) architecture UML diagrams like use cases, class diagrams, interaction diagrams, and activity diagrams were used.
- Participated in requirement gathering and converting the requirements into technical specifications.
- Extensively worked on User Interface for few modules using JSPs, JavaScript.
- Used apache-maven tool to build, configure, and package and deploy an application project.
- Designed dynamic and multi-browser compatible pages using HTML, CSS, JQuery, Angular.js and JavaScript.
- Pleasant experience with Angular JS directives ng-app, ng-init, ng-model for initialization of Angular JSapplication data.
- Work with Node.js use to multiple threads for file and network events.
- Involved in development of user interface using JSP with JavaBeans, JSTL and Custom Tag Libraries, JS, CSS, Jquery, Node.js
- Used Node.JS for writing code in the server side and creating scalable network applications.
- Used Sub version to maintain different versions of application code.
- Created the search logic using TFIDF algorithm and implemented this algorithm in map reduce.
- Performance improvement for map reduces using combiners and secondary sort implementations.
- Successfully integrated sqoop export to Oracle tables that exports the top 100 results of map reduce to the oracle tables.
- Developed workflows using custom MapReduce, Pig, Hive and Sqoop.
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
- Maintain Hadoop, Hadoop ecosystems, third party software, and database(s) with updates/upgrades, performance tuning and monitoring.
- Responsible in modification of API packages
- Managing and scheduling Jobs on a Hadoop cluster.
- Responsible to manage data coming from different sources.
- Experience in managing and reviewing Hadoop log files.
- Used Oracle DB for writing SQL scripts, PL/SQL code for procedures and functions.
- Participated in development/implementation of Cloudera Hadoop environment.
Environment: Core Java, JSP, JavaScript, Jenkins, Angular JS, Node.js, JavaBeans, CSS, HTML, JQuery, Maven Linux, Oracle, PL/SQL, Cloudera Distribution, Hadoop MapReduce, Sqoop, Hbase, Cassandra, Hive, Pig.
Confidential, Minnesota
Java developer
Responsibilities:
- Involved in development of the applications using Spring Web MVC and other components of the Spring Framework, the controller being Spring Core (Dispatcher Servlet)
- Implemented controller (abstract) and mapped it to a URL in .servlet.xml file. Implemented JSP corresponding to the controller where in the data was propagated into it from the model and view object from the controller designed and Implemented MVC architecture using Spring Framework, which involved writing Action Classes/Forms/Custom Tag Libraries & JSP pages.
- Developed unit level test cases using Junit, Maven as build tool and Jenkins to create and run deployment jobs.
- Used GitHub as a code repository.
- Used Spring MVC, Java script and angular JS for web page development.
- Redesign the app using technologies of HTML5, CSS3, JS, Angular JS and Node JS.
- Automating job submission Via Jenkins scripts.
- Designed, developed and maintained the data layer using Hibernate and performed configuration of Spring Application Framework.
- Used Oracle DB for writing SQL scripts, PL/SQL code for procedures and functions.
- Used Hibernate to store the persistence data into the IBM DB2 UDB database and written HQL to access the data from the database.
- Used JMS (Java Messaging Service) for asynchronous communication between different modules.
- Used XML, WSDL, UDDI, and SOAP Web Services for communicating data between different applications.
- Worked with QA team to design test plan and test cases for User Acceptance Testing (UAT) .
Environment: Core Java, J2EE, Spring MVC, Hibernate, HTML, Junit, GitHub, Jenkins, JavaScript, JSP, Angular JS, Node JS, CSS, JDBC, DB2, PL/SQL, JMS, SVN.
Confidential, VA
Systems Analyst
Responsibilities:
- Performed Database development and implementation activities.
- Created SQL tables and developed queries in SQL.
- Worked on inserting, deleting, updating the data and performing the required joins when necessary.
- Responsible for daily system administration of Linux servers.
- Created Logical Volumes for Linux servers.
- Implemented NFS, NAS, and HTTP servers on Linux servers.
- Created a local YUM repository for installing and updating packages.
- Experienced in Network Monitoring Application Software such as Nagios.
- Configured and performed troubleshooting Samba over Red Hat Linux Servers.
- Maintained availability, increased capacity & performance of production machines by upgrading their hardware & firmware.
- Associated with tasks involved in SAN migration project.
- Involved in discussions with design and platform engineering team.
- Worked with DBAs on installation of RDBMS database, restoration and log generation.
- Worked in a fast-paced 24x7 production environment comprising Red Hat Linux and IBM servers.
Environment: Oracle Linux 5/6, Oracle VM Manager 3.1/3.2, Oracle Enterprise Manager (OEM), Windows Server 2008 R2/2010.
Confidential
Java Developer
Responsibilities:
- Involved in Designing, Coding, Debugging and Deployment of Business Objects.
- Provided Hibernate mapping files for mapping java objects with database tables.
- Used AJAX framework for asynchronous data transfer between the browser and the server.
- Provided JMS support for the application using Weblogic MQ API.
- Extensively used Java Multi-Threading concept for downloading files from a URL.
- Provided the code for Java beans (EJB) and its configuration files for the application.
- Used Rational ClearCase version control tool to manage source repository.
- Involved in configuring and deploying the application on WebLogic Application Server 8.1.
- Provided utility classes for the application using Core Java and extensively used Collection package.
- Implemented log4j by enabling logging at runtime without modifying the application binary.
- Performed various DAL, DML operations on SQL server database.
Environment: Unix, Java 1.5, J2EE, Spring 2.0, Hibernate, WebLogic MQ, JMS, TOAD, AJAX, JSON, JDK, SAX, JSTL, EJB, JSP 2.0, SQL server 2005, Servlets 2.4, HTML, CSS, XML, XSLT, JavaScript, SQL, WebLogic.