Sr. Hadoop Developer/admin Resume
Seattle, WA
SUMMARY
- Around 8 years of experience in software Admin and development, 4+ years of experience in developing large scale applications using Hadoop and Other Big data tools.
- Experienced in the Hadoop ecosystem components like Hadoop Map Reduce, Cloudera, Hortonworks, HBase, Oozie, Hive, Sqoop, Pig, Flume, and Cassandra.
- Experience in developing solutions to analyze large data sets efficiently. To
- Experience in developing solutions by analyzing large data sets efficiently
- Experience with distributed systems, large - scale non-relational data stores, MapReduce systems, data modeling, and big data systems.
- Knowledge on implementingBigDatain Amazon Elastic MapReduce (Amazon EMR) for processing, managing Hadoop framework dynamically scalable Amazon EC2 instances.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode.
- Experience with Amazon Web Services,AWScommand line interface, andAWSdata pipeline.
- Experience in writing SQL, PL/SQL queries, Stored Procedures for accessing and managing databases such as Oracle, SQL Server2014/2012 MySQL, and IBM DB2.
- Hands on experience on Database tuning and Query tuning.
- Excellent understanding/knowledge of design and implementation of Teradata data warehousing solutions, Teradata Aster big data analytics and Analytic Applications.
- Good working experience in using Spark SQL to manipulate Data Frames in Python.
- Good knowledge in NoSQL databases including Cassandra and MongoDB.
- Good knowledge Socket Programming to communicate between clients and server.
- Excellent understanding of how Socket Programming enables two or more hosts to communicate with each other.
- Experience in handling native drivers of MongoDB, The Drivers which include Java and Python.
- Experience in building rich applications using complex queries and secondary indexes that unlock the value in structured, semi-structured, and unstructured data in MongoDB.
- Managed structured data that is designed to scale to a very large size across many commodity servers, with no single point of failure by using Cassandra.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode and MapReduce concepts.
- Extensive hands on experience in writing complex Mapreduce jobs, Pig Scripts and Hive data modeling.
- Excellent understanding/knowledge of Hadoop Distributed system architecture and design principles.
- Experience in converting MapReduce applications to Spark.
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Good knowledge in using job scheduling and workflow designing tools like Oozie.
- Experience in working with BI team and transform big data requirements into Hadoop centric technologies.
- Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
- Experience in Hadoop administration activities such as installation and configuration of clusters using Cloudera Manager and Apache Ambari.
- Have good experience creating real time data streaming solutions using Apache Spark/Spark Streaming/Apache Storm, Kafka and Flume.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Good understanding of Data Mining and Machine Learning techniques.
- Experience in handling messaging services using Apache Kafka.
- Experience in fine-tuning Mapreduce jobs for better scalability and performance.
- Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
- Experienced in developing and implementing web applications using Java, J2EE, JSP, Servlets, JSF, HTML, DHTML, EJB, JavaScript, AJAX, JSON, JQuery, CSS, XML, JDBC and JNDI.
- Working experience in Development, Production and QA Environments.
- Involved in all phases of Software Development Life Cycle (SDLC) in large scale enterprise software using Object Oriented Analysis and Design.
- Possess strong skills in application programming and system programming using C++ and Python on Windows and LINUX platforms using principles of Object Oriented Programming (OOPS) and Design Patterns.
- Involved in development for simulator which is being used for controllers to simulate real time scenarios using C / C++ programming.
- Working experience of control version tools like SVN, CVS, Clear Case and PVCS.
TECHNICAL SKILLS
Big Data Technologies: HDFS, Hive, Hana, AWS, Map Reduce, Pig, Sqoop, Oozie, Zookeeper, YARN, Avro, Spark
Scripting Languages: Shell, Python, Perl
Tools: Quality center v11.0\ALM, TOAD, JIRA, HP QTP, HP UFT, Selenium, Test NG, JUnit
Programming Languages: Java, C.., C, SQL, PL/SQL
QA methodologies: Waterfall, Agile, V-model.
Front End Technologies: HTML, XHTML, CSS, XML, JavaScript, AJAX, Servlets, JSP
Java Frameworks: MVC, jQuery, Apache Struts2.0, spring and Hibernate
Defect Management: Jira, Quality Center.
Domain Knowledge: GSM, WAP, GPRS, CDMA and UMTS (3G)
Web Services: SOAP (JAX-WS), WSDL, SOA, Restful (JAX-RS), JMS
Application Servers: Apache Tomcat, Web Logic Server, Web Sphere, JBoss
Databases: Oracle 11g, MySQL, MS SQL Server, IBM DB2 NoSQL Databases HBase, MongoDB
Cassandra Data Stax Enterprise 4.6.1:
Cassandra RDBMS: Oracle 9i, Oracle 10g, MS Access, MS SQL Server, IBM DB2, and PL/SQL
Operating Systems: Linux, UNIX, MAC, Windows NT / 98 /2000/ XP / Vista, Windows 7, Windows
PROFESSIONAL EXPERIENCE
Confidential, Seattle, WA
Sr. Hadoop developer/Admin
Responsibilities:
- Involved in all phases of Software Development Life Cycle (SDLC) and Worked on all activities related to the development, implementation, administration and support forHadoop.
- Installed and Configured Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing.
- Worked with the team to increase cluster from 28 nodes to 42 nodes, the configuration for additional data nodes was done by Commissioning process in Hadoop.
- Involved in and creating Spark cluster in HDInsight by create Azure compute resources with spark installed and configured.
- Involved in implementing an HDInsight version 3.3 clusters, which is based on spark version 1.5.1.
- Good knowledge in using components that are used in cluster such as spark core (Includes Spark core, Spark SQL, Spark streaming API’s.)
- Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, manage and review data backups and log files.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters and Experience in converting MapReduce applications to Spark.
- Managed and scheduled Jobs on a Hadoop cluster.
- Involved in defining job flows, managing and reviewing log files.
- Installed Oozie workflow engine to run multiple Map Reduce, Hive HQL and Pig jobs.
- Collected the log data from web servers and integrated into HDFS using Flume.
- Cassandra developer: Set-up configured and optimized the Cassandra cluster. Developed real-time java based application to work along with the Cassandra database.
- Involved in HDFS maintenance and administering it throughHadoop-Java API.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts and Experience in managing and reviewing Hadoop log files
- Constructed System components and developed server side part using Java, EJB, and Spring Frame work. Involved in designing the data model for the system.
- Used J2EE design patterns like DAO, MODEL, Service Locator, MVC and Business Delegate.
- Defined Interface Mapping between JDBC Layer and Oracle Stored Procedures.
- Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop. Worked on tuning the performance Pig queries.
- Implemented a script to transmit sysprin information from Oracle to Hbase using Sqoop.
- Implemented best income logic using Pig scripts and UDFs.
- Component unit testing usingAzureEmulator
- Analyze escalated incidences within theAzureSQLdatabase
- Implemented test scripts to support test driven development and continuous integration.
Environment: Hadoop, Map Reduce, Spark, shark, Kafka, AWS, HDFS, Zoo Keeper, Hive, Pig, Oozie, Core Java, Eclipse, Hbase, Sqoop, Flume, Oracle 11g, Cassandra, SQL,SharePoint,Azure2015, UNIX Shell Scripting.
ConfidentiaL, Port Washington, NY
Hadoop developer/Admin
Responsibilities:
- Working as administrator in Hortonworks (HDP 2242) distribution for 10 clusters ranges from POC to PROD
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues
- Hands on experience in writing MR jobs for cleansing the data and to copy it toAWScluster form our cluster.
- Experienced on adding/installation of new components and removal of them through Ambari
- Monitoring systems and services through Ambari dashboard to make the clusters available for the business
- Architecture design and implementation of deployment, configuration management, backup, and disaster recovery systems and procedures
- Hand on experience on cluster up gradation and patch upgrade without any data loss and with proper backup plans
- Changing the configurations based on the requirements of the users for the better performance of the jobs
- Experienced in Ambari-alerts configuration for various components and managing the alerts
- Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster
- Good troubleshooting skills on Hue, which provides GUI for developers/business users for day to day activities
- Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis
- Implemented complex MapReduce programs to perform joins on the Map side using distributed cache
- Setup flume for different sources to bring the log messages from outside to Hadoop HDFS
- Implemented Name Node HA in all environments to provide high availability of clusters
- Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases with Cron jobs
- Setting up MySQL master and slave replications and helping business applications to maintain their data in MySQL Servers
- Helping the users in production deployments throughout the process
- Experienced in production support which involves solving the user incidents varies from sev1 to sev5
Environment: Hadoop, Map Reduce, AWS, HDFS, Pig, Hive, Yarn, HBase, MapReduce, Sqoop, Flume, Zookeeper, Hortonworks, Eclipse, MYSQL, UNIX Shell Scripting
Confidential, San Mateo, CA
Hadoop Admin
Responsibilities:
- Involved inHadoopcluster administration: adding and removing nodes in a cluster, cluster capacity planning, performance tuning, cluster monitoring, troubleshooting
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest behavioural data into HDFS for analysis.
- Responsible for importing log files from various sources into HDFS using Flume.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Created customized BI tool for manager team that perform Query analytics using Hive QL.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Estimated the hardware requirements for NameNode and DataNodes & planning the cluster.
- Created Hive Generic UDF's, UDAF's, UDTF's in python to process business logic that varies based on policy.
- Hands on experience in writing MR jobs for cleansing the data and to copy it toAWScluster form our cluster
- Used open source web scraping framework for python to crawl and extract data from web pages.
- Possess strong skills in application programming and system programming using C++ and Python on Windows and LINUX platforms using principles of Object Oriented Programming (OOPS) and Design Patterns
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Worked with Various Admin interfaces like NzAdmin, NZWebAdmin, NZportal, NZSQL for troubleshooting and controlling Netezza appliance
- Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system. Worked with NoSQL database Hbase to create tables and store data.
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
- Involved in Cassandra Data Modelling and Analysis and CQL (Cassandra Query Language).
- Experience in Upgrading Apache Ambari, CDH and HDP Cluster.
- Configured and Maintained different topologies in Storm cluster and deployed them on regular basis.
- Experienced with different kind of compression techniques like LZO, GZip, and Snappy.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Created Data Pipeline of Map Reduce programs using Chained Mappers.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Implemented map reduce programs to perform joins on the Map side using Distributed Cache in Java. Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.
- Experience in Upgrading hadoop cluster hbase/zookeeper from CDH3 to CDH4.
- Involved in Agile SDLC during the development of project.
- Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
- Experienced in Monitoring Cluster using Cloudera manager.
Environment: Hadoop, HDFS, HBase, MapReduce, Java, AWS, JDK 1.5, J2EE 1.4, Struts 1.3, Hive, Pig, Sqoop, Flume, Kafka, Oozie, C++, Hue, Storm, Zookeeper, AVRO Files, SQL, ETL, Python, Cassandra, Cloudera Manager, MySQL, MongoDB.
Confidential, Moberly, MO
Hadoop Developer
Responsibilities:
- Currently working as Hadoop Admin and responsible for taking care of everything related to the clusters total of 60 nodes ranges from POC to PROD clusters
- Experienced on setting up Horton works cluster and installing all the ecosystem components through Ambari and manually from command line
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files
- Load log data into HDFS using Flume, Kafka and performing ETL integrations
- Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade
- Responsible for Installation of various Hadoop Ecosystems and Hadoop Daemons
- Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of databases
- Implemented Kerberos Security Authentication protocol for existing cluster
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes Communicate and escalate issues appropriately
- Involved in transforming data from Mainframe tables to HDFS, and HBASE tables using Sqoop and Pentaho Kettle And also worked on Impala to analyze stored data
- Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment And supporting and managing Hadoop Clusters using Apache, Horton works, Cloudera and MapReduce
- Involved in loading data from UNIX file system to HDFS And Created custom Solr Query components to enable optimum search matching
- Involved in writing Map reduce programs and tested using MRUnit
- Installed and configured local Hadoop Cluster with 3 nodes and set up 4 nodes cluster on EC2 cloud
- Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration
- Developing scripts and batch job to schedule a bundle (group of coordinators), which consists of various Hadoop programs using Oozie
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in java for data cleaning and pre-processing
- Performing Linux systems administration on production and development servers (Red Hat Linux, CentOS and other UNIX utilities)
- Installation and Configuration of VMware vSphere client, Virtual Server creation and resource allocation
- Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages Providing reports to management on Cluster Usage Metrics
Environment: HDFS, Map Reduce, HBase, Kafka, Yarn, Mongo DB, Hive, Impala, Oozie, Pig, Sqoop, Shell Scripting, MySQLdb, Red Hat Linux, CentOS and other UNIX utilities, Cloudera Manager
Confidential
Java Developer
Responsibilities:
- Designed a system and developed a framework using J2EE technologies based on MVC architecture.
- Involved in the iterative/incremental development of project application. Participated in the requirement analysis and design meetings.
- Designed and Developed UI’s using JSP by following MVC architecture
- Designed and developed Presentation Tier using Struts framework, JSP, Servlets, TagLibs, HTML and JavaScript.
- Designed the control which includes Class Diagrams and Sequence Diagrams using VISIO.
- Used the STRUTS framework in application. Programmed the views using JSP pages with the struts tag library, Model is a combination of EJB’s and Java classes and web implementation controllers are Servlets.
- Generated XML pages with templates using XSL. Used JSP and Servlets, EJBs on server side.
- Developed a complete External build process and maintained using ANT.
- Implemented Home Interface, Remote Interface, and Bean Implementation class.
- Implemented business logic at server side using Session Bean.
- Extensive usage of XML - Application configuration, Navigation, Task based configuration.
- Designed and developed Unit and integration test cases using Junit.
- Used EJB features effectively- Local interfaces to improve the performance, Abstract persistence schema, CMRs.
- Used Struts web application framework implementation to build the presentation tier.
- Wrote PL/SQLqueries to access data from Oracle database.
- Set up Web sphere Application server and used ANT tool to build the application and deploy the application in Web sphere.
- Prepared test plans and writing test cases
- Implemented JMS for making asynchronous requests
Environment: Java, J2EE, Struts, Hibernate, JSP, Servlets, HTML, CSS, UML, JQuery, Log4J, XML Schema, JUNIT, Tomcat, JavaScript, Oracle 9i, UNIX, Eclipse IDE.
Confidential
Java Developer
Responsibilities:
- Understanding andanalyzingthe requirements.
- Implemented server side programs by usingServletsand JSP.
- Designed, developedand validatedUser Interface using HTML, Java Script, XML andCSS.
- Implemented MVC using Struts Framework.
- Handled the database access by implementing Controller Servlet.
- Implemented PL/SQL stored procedures and triggers.
- Used JDBC prepared statements to call from Servlets for database access.
- Designed and documented of the stored procedures
- Widely used HTML for web based design.
- Involved in Unit testing for various components.
- Worked on database interaction layer for insertions, updating and retrieval operations of data from oracle database by writing stored procedures.
- Involved in development for simulator which is being used for controllers to simulate real time scenarios using C / C++ programming.
- Used Spring Framework for Dependency Injection and integrated with Hibernate.
- Involved in writing JUnit Test Cases.
- Used Log4J for any errors in the application
Environment: Java, J2EE, JSP, Servlets, HTML, DHTML, XML, JavaScript, Struts, c/c++, Eclipse, WebLogic, PL/SQL and Oracle.