Sr. Hadoop Developer Resume
Quincy, MA
SUMMARY
- Over 9+ years of experience in Information Technology involving Analysis, Design, Testing, Implementation and Training. Excellent skills in state - of-the-art technology of client server computing, desktop applications and website development.
- Over 5+ years of work experience on Big Data Analytics with hands on experience on writing Map Reduce jobs on Hadoop Ecosystem including Hive and Pig.
- Good working experience onHadoop architecture, HDFS, Map Reduce and other components in the Cloudera - Hadoop echo system.
- Good working experience onHadooparchitecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
- Hands on experience in installing, configuring, and usingHadoopecosystem components like Hadoop, Map Reduce, HDFS, Hive, Sqoop, Pig, Zookeeper and Flume.
- Good Exposure on Apache Hadoop Map Reduce programming, PIG Scripting and Distribute Application and HDFS.
- Expertise in Hadoop - Big data technologies: Hadoop Distributed File System (HDFS), Map Reduce, PIG, HIVE, HBASE, ZOOKEEPER, SQOOP.
- Good working experience onHadoopCluster architecture and monitoring the cluster.
- In-depth understanding of Data Structure and Algorithms.
- Experience in managing and reviewingHadooplog files.
- Experience in implementing standards and processes forHadoopbased application design and implementation.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice versa.
- Experience in managing Hadoop clusters using Cloudera Manager Tool.
- In depth knowledge of database like MySQL and extensive experience in writing SQL queries, Stored Procedures, Triggers, Cursors, Functions and Packages.
- Excellent knowledge of HTML, CSS, JavaScript, PHP.
- Good working experience on Installing and maintaining the Linux servers.
- Experience in Data Sharing and backup through NFS.
- Experience in Monitoring System Metrics and logs for any problems Adding, removing, or updatinguser accountinformation, resettingpasswords, etc
TECHNICAL SKILLS
Big Data Technologies: Hadoop, HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Zookeeper, Cloudera, Amazon EC2, EMR,S3, Redshift
Reporting Tools: Jaspersoft, Qlik Sense, Tableau
Scripting Languages: Perl, Shell, R
Programming Languages: C, C++, Java
Web Technologies: HTML, J2EE, CSS, JavaScript, AJAX, Servlets, JSP, DOM, XML, XSLT.
Application Server: WebLogic Server, Apache Tomcat.
DB Languages: SQL, PL/SQL, Postgres, Paraccel.
NoSQL Databases: Hbase, Cassandra
Databases /ETL: Oracle 9i/10g/11g, MySQL 5.2, DB2, Informatica v 8.x, Talend
Operating Systems: Linux, UNIX, Windows 2003 Server
IDE’s: Eclipse, NetBeans JDeveloper, IntelliJ IDEA.
Version Control: CVS, SVN, Git
PROFESSIONAL EXPERIENCE
Confidential, Quincy MA
Sr. Hadoop Developer
Responsibilities:
- Analyzing the requirement to setup a cluster
- Good at working on Hadoop, MapReduce, and Yarn/MRv2 developed multiple MapReduce jobs for structured, semi-structured and unstructured data in java
- Involved in Configuring Hadoop cluster and load balancing across the nodes
- Developed MapReduce programs in Java for parsing the raw data and populating staging Tables
- Created Hive queries to compare the raw data with EDW reference tables and performing aggregates
- Experienced in developing custom input formats and data types to parse and process unstructured and semi structured input data and mapped them into key value pairs to implement business logic in Map-Reduce.
- Experience in implementing custom sterilizer, interceptor, source and sink as per the requirement in Flume to ingest data from multiple sources.
- Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
- Importing and exporting data into HDFS and Hive using Sqoop
- Experienced in analyzing data with Hive and Pig
- Experienced knowledge over designing Restful services using java based API’s like JERSEY.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
- Integrating bulk data into Cassandra file system using MapReduce programs
- Got good experience with NoSQL databases HBase, Cassandra
- Involved in HBase setup and storing data into HBase, which will be used for further analysis
- Expertise in designing, data modeling for Cassandra NoSQL database
- Experienced in managing and reviewing Hadoop log files
- Experienced in defining job flows using Oozie workflow
- Involved in working with Spark on top of Yarn/MRv2 for interactive and Batch Analysis
- Worked closely with AWS EC2 infrastructure teams to troubleshoot complex issues
- Expertise in writing the Scala code using higher order functions for the iterative algorithms in spark for performance consideration
- Experienced in analyzing and Optimizing RDD’s by controlling partitions for the given data
- Good understanding on DAG cycle for entire spark application flow on Spark application WebUI
- Experienced in writing live Real-time Processing using Spark Streaming with Kafka
- Developed custom mappers in python script and Hive UDFs and UDAFs based on the given requirement
- Used HiveQL to analyze the partitioned and bucketed data and compute various metrics for reporting
- Experienced in querying data using SparkSQL on top of Spark engine
- Experience in managing and monitoring Hadoop cluster using Cloudera Manager
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop
- Unit tested a sample of raw data and improved performance and turned over to production
Environment: CDH, Java(JDK1.7), Hadoop, MapReduce, HDFS, Hive, Sqoop, Flume, HBase, Cassandra, Pig, Oozie, Kerberos, Scala, Spark, SparkSQL, Spark Streaming, Kafka, Linux, AWS, Shell Scripting, MySQL Oracle 11g, PL/SQL, SQL*PLUS
Confidential, MI
Hadoop Developer
Responsibilities:
- Installed Name node, Secondary name node, Yarn (Resource Manager, Node manager, Application master), Data node using Cloudera.
- Installed and configured Hortonworks Ambarifor easy management of existing Hadoop cluster, Installed and Configured HDP.
- Installed and configured multi-nodes fully distributed Hadoop cluster of large number of nodes.
- Provided Hadoop, OS, Hardware optimizations.
- Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
- Understanding the performance bottlenecks by analyzing the existing hadoop cluster and provided performance tuning accordingly.
- Regular Commissioning and Decommissioning of nodes depending upon the amount of data.
- Installed and configured Hadoop components Hdfs, Hive, HBase.
- Communicating with the development teams and attending daily meetings.
- Addressing and Troubleshooting issues on a daily basis.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
- Cluster maintenance as well as creation and removal of nodes.
- Monitor Hadoop cluster connectivity and security.
- Manage and review Hadoop log files.
- Configured the cluster to achieve the optimal results by fine tuning the cluster.
- Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
- Designed the shell script for backing up of important metadata and rotating the logs on a monthly basis.
- Implemented open source monitoring tool GANGLIA for monitoring the various services across the cluster.
- Testing, evaluation and troubleshooting of different NoSQL database systems and cluster configurations to ensure high-availability in various crash scenarios.
- Performance tuning and stress-testing of NoSQL database environments in order to ensure acceptable database performance in production mode.
- Designed the cluster so that only one secondary name node daemon could be run at any given time.
- Implemented commissioning and decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
- Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP.
- Provided the necessary support to the ETL team when required.
- Integrated Nagios in the Hadoop cluster for alerts.
- Performed both major and minor upgrades to the existing cluster and also rolling back to the previous version.
Environment: LINUX, HDFS, MapReduce, KDC, NAGIOS, GANGLIA, OOZIE, SQOOP, Cloudera Manager.
Confidential
Hadoop Developer
Responsibilities:
- Worked on writing transformer/mapping Map-Reduce pipelines using Java.
- Involved in creatingHive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
- Designed and implemented Incremental Imports into Hive tables.
- Worked in Loading and transforming large sets of structured, semi structured and unstructured data
- Deployed an Apache Solr search engine server to help speed up the search of the government cultural asset.
- Involvedin collecting, aggregating and moving data from servers to HDFS using Apache Flume
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Experienced in managing and reviewing theHadooplog files.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Implemented the workflows using Apache Oozie framework to automate tasks
- Worked with Avro Data Serialization system to work with JSON data formats.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
- Developed scripts and automated data management from end to end and sync up between all the clusters.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing pig Scripts.
Environment: CDH, HDFS, Core Java, MapReduce, Hive, Pig, Flume, Storm, Elastic search, Scala, Spark, Kibana, Shell scripting, UNIX.
Confidential
Java Developer
Responsibilities:
- Developed the spring AOP programming to configure logging for the application
- Expertise in developing enterprise applications using Struts Frameworks
- Developed the front end using JSF and Portlet.
- Developed Scalable applications using Stateless session EJBs.
- Developed the UI panels using JSF, XHTML, CSS, DOJO and JQuery
- MySQL to access data in the database at different Levels.
- Making a connection to backend MySQL database.
- Design and Developed using Web Service using Apache Axis wrote numerous session and message driven beans for operation on JBoss and WebLogic
- Used VSS (Visual Source Safe) as configuration management tool.
- Created automated test cases using Selenium
- Worked with SDLC process like water fall model, AGILE methodology
- JSP interfaces were developed. Custom tags were used
- Developed Servlets and Worked extensively on Sql.
- Used ANT for building the application and deployed on BEA WebLogic Application Server.
- Was responsible for Developing XML Parsing logic using SAX/DOM Parsers
- Good network at EMC Documentum Support Teams who help solve product issues and bugs
- Worked on tickets from service-now and Jira on daily basis.
- Designed the front end using Swing.
- Used IBM MQ Series in the project
- Apache Tomcat Server was used to deploy the application.
- Involving in Building the modules in Linux environment with ant script.
- Used Resource Manager to schedule the job in Unix server.
- Used web services (REST) to bridge the gap between our MS and Drupal/Word press technology.
- Design online stores using ASP & JavaScript: develop custom storefront applications, and custom user-interfaces for client sites.
- J2EE to communicate legacy COBOL based mainframe implementations.
- Worked on PL/SQL and SQL queries
- Developed Java Script and Action Script, VB Script macros for Client Side validations.
Environment: Spring, Struts, JSF, EJBs, JQuery, MySQL, DB2, Net Beans, JBoss, CVS, VSS, water fall model, UML, JSP, Servlets, ANT, XML, EMC, Jira, IBM MQ, Tomcat Server, Linux, Unix server