Sr Hadoop Developer/admin Resume
TexaS
SUMMARY
- Over 7+ years of experience in software development, 3+ years of experience in developing large scale applications using Hadoop and Other Big data tools.
- Experienced in teh Hadoop ecosystem components like Hadoop Map Reduce, Cloudera, Hortonworks, HBase, Oozie, Hive, Sqoop, Pig, Flume, and Cassandra.
- Experience in developing solutions to analyze large data sets efficiently
- In depth understanding/noledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode and MapReduce concepts
- Extensive hands on experience in writing complex Mapreduce jobs, Pig Scripts and Hive data modeling
- 2+ years of comprehensive experience in Tableau (Dev & Admin) and Big Data/Hadoop which includes MR (Mapreduce), PIG, HIVE, Impala, Oozie, Flume, Zookeeper, NoSQL DB's such as Hbase and exposure to MarkLogic.
- Excellent understanding/noledge of Hadoop Distributed system architecture and design principals.
- Experience in converting MapReduce applications to Spark.
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice - versa
- Good noledge in using job scheduling and workflow designing tools like Oozie.
- Experience in working wif BI team and transform big data requirements into Hadoop centric technologies.
- Experience in performance tuning teh Hadoop cluster by gathering and analyzing teh existing infrastructure.
- Experience in Hadoop administration activities such as installation and configuration of clusters using Cloudera Manager and Apache Ambari.
- Has good experience creating real time data streaming solutions using Apache Spark/Spark Streaming/Apache Storm, Kafka and Flume.
- Extending Hive and Pig core functionality by writing customUDFs
- Good understanding of Data Mining and Machine Learning techniques
- Experience in handling messaging services using Apache Kafka.
- Experience in fine-tuning Mapreduce jobs for better scalability and performance.
- Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
- Experienced in developing and implementing web applications using Java, J2EE, JSP, Servlets, JSF, HTML, DHTML, EJB, JavaScript, AJAX, JSON, JQuery, CSS, XML, JDBC and JNDI.
- Experience in writing SQL, PL/SQL queries, Stored Procedures for accessing and managing databases such as SQL Server2014/2012 MySQL, and IBM DB2.
- Working experience in Development, Production and QA Environments.
- Involved in all phases of Software Development Life Cycle (SDLC) in large scale enterprise software using Object Oriented Analysis and Design.
- Working experience of control version tools like SVN, CVS, Clear Case and PVCS.
TECHNICAL SKILLS
Languages: Java (Core Java, Networking, Threads, Swing), XML, XSD, XSL, JavaScript
J2EE Technologies: J2EE, Java Mail API
Web servers: Apache Tomcat Server, IBM Websphere Application Server 5.0/ 6.0, Weblogic application server, JBOSS4.x
Server Side: JSP, Servlets, EJB, JDBC.
Frameworks/ Components: Spring, Spring Batch, Struts, Hibernate.
Big Data: Hadoop HDFS, Map Reduce.
Databases: SQL, MySQL, SQL Server, Oracle, DB2, Microsoft Access
Unit Testing: Junit, Rational
Methodologies: OOAD, RUP, UML, Design Patterns.
OS: Windows 2000/XP/Vista, WindowsNT4.0, Windows 03, Linux, UNIX.
Markup Languages: HTML, XML, DHTML
Open Source API: Apache Commons-io/file upload/net.
PROFESSIONAL EXPERIENCE
Confidential
Sr Hadoop Developer/Admin
Responsibilities:
- Installed and Configured Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing.
- Worked wif teh team to increase cluster from 28 nodes to 42 nodes, teh configuration for additional data nodes was done by Commissioning process in Hadoop.
- Designed MarkLogic solutions to provide highly available, fully scalable high speed ingestion of published oriented data. Solution allowed for both transactional processing and warehouse reporting against teh same system.
- Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, manage and review data backups and log files.
- Worked wif systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Creating Spark SQL queries for faster requests.
- Responsible for implementing Hadoop 2.0 (YARN) and testing pig, hive and mango db job processes.
- Experienced import/export data into HDFS/Hive from relational database and Teradata using Sqoop.
- Development, documentation, and implementation support on MIDAS project (HHS client), using Hadoop (Java API), and Mark logic server to transact or store large data (BIG data), it also connects wif other systems to download data (Daily/Weekly/Monthly basis) and save it as a sequence file. Map/Reduce algorithm (Java API) used to perform faster processing of large scale data.
- Used Impala for aggregating jobs and runs an average of 100K aggregation queries per day.
- Used Impala for optimization of query performance instead of Hive
- Administered and supported distribution of Hortonworks.
- Managed and scheduled Jobs on a Hadoop cluster.
- Involved in defining job flows, managing and reviewing log files.
- Provided teh best recommendations and industry best practices for ETL Solutions in SSIS.
- Installed Oozie workflow engine to run multiple Map Reduce, Hive HQL and Pig jobs.
- Collected teh log data from web servers and integrated into HDFS using Flume.
- Worked wif Hortonworks Data Platform. Subscribed data using Invenio application which is published on Data Router to NFS and HDFS.
- Responsible for implementing Hadoop 2.0 (YARN) and testing pig, hive and mango db job processes.
- Exported teh patterns analyzed back to Teradata using Sqoop.
- Cassandra developer: Set-up configured and optimized teh Cassandra cluster. Developed real-time java based application to work along wif teh Cassandra database.
- Developed SSIS transformations & Event Handlers for Error handling and Debugging for teh Packages
- Developed novel implementation for Accumulo Continuous Ingest performance on IBM Power 775. Abstract was selected for presentation Confidential Accumulo Summit 2015.
- Responsible to manage data coming from different sources.
- Data processing using SPARK.
- Created Pig Latin scripts to sort, group, join and filter teh enterprise wise data.
- Expertise in NoSQL databases like MarkLogic, Cassandra, MongoDB, HBase.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
- Worked on Hadoop (Hortonworks) ecosystem architecture, systems-level, and network-level noledge
- Participated in requirement gathering form teh Experts and Business Partners and converting teh requirements into technical specifications
- Constructed System components and developed server side part using Java, EJB, and Spring Frame work. Involved in designing teh data model for teh system.
- Leveraging and using teh following database software back-ends: MySQL, MS
- SQL Server, Oracle, sqlite, and Accumulo
- Used J2EE design patterns like DAO, MODEL, Service Locator, MVC and Business Delegate.
- Defined Interface Mapping between JDBC Layer and Oracle Stored Procedures.
- Experience in managing and reviewing Hadoop log files.
- Worked on NoSQL databases including Cassandra, MongoDB, MarkLogic, and HBase.
- Managing and reviewing Hadoop log files, worked wif HCatalog to open up access to Hive's Metastore.
- Development experience in DBMS like Oracle, MS SQL Server, Teradata and MYSQL.
- Installed and configured Hadoop components MaprFS, Hive, Impala, Pig, Hue.
- Created final tables in Parquet format in Impala.
- Used Impala to read, write and query teh Hadoop data in HDFS or H
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Implement Flume, Spark, Spark Stream framework for real time data processing.
- Imported data using Sqoop from Teradata using Teradata connector.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Supported in setting up QA environment and updating configurations for implementing scripts wif Pig and Sqoop. Worked on tuning teh performance Pig queries.
- Implemented a script to transmit sysprin information from Oracle to Hbase using Sqoop.
- Implemented best income logic using Pig scripts and UDFs.
- Implemented test scripts to support test driven development and continuous integration.
Environment: Hadoop, Map Reduce, Impala, Spark, shark, Kafka, HDFS, Zoo Keeper, Hive, Pig, Oozie, Core Java, Eclipse, Hbase, Sqoop, Teradata, MarkLogic, Flume, Accumulo, Oracle 10g, Cassandra, UNIX Shell Scripting.
Confidential, Texas
Hadoop Developer
Responsibilities:
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
- Responsible for importing log files from various sources into HDFS using Flume.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developed software to process, cleanse, and report on vehicle data utilizing various analytics and REST API languages like Java, Scala.
- Supported in settling up QA environment and updating configuration for implementing scripts wif Pig and Scoop.
- Exploring wif teh Spark improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, Pair RDDs, Spark YARN.
- Extracted teh data from Teradata into HDFS using Sqoop.
- Experience in working wif various kinds of data sources such as HP Vertica and Oracle.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Created customized BI tool for manager team that perform Query analytics using HiveQL.
- Scheduled data pipelines for automation of data ingestion in AWS.
- Experience in designing and implementation of secure Hadoop cluster using Kerberos.
- Extensively worked on Microsoft tools for documentation and presentations- Visio, Word, PowerPoint and Excel Macros.
- Building search application using Solr.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Estimated teh hardware requirements for NameNode and DataNodes & planning teh cluster.
- Created reports for teh BI team using Sqoop to export data into HDFS and Hive.
- Created Hive Generic UDF's, UDAF's, UDTF's in python to process business logic that varies based on policy.
- Worked on Importing and exporting data from different databases like Oracle, Teradata into HDFS and Hive using Sqoop, TPT and Connect Direct.
- Worked on implementing spark operations on RDD.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
- Extracted files from Couch DB and placed into HDFS using Sqoop and pre-process teh data
- For analysis
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Extensively used SSIS transformations such as Lookup, Derived column, Data conversion, Aggregate, Conditional split, SQL task, Script task and Send Mail task etc.
- Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.
- Optimizing teh Hive queries using Partitioning and Bucketing techniques, for controlling teh data distribution.
- Implementing a technical solution on POC's, writing programming codes using technologies such as Hadoop, YARN, Python and Microsoft SQL Server.
- Developed UI application using AngularJS, integrated wif Apache Solr to consume REST.
- Built and maintained scalable data pipelines using teh Hadoop ecosystem and other open source components like Hive, and HBase.
- Demonstrated expertise utilizing ETL tools, including SQL Server Integration Services (SSIS), and Informatica and ETL package design, and RDBMS systems like SQL Server, Oracle.
- Worked wif Kafka for teh proof of concept for carrying out log processing on a distributed system. Worked wif NoSQL database Hbase to create tables and store data.
- Worked on custom Pig Loaders and storage classes to work wif variety of data formats such as JSON and XML file formats.
- Experience in developing Shell scripts and Python Scripts for system management.
- Working wif data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos TEMPprincipals.
- Created Pig Macros to improve reusability of code and modularizing teh code.
- Involved in Cassandra Data Modeling and Analysis and CQL(Cassandra Query Language).
- Experience in Upgrading Apache Ambari, CDH and HDP Cluster.
- Worked wif GPFS, Hive, Exacta, MS Sql Server, Teradata.
- Configured and Maintained different topologies in Storm cluster and deployed them on regular basis.
- Good noledge on Cassandra, MongoDB, Netezza and Vertica.
- Experienced wif different kind of compression techniques like LZO, GZip, and Snappy.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Created Data Pipeline of Map Reduce programs using Chained Mappers.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Used Splunk for HadoopOps for Managing, Monitoring and reviewing teh whole infrastructure lives operations & activity. Also Managed MapReduce job to rapidly sort, filter and report on performance metrics, time, status, user or resource usage.
- Prepared teh complete data mapping for all teh migrated jobs using SSIS.
- Created HBase tables to load large sets of structured, semi-structured and unstructured datacoming from UNIX, NoSQL and a variety of portfolios. Involved in loading data from Linux file system to HDFS.
- Build customized memory indexes for high performance information retrieval of products using Apache Lucene and Apache Solr, which provides more precise and useful search data.
- Utilized AWS framework for content storage and Elastic Search for document search.
- Developed Pig Latin scripts to extract data from teh web server output files to load into HDFS.
- Imported and exported data between HDFS and Relational Systems like MySQL, Oracle and Teradata in to Hive using Sqoop.
- Implemented map reduce programs to perform joins on teh Map side using Distributed Cache in Java. Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.
- Using Macros for automated custom coloring of cells based on values in teh cell.
- Experience in Upgrading hadoop cluster hbase/zookeeper from CDH3 to CDH4.
- Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
- Experienced in Monitoring Cluster using Cloudera manager.
Environment: Hadoop, HDFS, HBase, Impala, MapReduce, Java, JDK 1.5, J2EE 1.4,Splunk, Informatica, Struts 1.3, Hive, Kerberos, Solr, Pig, Vertica, Macros, Sqoop, Flume, Kafka, Oozie, Hue, Hortonworks, Scala, Lucene, Storm, Zookeeper, AVRO Files, AWS, SQL, ETL, Cloudera Manager, MySQL, MongoDB.
Confidential, Irvine, CA
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Analyzed data using Hadoop components Hive and Pig.
- Worked hands on wif ETL process.
- Load teh data into Spark RDD and do in memory data Computation to generate teh Output response.
- Responsible for running Hadoop streaming jobs to process terabytes of xml's data.
- Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Hands on experience wif Apache & Hortonworks Hadoop Ecosystem components such as Scoop, Hbase and Mapreduce.
- Developing HTML reports wif Perl CGI.
- Develope analytical component susing Scala, Spark and Spark Stream.
- Experience in Spark and scala.
- Analyzed teh web log data using teh HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
- Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports by our BI team.
- Experience in creating custom Lucene/Solr Query components.
- Implementation of Hadoop access security using Kerberos.
- Moved teh databases/data between different Netezza Servers.
- Responsible to manage data coming from different sources.
- Developed Spark scripts by using Scala Shell commands as per teh requirement.
- Integrated Oozie wif teh rest of teh Hadoop stack supporting several types of Hadoop jobs out of teh box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Creating ETL mappings wif Perl, Python and Unix shell scripting.
- Expert in implementing advanced procedures like text analytics and processing using teh in-memory computing capabilities like Apache Spark written in Scala/Python.
- Involved in loading data from UNIX file system to HDFS.
- Implemented FAST, Mercado, Endeca search platform & analytical services like Omniture and Coremetrics.
- Monitoring and troubleshooting of multi-node Hadoop clusters and associated Accumulo databases.
- Responsible for creating Hive tables, loading data and writing hive queries.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Extracted teh data from Teradata into HDFS using teh Sqoop.
- Exported teh patterns analyzed back to Teradata using Sqoop.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently wif time and data availability.
Environment: Hadoop Cluster, HDFS, Hortonworks, Hive, Pig, Sqoop, Solr, Netezza, Spark, Linux, Hadoop Map Reduce, Perl, Hippa, Scoop, HBase, Kerberos, Lucene, Scala, Shell Scripting, Mercado, Linux, Accumulo, UNIX Shell Scripting and Big Data.
Confidential, Atlanta GA
Sr Java / J2EE Developer
Responsibilities:
- As a programmer, involved in designing and implementation of MVC pattern.
- Extensively used XML where in process details are stored in teh database and used teh stored XML whenever needed.
- Part of core team to develop process engine.
- Developed Action Classes & Validation Struts framework
- Created project related documentations like user guides based on role.
- Implemented modules like Client Management, Vendor Management.
- Attended various Client meetings.
- Implemented Access Control Mechanism to provide various access levels to teh user.
- Designed and developed teh application using J2EE, JSP, XML, Struts, Hibernate, Spring technologies
- Coded DAO and hibernate implementation Class for data access.
- Coded Springs Services Class and Transfer Objects to pass teh data between layers.
- Designed teh Database for teh Jeevica in MS-SQL server 2008
- Implemented Web Services using Axis
- Used different features of Struts like MVC, Validation framework and tag library.
- Created detail design document, Use cases, and Class Diagrams using UML
- Written ANT scripts to build JAR, WAR and EAR files.
- Developed Standalone Java Component that will interact wif Crystal Reports on Crystal
- Enterprise Server in order to view Reports as well Scheduling of Reports as well storing data as XML and sending data to consumers using SOAP.
- Deployed teh application and tested on Websphere Application Servers.
- Developed Java Scripts for client side validations in JSP.
- Developed JSPs wif Struts taglibs for teh presentation layer.
- Coordinated wif teh onsite, offshore and QA team to facilitate teh quality delivery from offshore on schedule.
Environment: Java 1.5, Spring, Spring WebService, JSP, JavaScript, Hibernate, SOAP, CSS, Struts, Websphere, MQ Series, JUnit, Apache, Windows XP and Linux
Confidential
Java Developer
Responsibilities:
- Designed a system and developed a framework using J2EE technologies based on MVC architecture.
- Involved in teh iterative/incremental development of project application. Participated in teh requirement analysis and design meetings.
- Used Apache flume to ingest log data from multiple sources directly into Accumulo, file roll and HDFS
- Designed and Developed UI’s using JSP by following MVC architecture
- Designed and developed Presentation Tier using Struts framework, JSP, Servlets, TagLibs, HTML and JavaScript.
- Designed teh control which includes Class Diagrams and Sequence Diagrams using VISIO.
- Used teh STRUTS framework in application. Programmed teh views using JSP pages wif teh struts tag library, Model is a combination of EJB’s and Java classes and web implementation controllers are Servlets.
- Generated XML pages wif templates using XSL. Used JSP and Servlets, EJBs on server side.
- Developed a complete External build process and maintained using ANT.
- Implemented Home Interface, Remote Interface, and Bean Implementation class.
- Implemented business logic Confidential server side using Session Bean.
- Extensive usage of XML - Application configuration, Navigation, Task based configuration.
- Designed and developed Unit and integration test cases using Junit.
- Used EJB features effectively- Local interfaces to improve teh performance, Abstract persistence schema, CMRs.
- Used Struts web application framework implementation to build teh presentation tier.
- Wrote PL/SQLqueries to access data from Oracle database.
- Set up Web sphere Application server and used Ant tool to build teh application and deploy teh application in Web sphere.
- Prepared test plans and writing test cases
- Implemented JMS for making asynchronous requests
Environment: Java, J2EE, Struts, Hibernate, Accumulo, JSP, HDFS, Servlets, HTML, CSS, UML, JQuery, Log4J, XML Schema, JUNIT, Tomcat, JavaScript, Oracle 9i, Unix, Eclipse IDE.
Confidential
Java Developer
Responsibilities:
- Understanding andanalyzingthe requirements.
- Implemented server side programs by usingServletsand JSP.
- Designed, developedand validatedUser Interface using HTML, Java Script, XML andCSS.
- Implemented MVC using Struts Framework.
- Handled teh database access by implementing Controller Servlet.
- Implemented PL/SQL stored procedures and triggers.
- Used JDBC prepared statements to call from Servlets for database access.
- Designed and documented of teh stored procedures
- Widely used HTML for web based design.
- Involved in Unit testing for various components.
- Worked on database interaction layer for insertions, updating and retrieval operations of data from oracle database by writing stored procedures.
- Used Spring Framework for Dependency Injection and integrated wif Hibernate.
- Involved in writing JUnit Test Cases.
- Used Log4J for any errors in teh application
Environment: Java, J2EE, JSP, Servlets, HTML, DHTML, XML, JavaScript, Struts, Eclipse, WebLogic, PL/SQL and Oracle.