Sr Big Data/hadoop Architect Resume
Burlington, NJ
SUMMARY
- Above 9+ years of experience in Analysis, Design, Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects and experience in Big Data in implementing end - to-end Hadoop solutions.
- Experience in working in environments using Agile (SCRUM), RUP and Test Driven development methodologies.
- Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing
- Extensive experience in installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, Sqoop, Pig, Hive, Impala & Spark
- Expertise in using J2EE application servers such as IBM WebSphere, JBoss and web servers like Apache Tomcat.
- Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP).
- Experience in analyzing data using HIVEQL, PIG Latin and custom MapReduce programs in JAVA. Extending HIVE and PIG core functionality by using custom UDF's.
- Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
- Diverse experience utilizing Java tools in business, Web, and client-server environments including Java Platform, J2EE, EJB, JSP, Java Servlets, Struts, and Java database Connectivity (JDBC) technologies.
- Good understanding in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
- Good working experience on using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Implemented Service Oriented Architecture (SOA) using Web Services and JMS (Java Messaging Service).
- Implemented J2EE Design Patterns such as MVC, Session Façade, DAO, DTO, Singleton Pattern, Front Controller and Business Delegate.
- Experienced in developing web services with XML based protocols such as SOAP, Axis, UDDI and WSDL.
- Experienced in MVC (Model View Controller) architecture and various J2EE design patterns like singleton and factory design patterns.
- Extensive experience in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop, SPARK, Impala), NoSQL databases like MongoDB, HBase, Cassandra.
- Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Hands on experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
- Good knowledge in SQL and PL/SQL to write Stored Procedures and Functions and writing unit test cases using JUnit.
- Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse and Data Mart.
- Strong knowledge in Object oriented design/analysis, UML modeling, Classic design patterns, and J2EE patterns.
- Hands on experience working with databases like Oracle 12g, SQL Server 2010 and MySQL.
- Hands on experience on the entire latest UI stack including HTML, CSS, mobile friendly, responsive design, user-centric design etc.
- Experience in developing web based enterprise applications using Java, J2EE, Servlets, JSP, EJB, JDBC, Hibernate, Spring IOC, Spring AOP, Spring MVC, Spring Web Flow, Spring Boot, Spring Security, Spring Batch, Spring Integration, Web Services (SOAP and REST) and ORM frameworks like Hibernate.
- Expertise in using XML related technologies such as XML, DTD, XSD, XPATH, XSLT, DOM, SAX, JAXP, JSON and JAXB.
- Experience in using ANT and Maven for building and deploying the projects in servers and also using Junit and log4j for debugging.
TECHNICAL SKILLS
Hadoop Eco System: Hadoop 2.7/2.5, HDFS1.2.4, Spark … Hive, Pig, Sqoop, Map Reduce/YARN, Impala, Oozie.
Big Data Platforms: Hortonworks, Cloudera, Amazon AWS
Programming Languages: C, C++, Core Java, J2EE
Web servers: JBoss 6, IBM WebSphere 7, Apache Tomcat 7, Oracle Weblogic 10g, Oracle Application Server
Databases: MySQL 4/5, MS-SQL server … MongoDB and Oracle 12c/11g
Operating Systems: Linux, windows, Mac, Unix
Java/J2EE Technologies: Java, J2EE, Servlets, JSP, JMS, JavaBeans, JSTL, JSF, Struts, EJB, Spring, Hibernate, JNDI, JPA, Web Services SOAP (JAX-RPC, JAX-WS), Restful (JAX-RS), WSDL and UDDI.
Web Technologies: HTML5, CSS3, Bootstrap, Ajax, JavaScript, jQuery, Nodejs, Angular JS
Methodologies: Agile, Waterfall and Test-Driven Development
Version Control: IBM ClearCase, Visual Source Safe, SVN, CVS, GIT Hub
PROFESSIONAL EXPERIENCE
Confidential, Burlington NJ
Sr Big Data/Hadoop Architect
Responsibilities:
- Contributing as a member of a high performing, agile team focused on next generation data & analytic
- Build Big Data Analytics and Visualization platform for handling high-volume batch-oriented and real-time data streams.
- Utilized Agile Scrum Methodology to help manage and organize a team with regular code review sessions.
- Built platforms and deployed cloud based tools and solutions with AWS EMR
- Analyzed different big data analytic using Hive import data from RDBMS to HDFS.
- Loaded data from diff servers to AWS S3 bucket and setting appropriate bucket permissions.
- Reduced the overall EMR production cluster's cost (Amazon Web Services) by obtaining the best configuration for running data.
- Upgraded the Hadoop Cluster from CDH4 to CDH5 and setup High availability Cluster to Integrate the HIVE with existing applications
- Implemented complex big data with a focus on collecting, parsing, managing, analyzing, and visualizing large sets of data to turn information into business insights using multiple platforms in Hadoop ecosystem.
- Developed batch data flow using Spark code in python, Scala and Java
- Imported data from structured data source into HDFS using Sqoop incremental imports.
- Created Hive tables, partitions and implemented incremental imports to perform ad-hoc queries on structured data.
- Build Hive tables using list partitioning and hash partitioning and created Hive Generic UDF's to process business logic with Hive QL.
- Integrated HBase with MapReduce to move bulk amount of data into HBase.
- Developed SQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce jobs.
- Supported Map Reduce Programs those are running on the cluster and also Wrote MapReduce jobs using Java API.
- Involved in converting Map Reduce programs into Spark transformations using Spark RDD's using Scala and Python.
- Developed and execute data pipeline testing processes and validate business rules and policies
- Built code for real time data ingestion using Java, MapR-Streams (Kafka) and STORM.
- Designed unit test Data models and applications for data analytics solutions on streaming data
- Extensively used Apache Sqoop for efficiently transferring bulk data between Apache Hadoop and relational databases (Oracle, MySQL) for predictive analytics
- Developed Scripts and Batch Job to schedule various Hadoop Program and worked with the raw data, cleanses it and finally polishes it to the format where it can be consumed by Data Scientists to create critical insights.
- Developed story telling dashboards in Tableau Desktop and published them on to Tableau Server and used GitHub version controlling tools to maintain project versions.
- Optimized the mappings using various optimization techniques and also debugged some existing mappings using the Debugger to test and fix the mappings.
Environment: Hadoop, Java, MapReduce, HDFS, AWS, Amazon S3, Hive, Linux, XML, Eclipse, Cloudera, CDH4/5 Distribution, Spark, Scala, HBase, MongoDB, Python, GitHub, Sqoop, Oozie, DB2, SQL Server, Oracle 12c, MySQL
Confidential, New Hyde Park NY
Sr. Big Data/Hadoop Architect
Responsibilities:
- Worked in Agile development environment having KANBAN methodology.
- Used Kibana, which is an open source based browser analytics and search dashboard for Elastic Search and actively involved in daily scrum and other design related meetings.
- Maintain Hadoop, Hadoop ecosystems, and database with updates/upgrades, performance tuning and monitoring.
- Prepared data analytics processing, and data egress for availability of analytics results to visualization systems, applications, or external data stores
- Responsible for defining the data flow within Hadoop eco system and direct the team in implement them and builds large-scale data processing systems in data warehousing solutions, and work with unstructured data mining on NoSQL.
- Used AWS services like EC2 and S3 for small data sets processing and storage.
- Provisioning of Cloudera Director AWS instance and adding Cloudera manager repository to scale up Hadoop Cluster in AWS.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
- Created Hive tables, and loading and analyzing data using hive queries and wrote Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries.
- Developed customized Hive UDFs and UDAFs in Java, JDBC connectivity with hive development and execution of Pig scripts and Pig UDF's.
- Developed and maintained batch data flow using HiveQL and UNIX scripting and Used Hadoop YARN to perform analytics on data in Hive.
- Used Oozie and Zookeeper operational services for coordinating cluster and scheduling workflows.
- Implemented MR programs to analyze large datasets in warehouse for business intelligence purpose
- Worked with the Data Science team to gather requirements for various data mining projects.
- Maintained different cluster security settings and involving in creation and termination of multiple cluster environment.
- Continuous coordination with QA team, production support team and deployment team.
Environment: Hadoop, HDFS, MapReduce, Unix, REST, Python, Pig, Hive, HBase, Storm, NoSQL, Flume, Zookeeper, Cloudera, SAS, AWS, Kafka, Cassandra, Informatica, Teradata, Scala, Spark, Sqoop, XML, SQL.
Confidential, Dallas TX
Sr. Java/Hadoop Developer
Responsibilities:
- Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Involved in Installing, Configuring Hadoop Eco System, and Cloudera Manager using CDH4 Distribution.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Map Reduce, Hive and Spark.
- Developed the Map Reduce programs to parse the raw data and store the pre Aggregated data in the portioned tables.
- Involved in start to end process of Hadoop cluster installation, configuration and monitoring
- Responsible for building scalable distributed data solutions using Hadoop and Involved in submitting and tracking Map Reduce jobs using Job Tracker.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Worked with HBase in creating tables to load large sets of semi structured data coming from various sources.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop
- Migrated complex map reduce programs into Spark RDD transformations, actions.
- Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
- Automated all the jobs from pulling data from Storage to loading data into MySQL using Shell Scripts
- Designed and tested the data ingestion to handle data from multiple sources into the Enterprise Data Lake.
- Used Pig to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into HBase
- Created partitioned tables in Hive, mentored analyst and test team for writing Hive Queries.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Configured Nagios for receiving alerts on critical failures in the cluster by integrating with custom Shell Scripts.
- Rendered and delivered reports in desired formats by using reporting tools such as Tableau.
- Migrated the code into QA (Testing) and supported QA team and UAT (User).
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
- Implemented Struts tab libraries for HTML, beans, and tiles for developing User Interfaces.
- Developed the entire front end screens using Ajax, JSP, JSP Tag Libraries, CSS, Html and JavaScript.
- Used Test driven approach (TDD) for developing the application and documented all the modules and deployed on server in time.
- Extensively used Soap UI for Unit Testing and involved in Performance Tuning of the application
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Developed scripts and automated data management from end to end and sync up between all the clusters.
Environment: Hadoop, HBase, HDFS, Map Reduce, Kafka, Pig Latin, Sqoop, Hive, pig, MySQL, Oozie, Zookeeper, Python.
Confidential, Lexington, KY
Sr. Java/J2EE Developer
Responsibilities:
- Involved in Software Development Life Cycle (SDLC) of the application: Requirement gathering, Design Analysis and Code development.
- Implemented Struts framework based on the Model View Controller design paradigm.
- Designed the application by implementing Struts based on MVC Architecture, simple Java Beans as a Model, JSP UI Components as View and Action Servlet as a Controller.
- Used JNDI to perform lookup services for the various components of the system.
- Involved in designing and developing dynamic web pages using HTML and JSP with Struts tag libraries.
- Used HQL (Hibernate Query Language) to query the Database System and used JDBC Thin Driver to connect to the database.
- Developed Hibernate entities, mappings and customized criterion queries for interacting with database.
- Responsible for designing Rich user Interface Applications using JavaScript, CSS, HTML and AJAX and developed web services by using SOAP UI.
- Used JPA to persistently store large amount of data into database.
- Implemented modules using Java APIs, Java collection, Threads, XML, and integrating the modules.
- Applied J2EE Design Patterns such as Factory, Singleton, and Business delegate, DAO, Front Controller Pattern and MVC.
- Used JPA for the management of relational data in application and Designed and developed business components using Session and Entity Beans in EJB.
- Developed the EJBs (Stateless Session beans) to handle different transactions such as online funds transfer, bill payments to the service providers
- Developed XML configuration files, properties files used in struts Validate framework for validating Form inputs on server side.
- Extensively used AJAX technology to add interactivity to the web pages.
- Developed JMS Sender and Receivers for the loose coupling between the other modules and Implemented asynchronous request processing using Message Driven Bean.
- Used JDBC for data access from Oracle tables And JUnit was used to implement test cases for beans.
- Successfully installed and configured the IBM WebSphere Application server and deployed the business tier components using EAR file.
- Involved in deployment of application on WebLogic Application Server in Development & QA environment.
- Used Log4j for External Configuration Files and debugging.
Environment: JSP 1.2, Servlets, Struts1.2.x, JMS, EJB 2.1, Java, OOPS, Spring, Hibernate, JavaScript, Ajax, Html, CSS, JDBC, JMS, Eclipse, WebSphere, DB2, JPA, ANT.
Confidential, NYC NY
Java Developer
Responsibilities:
- Involved in client requirement gathering, analysis & application design.
- Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support in Water fall methodology.
- Developed the UI layer with JSP, HTML, CSS, Ajax and JavaScript.
- Used Asynchronous JavaScript and XML (AJAX) for better and faster interactive Front-End.
- Used JavaScript to perform client side validations.
- Involved in Database Connectivity through JDBC.
- Ajax was used to make Asynchronous calls to server side and get JSON or XML data.
- Developed server side presentation layer using Struts MVC Framework.
- Developed Action classes, Action Forms and Struts Configuration file to handle required UI actions and JSPs for Views.
- Developed batch job using EJB scheduling and leveraged container managed transactions for highly transactions.
- Used various Core Java concepts such as Multi-Threading, Exception Handling, Collection APIs, Garbage collections for dynamic memory allocation to implement various features and enhancements.
- Developed Hibernate entities, mappings and customized criterion queries for interacting with database.
- Implemented and developed REST and SOAP based Web Services to provide JSON and Xml data and Involved in implementation of web services (top-down and bottom-up).
- Used JPA and JDBC in the persistence layer to persist the data to the DB2 database.
- Created and written SQL queries, tables, triggers, views and PL/SQL procedures to persist and retrieve the data from the database.
- Developed a Web service to communicate with the database using SOAP.
- Performance Tuning and Optimization with Java Performance Analysis Tool and Implement JUnit test cases for Struts/Spring components.
- JUnit is used to perform the Unit Test Cases and used Eclipse as IDE and worked on installing and configuring JBOSS.
- Made use of CVS for checkout and check in operations and deployed the components in to WebSphere Application server
- Worked with production support team in debugging and fixing various production issues.
Environment: Java, JSP, HTML, CSS, AJAX, JavaScript, JSON, XML, Struts, Struts MVC, JDBC, JPA, Web Services, SOAP, SQL, JBOSS, DB2, ANT, Eclipse IDE, WebSphere.