Big Data Architect/hadoop Administrator Resume
SUMMARY:
- Big Data Solution Architect and Hadoop Administrator with over 11 years of extensive experience in full project life - cycle, Hadoop platform architecture, Hadoop capacity planning, complex data ingestion pipelines, Cloud architecture,Google Big Query, Google Dataflow, application design and architecture, systems analysis, conversion & integration, database design and development, implementation of various distributed multi-tiered applications in various domains including finance, telecom, pharmaceuticals etc.
- Successfully led complex projects with small and large teams, covering various Big Data technologies including Cloudera Hadoop, Apache Solr, MongoDB, Zookeeper various open source projects like Apache Nutch, Any23 etc. to set up complex data ingestion pipelines.
- Recent exposure to Google Cloud Platform(GCP) technologies including Google BigQuery and Google Dataflow, which allow data ingestion and adhoc querying capabilities at an unprecedented scale.
- Multi-year experience in managing and administering Cloudera Hadoop clusters including cluster planning, securing a Hadoop cluster using Kerberos, configuring Sentry, Cloudera Navigator.
- Extensive experience in advanced cluster configuration areas including setting up HDFS Rack Awareness, setting up HA with NameNodes.
- Extensive experience in ingesting data using various tools like Sqoop, Flume into HDFS.
- Prior experience includes setting up data ingestion by crawling the web and indexing the data in Solr with precise data extraction using concepts like Semantic Web at scale on a Hadoop cluster.
- Experienced in leading from concept to operation of Big Data solutions.
- Adept at Linux shell scripting.
- Ability to clearly articulate high level strategy to project details and communicate effectively, including formal presentations, with all levels of the organization.
- Extensive experience in strategic program road mapping and turning strategic objectives, policies into actionable tasks.
- Wide technical and the strategic background, experienced at functioning with business leaders to transform business needs into technology results.
- Motivated individuals and teams toward project objectives - empowering them to succeed without losing overall direction.
- Experienced in agile delivery of software using practices from Scrum, Extreme Programming and RUP.
- Excellent leadership, problem solving, technical, communication skills and a track record of getting along well with team members in cross cultural global teams.
- Strong background in enterprise infrastructure and architecture.
- Extensive experience in strategizing, leading and implementing Big Data projects using Apache Hadoop(MRV1 & YARN), Apache Solr and Apache Nutch frameworks.
- Extensive experience with NoSQL Databases, specifically with MongoDB and Apache Solr with different query parsers including Dismax and eDismax.
- Extensive experience implementing sclable and highly available architectures using sharding and replication.
- Experienced in implementing applications that involve large scale web crawling and data extraction from unstructured data especially in the E-commerce domain.
- Experienced in implementing applications that leverage semantic web technologies including RDFa, Microdata, Schema.org and Apache Jena for RDF .
- Thorough understanding and experience of leveraging semantic mark up vocabularies for E-commerce including Schema.org and GoodRelations.
- Extensive experience in leading the implementation of SOA transformation programs including highly scalable RESTful and SOAP based architectures.
- Advanced understanding of Microsoft Project, Visio, Excel including formulas, functions, pivot tables, conditional formatting, workbook structure etc.
- Experienced in migrating legacy operations to JEE, WebSphere, WebLogic, WebSphere Portal, Oracle, DB2, Solaris, and AIX.
- Extensive experience in Spring, Hibernate, Caching, Security, Messaging, Web Services.
- Extensive experience in the application of GoF design patterns, JEE design patterns and SOA design patterns.
- Familiarity with Natural Language Processing (NLP) techniques and excellent algorithmic skills.
- PMI certified Program Management Professional (PgMP) professional.
- Pursuing Cloudera Certified Administrator for Apache Hadoop (CCAH) certification.
TECHNICAL SKILLS:
Programming Languages: Java, C, C++, Python, and PL/SQL.
Java Technologies: JEE, JSP, EJB, JMS, Java Beans, JDBC and Java Servlets.
Architecture: OOAD, GoF, JEE, SOA Design Patterns, BPM, Enterprise Messaging,EC2.
Methodologies: Agile Methodologies, Scrum, UML, RUP, Extreme Programming.
RDBMS: Oracle 7, 8, 8i, 10g, 11i, 11g, SQL Server7.0, DB28.x, MySQL.
NoSQL: MongoDB
Application Servers: Weblogic 6.1/7.0/8.1,Websphere 5.x, 6.x, WebSphere Process Server 6.x, IIS, Apache Tomcat.
Middleware: RMI, EJB, CORBA.
Frameworks: Apache Hadoop(MRV1& YARN), Apache Solr, Apache Nutch, Apache Any23, Apache Jena,Dropwizard,Struts, Spring, Hibernate, iBATIS, CCF, J2C.
Scripting: Java Script, Ant,Maven, JACL, and Selenium.
Project Management Tools: Microsoft Project, Microsoft Excel, JIRA,Visio.
Operating Systems: UNIX (Solaris), Windows NT/2000, IBM-AIX, Linux.
Version Control Tools: SVN, CVS, Clearcase, MKS Source Integrity, Star Team, TFS.
IDE s: RAD, Eclipse, JBuilder, Visual Studio, Visual Age, WSAD, Workshop.
Other: XML, XSLT, SAX, DOM, BPEL, XMLSpy.
PROFESSIONAL EXPERIENCE:
Confidential
Big Data Architect/Hadoop Administrator
Responsibilities:
- Responsible for leading from concept to operation of the solution.
- Implemented a Cloudera Cluster with CDH 5.x distribution and Cloudera Manager 5.x to enable crawling and indexing of data from the web.
- The cluster was optimized for YARN (MRV2) Applications/Jobs to be running on the cluster for the duration of crawl window.
- The cluster nodes were primarily used to run the Hadoop YARN MapReduce jobs as well as to run Apache SolrCloud and NoSQL MongoDB cluster. The data size that was being searched upon ran up to 100+ million documents that were processed in the cloud on HDFS and fed in to SolrCloud for search.
- The data ingestion rate in the cluster was 100+ million Solr documents and 100+ million images sourced from the web over 2 week crawl cycle.
- The data was staged in HDFS and processed on the Hadoop cluster on AWS cloud and then fed into SolrCloud index nodes on AWS cloud.
- Implemented a data warehouse on cloud using Google BigQuery and Google Dataflow to ingest the data from SolrCloud to support adhoc querying of the ecommerce product data.
- Responsible for day-to-day activities which includes HDFS support and maintenance, Cluster maintenance, creation/removal of nodes, Cluster Monitoring/ Troubleshooting, Manage and review Hadoop log files, Backup and restoring, capacity planning.
Environment: Cloudera CDH 4.x/5.x, Hive, Pig, Impala, Oozie, Sqoop, Flume, Radis, Linux Scripting, QlikView,Tableau.
Confidential, Florham Park, NJ
Solution Architect.
Roles and Responsibilities:
- Responsible and accountable for the coordinated management of multiple related projects directed towards strategic business and organizational objectives.
- Created and executed development plans and revised as appropriate to meet changing needs and requirements.
- Managed several teams including offshore and was responsible for budget, scoping and program management.
- Planned and lead major integration of data silos and thereby consolidating the data from different business units and markets for centralized reporting and analytics.
- Established processes trained and developed procedures while coaching, mentoring and consulting the technology teams.
- Cost focused, detail oriented and provided both tactical and strategic thinking.
- Managed complex technical teams through thorough understanding of technical platforms including JEE, .NET, SOA, BPM, SAP BI, Data Warehouse, Business Objects and Mainframes spanning multiple tiers including the DBMS (Oracle and DB2 extensively).
- Managed technical resources within budget and project schedule.
- Lead teams of project managers, architects, technical leads and developers both on and off-shore.
- Recognized design and implementation deficiencies and implemented effective solutions.
- Coached, mentored and lead personnel with in a technical team environment.
Confidential, Jersey City, NJ
Technical Project Manager
Roles and Responsibilities:
- Senior technology manager who provided leadership for the platform technology team from technology and business perspectives.
- Responsible for setting the project baselines, cost estimates and determining the staffing models.
- Ensured that all aspects of SDLC using agile methodology are adhered to with strong focus on business analysis, requirements gathering, system architecture and testing as part of successful rollout.
- Managed multiple vendors and acted as a single point of contact for the project stake holders and management peers for the different SDLC phases of the project.
- Created the impact analysis and work proposal documents and effectively presented them at multiple management levels.
- Maintained continuous alignment of the project scope with strategic business objectives, and made recommendations to modify the project to enhance effectiveness toward the business result or strategic intent.
- Responsible for the architecture, design and development of the multi-tiered eBAM application by leveraging the existing services.
- Lead the effort to analyze the limitations of the existing web services and architect, design and develop new web services based on SOA patterns like aggregator, event driven consumer, dynamic routing etc.
- Finalized the technology stack and provided input to the initial estimates based on the list of features.
- The eBAM application leveraged the JDK1.5, WebSphere Application Server(WAS 6.x), Web Sphere Process Server(WPS 6.x) , Microsoft Office SharePoint Server, Spring, Hibernate, Oracle 10g,11i,Axis 1.4,2.0, CXF, Adobe Lifecycle ES among other technologies.
- Served as the liaison between the management and the different vendor groups and program managers spanning different phases of the project life cycle.
- Worked with the business to understand the requirements, perform the technical feasibility analysis, create architecture and design documents for off-shore and onsite teams.
- Extensively involved in the performance monitoring and optimization of the application to meet or exceed the SLAs using various tools like, AVICode, JProfiler, and HP Diagnostics etc.
Confidential, Parsippany, NJ
Solution Architect
Roles & Responsibilities:
- Provided leadership and ownership for the platform technology team.
- Very hands on manager responsible for SDLC and services architecture.
- Created and maintained project schedule and database documentation, and acted as liaison between all departments of ADT and Tyco International.
- Lead the design of the application architecture and enterprise object framework for a real time order processing system.
- Managed the development of the J2EE and Web Sphere Portal based software platform and helped in business process re-engineering from legacy system to distributed J2EE System.
- Managed the design an automated build, deploy and testing framework for continuous integration using Ant and Selenium scripting tools.
- Managed the performance tuning of the application by gathering the performance metrics of the WebSphere Application Servers and the Application.
- Managed the implementation of the Web Sphere cluster services on a Web Sphere Network Deployment Topology to achieve load balancing and increased availability of the application.
- Managed and incorporated Business Process Automation using the WebSphere Process Server (WPS) platform.
- Implemented Scrum as the Agile software development methodology.
Environment: JDK 1.4.2, Servlets 2.4, EJB 2;1, RAD 7.0, Web Sphere Application Server 6.x,WebSphere Process Server(WPS),DB2 8.x, Rational Rose Enterprise Suite, iBATIS,ANT,Selenium,JACL, Hudson Continuous Integration Server.
Confidential, New York, NY
Application Architect
Roles & Responsibilities:
- Evolving standards and maximizing the use of existing tools, technologies and frameworks (i.e., J2EE, Portal, Web Services, Oracle, BI/Actuate, Messaging, ETL, Managed File Transfer, Job Management, etc.).
- Driving emerging technology initiatives and architecture trends (i.e., SOA, ESB, EDA, .Net, GRID, BPM, etc.).
- Extracting common business requirements to promote component reuse and provide AD process in which shared components/libraries are housed.
- Leading efforts on producing SDLC “gate” processes, architecture/design/code review protocol, design patterns and coding best practices, and formalize RUP adoption.
- Provide technology and process leadership as a member of the Asset Management IT Architecture Team.
- Lead a team and successfully implemented a metadata-driven, web-based ad hoc query and reporting system using the open source Business Intelligence and Reporting Tools (BIRT) and leveraging the existing WebLogic portal architecture for the portfolio management services at Confidential Asset Management.
- Evaluated and integrated open source technologies into Asset Management infrastructure.
- Involved in strategizing and developing the road map for the Web 2.0 Initiative.
Environment: JDK 1.4.2, Servlets 2.3, JSP, J2EE, Oracle 9i, Weblogic 8.1, Weblogic Portal, Workshop, JProbe, TOAD,SQL, PL/SQL,(Business Intelligence and Reporting Tools)BIRT,AJAX,Jfree Chart, RSS, Web 2.0,Apache Axis2.
Confidential, Warren, NJ
Lead Developer
Responsibilities:
- Designed and developed modules supporting Servlet, Logging and Exception framework for EJB, HTTP, and XML clients.
- Extensively involved in migration of Web Sphere Studio Application Developer (WSAD) and WebSphere Application Server from 4.0.3 to 5.0.2 and to 5.1.
- Collected user requirements, involved in discussions with Business Analysts, Customer support Managers and teams requesting functionality.
- Prepared Use Cases, Class diagrams and Sequence diagrams to support the Interface documents.
- Used Command, Abstract, Protocol, State and Singleton Design patterns used for designing the framework.
- Implemented stateless session beans to service client requests.
- Implemented a common XML based request-response service model for different front end clients, thereby increasing the reusability.
- Used JAXB and Castor as the XML binding tools.
- Designed schemas using Altova XMLSpy
- Extensively involved in migrating from Common Connector Framework (CCF) to J2EE Connector Architecture (J2C).
- Managed successful production releases for different projects in a release.
Environment: JDK 1.3.1, Servlets 2.3, EJB 1.2, XML 1.0, WSAD 4.x 5.X, Web Sphere Application server 4.x 5.x, 6.0, DB2UDB 7.2, CICS, MF Cobol, VS-Cobol, JCL, MKS Source Integrity, VISIO, Rational Rose Enterprise Suite, Castor XML Data binding,JAXB, JUnit, Load Runner, Test Director.