Big Data/ Hadoop Lead / Architect Resume
Newark, CA
SUMMARY:
- 13+ years of software development experience including 3+ years in Hadoop Ecosystem.
- Stepped into multiple lead roles such as Project lead, Architect, Developer, offshore/onsite coordinator with fortune 500 customers.
- Experienced as a Hadoop Architect/Developerimplementing Hadoop clusters to complement data warehouses. (Ex: Teradata/Netezza/GreenPlum and other EDWs).
- Modeling data streams and batches using tools like Sqoop, Flume and Kafka.
- Experienced in Hive metastore design, partitioning logic, bucketing, Hive QL optimization, distributed cache usage and compression based on use case needs and SLA requirements.
- Experience modeling complex data transformation, cleansing, formatting for unstructured data using PIG Latin scripts.
- Strong knowledge of Pig and Hive's functions, extending Hive and Pig core functionality by writing custom UDFs.
- Has in depth knowledge of the entire Hadoop stack including core Hadoop (HDFS and MapReduce), Flume, Sqoop, Zookeeper, Oozie, Hive and Pig.
- Experience in Active directory integration with Kerberos, KDC inter realm and HA implementation for multiple clusters, knowledge of tools like Ranger and Knox.
- Experience in cluster sizing, infrastructure planning, estimating data growth to build roadmap.
- Experienced in evaluating and building different monitoring systems (eg: Sensu, Nagios, Ganglia and Graphite).
- Strong knowledge in with data ingestion pipeline design, Hadoop information architecture, datamodeling and data mining, machine learning and advanced data processing.
- Experienced in deploying, monitoring and managing Hadoop clusters with the Hortonworks and Clouderadistributions of Hadoop in production environment.
- Experience in designing, configuring and installing DataStax Cassandra.
- Experience in configuring CentOS boxes with appropriate security using IPtables, and building PKI infrastructure.
- Knowledge of file systems like ext3, ext4, XFS and distributed file system architecture.
- Good understanding in various MapReduce types, input formats and output formats to customize Hadoop for application - specific purposes.
- Extensive e-commerce and web development experience with JEE components.
- Experienced in developing mobile applications using the AndroidSDK.
- Experienced in techniques such as SPRINT planning, Backlog management, estimation, daily SCRUM and review/retro meetings.
- Excellent communication skills, both verbal and written. Strong desire to work for a fast-paced, flexible environment.
TECHNICAL SKILLS
Languages and Technologies: Hadoop Ecosystem HDFS, MapReduce, Hive, Pig, Flume, Sqoop, Kafka: Oozie, ZooKeeper, PigUnit. Spark, SparkSql
Monitoring and Automation Tools: : Nagios, Ganglia, Cloudera Manager, Ambari, Maven
Programming Languages: : Java, Python, PL, SQL, Android
Web Technologies: : Spring, JSF, Struts, Tiles, Gwt, hibernate, Junit, Jmeter: UML, Servlets, JSP, HTML, XML Schema, JSON: JavaScript, CSS, Junit, Jdbc, Jms, Web services (Rest, Soap), JVM monitoring and tuning
Scripting Languages: : JavaScript, Shell Script, Perl, Python
Relational Databases: : Oracle, MS SQL, Essbase, SQLite, Teradata, Netezza
NoSQL Databases: : Cassandra (DataStax), Hbase
Security: : Kerberos, PKI, SSO
Operating Systems& Platform: : Linux (Ubuntu, Fedora, CentOS), AWS
IDE Tools: : Eclipse, Informatica, Mercator, MS Project &Office, HP-: PPM, Stata, Splunk, MS-Azure, Machine Learning: Studio, JMP-SAS, CVS, ClearCase, GIT, Jira, Trello: Confluence
PROFESSIONAL EXPERIENCE
Confidential, Newark, CA
Big Data/ Hadoop Lead / Architect
Responsibilities:
- Participated in project planning sessions with team members and with points of contact from other teams to analyze requirements. Also outlined, proposed and coded the proposed solutions.
- Agile methodology was adopted in development and used Scrum method of project management.
- Installed and administered the Cloud distribution me.e. Cloudera Hadoop and streamlined the process of running multiple algorithms. Also used bulk output and complete build loads tools of HBase for generating humongous files into a compatible format and loading them.
- Intensively used Input reader, Map, Partition, Comparison, Reduce functions and Output Writer of MapReduce programming algorithm to process large data sets in parallel on each cluster. Coded the MapR jobs in Java.
- Worked with Apache Hadoop distributed processing environment for web analytics, web log details using Apache Flume, URL accessing, cookies, access dates, times and IP addresses for about a 100 node cluster.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive, Sqoop and Involved in loading data from UNIX file system to HDFS.
- Involved in collecting requirements from business users.
- Designedmigration of Microsoft SQL EDW to AWS platform, researched security risks issues and mitigation plan.
- Proposed multiple solutions with cost, risk and effort analysis.
- Involved in collecting business requirements, managed and prioritized executive goals based on risks, cost and scheduled goals.
- Designed reconciliation process between daily import and periodic import.
- Roadmap for future phases of migration including ETL migration to databricks / EMR Spark.
Tools/Environment: SparkSQL, Microsoft SQL server DW, Redshift, AWS direct connect, AWS import/export, Hardware/Open VPN, Glacier, S3, Kinesis, S3 API upload .
Confidential, Minneapolis, MN
Responsibilities:
- Involved in Architecture of the Proof of Concept Involved in initial meetings with Cloudera Architect and BI teams for requirement gathering.
- Created Hive external tables for append only tables and managed tables for Reload tables.
- Installed various Hadoop ecosystems and Hadoop Daemons.
- Responsible for sharing best practices with SLA driven cluster design, performance tuning, backup/recovery, monitoring, disaster recovery strategy, security and production support for Hadoop and related systems.
- Involved in collecting requirements from business usersto tune use cases, designand implementPOC solutionsand expand cluster capacity from 20 to several hundred nodes.
- Involved in efficiently ingesting and aggregating large amounts of streaming and batch data into Hadoop Cluster using various tools (Flume, Sqoop, Kafka).
- Designed staging methods for ETL vs ELT based on components involved in overall EDW architecture.
- User behavior and their patterns were studied by performing analysis on the data stored in HDFS using Hive.
- Implemented Kerberos security integration with AD and scenarios ranging from cross realm communication and High Availability for KDC.
- Used HiveQL to write Hive queries based on existing SQL queries.
- Analyzed and mined from huge volumes of data was exported to MySQL using Sqoop.
- Developed custom MapReduce programs and custom User Defined Functions (UDFs) in Hive to transform the large volumes of data with respect to business requirement.
- Worked with Big Data Analysts, Designers and Scientists in troubleshooting map reduce job failures and issues with Hive, Pig, Flume etc.
- Develop data collection and attribute enhancements using storm topology written in java.
- Develop spark analytics written in java to run machine learning algorithms on the collected data.
- Build creative, high performing, and scalable code using PIG and HIVE from xml, logs and web.
Tools/Environment: Hadoop, Mapreduce, Hive, Cloudera Manager, Shell Script, Oozie, Kerberos, Sqoop, Nagios.
Confidential
Hadoop Consultant / Engineer
Responsibilities:
- Responsible for building scalable distributed data pipeline using Hadoop components.
- Build Sqoop pipeline to collect data from Netezzausing connector for Social media platform.
- Setup Cloudera Hadoop software components such as YARN, Oozie, HDFS, HBase, Oracle NoSQL, Hive, Pig, Sqoop and Impala for customer workshops and demonstrate manageability with Oracle Big Data Appliance Manager
- Extensively used Cloudera (Hive) to generate the Reports.
- Connector setup and hourly dump creation, used cron scheduler and later migrated to Oozie workflows to run jobs.
- Used Sqoop to process dumps periodically and load data into Hdfs.
- Optimized Sqoop jobs using compression techniques and fine tuning data node to bring import within SLA.
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive for analytical processing by business users.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
Tools/Environment: Hive, Hive QL, Mapreduce, Ganglia, Sqoop, Informatica, Oozie, Netezza, Fidelity Cloud.
Confidential
Consultant / Engineer
Responsibilities:
- Implementedingesting data from Oracle dumps using Sqoop.
- Scheduled oozie workflows for loading data into HDFS.
- Implemented Pig Latin scripts to transform click stream data and mash up with retail data, finally exported data sets for processing into analytical databases.
- Moved previously scheduled pig jobs on cron for hourly runs.
- Added data import from Netezza via sqoop.
Tools/Environment: Mapreduce, Pig, Cron, Ganglia, Sqoop, Netezza, Informatica, Fidelity Cloud.
Confidential
Consultant / Engineer
Roles and Responsibilities:
- Effort estimation and sprint planning, participated in daily scrum, managed agile metrics.
- Technical and functional Requirement analysis, technical design specification creation and review.
- Resolved production issues to support and resolve large customers for wealth central and streetscape applications.
- Added functionality to expose underlying data, designing and implementing using json for Rest APIs.
- Added functionality to fix pagination and added data points in reports module in Streetscape.
- Evaluated upcoming technologies via POCs, built recommendations, risks and contributed towards technology stack roadmap.
- Conducted weblog data analysis for breach detection, created reports from Splunk.
- Ran fortify on entire code base for first level code clean up and security compliance.
- Setup project and environment for other team members.
- Conducted discussions on design, performance and scalability issues to refine Streetscape performance.
Tools/Environment: Spring IoC, Security, MVC, Struts, WS (Soap, Rest), SQL, Jquery, Flex, json, html5, Java, JSR 303, Eclipse, Fortify, Splunk, Toad, Tomcat, Maven.
Confidential, Boston, MA
Consultant / Engineer
Responsibilities:
- Requirement review and technical design specification creation.
- Setup environment to independently test and develop reports for entire platform.
- Added new report to display cohort details using FO with style sheets.
- Extensively used rich faces components to build complex UI tables.
- Conducted review of design, performance and scalability issues to refine product.
- Setup project and environment for other team members
- Conducted discussions on design, performance and scalability issues to refine product.
Environment and Tools: JSF, Rich faces, FO, CAS (SSO), xml, Jboss, Oracle 10g, Vertica, Dxcg(rules engine), MS-SharePoint, JIRA, MS-Excel/Word SoapUI, Jsf, WS, Java, JEE, XML, UML, Drools, R.
Confidential, Boston, MA
Consultant / Engineer
Responsibilities:
- Competitor analysis reviewed similar product offerings and engaged board with strategic direction.
- Analyzed market trends, customers and various segments, competitors and offerings to refine MRD.
- Relationship management for new partners, accessing and developing partnerships.
- Created mocks ups, first MVP and future prototypes. Pioneered feature analysis based on usage metrics.
- Server side programming for application interfacing with android device.
- Integration issues with HIS over HL7
- Configurations for Ejabberd/Tomcat/MySQL servers
- Android application design, development and bug fixes.
- User interface - event handling and triggers.
- C2DM (cloud to device messaging)
- XMPP/SMACK API, XEP 0184.
Environment and Tools: MS-Excel, JIRA, PoP, SoapUI, REST Ws, SQL, J2EE, XML, MySql, Ejabberd, Apache-tomcat, AWS.
Confidential, Minneapolis, MN
Consultant / Engineer
Responsibilities:
- Fronted sizing and defining project scope, planned and built road map with project risks analysis for $3M project.
- Pro-Con analysis of alternate solutions, liaised between technology and business funding team. Worked with business to review and help approve business requirement document.
- Managing a team software engineers to develop and manage end-to-end product lifecycle of “EarlyResolution” at Wells Fargo, coordinated with QA, release and deployment teams.
- Functional activities include product roadmap, pricing, budgeting and coordinating cross-functional teams.
- Defined, documented and communicated overall system architecture and helped break down and identify project milestones.
- Communicated and coordinated efforts with project manager, developers and external technical teams.
- Created system architecture, data flow, UML and interaction level schematics to elicit system gaps and build thorough understanding of complete solution. Formulated test strategy and identify tools for testing independent components.
Environment and Tools: MS-Project, Sharepoint, MS-Excel, SoapUI. Soap WS, microsoft dotnet, PL-SQL, J2EE, JMS XML, oracle, Tibco Bpm, Apache, maven.
Confidential, Phoenix, AZ
Consultant / Engineer
Responsibilities:
- Created high level architecture and design of new solution.
- Worked to build a risk map for undertaking.
- Created project build plan and held scrum meetings between technical team leads for cross system development, testing, deployment of interfaces and tracked project progress.
- Worked on changes to Spring based solution to accommodate changes for PCI compliance.
- Designed, developed, tested and deployed Water Payments (JSF based application) to change work flows.
- Involved in analysis and design of the system using UML concepts, which include class diagrams, sequence diagrams, state chart diagrams.
- Used Eclipse for developing, debugging and maintaining project code and accessory files.
- Made changes to build strategy using Maven.
- Worked on epay-db for impact analysis on database and integrated with hibernate based solution.
- Assisted in configuring glassfish application server to host and deploy the application.
- Reengineered applications to a service oriented architecture using web services over SOAP.
- Extensively used XML and JAXB to model and exchange data between applications.
Environment and Tools: WS-Soap, Spring, Hibernate, JSF, JAXB, Glassfish, Oracle, JUnit, Jmeter, Maven, SVN, Agile process.
Confidential, San Francisco, CA
Consultant / Engineer
Responsibilities:
- As Analyst conducted feasibility study for project.
- Worked with business clients to estimate and justify costs, planned and built road map with project risks.
- Wrote overall work breakdown structure with time chart, milestones and outlined IT deliverable strategy.
- Project lead created project build plan and held scrum meetings between technical team leads for cross system development, testing, deployment of interfaces and tracked project progress.
- Created and managed project release plan for QSR and dependent projects to complete testing of QSR and interfaces to SORs.
- Designed prototyped, developed, tested and deployed QSR application and automated batch jobs.
- Involved in analysis and design of the system using UML concepts, which include class diagrams, sequence diagrams, state chart diagrams.
- Used Eclipse for developing, debugging and maintaining project code and accessory files.
- Wrote shell and perl scripts for interfacing file transfers over NDM (Network Data Mover) scheduled jobs via Auto-sys and configured for on demand process runs.
- Created and modified stored procedures and triggers. Developed SQL queries for selection and data manipulation using Oracle.
- Assisted in configuring weblogic application server to host and deploy the application.
- Developed new Ant build scripts and modified existing ones to build the application as per the modified code.
- Extensively used JSPs and JavaScript to develop the user interface in Graphical View.
- Implemented various J2EE design patterns like Business Delegate, DTO and Service Locator.
Environment: Java, JSP, Servlets, J2EE, Struts, Tiles, XML, XSLT, CSS, Weblogic, Oracle, Log4J, CVS, Intera reporting engine, JUnit
Confidential
Consultant / Engineer
Responsibilities:
- Led team of engineers to maintain and administer perl framework on top of a Solaris environment.
- CVS administrator and release manager for applications and database scripts for over 25 applications.
- Managed batch programs and schedules of processes.
- Managed installation of new libraries and install CPAN libraries to extend functionality, QA compatibility
- Guided integration efforts with Active directories for seamless browsing.
- Configure, troubleshoot Lighttd/Apache servers to host and deploy applications.
- Coordinated efforts with cross-functional teams to work in unison to strategize and implement solutions. (Care, Peak Day Pricing, Demand Response)
- Refactored application and used struts and tiles frameworks to follow a strict MVC architecture.
- Added modules to introduce server side data validations for user input and enhanced error handling.
- Integrated application with siteminder based autantication system
- Coded stored procedure for application submission and queries for other parts of application.
- Designed and developed application to integrate with LDAP.
- Added JSPs and related user interface components (javascript, css, images) for enrollment module.
- Designed and coded test cases using JUnit.
- Assisted deployment: Configure WebLogic Application Server to host and deploy application.
- Maintenance: Debugged system to research bugs and break fixes.
Environment: SharePoint, JEE, Struts, Tiles, Servlets, JSPs, EJB, JUnit, Struts, Weblogic, Eclipse, Oracle, ANT, LDAP, Clearcase, Siteminder, Solaris.
Confidential, San Francisco, CA
Consultant / Engineer
Responsibilities:
- Elicited business requirements from client and stakeholders to document and map business requirement to technology specifications.
- Involved in analysis and design of the system using UML concepts, which include class diagrams, sequence diagrams; state chart diagrams implemented using Rational Rose Enterprise Edition.
- Used Eclipse as the integration environment for developing, debugging and maintaining project code and accessory files.
- Used Data Access Object Design Pattern (DAO) for data access functionality.
- Created and modified stored procedures and triggers. Developed SQL queries for selection and data manipulation using Oracle.
- Configured and tuned WebLogic Application Server to host and deploy the application.
- Developed new Ant build scripts and modified existing ones to build the application as per the modified code.
- Extensively used JSPs and JavaScript to develop the user interface in Graphical View.
- Implemented various J2EE design patterns like Business Delegate, DTO and Service Locator.
Environment: Java, JSP, Servlets, J2EE, Struts, Tiles, XML, XSLT, CSS, Weblogic, Oracle, Log4J, CVS, Intera reporting engine, JUnit.
Confidential
Sr. Software Engineer/ Onsit
Responsibilities:
- Responsible for all production issues and break fixes as per SLA compliance.
- As System/Requirement analyst involved in discussions with the client to understand and map business requirement to use cases.
- Attended meetings on weekly basis, updating client about statuses and new enhancements to be introduced, worked on new requirements to bridge gaps between IT and business.
- Involved in analysis and design of the system using UML, which include class diagrams, sequence diagrams, state chart diagrams implemented using Rational Rose Enterprise Edition.
- Designed parts of the application based on MVC architecture using Jakarta Struts framework.
- Used WSAD as the tool for developing, debugging code.
- Used Data Access Object Design Pattern (DAO) for data access functionality
- Developed SQL queries for selection and data manipulation using Oracle
- Assisted in configuring the WebLogic Application Server to host and deploy the application.
- Developed new Ant build scripts and modified existing ones to build the application as per the modified code.
- Extensively used JSPs and JavaScript to develop the user interface in Graphical View.
- Implemented various J2EE design patterns like Business Delegate, DTO and Service Locator.
- Documented the design and testing phases of project.
Environment:Java, jsp, servlets, ejb, light-framework, tiles, weblogic, webspehere, oracle, Essbase VSS, CVS, Linux Redhat 2.1AS.