Sr. Big Data Architect/hadoop Developer Resume
Palo Alto, CA
SUMMARY:
- Over 9+ years of experience in IT industry which includes 5+ years of experience in Big Data and Hadoop Ecosystem like PIG, HIVE, SQOOP, OOZIE, HBASE, MapReduce, ZOOKEEPER, ETL informatica tool and FLUME.
- Excellent understanding / knowledge of Big Data and Hadoop architecture.
- Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality. Good understanding of HDFS Designs, Daemons, federation and HDFS high availability (HA).
- Experience in managing Hadoop cluster using Cloudera Manager.
- Experience in Managing scalable Hadoop clusters including Cluster designing, provisioning, custom configurations, monitoring and maintaining using different Hadoop distributions: Cloudera CDH, Horton Works HDP Apache Hadoop.
- Well versed in designing and implementing MapReduce jobs using JAVA on Eclipse to solve real world scaling problems.
- Expertise in doing Unit Testing, Integration Testing, System Testing and Data Validation for Developed Informatica Mappings
- Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
- Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks
- Extensive experience working in Oracle, DB2, SQL Server and My SQL database. Hands on experience working with Java project build managers Apache MAVEN.
- Implemented connectivity to database using JDBC API from Servlets and JSP through Java Beans.
- Solid understanding of the high volume, high performance systems.
- Good knowledge in integration of various data sources like RDBMS, Spreadsheets, Text files and XML files.
- Knowledge of UNIX and shell scripting. Good knowledge on implementing Non - SQL using, Cassandra, Mongodb
- Have flair to adapt to new software applications and products, self-starter, have excellent communication skills and good understanding of business work flow.
- Working knowledge in SQL, PL/SQL, Stored Procedures, Functions, Packages, DB Triggers, Indexes.
- My primary focus is on supporting the creative process of software development through Agile, Scrum and lean Principles: eliminating waste in the process; incremental, rapid delivery of business value on 1-4 week cycles; delivery of fully tested and documented product every cycle.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
PROFESSIONAL EXPERIENCE:
Confidential, Palo Alto, CA
Sr. Big Data Architect/Hadoop Developer
Responsibilities:
- Involved in various phases of development analyzed and developing the system going through Agile Scrum methodology
- Developed Simple to complex Map/reduce Jobs using Hive.
- Analyzing the data by performing Hive queries and running Pig scripts to know user behavior.
- Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating pipelines using Pig, and Hive.
- Resolving performance issues in Pig and Hive with understanding of Map Reduce physical plan execution and using debugging commands to run code in optimized way.
- Further using pig to do transformations, event joins, and pre -aggregations performed before loading JSON files format onto HDFS.
- Responsible for data ingestion from RDBMS to Hadoop using Sqoop, automating the workflow using Zena and performing data cleansing, transformations and using PIG Piggybank and elephant bird API for further data analytics.
- Working with real time streaming applications using tools like Spark Streaming, Storm and Kafka.
- Using Spark jobs written in scala used to transformed data and loaded into hive tables.
- Parsed XML files and loaded the data into hive ORC tables using spark.
- Implemented Extract queries with spark transformations and improved performance.
- Worked on spark performance tuning.
- Developing a Talend framework to execute extracts on Data Lake tables and implemented data quality validation logic.
- Used different Hadoop components in Talend to design the framework.
- Developing Sqoop scripts to handle change data capture for processing incremental records between new arrived and existing data in RDBMS tables.
- Developing data pipelines using Sqoop, Pig and Hive to ingest customer member data, clinical, biometrics, lab and claims data into HDFS to perform data analytics.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Developing job flows to automate the workflow using Zena automation tool.
- Participating in weekly conference sessions with business analysts and high level architects to report project updates.
- Involving in daily Scrum meetings and reporting development of project activity assuring effective solution on Agile-scrum method and integrated.
Environment: HDFS, Map Reduce, Apache Pig, Sqoop, Hive, Hbase, Oracle 11g, Scala IDE 4.4.1, Linux, Putty, JSON, Zena Automation, Talend 5.6.2, TAC, spark 1.6
Confidential, San Francisco, CA
Sr. Hadoop developer/Big Data Engineer
Responsibilities:
- Involved in design and development phases ofSoftware Development Life Cycle (SDLC)usingScrummethodology
- Mainly worked on handling of Bigdata Analytics and infrastructure of Hadoop, MapReduce
- Used Cassandra for data extraction from Non-SQL databases.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and purchase histories into HDFS for analysis.
- Developed job flows in Oozie to automate the workflow for extraction of data from warehouses and weblogs.
- Conceived and designed custom POCs using Kafka.
- Used Pig to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS
- Optimizing Map reduce code, pig scripts, user interface analysis, performance tuning and analysis.
- Used Microsoft Azure for building, deploying and managing application in Microsoft managed data centers.
- Installed cluster, monitoring/administration of cluster recovery, capacity planning, and slots configuration.
- Tested the application by creating the test cases using MRUnit for both unit testing and Integration testing for the entire application
- Writing Hadoop MR programs to get the logs and feed into Cassandra for Analytics purpose
- Building, packaging and deploying the code to the Hadoop servers.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard. Loaded the aggregated data onto DB2 for reporting on the dashboard.
Environment: HDFS, Mahout, Map-Reduce, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie, DB2, Cassandra, HBase.
Confidential, Santa Clara Valley, CA
Sr. ETL Developer
Responsibilities:
- Participated in requirement analysis with the help of business model and functional model.
- Wrote documentation to describe program development, logic, coding, testing, changes and corrections.
- WrotePL/SQL stored procedures and triggers, cursorsfor implementing business rules and transformations. Created complex T-SQL queries and functions
- Provided support to develop the entire warehouse architecture and planned the ETL process.
- Extracted data from flat files, XML files and Oracle database, applied business logic to load them in the central Oracle database.
- Performance tuned various mappings, Sources, Targets and transformations by optimizing caches for lookup, joiner, rank, aggregator, sorter transformation and tuned performance ofInformatica session for data files by increasing buffer block size, data cache size, sequence buffer length and used optimized target based commit interval and Pipeline partitioning to speed up mapping execution time
- Monitored and tunedETLrepository and system for performance improvements.
- Worked with data modeling in preparing logical and physical data models and adding/deleting necessary fields using ERwin.
- Defined the content, structures and quality of high complex data structures using Informatica Data Explore (IDE).
- Implementedslowly changing dimensionto maintain current information and history information in dimension tables.
- Worked on different data sources such asOracle, SQL Server, Flat files and so on.
- Created Jobs and Job streams in Autosys scheduling tool to scheduleInformatica, SQL script and shell script jobs
- Designed complex mappings in Power Center Designer usingAggregate, Expression, Filter and Sequence Generator, Update Strategy, Union, Lookup, Joiner, XML Source Qualifier and Stored procedure transformations.
- ProposedPL/SQL and UNIX Shell Scriptsfor scheduling the sessions in Informatica.
- Worked with command line program pmcmd to interact with the server to start and stop sessions and batches, to stop the Informatica server and recover the sessions.
Environment: Informatica Power Center 9.6.1/9.6, Oracle 11g, SQL Server 2014/2012, TERADATA, PL/SQL, TOAD, InformaticaScheduler, UNIX, Shell Scripting.
Confidential, Cypress, CA
Java Developer
Responsibilities:
- Involved in the designing of the project using UML. Followed J2EE Specifications in the project.
- Designed the user interface pages in JSP.
- Used XML and XSL for mapping the fields in database. Used JavaScript for client side validations.
- Created stored procedures and triggers that are required for project. Created functions and views in Oracle.
- Enhanced the performance of the whole application using the stored procedures and prepared statements.
- Responsible for updating database tables and designing SQL queries using PL/SQL.
- Created bean classes for communicating with database.
- Involved in documentation of the module and project. Prepared test cases and test scenarios as per business requirements.
- Involved in bug fixing. Prepared coded applications for unit testing using JUnit.
Environment: Java, JSP, Servlets, J2EE, Java Beans, Oracle, HTML, DHTML, XML, XSL, BEA WebLogic.
Confidential, San Jose, CA
Java developer
Responsibilities:
- Involved in Full Life Cycle Development in Distributed Environment Using Java and J2EE framework.
- Responsible for developing and modifying the existing service layer based on the business requirements.
- Involved in designing & developing web-services using SOAP and WSDL. Involved in database design.
- Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle 9i.
- Used technologies like JSP, JSTL, JavaScript and Tiles for Presentation tier
- Data Operations are performed using Spring ORM wiring with Hibernate and Implemented Hibernate Template and criteria API for Querying database.
- Developed Exception handling framework and used log4J for logging.
- Developed Web Services using XML messages that use SOAP. Developed Web Services for Payment Transaction and Payment Release.
