Sr.big Data Engineer Resume
Mountain View, CA
SUMMARY
- Over 8 years of experience in IT industry.
- Over 5 years of experience in development of Big Data projects using Hadoop, Hive, Pig, Hbase and other open source tools/technologies
- Over 1 year of experience in web application development using JAVA, J2EE technologies
- Move data from different sources into Hadoop and define detailed technical processes for data acquisition
- Over 2 years of extensive ETL experience using Datastage (version 8.5/ 8.1/8.0/7.5/7. 0 ), Designing and developing jobs using Datastage Designer, Datastage Manager, Datastage Director and DS Debugger.
- Solid experience in writing MapReduce jobs in Java and Pig.
- Experience in installing, configuring and administrating Hadoop cluster of major Hadoop distributions.
- Excellent experience in installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, Hbase, Hive, Pig, Zoo Keeper, HCatalog, Oozie, Flume and Horton Works.
- Hands - on experience with “Productionalizing” Hadoop applications (e.g. configuration management, administrator, monitoring, debugging and performance tuning.)
- Expertise in several J2EE technologies like JSP, Servlets, JSF, Hibernate, Spring, Web Services, JDBC, XML
- Expertise in designing and developing J2EE compliant systems using IDE tools like Eclipse, Websphere Studio Application Developer (WSAD).
- Expert in using J2EE complaint application servers Apache Tomcat, IBM Web Sphere.
- Implemented Unit Testing using JUNIT testing during the projects.
- Extensive experience in Designing, Developing and maintaining applications in data warehouse for Healthcare, Telecommunication, Banking and Insurance.
- Extensive experience in working with Parallel jobs, Troubleshooting and Performance tuning.
- Profound knowledge of the principle of DW using Fact Tables, Dimension Tables, Star schema modeling and Snowflake Schema modeling.
- Strong skills in Datastage Administrator in UNIX & LINUX environments, Report creation using OLAP data source and having knowledge in OLAP universe.
- Solid experience in coding using SQL, SQL*plus, PL/SQL stored procedure/functions, triggers and packages.
- Expert in Data Warehousing techniques for Data Cleansing, Slowly Changing Dimension Phenomenon (SCD), Surrogate key assignment and Change Data Capture(CDC).
- Excellent knowledge of different Databases like Oracle 10g/11g, MS SQL server, DB2 and Teradata.
- Developed interfaces using UNIX Shell Scripts to automate the process of loading, pushing and pulling data from and to different servers. .
- Vastly worked on projects involving multiple operating systems (Windows NT/2000/9x, UNIX AIX).
- Excellent experience in extracting source data from Various Databases, Sequential files, Flat files, transforming and loading it into the data warehouse.
- Technical expertise in all phases of Software Development Life Cycle.
- Excellent experience in writing complex queries to facilitate the supply of data to other teams.
- Strong technical and analytical skills. Problem solving, communication and documentation skills.
TECHNICAL SKILLS
Hadoop: HDFS, Hive, Pig, Sqoop, Flume, Mahout, Zoo Keeper, Hbase, HCatalog, Hue, Impala, Tez and Hawq.
Amazon Web Services (AWS): EC2, S3, EMR
Tools: IBM InfoSphere Datastage Enterprise Edition 8.1/8.0/7.5/7. x, Toad 9.6/8.6, Erwin 7.0, SQL*Loader, Autosys r11.0, Tivoli 5.1, Business Object Enterprise XIR 3.1, IBM Change Data Capture 6.5, Balance Optimizer.
Language: Core Java, JSP, Servlet, JDBC, C/C++, Perl, Korn shell, SQL, PL/SQL, UNIX shell, Scripting and Pig Latin.
Web Technologies: HTML, Ajax and JavaScript
XML Technologies: XML and XSL
Unit Testing: JUnit
Application Servers: Tomcat, Websphere and Weblogic
IDE: Eclipse and Oracle JDeveloper 10g
Operating Systems: Windows/2000/2003/NT/XP, UNIX, IBM AIX
Databases: Oracle 8i/9i/10g/11g, MS SQL Server 7.0/2000/2003, DB2 UDB, SQL Server 2008, Teradata 13.10/12.0.
Other Tools: MS Visio, ERwin, MS Office
PROFESSIONAL EXPERIENCE
Confidential, Mountain View, CA
Sr.Big Data Engineer
Responsibilities:
- Involved to provide architect, design, develop and testing services.
- Loaded data from Oracle, MS SQL SERVER, MYSQL, Flat File database into HDFS, HIVE, NETEZZA and VERTICA.
- Involved in writing MapReduce jobs.
- Used Automation script to import Informatica mapping and workflow.
- Also used Informatica Developer for Incremental load and Python framework for data loading.
- Working closely with Operations/IT to assist with requirements and design of Hadoop clusters to handle very large scale.
- Responsible for managing and scheduling jobs on Hadoop Cluster, using Tidal.
- Experience in troubleshooting Map Reduce job failures and issues with Hive and Netezza
- 24X7 production support for weekly schedule with Ops team.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Analyzed Business Requirement Document written in JIRA and participated in peer code reviews.
Confidential, Houston, TX
Big Data Developer
Responsibilities:
- Involved to provide architect, design, develop and testing services.
- Loaded data from Oracle database into HDFS, HIVE and HBASE.
- Involved in writing MapReduce jobs.
- Loaded the created HFiles into HBase for faster access of large customer base without taking Performance hit.
- Used apache Maven for project build.
- Involved in loading data from UNIX file system to HDFS.
- Creating Hive tables, loading with data and writing hive queries which will run internally in maps
- Performed unit testing of MapReduce jobs on cluster using MRUnit.
- Used Oozie scheduler system to automate the pipeline workflow.
- Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews, test development, test automation.
- Involved in story-driven Agile development methodology and actively participated in daily scrum meetings.
- Analyzed Business Requirement Document written in JIRA and participated in peer code reviews in Crucible.
- Loading and transforming large sets of structured, semi structured and unstructured data
- Managing cluster coordination services through Zoo Keeper.
Confidential, Bloomington, IL
Big Data Developer
Responsibilities:
- Involved to provide architect, design, develop and testing services to Confidential for sub-system components within the data aggregation infrastructure associated with project Knight Hawk.
- Worked an Integrated Customer Platform (ICP) project at Confidential that is aligned to the Drive Safe and Save program.
- The project, code-named project "Knight Hawk, is focused on development of a smartphone-based telematics solution for driver behavior monitoring.
- Architect and implemented Hadoop batch solution for risk rating for "pay as you drive/pay as how you drive" model.
- Developed Java Map/Reduce job for Trip Calibration, Trip summarization and data filtering.
- Developed Hive UDFs for rating aggregation.
- Importing and exporting data into HDFS and Hive using Sqoop
- Experienced in defining job flows
- Experienced in managing and reviewing Hadoop log files
- Developed Hbase java client API for CRUD Operations.
- Developed the Java client API for node provisioning, load balancing and artifact deployment.
- Responsible to manage data coming from different sources
- Used Oozie tool for job scheduling.
Confidential, La Palma, CA
ETL Developer /Data Engineer
Responsibilities:
- Interacted with End user community to understand the business requirements.
- Prepared the required application design documents based on functionality required.
- Designed the ETL processes using DataStage to load data from Oracle, DB2 UDB to Flat Files (Fixed Width) and Flat Files to staging Teradata database and from staging to the target Teradata Data Warehouse database.
- Implemented dimensional model(logical and physical)data model in existing architecture using Erwin
- Used DataStage Parallel Extender stages namely Sequential, Lookup, Change Capture, Funnel, Transformer Stage, Column Export stage and Row Generator stages in accomplishing the ETL Coding.
- Developed Teradata SQL Queries to Load data from Teradata Staging to Enterprise Data warehouse
- Extensively worked on Error Handling and Delete Handling.
- Designed and developed the jobs for extracting, transforming, integrating, and loading data using DataStage Designer.
- Developed job sequencer with proper job dependencies, job control stages, triggers.
- Developed DataStage job sequences used the User Activity Variables, Job Activity; Execute Command, Loop Activity, and Terminate.
- Used the DataStage Director and its run-time engine to monitor the running jobs.
- Involved in performance tuning and optimization of DataStage mappings using features like partitions and data/index cache to manage very large volume of data.
- Importing and exporting data into HDFS and Hive Using Sqoop.
- Experienced in defining job flows.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from Unix file system to HDFS.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce.
- Involved in Unit testing, System testing to check whether the data is loading into target, which was extracted from different source systems according to the user requirements.
- Extracted the data from the data warehouse using Business Object for reporting purposes.