Hadoop Developer Resume
Kansas City, MO
PROFESSIONAL SUMMARY:
- Over 8+ years of experience in the field of IT with over 2.5+ years of experience in developing Bigdata/Hadoop applications and over 5.5 years of experience in Data warehousing.
- Good knowledge ofHadoopArchitecture and various components such as HDFS, Job Tracker, Task Tracker, Data Node, Name Node and Map - Reduce components. Hands on experience in working with Flume, HDFS, sqoop, hive and fair knowledge of hbase.
- Hands on experience in spark programming and good exposure to scala.
- Implemented in setting up standards and processes forHadoopbased application design and implementation.
- Experience in installation, configuration, performance tuning and deployment of Big Data solutions.
- Expertize with the tools inHadoopEcosystem including Pig, Hive, HDFS, MapReduce, Sqoop, HBase, Impala, Yarn, Flume,Oozie and Zookeeper.
- Experience in developing NoSQL databases like HBase and Cassandra by using CRUD, Sharding, Indexing and Replication.
- Experience in developing Pig scripts and Hive Query Language.
- Managing and scheduling batch Jobs on aHadoopCluster using Oozie.
- Experience in managing and reviewingHadoopLog files.
- Used Zookeeper to provide coordination services to the cluster.
- Experienced using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Sound knowledge of LINUX as Hadoop runs on Linux.
- Good knowledge on Core Java and debugging skills.
- Familiarity with Data as Service(DaaS) and working experience on Apache Solr.
- Experience in requirement analysis, system design, development and testing of various software applications.
- Hands on experience in application development using Java, RDBMS and Linux Shell Scripting.
- Detailed understanding of Software Development Life Cycle (SDLC) .
- Strong Data warehousing experience in application development using InformaticaPowerCenter(Designer, Workflow Manager, Workflow Monitor, Repository Manager).
- Experience in creating complex mappings using various transformations and well versed in debugging an Informatica mapping.
- Strong SQL experience in coding and Query Tuning.
- Tweaked complete ETL system by collecting performance data, optimizing default session parameters and used various Performance Tuning methods on source, target, mapping and session.
TECHNICAL SKILLS:
Operating Systems: Windows 8/7/XP, Unix, Linux CentOS-6.5, Ubuntu 13.X, Mac OSX.
HadoopEco Systems: Hadoop1.x/2.x(Yarn), HDFS, Map Reduce, HBase, Hive, PIG, Zookeeper, Sqoop, Flume, Oozie, Impala,Cloudera-desktop.
Development Tools: Eclipse, MySQL, SQL Developer, TOAD, Microsoft Suite (Word, Excel, PowerPoint, Access), Open Office Suite (Editor, Calc etc..),VM Ware, InformaticaPowerCenter.
NoSQL Databases: HBase,MongoDB.
Databases: MySQL, Oracle 12c/11g/10g, SQL Server.
PROFESSIONAL EXPERIENCE:
Confidential, Kansas City, MO
Hadoop Developer
Responsibilities:
- Worked on analyzingHadoopcluster using different big data analytic tools including Flume, Pig, Hive and Map Reduce.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Involved in loading data from LINUX file system to HDFS.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experience working on processing unstructured data using Pig and Hive.
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
- Supported MapReduce Programs those are running on the cluster.
- Gained experience in managing and reviewingHadooplog files.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Developed Pig Latin scripts to extract data from the web server output files to load into HDFS.
- Extensively used Pig for data cleansing.
- Created and maintained technical documentation for launchingHadoopclusters and for executing Hive queries and Pig Scripts.
- Strong experience on Apache server configuration.
- Used NoSQL database with HBase .
- Exported the result set from HIVE to MySQL using Shell scripts.
- Actively involved in code review and bug fixing for improving the performance.
Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, LINUX, Cloudera, Big Data, Java APIs, SQL, NoSQL, Hbase, MySQL,Apache Solr.
Confidential, Bentonville, Arkansas
Hadoop Developer
Responsibilities:
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest data.
- Worked on the Data Feed to the pricing algorithm, Staged data in the specified format for the Pricing engine to consume.
- Involved in writing MapReduce jobs.
- Used Pig to do transformations, event joins, filter traffic and some pre-aggregations before storing the data onto HDFS.
- Involved in developing UDFs for both Pig and Hive in Java.
- Used Hive to analyze the partitioning and bucketing data and compute various metrics for reporting.
- Involved in developing Hive DDLs to create, alter and drop Hive tables.
- Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
- Computed various metrics using Java MapReduce to calculate metrics that define user experience, revenue etc.
- Responsible for developing data pipeline using flume, sqoop and pig to extract the data from weblogs and store in HDFS.
- Proficient work experience with NOSQL HBase database.
- Used Eclipse to build the application.
- Involved in using SQOOP for importing and exporting data into HDFS.
- Involved in processing ingested raw data using MapReduce, Apache Pig and Hive.
- Involved in developing Pig Scripts for CDC and delta record processing between newly arrived data and already existing data in HDFS.
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
- Worked on HBase by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharding features.
Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, Oracle 11g, Core Java, HDFS, Eclipse.
Confidential, Atlanta, GA
Informatica Developer
Responsibilities:
- Design of Informatica Servers and repositories.
- Design of repository architecture.
- Creation of UNIX Level folders for installing latest versions.
- Installing, testing, analyzing on the latest versions.
- Assigning properties to Integration Service and Repository Services as per the environment.
- Creating/joining domains, nodes and grids.
- Understanding new features by working with Informatica Vendor.
- Work with Informatica Vendor and participate in resolving configurational and environmental issues.
- Sharing knowledge to Developers, Administrators on the latest Informatica versions and make them understand the tasks to be performed in understanding and working on new versions.
- Design reviews of new code developed in various projects.
- Make sure that any new development should follow TSG standards.
- Helping developers in resolving any issues they face during the development of new mappings or during enhancements.
- Supporting Informatica Administrators in resolving complex tickets/break downs.
- Resolving the tickets raised with Informatica Design teams on design/configurational issues.
- Participated in implementing InformaticaPowerCenter Data validation Option (DVO) for complete data validation and testing in production environment.
- Documentation & Presentation of the project and new architectures.
- Deciding best properties of various components for maximum performance.
Environment: Oracle 11g, Unix, Informatica 9.1, 8.x, Edit plus, Putty, SQL plus, Toad, Autosys.
Confidential, Atlanta, GA
ETL Developer
Responsibilities:
- Extensively used all the features of Informatica 8.6 including Designer, Workflow manager and Repository Manager, Workflow monitor.
- Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Informatica power center.
- Implemented various workflows using transformations such as Normalizer, look up, aggregator, stored procedure and scheduled jobs for different sessions.
- Working closely with Source System teams to analyze source data and resolve source data issues.
- Interacting with other project teams for seamless integration.
- Understanding existing business model and customer requirements with the Functional team.
- Attending the technical review meetings for Data Modeling.
- Understanding Data warehouse concepts to implement helpful techniques in the project.
- Analyzing data model to document the requirements and to document the pseudo code.
- Responsible for creating Technical specification and Mapping design documents.
- Designing technical specification documents for all the developed mappings
- Ensure business rules are translated and documented from business to technical requirements for developers
- Development of ~50 Informatica mappings to build business rules to load data.
- Used Informatica power center to extract, transform and load data from multiple input sources like relational sources, legacy sources and flat files to load the target systems.
- Created mappings using Filter, Expression, Look up, Source Qualifier and Sequence Generator Transformations.
- Coordinate multiple architects responsible for development, integration, administration, and evolution of the data warehouse
- Worked on UNIX shell scripts. Developed UNIX shell scripts to run the pmcmd functionality to start and stop sessions
- Automation of job processing, establish automatic email notification to the concerned persons.
- Involved in unit testing and system integration testing and preparing Unit Test Plan (UTP) and System Test Plan (STP) documents.
- Worked on HPSD Change Requests (CR) while integration testing and UAT are in progress.
- Worked with Informatica Design and Administrator team for code migration and participated in project Go-live.
- Extensively worked with Reports Development Team to make them understand the DW to generate required reports using BO and Cognos tools.
- Transferred knowledge to maintenance team after production go-live.
Environment: Informatica 8.1, Windows XP and Oracle 9i, 11g, UNIX, Erwin 4.x, SQL Developer, BO 6.5, Cognos.
Confidential
Jr. ETL Developer
Responsibilities:
- Analyzing data issue problems and recommending solutions.
- Resolving Load Failure Issues in Production.
- Analyzing ETL mappings and making changes for correction of errors and inclusion of required new logic.
- Analyzing and modifying Perl Scripts.
- Running Test Loads in Development and Test Environments.
- Addition of new Source Systems.
Environment: Informatica 8.x, UNIX, Shell Scripting and Oracle 9i
