Hadoop Admin/developer Resume
La, CaliforniA
SUMMARY
- Accomplished Senior IT Developer with over 9 years of experience in Analysis, Development, and Implementation of business applications using ETL and ELT, BI tools, Hadoop, Informatica Power Center, Client/Server and Web applications on UNIX and Windows platforms.
- Strong experience in Metadata Management using Hadoop User Experience (HUE)
- Expertise with the tools in Hadoop Ecosystem includingPig, Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie,andZookeeper.
- Experience in Using Pig as ETL tool to perform testing on transformations, event joins and some pre - aggregations before storing the data onto HDFS
- Good experience with MapReduce (MR), Hive, Pig, HBase, Sqoop, Spark, Scala for data extraction, processing, storage and analysis
- Experience writing Hive QL queries and Pig Latin scripts for ETL
- Expertise in processing and analyzing archived and real-time data using Core Spark,Spark-SQL and Spark Streaming.
- Data Ingestion in to Hadoop (HDFS): Ingested data into Hadoop from various data sources like Oracle, MySQL using Sqoop tool. Created Sqoop job with incremental load to populate Hive External tables. Involved in importing the real-time data to Hadoop using Kafka and also worked on Flume. Exported the analyzed data to the relational databases using SQOOP for visualization and to generate reports for the BI team
- Exposure to Data Lake Implementation using Apache Spark and developed Data pipe lines and applied business logics using Spark and used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
- Experienced Schedule Recurring Hadoop Jobs with Apache
- Good knowledge of Design Patterns like Singleton, DAO, Factory, MVC etc.
- Extensive experience with EnterpriseJavaBeans (EJB) - Session, Entity and Message Driven Beans.
- Designed and developed Complex mappings from various Transformations like re-usable transformations, and Mappings/Mapplets, Unconnected / Connected lookups, Router, Filter, Expression, Aggregator, Joiner, Update Strategy and more, mappings using Stored Procedure’s, Normalizer, XML, External Procedure
- Created custom data models to accommodate businessmetadataincluding KPIs, Metrics and Goals
- Experienced in designing and implementing Data Mart / Data Warehouse applications using/INFORMATICA 9.5.1 /9.x/ Informatica Cloud express/PowerCenter/8.x/7.x/6.x (Designer, Workflow manager, Workflow monitor and Repository Manager).
- Experience in Performance tuning of targets, sources, mappings and sessions.
- Experience in using Informatica version 9.01 advanced edition with add-ons for data profiling (Informatica Data Explorer) and metadata exchange. Collected metadata in Informatica metadata manager repository. Set up and lead usage of new Informatica features in the new Analyst and Developer Informatica client’s tools.
- Preparing the High level Design Specifications for ETL Coding and mapping standards.
- Extensive experience in OBIEErepository (Physical, Business Model and Mapping and Presentation layers) for both Stand-Alone and Integrated Siebel Analytics implementations.
- Experience with business object data integration, integration of various data sources like Oracle, SQL server and MS access, flat files, Teradata, and XML, EBCDIC, EPIC files. Experience in Data Flux.
- Strong expertise in Relational data base systems like Oracle 8i/9i/10g, SQL Server 2000/2005, MS Access, design and database development using SQL, PL/SQL, SQL PLUS, TOAD,SQL LOADER.
- Two years’ experience installing, configuring, testing Hadoop ecosystem components.
- Experience with UNIX commands, VI editor and writing Shell scripts.
- Expertise Informatica B2B Data Transformation, DT and DX, pre-built transformations for most versions of industry Confidential messages including the following EDI standards and derivatives.
- Experience in Ralph Kimball Methodology, Logical Modeling, Physical Modeling, Dimensional Data Modeling, Star Schema, Snowflake Schema, FACT tables, Dimension tables change or transformed of unstructured data to well formed XML.
- Expertise in OLTP/OLAP System study, developing Database schemas like star schema, snow flake schema Dimensional Data Modeling used in relational, dimensional modeling and slowly changing dimensions (SCD’s).
- Worked in Agile team setting to develop features for E-commerce sites with new technology platform.
- Experience in working with scheduling tools like Autosys and Control M. Experience with Informatica Web services / message queue.
- Experience with high volume datasets from various sources like Oracle, Text Files, and Netezza Relational Tables and xml targets
- Knowledge of reporting tool like OBIEE, Cognos, Microstrategy and Business Objects.
- Experience on ILM data archiving tool to retrieve the legacy application.
- Good experience on Webservices and experienced with Maplets in DT is ideal
- HIPPA certified through HP.
- Expertise in writing Oracle PL/SQL Stored Procedures/Functions/Packages and cursors, Triggers.
- SQL Tuning and creation of indexes for faster database access, better query performance in Informatica by creating Explain Plan, SQL hints for query and indexing the required columns.
- Develop, execute ETL & ELT test plans, Document ETL processing and generate required metadata.
- Develop a transformation by marking the relevant data directly on a sample of the data source, and mapping that data to a chosen XML schema.
- Experience in leading the teams for Designed and developed ETL process to load and maintain EDW.
- Ability to achieve project goals within project constraints such as scope, timing and budget.Allocating work and managing theresource planning.Day to day monitoring of thedeliverableand meeting the expectations of the business clients.
- Expertise in writing DDL, DML, DCL, and TCL commands .
- Outstanding communication and interpersonal skills, ability to learn quickly, good analytical reasoning and high compliance to new technologies and tools.
TECHNICAL SKILLS
- HADOOP
- Spark
- Spark Streaming
- Kafka
- HBase
- Cassandra
- HDFS
- MapReduce
- Hive
- Pig
- Flume
- Sqoop
- Scala
- C
- C++
- Java
- Python
- Informatica 10.1/9.5.1/9.1/ Informatica cloud services 9.x/8.x/7.x/6.x (Power Centre
- Power Mart
- PowerExchange)/ Informatica Identity Resolution (IIR)
- ILMCARS
- Pervasive
- NettazaOracle 8i/9i/10g/11g/ Netezza/DB2
- Erwin 4.5/4.0/3.x
- SQL
- SQL Server
- Sql 2005 and 2008 PL/SQL
- Hive
- PIG
- SQL*Plus and TOAD
- OBIEE 10.1.3.x
- Siebel Analytics7.x
- Oracle BI apps
- UNIX/LINUX/DOS Scripting
- PL/SQL
PROFESSIONAL EXPERIENCE
Confidential, LA, California
Hadoop Admin/Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Hadoop installation, Configuration of multiple nodes using Clouderplatform.
- Installed and configured a Hortonworks HDP 2.2 using Ambari and manually through command line. Cluster maintenance as well as creation and removal of nodes using tools like Ambari, Cloudera Manager Enterprise and other tools.
- Handling the installation and configuration of a Hadoop cluster.
- Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Involved in developer activities of installation and configuring Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Involved in Cluster Level Security, Security of perimeter (Authentication- ClouderaManager, Active directory and Kerberos) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data ( Data Encryption at Rest)
- Handling the data exchange between HDFS and different web sources using Flume and Sqoop.
- Monitoring the data streaming between web sources and HDFS and functioning through monitoring tools.
- Close monitoring and analysis of the MapReduce job executions on cluster at task level.
- Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
- Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning Scripting Hadoop package installation and configuration to support fully-automated deployments.
- Day-to-day operational support of our Cloudera Hadoop clusters in lab and production, at multi-petabyte scale.
- Changes to the configuration properties of the cluster based on volume of the data being processed and performed by the cluster.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with Spark installed and configured.
- Setting up automated processes to analyze the system and Hadoop log files for predefined errors and send alerts to appropriate groups and an Excellent working knowledge on SQL with databases.
- Commissioning and De-commissioning of data nodes from cluster in case of problems.
- Setting up automated processes to archive/clean the unwanted data on the cluster, in particular on Name Node and Secondary Name node.
- Set up and managing HA Name Node to avoid single point of failures in large clusters.
- Discussions with other technical teams on regular basis regarding upgrades, process changes, any special processing and feedback.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Administering and Maintaining Cloudera Hadoop Clusters Provision physical Linux systems, patch, and maintain them.
Environment: Hadoop, Confluent Kafka, Hortonworks HDF, HDP, NIFI, Linux, Splunk, Yarn, Clouder 5.13, Spark, Tableau.
Confidential, Seymour, IN
Hadoop Developer
Responsibilities:
- Installed and configured MapReduce, SCALA, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
- Experience in writing Sqoop scripts to import and export data from RDBMS into HDFS, HIVE and handled incremental loading on the customer and transaction information data dynamically.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, CSV formats.
- Worked on setting up Kafka for streaming data and monitoring for the Kafka Cluster.
- Involved in the creation and design of Kafka producers for data ingest pipelines and consumed the data using Spark streaming for Spark Data frames and ingested the data into the Hive Tables.
- Setup Kafka producer and consumers, Spark andHadoopMapReduce jobs.
- Implemented Kerberos Security Authentication protocol for existing cluster
- Developed Scala code using Spark API and Spark-SQL/Streaming for faster testing and processing of data.
- Used Spark API overHadoopYARN to perform analytics on data in Hive.
- Worked on improving the performance and optimization of the existing Spark application inHadoopusing Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
- Experienced with real time processing of data sources using Apache Spark.
- Imported the data from different sources like HDFS/HBase into Spark RDD.
- Involved in converting Map Reduce programs into Spark transformations using Spark RDD's on Scala.
- Experienced with NoSQL databases like HBase, MongoDB and Cassandra.
- Used Oozie workflow engine to manage interdependentHadoopjobs and to automate several types ofHadoopjobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs.
- Created Tableau dashboards by connecting with the Hive tables.
- Integrated maven with GIT to manage and deploy project related tags.
- Integrated GIT into Jenkins to automate the code check-out process.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Supported code/design analysis, strategy development and project planning.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Assisted with data capacity planning and node forecasting.
- Ensure smooth functioning of our development, QA and Worked with team members in the defect resolution process
- Collaborate with team members to resolve issues to assure the successful delivery of high quality ETL AND ELT and BI Code.
Environment: Hadoop, HDFS, Spark, MapReduce, Hive, Sqoop, Kafka, HBase, Oozie, Scala, GIT, Jenkins, Java, SQL Scripting and Linux Shell Scripting, Cloudera.
Confidential, IL
Sr Informatica Developer
Responsibilities:
- Driving the project starting from capturing business requirements, project development, acceptance testing and final implementation.
- Effectively lead the different teams with size of 8-12 resources to achieve project goals within project constraints such as scope, timing and budget.Allocating work and managing theresource planning.Day to day monitoring of thedeliverableand meeting the expectations of the business clients.
- Implementing methods to validate that data supplied by external sources were loaded correctly into the awards database.
- Validating the data were loaded correctly, external data will be examined for signals that indicate the source might be incorrect.
- Rewriting the database that maintains information about agency awards programs, to enable the database andto support calculations using the new book of business concept. It will also simplify the process of adding new metrics and targets.
- Interacting with business users effectively to understand precise and accurate requirements.
- Finalize resource requirements and building offshore ETL and ELT and reporting teams based on signed off business requirements.
- Worked on the Domains and Communities for Business Glossary and Information Governance Catalog.
- Prepared workflows for scheduling the load of data into Hive using IBIS Connections.
- Enhanced the tool DataModelValidator for the DMCOE team using Core Java and JDBC.
- Worked on a robust automated framework in data lake for metadata management that integrates various metadata sources, consolidates and updates podium with latest and high-quality metadata using the big data technologies like Hive and Impala.
- Ingested data from variety of data sources like Teradata, DB2, Oracle, SQL server and PostgreSQL sources to data lake using podium and solved various data transformation and interpretation issues during the process using Sqoop, GIT and udeploy.
- Responsible for data governance processes and policies solutions using Data Preparation tools and technologies like Podium and Hadoop.
- Effectively coordinating with offshore teams, Client BI team and business users to deliver the project on time.
- Working closely with the Client BI team for resolving technical issues, getting data models, ETL architectures and high level designs reviewed and any sort of technical support required.
- Effectively coordinating with business users to complete acceptance testing within timelines and obtain sign offs.
- Working on data warehousing concepts/design with good understanding of the ETL and reporting processes.
- Participate in cross-application integration testing and system testing testing and worked with team members in the defect resolution process.
- Ensure that all timelines of loading/validating data are met with comparing host (mainframes) files.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- This Project involved the migration of legacy Oracle applications to the SAP R/3 implementation.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Hadoop log files.
- Ensure smooth functioning of our development, QA and Worked with team members in the defect resolution process.
- Collaborate with team members to resolve issues to assure the successful delivery of high quality ETL and BI Code.
Environment: Informatica 10/9.X, Informatica cloud services, Power Center 9.5.1/8.6, Power Exchange, UNIX, ULTRAEDIT, WINSQL, WINSCP, MS ACCESS, Windows NT, Oracle 9i/10g, DB2, MAINFRAMES, Erwin 4.0, ILM-Data Archiving 6.1, SQL, PL/SQL, T SQL, TOAD, Hadoop, Oozie, Pig,Hive, Talend, CARS, XML, Testlink, HP SERVICE MANAGER, LOTUS NOTES.
Confidential, MI
Sr. Informatica Developer/project coordinator
Responsibilities:
- Involved in gathering and reviewing business requirements. Involved in designing the specifications, Design Documents, Data modeling and design of data warehouse.
- Migration of mappings and transformation from one environment to another.
- Work with dev team lead and business partners to clarify ETL requirements and business rules.
- Write INFA mappings using PC, PE 8.x/PC 9.1.
- Design, develop, implement, and assist and in validating ETL processes.
- Create and execute unit test plans based on system and validation requirements.
- Troubleshoot, optimize, and tune ETL processes.
- Document all ETL related work per Confidential DF methodology.
- Maintain existing code and fix bugs whenever needed.
- Ensure that all timelines of loading/validating data are met.
- Ensure smooth functioning of our development, QA, production, and staging environments.
Environment: Informatica Power Center 9.1/8.6.1, DTD files, Oracle 10g/11g, winsql, postgresql, PL/SQL.
Confidential, CT
Lead/Sr. Informatica Developer
Responsibilities:
- Involved in gathering and reviewing business requirements. Involved in designing the specifications, Design Documents, Data modeling and design of data warehouse.
- Responsible for definition, development and testing of processes/programs necessary to extract data from operational database and syndicated File System IMS and, SWIFT, HL 7 and EDI-X12 flat files.
- Transform and cleanse data, and Load it into data warehouse using Informatica Power center.
- Created the repository manager, users, user groups and their access profiles.
- Created Logical & Physical models and used ERwin for data modeling and Dimensional Data Modeling.
- Created complex mappings in Power Center Designer using Expression, Filter, Sequence Generator, Update Strategy, Joiner and Stored procedure transformations.
- Involved in the development of Informatica mappings and also performed tuning for better performance.
- Setting up sessions to schedule the loads at required frequency using Power Center, Workflow manager, PMCMD and also using scheduling tools.
- Automated the entire processes using UNIX shell scripts.
- Implemented deployment procedures and started the INFA Services thorough Unix/Putty and done the migration from old version to new version, scheduled the jobs through Informatica.
Environment: Informatica 9.X/ Power Center 9.0/8.6, Power Exchange, informatic MFT, sHL7 3.x/2.4, HIPAA, Epic Systems,UNIX, Windows NT, Oracle 9i/10g,, DB2.
Confidential, NYC
Sr. Informatica Developer
Responsibilities:
- Developed complex Informatica mappings to load the data from various sources using different transformations like source qualifier, connected and unconnected look up, expression, aggregator, joiner, filter, normalizer, rank and router transformations.
- Worked Informatica power center tools like source analyzer, mapping designer, mapplet and transformations.
- Responsible for Performance Tuning at the Mapping Level and Session level.
- Worked with SQL Override in the Source Qualifier and Lookup transformation.
- Extensively worked with both Connected and Unconnected Lookup Transformations.
Environment: Informatica Power Centre/9.X 8.6.1/8.1.3 , Informatica Identity Resolution (IIR), Teradata V2R12/V2R6, Oracle 10g, SQL Assistant and Administrator, IMS Data, XML, LINUX, UNIX Shell Scripting, Rational Clear Quest, Agile methodology, Windows, Autosys