Hadoop Consultant Resume
Richardson, TexaS
SUMMARY
- 10+ years of experience in Data Analysis, Database Designing, Data Modeling, Data warehousing and Database development .
- 3+ years of experience Hadoop, HDFS, MapReduce, Kafka, Pig, Hive, Impala, HBase.
- Experience in data management and implementation of Big Data applications usingHadoop frameworks.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Experience in working with ClouderaHadoopdistribution and Hortonworks.
- Good experience of using Spark SQL and Scala.
- Worked on developing ETL processes to load data from multiple data sources to HDFS using SQOOP, perform structural modifications using Pig, HIVE and analyze the loaded data.
- Hands on experience in in Oracle, JDK, J2EE, XML, JDBC
- Hands on experience with Datawarehousing/ETL process.
- Used Oracle PL/SQL, Netezza, Teradata and ETL tools like Informatica 10.0 / 9.6, Big Data Edition.
- Experienced in translating data access, transformation, and movement requirements into Functional Requirements and Mapping Designs.
- Proficient with back end database programming using PL/SQL, SQL, Writing Packages, Triggers, Stored Procedures, Materialized Views, Partitioning and performance tuning.
- Experience of working on Performance Tuning and Optimization on applications by identifying bottlenecks in Informatica as well as database.
- Strong in Unix / Linux Shell Scripting.
- Skilled programmer, expert designer and accomplished team leader.
- Experienced in using agile approaches including Extreme Programming, Test-Driven Development and Agile Scrum.
- Excellent communication skills have helped me work in large teams, mentor peers, gathering user requirement.
TECHNICAL SKILLS
Hadoop Ecosystem: Hadoop - HDFS, Hive, Impala, Kafka, Sqoop, Pig, HBase, Oozie, YARN, Zookeeper, Scala, Spark SQL, Phoenix, Nifi
Programming/Scripting: Java, Javascript, C, SQL, PLSQL, Pro*C, Shell Scripting, Scala, Python
BI/ETL: Informatica BigData Edition, Informatica Power Center 9.6PowerCenter Real Time Edition B2B Data Transformation Studio
Databases: Oracle 11g, DB2, SQL Server2008/2005, Netezza, Teradata
Tools: IntelliJ, Eclipse, SQL Developer, Aginity Workbench, Management studioTOAD, DT studio, DB Visualizer
Methodologies: Agile, Scrum, Waterfall
PROFESSIONAL EXPERIENCE
Confidential, Richardson, Texas
Hadoop Consultant
Responsibilities:
- Work closely with the business and analytics team in gathering the system requirements.
- Provide design recommendations and leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Used Kafka for real time streaming along with Apache Nifi to configure the processors.
- Used HBase to persist the steaming data received from Kafka.
- Used Apache Phoenix to execute queries on HBase.
- Created Hive tables for batch data loading and analyzing data using hive queries
- Created complex Hive queries to help business users to analyze and spot emerging trends by comparing fresh data with historical metrics.
- Managed and reviewed Hadoop log files.
- Also involved in doing Analytics on claims and reject claims processing in the DataLake.
- Tested raw data and executed performance scripts.
- Actively involved in code review and bug fixing for improving the performance.
Environment: Unix Shell Scripts, HDFS, Hive, Apache Kafka, Apache Nifi, Apache Phoenix, Teradata
Confidential, Irving,Texas
ETL / Hadoop Consultant
Responsibilities:
- Work closely with the business and analytics team in gathering the system requirements.
- Provided design recommendations and leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Used Sqoop to bulk data load from DB2 to HDFS for the intial load.
- Used Spark RDD, Dataframes and Datasets to perform validations and aggregations on the provider and claims files.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Created Hive tables for data loading and analyzing data using hive queries
- Created complex Hive queries to help business users to analyze and spot emerging trends by comparing fresh data with historical metrics.
- Developed Scala scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
- Actively involved in code review and bug fixing for improving the performance.
Environment: DB2, Spark, Scala, Unix Shell Scripts, HDFS, Hive, Sqoop, Eclipse
Confidential, Round Rock, Texas
ETL / Hadoop Consultant
Responsibilities:
- Working with cross functional Business and IT teams of Business Analysts, Data analysts, Data Modelers, Solution Architects, DBA's, Developers and Project Managers.
- Creating data mappings from the Enterprise Datawarehouse System, Customer MDM to the Dashboard related data.
- Used Kafka for the steaming orders related data into HDFS.
- Used Sqoop scripts to moved the required data from Teradata into HDFS.
- Used Pig scripts to transform the data to 360 JSON Format, where the data was ingested into AllSight.
- Created Hive queries to compare the raw data with EDW reference tables and performing aggregates.
- The composite view was indexed in Elastic Search.
- Developed Complex Hive queries for the analysts.
- Cluster co-ordination services through ZooKeeper.
- End to End testing of the entire system.
- Conducted demo to the end users.
Environment: Oracle, Teradata, Unix Shell Scripts, Java, HDFS, Pig, Hive, Sqoop, HBase, Kafka, Allsight
Confidential
ETL/Hadoop Consultant
Responsibilities:
- Working with cross functional Business and IT teams of Business Analysts, Data analysts, Data Modelers, Solution Architects, DBA's, Developers and Project Managers.
- Involved in the Big Data implementation for Mercury Insurance
- Migrated from Informatica 9.6 to Informatica Big Data Edition.
- Migrated the data from Oracle, Netezza into HDFS using Sqoop and Pig scripts.
- Used UDFs for specific transformation logic.
- Experienced on loading and transforming of large sets of structured, and semi structured data from Oracle through Sqoop and placed in HDFS for further processing.
- Monitored Hadoop scripts which take the input from HDFS and load the data into Hive.
- Designed and developed Hive queries using partitions/buckets for data analysis.
- Used Informatica BDE for Data Profiling.
- Implemented test scripts to support test driven development and continuous integration
Environment: Netezza 6, UNIX Shell Programming, HDFS, Pig, Python, Hive, Impala, Sqoop, HBase, Informatica PowerCenter 9.6/ Big Data Edition.
Confidential, Tampa Florida
ETL Consultant
Responsibilities:
- Work with cross functional Business and IT teams of Business Analysts, Data analysts, Data Modelers, Solution Architects, DBA's, Developers and Project Managers.
- Converted functional requirements into technical specification, mapping document,Interface Control Documents.
- Assisted the team in the development of design standards and codes for effective ETL procedure development and implementation.
- Created complex mappings using Connected / Unconnected Lookups, Normalizer, Union transformations
- Enhanced performance for Informatica session for sources such as large data files by using partitions, increasing block size, data cache size and target based commit interval, push down optimizations.
- Designed workflows with decision, assignment task, event wait, and event raise tasks.
- Tested the ETL components.
- Mentored the offshore team .
- Implementation of the Datamart into production and provide support.
Environment: Oracle 11g, PL/SQL, UNIX Shell Programming, Java, Informatica PowerCenter 9.1, MS-Visio, Control-M
Confidential, Malta, New York
ETL Consultant
Responsibilities:
- Analyzed legacy application data and determined conversion rules for migration to data warehouse
- Performed data analysis, data mapping including validating data quality and data consistency to arrive at gap-analysis
- Developed detailed programming specifications for ETL, data migration, and data scrubbing processes.
- Ensured proper data movement is available to achieve the business objectives for application development, analytical reporting
- Used Informatica Designer to create several mappings, transformations using source as flat files and databases to move data to a target Data Warehouse.
- Performance Tuning.
- Post implementation knowledge transfer to client and peers
Environment: Teradata, Informatica PowerCenter 8.6.1, Java, UNIX Shell Programming, ERStudio, Control-M
Confidential, New York
ETL Onsite Lead
Responsibilities:
- Gathering and documenting business requirements into Technical Specifications.
- Analyzed the source system to understand the architecture to get deeper understanding of business rules and data integration checks.
- Designing the fact and dimension tables for the Star Schema using ER Studio.
- Developed complex database structures for data validations and coded complex PL/SQL procedures.
- Designed and constructed complex MViews, Partitions, Stored Procedures and Packages for implementing the ETL process
- Developed mappings for ETL process.
- Using shell scripting to automate the batch jobs to reduce manual intervention.
- Tuning and Performance Optimization to considerably reduce the total time for ETL processes and reports.
- Testing the entire ETL process .
- Support and maintain the DataWarehouse.
- Co-ordination with Application Teams, Business Analysts, System Administrators for day-to-day maintenance and implementations.
Environment: Oracle 11g, PL/SQL, Java, UNIX Shell Programming, Informatica 8.6,ERStudio
Confidential, San Francisco CA
ETL Developer
Responsibilities:
- Gathering and documenting business requirements into Technical Specifications.
- Analyze the central datawarehouse and determined conversion rules for migration .
- Developed detailed programming specifications for ETL, data migration, and data scrubbing processes.
- Coding and testing the ETL using packages,shell scripts.
- Performance optimization of Oracle PL/SQL .
- Co-ordinated with Application Teams, Business Analysts, System Administrators for day-to-day maintenance and implementations.
Environment: Oracle 10g, PL/SQL, Linux, Java, ERStudio, MS SQL Server 2000, UNIX Shell Programming
Confidential
Software Engineer
Responsibilities:
- Preparing the detailed design document from functional spec provided by the client.
- Design, document and develop complex solutions from requirements
- Automated the process of loading of data warehouse feeds.
- Worked as Team Lead for a team of 7 members.
- Worked with large database containing millions rows involving high net worth value customers based in US.
- Automated processes such as daily loading data of data using SQL Loader from other systems
- Effectively reduced the overall processing time by implementing packages, triggers, IOT, Partitioned Tables, Indices, materialized Views.
- Extensively used TKProf, SQL Trace to constantly monitor the system and fine tuned the system for better performance removing bottlenecks during processing.
Environment: Oracle 9i, PL/SQL, Forms 9i, Java, JSP, Reports, SQL* Loader, TOAD, ERStudio
Confidential
Software Engineer
Responsibilities:
- Lead a team of 10 members in the capacity of ‘Team Lead ‘.
- Prepared the detailed design document from functional spec provided by the client.
- Designed the logical and physical database in keeping with the Business Rules.
- Designed and Constructed complex Views, Triggers, Stored Procedures and Packages
- Coordinated the Review Meeting.
- Briefed the Management with the Project Status Report every week.
- Performance tuning and Optimization of large database
Environment: Oracle 9i, Developer 2000, Forms 5.0, PL/SQL, Java, JSP, UNIX shell scripts, TOAD, Pro*C