Hadoop Developer Resume
RI
SUMMARY:
- Over 8 plus years of Professional experience in IT Industry, involved in developing, Implementing, configuring, testing Hadoop ecosystem components.
- Hadoop Developer with 5+ years of working experience on designing and implementing complete end - to-end Hadoop Infrastructure using MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Spark, Oozie and Zookeeper.
- Expert Hands-on in Installing, Configuring, Testing Hadoop Ecosystem components
- Familiar and good exposure with Apache Spark ecosystem such as Shark, Spark Streaming using Scala and Python
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Experience in working with MapReduce programs using Hadoop for working with Big Data.
- Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java and Python.
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa
- Collecting and aggregating large amount of Log Data using Apache Flume and storing data in HDFS for further analysis.
- Job/workflow scheduling and monitoring tools like Oozie.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Experience on Hadoop clusters using major Hadoop Distributions - Cloudera (CDH4, CDH5), and Hortonworks (HDP).
- Experience in different layers of Hadoop Framework - Storage (HDFS), Analysis (Pig and Hive), Engineering (Jobs and Workflows).
- Performed data sharing using spark RDD (resilient distributed datasets)
- Good understanding of NoSQL databases and hands on experience with HBase.
- Experienced in loading dataset into Hive for ETL (Extract, Transfer and Load) operation.
- Working knowledge in SQL, PL/SQL, Stored Procedures, Functions, Packages, DB Triggers, Indexes, SQL* Loader.
- Experience in Amazon AWS cloud services (EC2, EBS, S3).
- Strong Data Warehousing, Data Marts, Data Analysis, Data Organization, Metadata and Data Modeling experience on RDBMS databases.
- Extensive knowledge in Designing, Developing and implementation of the Data marts, Data Structures using Stored Procedures, Functions, Data warehouse tables, views, Materialized Views, Indexes at Database level using PL/SQL, Oracle.
- Excellent communication skills, interpersonal skills, problem solving skills a very good team player along with a can do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.
TECHNICAL SKILLS:
Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Oozie, Zookeeper, Spark, Shark
Hadoop platforms: Cloudera, Horton Works
Languages: Java, Python, C#, Scala
Web Technologies: HTML, JavaScript, jQuery, AJAX, XML, JSON
Scripting Language: UNIX Shell Script
Data Modeling: Erwin, SAS
OLAP Tools: Micro strategy OLAP Suite, Cognos, Business Objects
RDBMS DB: MS SQL, MySQL, Oracle, SQL Server (SSIS, SSRS, SSAS)
NoSQL Technologies: Hbase, MongoDB, Cassandra
Tools & Utilities: SVN, Github, Maven
Operating Systems: Windows 7/8, Vista, Windows XP, Linux (Ubuntu, Red hat)
PROFESSIONAL EXPERIENCE:
Confidential, RI
Hadoop Developer
Responsibilities:
- Responsible for managing data from multiple sources.
- Loading the data from the different Data sources like (SQL, DB2, and Oracle) into HDFS using Sqoop and load into Hive tables.
- Developed various Big Data workflows using custom MapReduce, Pig, Hive and Sqoop.
- Assisted in exporting analysed data to relational databases using Sqoop.
- Creating Oozie workflows and coordinator jobs for recurrent triggering of Hadoop jobs such as Java map-reduce, Pig, Hive, Sqoop as well as system specific jobs (such as Java programs and shell scripts) by time (frequency) and data availability.
- Installing, Upgrading and Managing Hadoop Cluster on Hortonworks.
- Developed Spark Streaming Jobs in Scala to consume data from Kafka Topics, made transformations on data and inserted to HBase.
- Used Spark for fast and general processing engine compatible with Hadoop data.
- Used Spark to design and perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.
- Setup of HADOOP Cluster on AWS, which includes configuring different components of HADOOP.
- Analyzed large data sets by running Hive queries, and Pig scripts.
- Implemented Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, AND UNION.
- Built re-usable Hive UDF libraries which enabled various business analysts to use these UDF’s in Hive querying.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Applied MapReduce framework jobs in java for data processing by installing and configuring Hadoop, HDFS
- Importing and exporting data into HDFS and Hive using Sqoop.
- Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
- Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a MapReduce way.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Involved in creating Hive tables , and loading and analyzing data using hive queries.
- Used FLUME to export the application server logs into HDFS.
- Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
- This role also entailed working closely with Data science and Platform consulting teams to validate the architectural approach, check design constraints in the setup of enterprise level data ingest stores.
Environment: Hadoop, MapReduce, Horton Works, HDFS, Linux, Sqoop, Spark, Pig, Hive, Oozie, Flume, Pig Latin, Java, AWS, Python, Hbase, Eclipse and Windows.
Confidential, Bellevue, WA
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Installed and configured Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Developed Simple to complex Map/Reduce Jobs using Hive and Pig.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from Oracle into HDFS using Sqoop.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Used Pig UDF's to implement business logic in Hadoop.
- Implemented business logic by writing UDFs in Java and used various UDFs.
- Responsible to migrate from Hadoop to Spark frameworks, in-memory distributed computing for real time fraud detection.
- Used Spark to store data in-memory.
- Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
- Implemented batch processing of data sources using Apache Spark.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig and HiveQL.
- Developed the Pig UDF’S to pre-process the data for analysis.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Cluster co-ordination services through ZooKeeper.
- As Part of POC setup Amazon web services (AWS) to check whether Hadoop is a feasible solution or not.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Apache Sqoop, Spark, Oozie, HBase, AWS, PL/SQL, MySQL and Windows.
Confidential, Boca Raton, FL
Hadoop Developer/Administrator
Responsibilities:
- Developed big data analytic models for customer fraud transaction pattern detection models using Hive from customer transaction data. It also involved transaction sequence analysis with gaps and no gaps, network analysis between common customers for the top fraud patterns.
- Developed customer transaction event path tree extraction model using Hive from customer transaction data.
- Enhanced and optimized the customer path tree GUI viewer to incrementally load the tree data from HBase NoSQL database. Used prefuse open source java framework for the GUI.
- Developed the Pig UDF’S to pre-process the data for analysis.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig and HiveQL.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Cluster co-ordination services through ZooKeeper.
- Design and implement Map/Reduce jobs to support distributed data processing.
- Process large data sets utilizing our Hadoop cluster.
- Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment.
- Performed spark queries for data processing.
- Performed spark shell programs like scala programming.
- Designing NoSQL schemas in Hbase.
- Developing map-reduce ETL in Java/Pig.
- Implemented Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH,GENERATE, GROUP, COGROUP, ORDER, LIMIT, AND UNION.
- Extensive data validation using HIVE.
- Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
Environment: Hadoop Map Reduce, Pig Latin, Zookeeper, Oozie, Sqoop, Spark, Java, Hive, Hbase, UNIX Shell Scripting.
Confidential, Pittsburgh, PA.
Data Analyst
Responsibilities:
- Perform Daily validation of Business data reports by querying databases and rerun of missing business events before the close of Business day.
- Worked on claims data and extracted data from various sources such as flat files, Oracle and Mainframes.
- Gathered Business requirements by interacting with the business users, defined subject areas for analytical data requirements.
- Optimizing the complex queries for data retrieval from huge databases.
- Root cause analysis of data discrepancies between different business system looking at Business rules, data model and provide the analysis to development/bug fix team.
- Lead the Data Correction and validation process by using data utilities to fix the mismatches between different shared business operating systems.
- Conduct downstream analysis for different tables involved in data discrepancies and arriving at a solution to resolve the same.
- Extensive data mining of different attributes involved in business tables and providing consolidated analysis reports, resolutions on a time to time basis.
- Created complex SQL scripts to build and store logical data in snowflake database for data analyses and quality checks.
- Reviewed Stored Procedures for reports and wrote test queries against the source system (SQL Server) to match the results with the actual report against the Data mart.
- Executed number of Queries using the models on TERADATA (Teradata), created logical, physical models using ERWIN tool to provide data analysis and verification.
- Converted SAS scripts into Snowflake compatible and migrated data from Teradata to snowflake database for multiple business line.
- Wrote packages to fetch complex data from different tables in remote databases using joins, sub queries
- Validated Data to check for the proper conversion of the data. Data Cleansing to identify bad data and clean the data. Data profiling for accuracy, completeness, consistency.
- Reviewed all the systems design by assuring adherence to defined requirements.
- Met with user groups to analyze requirements and proposed changes in design and specifications.
- Flat file conversion from the data warehouse scenario.
- Created Static and Dynamic Parameters at the report level.
- Involved in Data Reconciliation Process while testing loaded data with user reports.
- Documented all custom and system modification
- Worked with offshore and other environment teams to support their activities.
- Responsible for deployment on test environments and supporting business users during User Acceptance testing (UAT).
Environment: DataStage 8.1, Oracle 10g, DB2, Sybase, TOAD, Cognos 8.0, SQL Server 2008, TSYS Mainframe, SAS PROC SQL, SQL, PL/SQL, ALM/Quality Center 11, QTP 10, UNIX, Shell Scripting, XML, XSLT.
Confidential, San Antonio, TX
Data Analyst
Responsibilities:
- Analyzed problem and solved issues with current and planned systems as they relate to the integration and management of order data.
- Data flow check with source to target mapping of the data.
- Data matrix creation for mapping the data with the business requirements
- Data profiling to cleanse the data in the data base and raise the data issues found.
- Created and reviewed mapping documents based on data requirements
- Engaged in logical and physical designs and transforms logical models into physical model through forward engineering of Erwin tool.
- Perform small enhancements (data cleansing/data quality).
- Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
- Involved in data mapping and data clean up.
- Enhance smooth transition from legacy to newer system, through change management process.
- Created Datasets using SAS proc SQL from flat file.
- Extracted, transformed and loaded the data into databases using Base SAS.
- Involved in Test case/data preparation, execution and verification of the test results.
- Reviewed PL/SQL migration scripts.
- Coded PL/SQL packages to perform Application Security and batch job scheduling.
- Created user guidance documentations.
- Created reconciliation report for validating migrated data.
Environment: UNIX, Shell Scripting, XML Files, XSD, XML, SAS PROC SQL, Oracle, Teradata, Sybase, Toad and Windows.