Hadoop Developer Resume
Plainsboro, NJ
SUMMARY
- Having 3.5 years of experience in Bigdata as Hadoop Developer
- Good experience in Parallel Processing tool Spark with Python/Scala
- Good experience working with tools like HIVE, PIG, Python, Oozie and Sqoop, HUE
- Worked using Spark - SQL utility for SQL query execution on Hadoop environment
- Working with Data frames in Spark to Process complex queries and Analyze Data
- Extensively worked on Hive data warehouse and implemented ETL
- Knowledge on Machine Learning tool R Programming Language
- Experience on Python to Implement Data warehouse solutions, Data Ingestion
- Developed ETL Solution on HIVE for Teradata Offload
- Worked on EXCEL-Hive Business Intelligence Integration
- Developed UDFs in core Java supporting for PIG and HIVE data warehouse
- Excellent understanding of Hadoop architecture and its components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager and Spark programming paradigm
- Hands-on-experience on major components in Hadoop Ecosystem such as Hive, PIG, MapReduce, Sqoop, Hbase, Hbase-Hive Integration and good knowledge of Mapper/Reducer/HDFS Framework and YARN
- Exposure to Cloudera development environment and management using HUE.
- Experience in data management and implementation of Big Data applications using Hadoop frameworks
- Knowledge of manipulating/analyzing large datasets/stored data and finding patterns and insights out of it based on the requirement
- Experience in importing and exporting the different formats of data into HDFS, HBASE from different RDBMS databases and vice versa
- Experience in extending Hive and Pig core functionality by writing custom UDFs using Java
- Experience in importing and exporting data from RDBMS to HDFS and vice versa using Sqoop, python
- Hands on experience in setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop jobs
- Excellent technical, logical, code debugging and problem solving capabilities and ability to watch the future environment, the competitor and customers probable activities carefully
- Good team player with strong analytical and communication skills
- 2+ Years of experience in Business Intelligence reporting tool Tableau
- Hands on Experience on creating various Views and Dashboards in Tableau
- Have Good Knowledge on Working with Joins and Custom SQL
- Experience in creating Aggregates, Hierarchies, Formatting, Sorting and Grouping
- Experience in working on Filters, Quick Filters, Context Filters and Parameters
- Good Expertise on working with Multiple Measures, Blended axis, Dual axis
- Have good knowledge on Working with String, Date, Table calculations and calculated measures
- Good Knowledge on creating Actions like Filter, Highlight and URL
- Expertise in Working on Maps and good knowledge on Custom Geo coding
TECHNICAL SKILLS
BI Tools: Apache Spark, Tableau 8.X, Hadoop, Apache PIG, HIVEBase, Sqoop, Python, Oozie
Databases: Oracle 10g, MS SQL Server 2005, Teradata
Languages: SQL, Core Java, Python, UNIX Shel
lVersioning: Tortoise SVN, CVS
PROFESSIONAL EXPERIENCE
Confidential, Plainsboro, NJ
Hadoop Developer
Responsibilities
- Python scripts to automate data ingestion process
- Implemented auto ingestion on different input data formats and sources
- Python scripts to clean data from input data files
- Python scripts to updated ingestion in respected HDFS locations
- Implemented Oozie jobs on Python auto ingestion process
- Using Hive ETL normalized data and applied business logic
- Developed the PIG, Hive scripts & UDF's to study the User sessions and member behavior
- Wrote Pig Scripts to perform transformation procedures on the data in HDFS
- Processed HDFS data and created external tables using Hive, in order to analyze Products sold per day, Locations and different Vendors
- Extended PIG & Hive framework through the use of custom UDF to meet the requirements
- Developed multiple PIG scripts for clustering and grouping user sessions
- Developed Oozie workflows to process the Hadoop jobs
Environment: Hadoop, Python,2.7, HDFS, Hive 0.12.1, Java, Hadoop distribution of Cloudera, Pig 0.11.1,, Linux, Sqoop 1.4.4, Oozie 3.3.0, Tableau, Notepad++
Confidential, San Jose, CA
Hadoop Developer
Responsibilities
- Implemented Hive ETL solution for Teradata Offload
- Writing Spark Scripts to Process and analyze large sets of data
- Ingested data from various tables and performed Sqoop imports
- Applied Confidential Supply chain Business Logic on source data
- Using Hive ETL normalized data and applied business logic
- Hive performance tuning to process 2 Billion record sets
- TeraData-Hadoop(Tpump) utility to export data into Teradata Tables
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-Sql, Data Frames and Pair RDD’s
- Implemented Spark using Python and utilizing Data frames and Spark SQL API for faster processing of data
- Created RDD’s, Data Frames and Datasets
- Used ORC, Parquet file formats for storing the data
- Used java code for SQl Queries and also code to retrieve the Sql Queries through Text File
- Log4j framework has been used for logging debug, info & error data
- Created Hive External and Managed tables
- Designed and Maintained Tez workflows to manage the flow of jobs in the cluster
- Loaded the Spark RDD and do in memory data Computation to generate the Output response
Environment: Hadoop, HDFS, PySpark, Teradata, TPump, Hive 0.11.1, Java, MapR, Pig 0.11.1,, Linux, Sqoop 1.4.4, Oozie 3.3.0, Notepad++
Confidential, Charlotte, NC
Hadoop Developer
Responsibilities
- Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology
- Imported the data from SQL Server and landed it on to HDFS Using Sqoop import
- Developed data pipeline using, Sqoop, Pig and Java map reduce to ingest and dump customer behavioral data and purchase histories into HDFS prior to analysis
- Used Hive to analyze the data and compute
- Used Pig for various data joins and data enrichment
- Optimizing Map-reduce code, pig scripts, user interface analysis, performance tuning and analysis
- Loaded the aggregated data onto SQL Server again Using Sqoop export for reporting on the dashboard
Environment: Hadoop, HDFS, Hive 0.12.1, Java, Hadoop distribution of Cloudera, Pig 0.11.1, Linux, Sqoop, Microsoft Excel Reporting, Notepad++
Confidential
Business Analysis
Responsibilities
- Understanding and analyzing the business requirements
- Designing and develop Tableau Reports, Documents, Dashboards and Scorecards per specified requirements and timelines
- Extracted data from various sources and performed data blending
- Created various interactive dashboards
- Developed various Reports as per customer requirements.
- Experience on KPI (key performance Indicators)
- Creating Customized and Interactive dashboards using data sources and custom objects
- Created quick filters, table calculations, calculated fields and parameters
Environment: Tableau, SQLServer2008, Microsoft Excel
Confidential
Automation and Database Testing
Responsibilities
- Understanding and analyzing the business requirements
- Extracted data from various sources and performed data blending
- Created various interactive dashboards
- Designed hybrid (modular and data driven frame work) as per the discussion with the client and on-site team
- Created library files for reusable operations, for run time settings, for logging the results in excel
- Handled client status calls and have put the status mails on daily basis
- Presented the demonstration on execution of Framework to client
- Gave KT's to the newly added resources regarding ongoing automation framework and approach
Environment: SQLServer2008, HPE UFT, VBScript, Microsoft Excel,
Confidential
Database and Automation Tester
Responsibilities
- Written test cases for integration and end to end testing
- Involved in documenting automation test plan and automation strategy
- Involved in documenting the approach and discussion about designing the framework with the client and on-site team
- Identified the reusable operations in scenarios to be automated
- Worked with SOAP UI and XML files for automating the web services
- Designed hybrid (modular and data driven frame work) as per the discussion with the client and on-site team
- Created library files for reusable operations, for run time settings, for logging the results in excel
Environment: SQLServer2008, TestComplete, JScript, Microsoft Excel, SOAP UI
