Hadoop Developer Resume
Milwaukee, WI
SUMMARY
- Having 10+ years of experience in the Information Technology industry as a Data warehouse professional with extensive experience in Data Lake, Data Warehousing, Data Integration, Data Acquisition, Data Ingestion, Data Migration, Data Modeling, Data Profiling, Data Analysis, Data Cleansing, Data Quality, Data Processing, Data Mart and Data Governance projects including Implementation, Maintenance, Testing, and Production Support of Applications.
- About 3 years of experience in architecting, developing and implementing Big - Data technologies in core and enterprise software development initiatives and applications that perform large scale Distributed Data Processing for Big data analytics and Big Data ecosystem tools; Hadoop, Hive, Pig, Sqoop, HBase, Spark, Spark SQL, Spark Streaming, Python, Kafka, Oozie, Zoo Keeper, Yarn, TEZ.
- Hands on experience in using various Hadoop distributions (Cloudera, Horton works, MapR).
- In depth understanding of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Experienced in working with structured data using Hive QL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
- Experience using Sqoop to import/export data into HDFS from RDBMS and vice-versa and Loading the Real-time data into HDFS using Kafka.
- Expertise in the Concepts of COST Based Optimization technics and Identified various solutions to improve the Performance of HIVE SQL’s.
- Experience in implementing Spark Sql and converting business process into RDD transformations Spark.
- Strong skills in Informatica 7.1, 8.6.1, 9.5.1, IBM-DataStage 9.1/8.5/7.5, SQL Programming, Teradata, IBM DB2, PL/SQL, SQL Server, Performance tuning and Shell Scripting.
- Expertise in ELT process well versed concept in Teradata and DB2.
- Created the High-level design documents and Source to Confidential Mapping for ETL/ELT process.
- Performance tuned for different subject areas by implementing aggregates, aggregate join indexes, compression, statistics, and SQL re-writes; along with foundation table modifications including Index changes.
- Working Knowledge in Teradata load utilities including FastLoad, Multiload and BTEQ in network attached client environment.
- Worked extensively in UNIX client/server environment and have good exposure in shell scripting.
- Used Git, Jenkins and Sonarqube for Continuous Integration and Development of the code.
- Worked in both Waterfall and Agile methodologies.
- Excellent analytical, problem-solving, communication and interpersonal skills.
- Self-motivated, energetic team player with demonstrated proficiency for learning new tools and business environment.
TECHNICAL SKILLS
Big Data: Hadoop Ecosystem, HDFS, Map Reduce, Pig, HIVE, SqoopOozie, HBase, and Zookeeper, YARN, Spark, Kafka, Python
ETL: Informatica 7.1, 8.1.1, 8.6.1, Informatica Metadata Manager 8.6.1 DataStage 9.1/8.5/7.5, SSIS.
NoSQL Database: HBase, Druid
IDE/Build Tools: Eclipse, Maven, IntelliJ, TFS
Continuous Integration: Jenkins, Git, Sonarqube
Version Control: Git, Team Foundation Server
OLAP: Cognos ReportNet 1.1, Tableau 10.X
Database: Oracle9i, 10g; Teradata 13.10, 14.00, SQL Server 2012
Languages: SQL, PL/SQL, Shell Scripting, Python
Scheduling Tool: Autosys, Control-M
Agile: JIRA, Version One
PROFESSIONAL EXPERIENCE
Confidential, Milwaukee, WI
Hadoop Developer
Responsibilities:
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
- Worked on multiple sources to bring the data to Data Lake and build the snapshots of data on daily basis and load the data into HDFS.
- Worked extensively with Sqoop for importing data from Oracle, Dynamic CRM Web Services and SQL Server to HDFS.
- Developed Spark code using Python and Spark-SQL/Streaming for faster testing and processing of data.
- Developed Spark scripts by using Python shell commands as per the requirement.
- Developed Python scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Developed Hive (hql) scripts, HDFS external/managed tables, Oozie workflow and coordinator application to load the data into HDFS-Landing, foundation layer.
- Experienced in implementing Spark RDD transformations, actions to implement business analysis.
- Migrated Hive QL queries on structured into Spark QL to improve performance
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
- Used Reporting tools like Tableau to connect to Drill and generate daily reports of data.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Extensively worked on creating End-End data pipeline orchestration using Oozie.
Environment: Map Reduce, HDFS, Hive, Pig, SQL, Sqoop, Oozie, Yarn, Shell scripting, Spark, Python, Tableau.
Confidential, Minneapolis, MN
Lead BI Engineer
Responsibilities:
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Developed Spark code using Python and Spark-SQL/Streaming for faster testing and processing of data.
- Experienced with batch processing of data sources using Apache Spark.
- Experienced in implementing Spark RDD transformations, actions to implement business analysis.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Migrated Hive QL queries on structured into Spark SQL to improve performance.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
- Created Pig scripts that reads and writes into HBase tables.
- Created Oozie workflows that runs the hive, pig, shell scripts and performs quality checks.
- Created External Hive Table on top of parsed data and save data using ORC file formats.
- Worked with different File Formats like TEXTFILE, JSON, XML, PARQUET and ORC file formats for HIVE querying and processing.
- Performed advanced procedures like server log analytics using the in-memory computing capabilities of Spark using python.
- Implemented Real time streaming the data using Spark with Kafka and Spark SQL for faster processing.
- Provided the design to patterns for joins, updates and other missing features of Hive.
- Working knowledge in IDEs like Eclipse.
- Working knowledge of using GIT, Maven for project dependency / build / deployment.
- Build the Consumers to have real time data stream using Kafka and Spark streaming.
- POC Developed Data Integration and data pipeline using Kafkaand Spark to store data into HDFS.
- Automated all the jobs using Oozie workflows and to run multiple Map Reduce and Pig jobs and supported in running jobs on the cluster.
Environment: Map Reduce, HDFS, Hive, Pig, SQL, Sqoop, Oozie, HBase, Shell scripting, Apache Kafka, Spark, Python.
Confidential
Senior BI Engineer
Responsibilities:
- Analyzing the existing system process and functionality and designing the new system in big data with the respective appropriate functioning techniques.
- Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the Guest, transaction data by date.
- Created the process to offload/migrating existing ELT/ETL process to Hadoop.
- Worked extensively with Sqoop for importing data from Teradata to HDFS.
- Developed Hive (hql) scripts, HDFS external/managed tables, Oozie workflow and coordinator application to load the data into HDFS-Landing, foundation layer.
- Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Extensively worked on creating End-End data pipeline orchestration using Oozie.
- Developed HIVE queries to create partitions and buckets to optimize the job processing.
- Providing Technical/functional assistance to offshore team members.
- Reviewing the Hive (hql) scripts, Oozie workflow for the developers and providing the review comments to the developer.
- Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
- Handled all the issues and post production defects raised during the Implementation phase.
- Involved in support for the developed jobs in PROD environment until it gets signed-off.
Environment: DataStage 8.5, Map Reduce, HDFS, Hive, Pig, SQL, Sqoop, Oozie, Zookeeper.
Confidential
System Analyst
Responsibilities:
- Involved in System requirement design specification for Development and traceability matrix.
- Set up and execution of iTest environment, data integrity & performance issue during migration.
- Good Knowledge on preparing job chains in Cronicle Scheduling tool.
- Implementing project using water fall methodology, involved in development, Unit/ Integration/Regression testing of application using QC against other dependent applications in ODS & EDW using GEHC IMPRD toll gate process.
- Created mappings using different transformations like Source Qualifier, filter, Aggregator, Expression, Lookup, Sequence Generator, Router and Update Strategy.
- Responsible to load staging tables by using Multi Load and Fast Load Scripts.
- Writing BTEQ scripts for moving data from staging to final tables.
- Defined reusable business logics in the form of mapplets and reusable transformation according to the mapping requirements.
- Analysis of certain existing mappings, which were producing errors and modifying them to produce correct results.
Environment: Informatica 8.6.1, Teradata14.00, Unix, Control M.
Confidential
Analyst
Responsibilities:
- Analyzed the business requirements and functional specifications.
- Extracted data from oracle database and spreadsheets and staged into a single place and applied business logic to load them in the central oracle database.
- Extensively used Transformations like Router, Aggregator, Normalizer, Joiner, Expression and Lookup, Update strategy and Sequence generator and Stored Procedure.
- Implemented performance tuning logic on targets, sources, mappings, sessions to provide maximum efficiency and performance.
- Parameterized the mappings and increased the re-usability.
- Used Informatica Power Center Workflow manager to create sessions, workflows and batches to run with the logic embedded in the mappings.
- Created procedures to truncate data in the Confidential before the session run.
- Used the PL/SQL procedures for Informatica mappings for truncating the data in Confidential tables at run time.
- Extensively used Informatica debugger to figure out the problems in mapping. Also, involved in troubleshooting existing ETL bugs.
- Created a list of the inconsistencies in the data load on the client side so as to review and correct the issues on their side.
- Written documentation to describe program development, logic, coding, testing, changes and corrections.
- Created Test cases for the mappings developed and then created integration Testing Document.
Environment: Informatica 8.6.1, Oracle 10g, OBIEE, SAP BW, Unix, Control M.
Confidential
Programmer Analyst
Responsibilities:
- Developed mappings to extract, transform and load the data from Flat Files using Informatica.
- Created mappings and sessions to implement technical enhancements for data warehouse by extracting data from sources like Oracle and Delimited Flat files.
- Applied slowly changing dimensions like Type 1 and 2 effectively to handle the delta Loads.
- Prepared various mappings to load the data into different stages like Landing, Staging and Confidential tables.
- Used various transformations like Source Qualifier, Expression, Aggregator, Joiner, Filter, Lookup, Update Strategy Designing and optimizing the Mapping.
- Developed Workflows using task developer, worklet designer, and workflow designer in Workflow manager and monitored the results using workflow monitor.
- Created various tasks like Session, Command, Timer and Event wait.
- Modified several of the existing mappings based on the user requirements and maintained existing mappings, sessions and workflows.
- Tuned the performance of mappings by following Informatica best practices and also applied several methods to get best performance by decreasing the run time of workflows.
- Prepared SQL Queries to validate the data in both source and Confidential databases.
- Worked on TOAD and Oracle SQL Developer to develop queries and create procedures and packages in Oracle.
- Created Test cases for the mappings developed and then created integration Testing Document.
- Prepared the error handling document to maintain the error handling process.
- Closely worked with the reporting team to ensure that correct data is presented in the reports.
- Production scheduling of Informatica mappings and UNIX scripts usingAutosys Scheduler.
Environment: Informatica 7.1, Oracle 9i, Unix, Cognos Report Net 1.1, Autosys Scheduler.