Hadoop Developer Resume
- Over 8 years of strong experience in the IT industry that includes 3+ years as a Hadoop Developer in domains like financial services and Insurance.
- Maintained positive communications and working relationship at all levels.
- An enthusiastic and goal - oriented team player possessing excellent communication, interpersonal skills with good work ethics
- Expertise inHadoopeco system components HDFS, Map Reduce, Yarn, HBase, Pig, Sqoop, Spark and Hive for scalability, distributed computing and high performance computing.
- Experience in using Hive Query Language for data Analytics.
- Experienced in Installing, Maintaining and Configuring Hadoop Cluster.
- Strong knowledge on creating and monitoring Hadoop clusters on EC2, VM, and Horton works Data Platform 2.1 & 2.2, CDH3, and CDH4Cloudera Manager on Linux, Ubuntu OS etc.
- Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
- Having Good knowledge on Single node and Multinode Cluster Configurations.
- Strong knowledge in NOSQL column oriented databases like HBase and its integration with Hadoop cluster.
- Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Analyze data, interpret results and convey findings in a concise and professional manner
- Partner with Data Infrastructure team and business owners to implement new data sources and ensure consistent definitions are used in reporting and analytics
- Promote full cycle approach including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to the requestor
- Very Good understanding of SQL, ETL and Data Warehousing Technologies
- Knowledge of MS SQL Server 2012/2008/2005 and Oracle 11g/10g/9i and E-Business Suite.
- Expert in TSQL, creating and using Stored Procedures, Views, User Defined Functions, implementing Business Intelligence solutions using SQL Server 2000/2005/2008.
- Developed Web-Services module for integration using SOAP and REST.
- NoSQL database experience on Hbase, Cassandra
- Flexible with Unix/Linux and Windows Environments working with Operating Systems like Centos 5/6, Ubuntu 13/14, Cosmos.
- Knowledge of java virtual machines (JVM) and multithreaded processing.
- Experience in build scripts using Maven and do continuous integrations systems like Jenkins.
- Java Developer with extensive experience on various Java Libraries, API's and frameworks.
- Hands on development experience with RDBMS, including writing complex SQLqueries, Stored procedure and triggers.
- Have sound knowledge on designing data warehousing applications with using Tools like Teradata, Oracle and SQL Server.
- Experience in working with job scheduler like Autosys and Maestro.
- Strong in databases like Sybase, DB2, Oracle, MS SQL.
- Strong understanding of Agile Scrum and Waterfall SDLC methodologies.
- Strong communication, collaboration & team building skills with proficiency at grasping new Technical concepts quickly and utilizing them in a productive manner.
- Adept in analyzing information system needs, evaluating end-user requirements, custom designing solutions and troubleshooting information systems.
Big Data: HDFS, Hive, Pig, HBase, Sqoop, mahout, Hadoop components (JT, TT, ZK)
Languages: Java, C/C++, Python, XML Shell scripting, Python, COBOL
Frame works: Spring, Hibernate, Struts
Servers: IBM WebSphere, WebLogic, Tomcat, and Redhat Satellite Server
Version Control: CVS, Tortoise SVN
Database: Oracle, DB2, MS-SQL Server, MySQL, MS-Access
Operating Systems: Windows 95/98/2000/XP/Vista/7/9, Macintosh, Unix.
Confidential, Richfield, MN
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
- Worked on moving all log files generated from various sources to HDFS for further processing.
- Developed workflows using custom MapReduce, Pig, Hive and Sqoop.
- WritingSparkprograms to load, parse, refined and store sensor data into Hadoop and also process analyzed and aggregate data for visualizations.
- Creating various views for HBASE tables and also utilizing the performance of Hive on top of HBASE.
- Developed the Apache Storm, Kafka, and HDFS integration project to do a real time data analyses.
- Designed and developed the Apache Storm topologies for Inbound and outbound data for real time ETL to find the latest trends and keywords.
- Developed Map Reduce program for parsing and loading into HDFS information.
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
- Written Hive UDF to sort Structure fields and return complex data type.
- Responsible for loading data from UNIX file system to HDFS.
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
- Extracted Data is loaded to Datalake (Bigdata) platform from multiple sources like oracle, Teradata and EDW using sqoop
- Designed and developed a distributed processing system running to process binary files in parallel and crunch the analysis metrics into a Data Warehousing platform for reporting.
- Developed workflow in Control M to automate tasks of loading data into HDFS and preprocessing with PIG.
- Developing Sqoop import scripts for loading incremental data from source to datalake
- Cluster co-ordination services through ZooKeeper
- Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
- Modelled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.
Environment: Hive QL, MySQL, HBase, HDFS, HIVE, Eclipse (Kepler), Hadoop, Oracle 11g, PL/SQL, SQL*PLUS, Toad 9.6, Flume, PIG, Sqoop, Spark, UNIX, Cosmos.Confidential, San Jose, CA
- Solid Understanding of Hadoop HDFS,Map-Reduce and other Eco-System Projects
- Installation and Configuration of Hadoop Cluster
- Working with Cloudera Support Team to Fine tune Cluster
- Working Closely with SA Team to make sure all hardware and software is properly setup for Optimum usage of resources
- Developed a custom File System plugin for Hadoop so it can access files on Hitachi Data Platform
- Plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
- The plugin also provided data locality for Hadoop across host nodes and virtual machines
- Developed Map Reduce programs in Java for parsing the raw data and populating staging tables
- Developing Sqoop import scripts for loading incremental data from source to datalake
- Developed map Reduce jobs to analyze data and provide heuristics reports
- Good experience in writing data ingesters and complex MapReduce jobs in java for data cleaning and preprocessing and fine tuning them as per data sets
- Extensive data validation using HIVE and also written Hive UDFs
- Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce way lots of scripting (python and shell) to provision and spin up virtualized Hadoop clusters
- Adding, Decommissioning and rebalancing nodes
- Created POC to store Server Log data into Cassandra to identify System Alert Metrics
- Rack Aware Configuration
- Configuring Client Machines
- Configuring, Monitoring and Management Tools
- HDFS Support and Maintenance, Cluster HA Setup
- Applying Patches and Perform Version Upgrades
- Incident Management, Problem Management, Performance Management and Reporting
- Recover from Name Node failures
- Schedule Map Reduce Jobs - FIFO and FAIR share
- Installation and Configuration of other Open Source Software like Pig, Hive, HBASE, Flume and Sqoop
- Integration with RDBMS using sqoop and JDBC Connectors
- Working with Dev Team to tune Job Knowledge of Writing Hive Jobs
ENVIRONMENT: Windows 7, UNIX, Linux Java, Apache HDFS Map Reduce, Pig, Hive, HBase, Flume, Sqoop, Cassandra, NOSQL
- Loading the data from the different Data sources like (Teradata and DB2) into HDFS using sqoop and load into Hive tables, which are partitioned.
- Developed HiveUDF’s to bring all the customers email id into a structured format.
- Developed bash scripts to bring the Tlog files from ftp server and then processing it to load into hive tables.
- Using sqoop to load data from DB2 into HBASE environment.
- Inserted Overwriting the HIVE data with Hbasedata daily to get fresh data every day.
- All the bash scripts are scheduled using Resource Manager Scheduler.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Developed pig scripts to transform the data into structured format and it are automated through oozie coordinators.
- Worked on loading the data from MySQL to HBase where necessary using Sqoop
- Developed Hivequeries for Analysis across different banners.
Environment: Windows 7, Hadoop, HDFS, MapReduce, Sqoop, Hive, pig, Hbase, Teradata, DB2, Oozie, MySQL, Eclipse
Confidential, Santa Clara, CA
Senior Software Engineer
- Participate in project planning sessions with business analysts and team members to analyze business IT Requirements and translate business requirements into working model.
- Involved in Planning, Defining and Designing data base on business requirement and provided documentation.
- Involve in Initial designing and creating Use case diagrams, Sequence Diagrams and class diagrams using the MS Visio Tool.
- Develop Java application using JavaSpring framework.
- Developed RESTful Web services using Java, SpringBoot
- Wrote complex SQL queries using joins, sub queries and correlated sub queries to retrieve data from the database.
- Created/Updated database objects like tables, views, stored procedures, function, packages
- Develop DAO design pattern for hiding the access to data source objects.
- Use Hibernate framework for the backend persistence
- Use Eclipse as IDE tool to develop the application and JIRA for bug and issue tracking
- Use CVS for software configuration management and version control
- Optimized and modified the triggers, complex store functions, procedures, user base data type etc.
- Added methods for performing CRUD operations in applications using JDBC and wrote several SQL queries.
- Responsible for the dealing with the problems, bug fixing and troubleshooting.
- Developing OraclePL/SQL stored procedures, Functions, Packages, SQL scripts to facilitate the functionality for various modules.
Environment: Java, J2EE, EJB 1.1, JSF, XML, JDBC, Oracle 9i, Log4J 1.2.,PL/SQL Developer, REST framework, C#, .NET Framework 3.0, Spring framework.
- .NET Developers/Architects Resumes
- Java Developers/Architects Resumes
- Informatica Developers/Architects Resumes
- Business Analyst (BA) Resumes
- Quality Assurance (QA) Resumes
- Network and Systems Administrators Resumes
- Help Desk and Support specialists Resumes
- Oracle Developers Resumes
- SAP Resumes
- Web Developer Resumes
- Datawarehousing, ETL, Informatica Resumes
- Business Intelligence, Business Object Resumes
- MainFrame Resumes
- Network Admin Resumes
- Oracle Resumes
- ORACLE DBA Resumes
- Other Resumes
- Peoplesoft Resumes
- Project Manager Resumes
- Quality Assurance Resumes
- Recruiter Resumes
- SAS Resumes
- Sharepoint Resumes
- SQL Developers Resumes
- Technical Writers Resumes
- WebSphere Resumes
- Hot Resumes