Hadoop Developer Resume CA - Hire IT People

SUMMARY

Over 7 years of IT experience which includes 3 years experience in Big data technologies with more than 4 years of extensive experience in Java/J2EE
Currently working on Spark - GraphX on a POC to figure out the strong connection between identities on ~3TB RDF data using indexing form Jena/SPARQL
Excellent Knowledge in understanding Big Data infrastructure, distributed file systems - HDFS, parallel processing - Map Reduce framework and complete Hadoop ecosystem - Hive, Hue, Pig, Hbase, Zookeeper, Sqoop, Kafka-Storm, Spark, Flume and Oozie.
In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce concepts and experience in working with MapReduce programs using Apache Hadoop for working with Big Data to analyze large data sets efficiently.
In depth knowledge of real-time ETL/Spark analytics using Spark Sql with visualization
Hands-on experience on YARN (MapReduce 2.0) architecture and components such as Resource Manager, Node Manager, Container and Application Master and execution of a MapReduce job.
Experience in collecting the log data from different sources (webservers and social media- Tweets) using Flume and storing in HDFS to perform the MapReduce jobs/Hive queries.
Knowledge in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Zookeeper, Sqoop, Kafka-Storm,Spark, Pig, Impala and Flume.
Experience in installation, configuration, supporting and managing - Cloudera’s Hadoop platform along with CDH4 & CDH5 clusters, HDP 2.2 with Kafka-Storm and EC2 platform, IBM’s Big Insight Hadoop ecosystem.
Strong knowledge of Pig and Hive's analytical functions, extending Hive and Pig core functionality by writing custom UDFs.
Experience in developing REST API's for use in single page or native applications and implementing Rails Migrations, Active Record, Action Pack and Action Mailer.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems/ Non-Relational Database Systems and vice-versa.
Expertise in developing PIG Latin Scripts and Hive Query Language for data Analytics.
Well-versed in and implemented Partitioning, Dynamic-Partitioning and bucketing concepts in Hive to compute data metrics.
Created classes that simulate real-life objects, and write loops to perform actions on your data.
Expertise in development of multi-tiered web based enterprise applications using J2EE technologies like Servlets, JSP, JDBC, Java Beans, Spring Framework, MVC and Hibernate.
Having Experience of applications development on tools Eclipse, STS (Spring tool), Android Studio, etc with deployment on server/cloud like IBM’s Bluemix using inbuilt services.
Detailed knowledge and experience of Design, Development and Testing Software solutions using Java and J2EE technologies with developing and maintaining the Web Applications using the Web server Tomcat
Experience with front end technologies like HTML5, CSS3, Javascript and jQuery for UI to get a complete end to end system.
Strong analytical skills with ability to quickly understand clients business needs
Proficient in software documentation and technical report writing. Versatile team player with good communication, analytical presentation on various big data talks and inter-personal skill

TECHNICAL SKILLS

Hadoop/Big Data: HDFS, MapReduce(M-R), Hue, Hive, Pig, Hbase, Impala, Sqoop, Flume, Zookeeper, Oozie, Kafka-Storm, Spark with Scala

Operating Systems/Environment: Windows, Ubuntu, Linux, iOS, Cloudera CDH,EC2,S3, IBM Big Insight

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, Java Beans

Frameworks: MVC, Struts, Hibernate, Spring

Databases/Database Languages: MySQL, MS-SQL Server, SQL, Oracle 11g, NoSQL (HBase, MongoDB, Cassandra )

Web Technologies: HTML5, CSS3, JavaScript, jQuery, AJAX, REST, Bootstrap

Programming Languages: Java, Scala,C/ C++, Python

IDE’s/Tools: Eclipse, Net beans, Android Studio, DevC++, JUnit testing tool,Log4j for logging

Web Servers: Web Logic, Web Sphere, Apache Tomcat 7, IBM’s Bluemix, Pivotal 3.0

ETL Tools: Talend for Big data, Informatica

Software Engineering Methods: UML, Object Oriented Methodologies, Scrum and Agile methodologies

PredictiveModelling Tool/Statistical Programming/BI Tools: Tableau,R, IBM Cognos, IBM SPSS Modeler, IBM’s Bigsheets, D3, Qlikview

Virtual machine: Oracle virtual box, VMware player, Workstation 11

Team/ Source control: Distributed(Git),Centralized(Sub Version-svn)

PROFESSIONAL EXPERIENCE

Confidential, CA

Hadoop Developer

Responsibilities:

Created the DB catalog, Roles Privs, Erwin Logical, Erwin UDP feed files to import database catalog, Roles Privs, Logical and UDP relations of a Database
Import DB catalog, Roles, Erwin Logical and UDP of various databases to the Metadata Repository
Import EME dataset import, EME graph import, EME table import, Lineage/Dependency analysis of various Databases to the Metadata Repository
Performed the ETL (Extract Transform Load) process and wrote Ruby scripts and loaded the data in the target database.
Used Rails, AJAX, JSON, CSS and jQuery to design the front end of the application. Back end of the application is developed mainly using Active Records.
Validate the Change sets of different imports in the Metadata Repository and approve the change sets so that the Business can access the data
Created Ab Initio graphs that transfer data from various sources like Oracle, flat files and CSV files to the Teradata database and flat files
Performed Data Ingestion from multiple internal clients using Apache Kafka.
Performed Real time event processing of data from multiple servers in the organization using
Apache Storm by integrating with apache Kafka.
Derived modeled the Facts, Dimensions, Aggregated facts in Ab Initio from data warehouse star schema for create billing, contracts reports
Worked on Multi file systems with extensive parallel processing. Automation of load processes using Autosys
Used Lookup Transformation in validating the warehouse customer data
Prepare logical/physical diagram of DW, and present it in front of business leaders. Used ERWIN for model design
Performed bulk data load from multiple data source (ORACLE 8i, legacy systems) to TERADATARDBMS
Used BTEQ and SQL Assistant (Query man) front-end tools to issue SQL commands matching the business requirements to Teradata RDBMS
Coded and tested Ab Initio graphs to extract the data from Oracle tables and MVS files
Worked on profiling of operational data using Ab Initio Data Profiler/SQL Tool to get better understanding of the data that can be used for analytical purpose for business analysts
Extensively used UNIX Shell Scripting for writing SQL execution scripts in Data Loading Process
Produced mapping document and ETL design document
Worked closely with the end users in writing the functional specifications based on the business needs
Participated in project review meetings
Extensively worked with PL/SQL Packages, Stored procedures & functions and created triggers to implement business rules and validations
Responsible for Performance-tuning of Ab Initio graphs
Collected and analyzed the user requirements and the existing application and designed logical and physical data models
Scripts were run through Unix shell scripts in Batch scheduling
Responsible to prepare Interface specifications and complete Documentation of Graphs and its Components
Responsible for testing the graph (Unit testing) for Data validations and preparing the test report
Implemented Security Features of Business Objects like row level, object level and report level to make the data secure
Good understanding of performance tuning with both NoSQL, Kafka, Storm and SQL Technologies
Design/Develop framework to leverage platform capabilities using MapReduce, Hive UDFs,Kafka

Environment: Ab Initio (CO>Operating system 2.15/2.14, GDE 1.16/1.15/1/14 ), Ab Initio Metadata Explorer, ER-win 4.0, UNIX, MVS, SQL, PL/SQL, Oracle 10g, Teradata V2R6, DB2, COBOL, Perl, Autosys .

Confidential, San Francisco, CA

Hadoop Consultant

Responsibilities:

Worked on Big Data Hadoop cluster implementation and data integration in developing large-scale system software
Installed and configured MapReduce, HIVE and the HDFS; implemented CDH4 (Hortonworks) Hadoop cluster on CentOS/Linux. Assisted with performance tuning and monitoring
Assessed existing and EDW (enterprise data warehouse) technologies and methods to ensure our EDW/BI architecture meet the needs of the business and enterprise and allows for business growth
Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW
Capturing data from existing databases that provide MySQL interfaces using Sqoop
Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa loading data into HDFS
Develop and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, XML, JMS, JBoss and Web Services
Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics
Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data
Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems
Managed and reviewed Hadoop log files
Tested raw data and executed performance scripts
Shared responsibility for administration of Hadoop, Hive and Pig
Exposure to Machine Learning using R and Mahout.
Developed Hive queries for the analysts, used ETL tool Talend for processing and further did visualization for transactional data
Helped business processes by developing, installing and configuring Hadoop ecosystem components that moved data from individual servers to HDFS
Created Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from Linux, NoSQL and a variety of portfolios
Supported code/design analysis, strategy development and project planning
Developed multiple MapReduce jobs in Java, further any required coding in Java for data cleaning, filtering and preprocessing with experience of testing.
Assisted with data capacity planning and node forecasting
Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Administrator for Pig, Hive and Cassandra installing updates, patches and upgrades.
Handling structured and unstructured data and applying ETL processes.

Environment: Hadoop, MapReduce, HDFS, Hive, Cassandra, Java (jdk1.7), Hadoop distribution of Hortonworks, Cloudera, MapR, IBM DataStage 8.1(Designer, Director, Administrator), MySQL, Windows, Linux

Confidential, Des Moines, IA

Hadoop Consultant

Responsibilities:

Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Flume, Oozie Zookeeper and Sqoop.
Extensively involved in Installation and configuration of Cloudera distribution Hadoop, NameNode, Secondary NameNode, JobTracker, TaskTrackers and DataNodes
Have real-time experience of Kafka-Storm on HDP 2.2 platform for real time analysis.
Created PoC to store Server Log data in MongoDB to identify System Alert Metrics
Implemented Hadoop framework to capture user navigation across the application to validate the user interface and provide analytic feedback/result to the UI team
Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
Performed analysis on the unused user navigation data by loading into HDFS and writing MapReduce jobs. The analysis provided inputs to the new APM front end developers and lucent team.
Wrote MapReduce jobs using Java API and Pig Latin.
Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
Used Flume to collect, aggregate and store the web log data onto HDFS.
Wrote Pig scripts to run ETL jobs on the data in HDFS and further do testing.
Used Hive to do analysis on the data and identify different correlations.
Involved in HDFS maintenance and administering it through Hadoop-Java API.
Imported data using Sqoop to load data from MySQL to HDFS and Hive on regular basis.
Written Hive queries for data analysis to meet the business requirements.
Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
Involved in creating Hive tables and working on them using HiveQL and perform data analysis using Hive and Pig
Supported Map Reduce Programs those are running on the cluster.
Maintaining and monitoring clusters. Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop
Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Used Qlikview and D3 for visualization of query required by BI team
Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Flume, ZooKeeper, Cloudera Manager,Oozie, Java (jdk1.6), MySQL, SQL, Windows NT, Linux

Confidential, CA

Java/Bigdata Developer

Responsibilities:

Involved in Installation and configuration of JDK, Hadoop, Pig, Sqoop, Hive,HBase on Linux environment. Assisted with performance tuning and monitoring.
Worked on creating MapReduce programs to parse the data for claim report generation and running the Jars in Hadoop. Co-ordinated with Java team in creating MapReduce programs.
Worked on creating Pig scripts for most modules to give a comparison effort estimation on code development.
Created reports for the BI team using Sqoop to export data into HDFS and Hive
Collaborated with BI teams to ensure data quality and availability with live visualization
Created HIVE Queries to process large sets of structured, semi-structured and unstructured data and store in Managed and External tables .
Created HBase tables to load large sets of structured data.
Managed and reviewed Hadoop log files.
Performed test run of the module components to understand the productivity.
Shared responsibility and assistance for administration of Hadoop, Hive, Sqoop, HBase,and Pig in team .
Shared the knowledge of Hadoop concepts with team members.
Profound knowledge shared with team mates in Zookeeper, MongoDB, Cassandra
Involved in providing inputs for estimate preparation for the new proposal.

Environment: Linux, Hadoop, Pig, Sqoop, Hive, HBase, Sqoop, MongoDB, Cassandra, Java, Eclipse Juno.

Confidential, CA

Java Developer

Responsibilities:

Involved in the phases of SDLC(Software Development Life Cycle) including Requirement collection, Design and analysis of Customer specification, Development and Customization of the application
Worked on Enhancement requests in front-end and back-end changes using Servlets, Tomcat server, JDBC, Hibernate
Used SQL queries for database integration with the code
Creation of test plans and test data for modified programs and logging the test documents in QC
End - to - End System development and testing of each modules.(Unit &System integration)
Co-ordination activities with Onshore and Offshore team of 10+ members
Responsible for Effort estimation and timely production deliveries
Creation and Execution of half yearly and yearly load jobs which updates new rate and discounts etc for the claim calculations in Database and Files
Managing Production support tickets through a tool Remedy,Correcting the data and rerunning the job,Resolving the tickets and maintaining the log
Conducting Weekly Team meetings to update the project status to Project manager
Preparation and updation of Project Detail Document and related manuals
Rewarded appreciations from client on proposing and implementing paging logic of Glossary in Explanations of Benefits(EOB) to print on the previous page which saved huge money and added profit to client
Participated in Hadoop Training for Development and Admin as a Cross-platform training program

Environment: Java, J2EE, SQL,Servlets,XML,Hibernate,Eclipse, Git, JUnit, JDBC,Tomcat server

Confidential, Bridgeport, CT

Java /J2EE Developer

Responsibilities:

Involved in Business Systems Analysis, gathering Business Requirements, deriving functional requirements and system requirements from the business requirements
Configured business applications in XML bean definition files using Spring(STS)
Worked on Hibernate ORM.Created Hibernate XML files and Java class files to map the object relation mappings
Provided data persistence by object/relational mapping solution via Hibernate for application save, update, delete operations.
Created services for various modules like Account (CD/Checking/Savings), Creation and Maintenance using Servlets.
Utilized Core J2EE design patterns such as Singleton and Data Access Object (DAO) in the implementation of the services.
Responsible for writing SQL Queries and Procedures.
Improved database performance by recognizing and rewriting slow queries.
Build & Deployment in Websphere Application Server.
Solid deployment experience on the Linux platform.

Environment: Java, SQL, XML, Servlets, Tomcat server, Eclipse, Hibernate, Linux

Confidential

Java Software Developer

Responsibilities:

Gathered requirements from client, analyzed and prepared the requirement specification document.
Object oriented design using UML and IBM’s Rational Rose used in implementing UML.
Configured application connectivity using JDBC
Designed all user interfaces using JSP and deployed the application in Apache Tomcat server
Involved in API development using CoreJavaconcepts
Used HTML, CSS, JSP, and JavaScript for Front End User Interface design.
Worked with the collection libraries.
Involved in Database designing and developing SQL Server.
Used development environment integrated with Eclipse with team support by Git
Integrated theJavaapplication to end-users.
Involved in production support.

Environment: Java/J2EE (JSP, Servlet), Eclipse, Struts, Hibernate, JPA, XML, WebLogic, Unit Case, JUnit, UML

Confidential, Stamford, CT

Java Developer

Responsibilities:

Involved in the Design, Coding, Testing and Implementation of the web application.
Developed JSP Java Server Pages starting from HTML and detailed technical design specification documents. Pages included HTML, CSS, JavaScript, Hibernate and JSTL.
Developed SOAP based requests for communicating with Web Services.
Used agile systems and strategies to provide quick and feasible solutions, based on agile system, to the organization.
Implemented HTTP Modules for different applications in Struts Framework that uses Servlets, JSP, ActionForm, ActionClass and ActionMapping.
Developing web applications using MVC Framework, Spring, Struts, Hibernate.
Analyzed and fixed defects in the Login application.
Involved in configuration and deployment of application on the JBoss Application.
Involved in dynamic creation of error elements on demand when there is an error.
Ensured design consistency with client’s development standards and guidelines.

Environment: Java, J2EE, Struts, SOAP web services, Spring, Hibernate, JavaScript, jQuery, JBoss Application Server, Oracle, AJAX, JSP, Servlets, Eclipse, Git Source control, Linux.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship