Big Data Hadoop Developer Resume
Eden Prairie, MN
SUMMARY:
- Around 9+ years of experience in IT industry. Around 4+ years of experience in big data/Hadoop technologies
- 5 plus years of experience as designing and developing in Java & J2EE technologies Good knowledge in Data structures and Design Analysis of Algorithms course
- Good knowledge in Message Passing techniques, Inter process communication Good knowledge in creating Data Marts and warehouse concepts
- Experience in writing MapReduce, YARN, PIG Scripts, Hive Queries, apache Kafka, Storm for analyzing Data
- Clear understanding on MapReduce1, YARN, distributed cache concepts
- Hands on experience on NoSQL databases such as Hbase, Cassandra - bit knowledge on MongoDB
- Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive
- Importing the data from the MySQL into the HDFS using Sqoop Importing the unstructured data into the HDFS using Flume
- Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner Written Map Reduce java programs to analyze the log data for large-scale data sets
- Involved in using HBase Java API on Java application Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System
- Developed Pig Latin scripts to extract the data from the output files to load into HDFS. Developed custom UDFs and implemented Pig scripts
- Implemented MapReduce jobs using Java API and PIG Latin as well HIVEQL Participated in the setup and deployment of Hadoop cluster
- Hands on design and development of an application using Hive (UDF). Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL)
- Provide support data analysts in running Pig and Hive queries Involved in the process of Cassandra data modelling and building efficient data structures
- Involved in working on Cassandra database to analyze how the data get stored. Experienced in moving data from Hive tables into Cassandra for real time analytics on hive tables
- Worked on design and implemented a Cassandra based database and related web services for storing unstructured data.
- Performed benchmarking of the No-SQL databases, Cassandra and HBase Integrating bulk data into Cassandra file system using MapReduce programs.
- Involved in creating data-models for customer data using Cassandra Query Language Involved in CQL.
- Involved in HiveQL. Involved in Pig Latin scripts I wrote the mappings to bring the data from different sources such as JDE, Oracle Financials, Oracle databases and Flat files into the staging area.
- Tuned mappings, transformations and SQL query to gain optimum performance.
- As a developer, designed and implemented Data mart and operational databases. Led the technical design meetings, and responded to client’s business need.
- Hands on experience in working with Tabula reporting tools Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP
- Responsible for defining the data flow within Hadoop eco system and direct the team in implement them
- Exported the result set from Hive to MySQL using Shell scripts Good understanding and hands on experience on Messaging systems like Kafka
- Used Kafka as a traditional message broker system for large scale message processing Involved in writing messages into Queue
- Developed the Apache Storm (Topologies), Kafka, and HDFS integration project to do a real time data analyses
- In-depth understanding and good hands on experience in Inter Process Communication and message passing using distributed cache
- Developed simple to complex Map/Reduce jobs using Hive and Pig to handle files in multiple formats (JSON, Text, XML, Avro, Sequence File and etc. worked extensively on creating combiners, Partitioning, Distributed cache to improve the performance of MapReduce jobs
TECHNICAL SKILLS:
Programming Skills: JAVA, C, C++ with Data Structures, Scala Map reduce, PIG Scripts, Servlets, JSP, MVC, Spring Security, AJAX, XML, HTML, Angular JS, JQuery, JavaScript and web services And SPARK/SCALA
NOSQL Databases: Hbase, Cassandra
RDBMS: Oracle, MySQL, DB2 and Netezza
J2EE Server: Web logic Application server 8.1, Web Sphere Application Server 8.5
Web Server: Apache Tomcat 7.0
IDE s: Eclipse 3.4, RSA 8.0
Build Tools: Maven
Continuous integration Tools: Hudson, Jenkins
Version control/Configuration Management Tool: SVN, CVS,GIT
Operating Systems: Windows XP/Windows 9x, UNIX, Linux
Open source Distributed File system: Hadoop Distributed File System
Data Analysis Tools: Apache PIG, HIVE
Data Importing Tools: Sqoop, flume
Serialization Framework: Apache Avro
Distributed File system: Hadoop
Distributed Message Passing Broker: Apache Kafka
Cluster coordinating services: Apache Zookeeper
Application Workflow management: apache Oozie
PROFESSIONAL EXPERIENCE:
Big Data Hadoop Developer
Confidential, Eden Prairie, MN
Responsibilities:
- Responsible for building and configuring distributed data solution using MapR distribution of Hadoop.
- Involved in complete BigData flow of the application data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data.
- Involved in importing the data from various formats like MapR-DB JSON, XML to HDFS environment. Involved in transfer of data from post log tables into HDFS and Hive using SQOOP
- Developed Spark code by using Scala/Spark-SQL for faster processing and testing. Responsible for ingesting the data on to Data lake.
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like spark.
- Importing and exporting data into HDFS and hive using Sqoop and Kafka. Used Kafka to receive real time streaming data and processed using spark streaming and store those stream data into HDFS cluster using Scala.
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
- Developed Spark scripts by using Python shell commands as per the requirement. Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
- Supported MapReduce Programs those are running on the cluster. Developed Hive queries, Pig Latin scripts and Spark SQL queries to analyze large datasets.
- Worked on debugging, performance tuning of Hive and Pig Jobs . Used Informatica Power Center for (ETL) extraction, transformation and loading data from heterogeneous source systems into HDFS.
- Implemented test scripts to support test driven development and continuous integration of Big Data.
- Experience in managing and reviewing huge Hadoop log files. Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting
- Scheduling Oozie workflow engine to run multiple Hive and Pig jobs and extensively used Pig for data cleansing.
- Created, technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
- Implemented the Casandra and manage of the other tools to process observed running on over YARN.
- Built the automated build and deployment framework using Jenkins, Maven etc.
- Used hive data warehouse modeling to interface with BI tools such as Tableau from Hadoop also, enhance the existing applications.
Environment: MapR, HDFS, NFS, MapR-DB JSON, Hive, MapReduce, Pig Oozie, Sqoop, Spark, Kafka, Shell Scripting, LINUX, Cassandra, Scala, Tableau, MySQL.
Big Data Hadoop Developer
Confidential, Cary, North Carolina
Responsibilities:
- Gathered the business requirements from the Business Partners Responsible to manage data coming from warehouse
- Created all external tables as raw data zone on top of hive before data comes to big data layer from warehouse
- We get the data in the form of dimensions and fact tables. All dimension tables will be updated monthly wise. We applied STAR SCHMEA concept to build this project work.
- Responsible in created hive scripts and shell script for base data load and as well as delta load.
- As the data is coming from warehouse we designed pre-processing step as slimming down the dimensions for each dimension and fact tables for loss and premium for all the dashboards.
- Designed and developed STAR SCHEMA Good knowledge in identifying the dimensions with Facts.
- Hands on experience in developing Staging Layer from the slimming down the dimensions. Staging Layer does not have any keys and it consists of redundant data.
- Involved in creation Datamart dimensions and fact table creation from staging Layer I moved entire Datamart from QA to PROD
- Validated all the datamarts as per the developer standpoint Interacted with Click view Team while generating reports
- Hands on experience in synchronizing of the tables from hive and Big SQL Good knowledge in working with Big SQL
- Experience in delivering data marts to click view, and users can access the data from Big SQL Responsible for maintaining full refresh data for every quarter.
- Involved in HDFS maintenance and loading of structured data. Involved in managing and reviewing Hadoop log files
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis Written Hive queries for data analysis to meet the business requirements
- Creating Hive tables and working on them using Hive QL. Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions
- Code repository is maintained as BOARLAND STARTEAM, from which all the code can be moved to the prod environment once the UAT and QA success
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
- Working knowledge in writing PIG's Load and Store functions Focusing on optimization, we used Spark and Scala in order to achieve high throughput.
- Whatever we processed using HIVE is taking 10hrs. to complete one dashboard, Later the same thing was completed within 30Min using Spark/Scala.
- Trained the offshore Resources to adopting the client standards
- Documented all the challenges, issues involved to deal with the security system and Implemented best practices
- Created Project structures and configurations according to the project architecture and made it available to the junior developer to continue their work
- Handled onsite coordinator role to deliver work to offshore
Environment: IBM Big Insights 4.2, Spark 1.6, Scala, Big SQL, HIVE, Java 1.8, Schell script, Pig, and Borland StarTeam 14.0
Hadoop Developer
Confidential, Georgetown, KY
Responsibilities:
- Provided application demo to the client by designing and developing search engine, report analysis trends, application administration prototype screens using AngularJS, and BootstrapJS.
- Took the ownership of complete application Design of Java part, Hadoop integration
- Apart from the normal requirement gathering, participated in Business meeting with the client to gather security requirements.
- Assisted with the architect to analyze the existing system and future system Prepared design blue pints and application flow documentation
- Responsible in working with Message broker system such as Kafka Extracted data from mainframes and feed to KAFKA and ingested to HBASE to perform Analytics
- Written event driven, link tracking system to capture user events and feed to KAFKA to push it to HBASE.
- Created MapReduce jobs to extracts the contents from HBASE and configured in OOZIE workflow to generate analytical reports.
- Developed the JAX- RS web services code using apache CXF framework to fetch data from SOLR when user performed the search for documents
- Participated in SOLR schema, and ingested data into SOLR for data indexing.
- Written MapReduce programs to organize the data, and ingest the data to suitable for analytics in client specified format
- Hands on experience in writing python scripts to optimize the performance Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.
- Extracted files from Cassandra through Sqoop and placed in HDFS and processed. Implemented Bloom filters in Cassandra using key space creation
- Involved in writing Cassandra CQL statements God hands on experience in developing concurrency using spark and Cassandra together
- Involved in writing spark applications using Scala Hands on experience in creating RDDs, transformations and Actions while implementing spark applications
- Good knowledge in creating data frames using Spark SQL. Involved in loading data into Cassandra NoSQL Database
- Implemented record level atomicity on writes using Cassandra Written PIG Scripts to query and process the Data sets to figure out the patterns of trends by applying client specific criteria, and configured OOZIE workflows to run the jobs along with the MR jobs
- Stored the derived the results in HBASE from analysis and make it available to data ingestion for SOLR for indexing data
- Involved in integration of java search UI, SOLR and HDFS Involved in code deployments using continuous integration tool using Jenkins
- Documented all the challenges, issues involved to deal with the security system and Implemented best practices
- Created Project structures and configurations according to the project architecture and made it available to the junior developer to continue their work
- Handled onsite coordinator role to deliver work to offshore Involved in core reviews and application lead supported activities
- Trained the offshore Resources to adopting the client standards
Environment: Java, J2EE, Python, Cassandra, Spring 3.2, MVC, HTML5, CSS, AngularJS, BootstrapJS, Restful services using CXF web services framework, WAS 8.5, spring data, SOLR 5.2.1, PIG, HIVE, apache AVRO, Map Reduce, Sqoop Zookeeper, SVN, Jenkins, windows AD, windows KDC, Hortonworks distribution of Hadoop 2.3, YARN, Ambari
Hadoop Developer
Confidential, Detroit, MI
Responsibilities:
- Written multiple java MapReduce jobs for data cleaning and preprocessing Experienced in defining job flows using Oozie
- Experienced in managing and reviewing Hadoop log files Load and transform large sets of structured, semi structured and unstructured data
- Responsible to manage data coming from different sources and application Supported Map Reduce Programs those are running on the cluster
- Involved in loading data from UNIX file system to HDFS. Involved in designing schema, writing CQL’s and loading data using Cassandra
- Good experience with CQL Data manipulation commands and CQL clauses Worked with CQL collections
- Good knowledge and hands on experience with hive connection to Tabula Create datamarts
- Installed and configured Hive and also written Hive UDFs. Involved and experienced with Datastax
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way
- Developed Map Reduce jobs to automate transfer of data from/to Hbase Assisted with the addition of Hadoop processing to the IT infrastructure
- (Near) Real time search. Working in creation of indexes and increasing the search results very faster
- Used RESTful JAVA APIs and Native APIs while working with Elastic Search
- Used flume to collect the entire web log from the online ad-servers and push into HDFS
- Implemented Map/Reduce job and execute the MapReduce job to process the log data from the ad-servers
- Wrote efficient map reduce code to aggregate the log data from the Ad-server
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
- Working knowledge in writing PIG's Load and Store functions
Environment: Hortonworks, MapReduce, HDFS, Hive, Pig, Flume, Oozie, tabula Cassandra and Java 1.5
Hadoop Developer
Confidential, Detroit, MI
Responsibilities:
- Gathered the business requirements from the Business Partners and Subject Matter Experts
- Responsible to manage data coming from different sources Involved in HDFS maintenance and loading of structured and unstructured data
- Wrote MapReduce job using Java API Involved in managing and reviewing Hadoop log files
- Involved in working with Talend Big data studio for cleaning, transformation and some ETL actions
- Responsible in loading huge data into Talend on ad-hoc demand Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments.
- Involved in pushing data as a delimited into HDFS Imported data using Sqoop to load data from MySQL to HDFS on regular basis
- Developing Scripts and Batch Job to schedule various Hadoop Program Written Hive queries for data analysis to meet the business requirements
- Creating Hive tables and working on them using Hive QL. Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers
- Used JUnit for unit testing and Continuum for integration testing
Environment: Hadoop, Talend, MapReduce, HDFS, Hive, PIG, MySQL, Java (jdk1.6), and Junit
Sr. Java Developer / Hadoop Developer
Confidential, Chattanooga, TN
Responsibilities:
- Involved in coding of JSP pages for the presentation of data on the View layer in MVC architecture
- Involved in requirements gathering, analysis and development of the Insurance Portal application
- Used J2EE design patterns like Factory Methods, MVC, and Singleton Pattern that made modules and code more organized, flexible and readable for future upgrades
- Worked with JavaScript to perform client side form validations Used Struts tag libraries as well as Struts tile framework
- Used JDBC to access Database with Oracle thin driver of Type-3 for application optimization and efficiency
- Used Data Access Object to make application more flexible to future and legacy databases Actively involved in tuning SQL queries for better performance
- Wrote generic functions to call Oracle stored procedures, triggers, functions Used JUnit for the testing the application in the testing servers
- Providing support for System Integration Testing & User Acceptance Testing Used Oracle SQL developer for the writing queries or procedures in SQL.
- Involved in resolving the issues routed through trouble tickets from production floor Participated in Technical / Functional Reviews
- Involved in Performance Tuning of the application Used Log4J for extensible logging, debugging and error tracing
- Need to discuss with the client and the project manager regarding the new developments and the errors
- Involved in Production Support and Maintenance Involved in transferring data from MYSQL to HDFS using Sqoop
- Written map-reduce jobs according to the analytical requirements I developed java programs to clean the huge datasets and for pre processing
- Responsible in creating Pig scripts and analyzing from the large datasets Involved with different kind of files such as text and xml data
- Involved in developing of UDFs in pig scripts Interacted and reported the fetched results to BI department
Environment: JDK, J2EE, UML, Servlet, JSP, JDBC, Struts, XHTML, JavaScript, MVC, XML, XML, Schema, Tomcat, Eclipse, CDH, Hadoop, HDFS, Pig, MYSQL and MapReduce
Java Developer
Confidential, Pittsburgh, PA
Responsibilities:
- Involved in requirement gathering & Analysis of the project Designed the functional specifications and architecture of the web-based module using Java Technologies
- Created Design specification using UML Class Diagrams, Sequence & Activity Diagrams
- Developed the Web Application using MVC Architecture, Java, JSP, and Servlets & Oracle Database
- Developed various Java classes, SQL queries and procedures to retrieve and manipulate the data from backend Oracle database using JDBC.
- Extensively worked with Java Script for front-end validations Analysis of business requirements and develop system architecture document for the enhancement project
- Designed and developed applications on Service Oriented Architecture (SOA)
- Created UML (Use cases, Class diagrams, Activity diagrams, Component diagrams, etc.) using Visio
- Provided Impact Analysis and Test cases Delivered the code within the timeline, and logged the bugs/fixes in TechOnline, tracking system
- I had developed Unit & Functional Test cases for testing Web Application
- Used Spring (MVC) architecture to implement the application using the concrete principles laid down by several design patterns such as Composite View, Session Facade, Business Delegate, Bean Factory, Singleton, Data Access Object and Service Locator
- Involved in the integration of spring for implementing Dependency Injection Developed code for obtaining bean references in Spring IOC framework
- Focused primarily on the MVC components such as Dispatcher Servlets, Controllers, Model and View Objects, View Resolver
- Involved in creating the Hibernate POJO Objects and utilizing Hibernate Annotations Used Hibernate, object/relational-mapping (ORM) solution, technique of mapping data
Environment: Windows NT 2000/2003, XP, and Windows 7/ 8, Java, UNIX, SQL, SOA, JDBC, JavaScript, Maven, JUnit, Agile/Scrum Methodology and SVN Version Control
JAVA Developer
Confidential
Responsibilities:
- Developed web components using JSP, Servlets and JDBC Designed tables and indexes
- Designed, Implemented, Tested and Deployed Enterprise Java Beans both Session and Entity using WebLogic as Application Server
- Developed stored procedures, packages and database triggers to enforce data integrity. Performed data analysis and created crystal reports for user requirements
- Implemented the presentation layer with HTML, XHTML and JavaScript Used EJBs to develop business logic and coded reusable components in Java Beans
- Development of database interaction code to JDBC API making extensive use of SQL
- Query Statements and advanced Prepared Statements Used connection pooling for best optimization using JDBC interface
- Used EJB entity and session beans to implement business logic and session handling and transactions Developed user-interface using JSP, Servlets, and JavaScript
- Wrote complex SQL queries and stored procedures Actively involved in the system testing
- Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product
- Responsible for creating work model using HTML and JavaScript to understand the flow of the web application and created class diagrams.
- Participated in the daily stand up SCRUM agile meetings as part of AGILE process for reporting the day to day developments of the work done
- Design and develop user interfaces using HTML, JSP.
- J2EE is used to develop the application based on MVC architecture Created interactive front-end GUI using JavaScript, JQuery, DHTML and Ajax Used SAX and DOM XML parsers for data retrieval
Environment: Windows NT 2000/2003, XP, and Windows 7/ 8 C, Java, JSP, Servlets, JDBC, EJB, DOM, XML, SAX
Java Developer
Confidential
Responsibilities:
- Successfully completed the Architecture, Detailed Design & Development of modules Interacted with end users to gather, analyze, and implement the project
- Developed applications that enable the public to review the Inventory Management
- Established schedule and resource requirements by planning, analyzing and documenting development effort to include time lines, risks, test requirements and performance targets
- Analyzing System Requirements and preparing System Design document Developing dynamic User Interface with DHTML and JavaScript using JSP and Servlet Technology
- Designed and developed a sub system where Java Messaging Service (JMS) applications are developed to communicate with MQ in data exchange between different systems
- Java Message Oriented Middleware (MOM) API for sending messages between clients Used JMS elements for sending and receiving messages
- Used hibernate for mapping from Java classes to database tables Wrote PL/SQL & SQL in Oracle Database for creating tables, indexes, triggers and query statements
- Design and develop enterprise web applications, for internal production support group, using Java (J2EE), Design Patterns and Struts framework
- Tuning and Index creation for improved performance Designed and developed database schema for new applications
- Created connection pooling method to avoid the waiting for database connection Designed an ER Diagram for all the databases using the DB Designer an Open Source Tool
- Designed the Class Diagrams and the Use cases Diagram using the Open Source tool Created and executed Test Plans using Quality Center by Test Director
- Mapped requirements with the Test cases in the Quality Center Supporting System Test and User acceptance test
- Created internal users and support personnel by drafting and reviewing company product documentation such as Technical document, Impact assessment document
Environment: Rational Application Developer 6.0, Rational Rose, Java, J2ee, JDBC, EJB, JSP, EL, JSTL, JUNIT, XML, SOAP, WSDL, SOA.