Hadoop Developer, Resume
Columbus, OhiO
PROFESSIONAL SUMMARY:
- 8 years of experience in IT with multinational clients in variety of industries, which include expertise in Big data experience developing Big data/Hadoop/Java applications.
- Following Software Development Lifecycle (SDLC) for proper delivery and implementation of software.
- Completed multiple cycles of software implementation.
- Good knowledge of Data modeling, use case design and Object oriented concepts.
- Developed interactive GUI application using Spring, Angular JS, Node JS and JavaScript.
- Experience with Web Application Development, Deployment, Low level Designing using Java, J2EE (Struts, JSP, Servlet, Spring), RESTful Webservice, and MVC Architecture
- Experienced in integration of various data sources like Java, RDBMS, Shell Scripting, Spreadsheets, Text files, XML and CSV.
- Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java Design Patterns.
- Proficient in map reduce and Big data analysis task using java programming.
- Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
- Excellent understanding and extensive knowledge of Hadoop architecture and various +components such as HDFC, JobTracker, TaskTracker, NameNode, DataNode and MapReduce programming paradigm.
- Deep knowledge in Hadoop architecture and various components such as (MapReduce, HDFS, Hive, Sqoop, Pig, Yarn, Spark, scala, Kafka, Flume, Oozie and Zookeeper) on multiple distributions i.e. Cloudera and NoSQL platforms (HBase, Cassandra & MaongoDB).
- Experience in Data Analysis, Data Validation, Data Verification, Data Cleansing, Data Completeness and identifying data mismatch.
- Involved in maintenance and migration of data using sqoop.
- Hands on experience creating User defined aggregate functions(UDAF) and User defined Table functions(UDTF) for researchers.
- Good Exposure on Mapreduce programming, PIG Scripting and Distributed Application and HDFS.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre - processing with Pig.
- Had to do the Cluster co-ordination services through ZooKeeper.
- Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing
- Experience in installation, configuration, supporting and monitoring Hadoop clusters using Cloudera distributions.
- Expertise in developing PIG Latin Scripts and using Hive Query Language for data Analytics.
- Well experienced in writing data transformations, data cleansing using PIG operations and implementing Join operations using PIG Latin.
- Excellent understanding and knowledge of NOSQL databases like HBase, Cassandra and MongoDB.
- Experienced in developing MapReduce programs python for working with Big Data.
- Generating analytical reports using Impala and Hue.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Have hands on experience in implementing various machine learning algorithms using Spark .
- Worked as a part of Data Science Team in one of the projects.
- Experience debugging Pig and Hive scripts and optimizing MapReduce job and debugging Mapreduce job using Junit framework.
- Worked on building massively scalable Real Time Streaming Platform using Spark, Scala, Spark Streaming, Kafka.
- Expertise in working with different databases like Oracle, MS-SQL Server, Postgress, and MS Access 2000 along with exposure to Hibernate for mapping an object-oriented domain model to a traditional relational database.
- Experience in working with web/application servers like JBOSS, Apache Tomcat, IBM Web sphere,Web logic
- Adequate knowledge of Agile and Waterfall methodologies.
- Extensive experience in data analysis using tools along with Shell scripting and UNIX.
- Experienced in developing complex SQL queries and PL/SQL packages, stored procedures, functions, and triggers.
- Experience with Amazon Web Services(AWS), SOAP and REST protocol.
- Familiarity and experience with data warehousing and ETL tools.
- Good understanding of Scrum methodologies, Test Driven Development(TDD) and continuous integration.
- Excellent knowledge in SQL technologies.
- Worked on an enterprise-level, application development and testing team with primary focus in the area of functional testing and regression testing.
TECHNICAL SKILLS:
Languages: Python, Core Java, Visual Basic, C++, ASP.NET, HTML, JavaScript, CSS, CGI, CLISP.
Hadoop/Big Data: Hadoop, hive, Sqoop, Pig, Puppet, Spark, Kafka, Strom,HBase, MongoDB, Cassandra, PowerPivot, Datameer, Pentaho, Flume
Databases: MySQL, MS SQL Server, DB2, MS-SQL Server.
Applications: MS Office, MS Project, Visual Studio, Net beans IDE, dotcms, MS APS.
UML Diagram: Visual Paradigm, Smartdraw, Enterprise Architect, MS Visio, UMLet
Analysis softwares: Weka, JFS, Neural Network software.
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
ETL Tools: Informatica, Pentaho
PROFESSIONAL EXPERIENCE:
Confidential, Columbus, Ohio
Hadoop Developer
Responsibilities:
- Worked with the Data Science team to gather requirements for various data mining projects.
- Developed multiple MapReduce Jobs in python for data cleaning and pre-processing.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Developed Simple to complex MapReduce Jobs in python using Hive and Pig.
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Involved in running Hadoop jobs for processing millions of records of text data.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Worked on creating real time data streaming solutions using Apache Spark/Spark Streaming/Apache Storm, Kafka and Flume.
- Involved in loading data from LINUX file system to HDFS.
- Responsible for managing data from multiple sources.
- Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
- Extracting the list of areas using the service extensively, using spark/scala.
- Experienced in running Hadoop streaming jobs to process terabytes of XML/JSON/CSV/TSV format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Cassandra, Kafka, LINUX.
Confidential, Washington, DC
Hadoop Developer
Responsibilities:
- Manage the full project lifecycle, from initiation through implementation, including requirements gathering & prioritization, defining scope & schedule, obtaining approvals, development, testing tasks and troubleshooting.
- Retrieved data and analyzed it using spark/scala.
- Coordinate in Design, implementation and test data distribution framework.
- Communicate and coordinate for cross-functional projects.
- Build data lake using HDFS, Flume, Sqoop, Hive, Pig, HBase from JSON data using python.
- Manage the entire Hadoop Distributed Data File System
- Implement new Hadoop hardware infrastructure, OS integration and application installation
- Responsible for setting up the Hadoop cluster, configuration, performance tuning and Hadoop Cluster maintenance.
- Monitor Hadoop cluster, troubleshoot problems, file system management and monitoring.
- Experience in troubleshooting defects & delays, and build tests to prevent reoccurrence of the failures.
- Replacing Mapreduce with GridGain.
- Import data into Hadoop from Oracle and SQL server using Sqoop.
- Experience in processing the data using PIG Latin and HiveQL.
- Execute system and disaster recovery processes as required.
- Implement and maintain security as designed by the Hadoop Architects.
- Experience in building infrastructure for Hadoop-Analytics environments, capacity management and forecasting.
- Work with the Hadoop production support team to implement new business initiatives as they relate to Hadoop.
- Evolve the Data Platform architecture, data, compute, and access paradigms to meet and exceed business user expectations.
- Support deliverable for SLA and uptime.
- Support and advice business users on use of the Data Platform to solve. Business/Analytical problems.
- Solid understanding of all phases of development using multiple methodologies i.e. Waterfall, Agile.
Environment: Hadoop, Hive, Pig, HBASE, Sqoop, Flume, Zookeeper, Pig, HDFS.
Confidential, Richland, WA
Big Data Analyst
Responsibilities:
- Understanding and collecting database requirements.
- Tracking progress through stand up meetings and other scrum meetings.
- Migrating MS SQL Server data to Hadoop using Sqoop.
- Deploying MS Analytical platform System(APS).
- Experience working with MS Analytical platform System(APS).
- Maintaining sensitive healthcare records in Parallel data warehouse(PDW) processing environment.
- Involved in analyzing system failures, identifying root causes and recommended course of actions.
- Extracted TSV and JSON using MS APS data for researchers.
- Querying across relational data stored in Parallel Data Warehouse (PDW) and non-relational Hadoop data that is stored in the Hadoop Distributed File System (HDFS), for researchers.
- Integration and joining of PDW and Hadoop data (Polybase technology).
- Used PowerPivot for ease in generating data reports.
- Cleaning Hadoop data at the backend.
- Customising functionalities using UDAF and UDTF for specific researchers.
- Used Flume to collect and store data from different departments.
Environment: Hadoop, HDFS, MS Analytical platform system (APS), Sqoop, Parallel Data Warehousing(PDW), PowerPivot, Flume.
Confidential, Bryan, TX
Jr. Big Data Developer
Responsibilities:
- Worked on a customize software building process.
- Involved in developing application at the backend using Node JS and Servelets.
- Handled database for the application at the backend using Hadoop framework.
- Worked on analyzing customer choice based on previous selected values.
- Installed/Configured/Maintained Cloudera Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Developed multiple MapReduce jobs in Java and python for data cleaning and preprocessing.
- Handling scheduling using Yarn.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Had to do the Cluster co-ordination services through ZooKeeper.
- Supported Mapreduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in mapreduce way.
- Involved in defining job flows, managing and reviewing log files.
- Creating Resilient Distributed Dataset and gathering data on driver.
- Generating meaningful analytical report using HUE and Impala.
- Experience with NoSQL database(Cassandra and MongoDB).
- Following Agile methodology for improved data performance with both inter team and within the team cooperation.
- Application of sampling in performance profiling.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, NoSQL (Cassandra and MongoDB), SQL, J2EE, JUnit, Cloudera VMWare.
Confidential, Louis, MO
Java Developer
Responsibilities:
- Developed the application under JEE architecture, developed Designed dynamic and browser compatible user interfaces using JSP, Custom Tags, HTML, CSS, and JavaScript.
- Deployed & maintained the JSP, Servlets components on Weblogic 8.0
- Developed Application Servers persistence layer using, JDBC, SQL, Hibernate.
- Used JDBC to connect the web applications to Databases.
- Implemented Test First unit testing framework driven using JUnit.
- Developed and utilized J2EE Services and JMS components for messaging communication in Web.
- Uploading and downloading programs from Github.
- Configured development environment using Weblogic application server for developers integration testing.
- Developed Struts action classes, action forms and performed action mapping using Struts framework and performed data validation in form beans and action classes.
- Defined the search criteria and pulled out the record of the customer from the database. Make the required changes and save the updated record back to the database.
- Validated the fields of user registration screen and login screen by writing JavaScript validations.
- Developed build and deployment scripts using Apache ANT to customize WAR and EAR files.
- Used DAO and JDBC for database access.
- Developed stored procedures and triggers using PL/SQL in order to calculate and update the
- tables to implement business logic.
- Generated Reports using report definition language(RDL) for SSRS reports.
- Design and develop XML processing components for dynamic menus on the application.
- Involved in post production support and maintenance of the application.
- Setting up distributed environment and deploying application on distributed system.
- Prediction through logical regression, Naive Bayes and decision trees.
Environment: Java 1.4, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit.
Confidential
JAVA Developer
Responsibilities:
- Responsive contribution in stand-up meetings, reviews, release planning, demos and other scrum related meetings.
- Involved in data modeling and use case design.
- Involved in development of sales management software application using Eclipse IDE.
- Application of JQuery/JS for responsive GUI.
- Made use of Spring framework.
- Extensive use of HTML features and transferring data between pages using JS.
- Experience using PHP and CGI.
- Worked on Amazon web services(AWS).
- Implementing plug-in for user ease.
- Extensively used Struts framework as the controller to handle subsequent client requests and
- invoke the model based upon user requests.
- Maintaining and updating data on MS SQL Server using MS SQL Server Management Studio.
- Defined the search criteria and pulled out the record of the customer from the database. Make the required changes and save the updated record back to the database.
- Used DAO and JDBC for database access.
Environment: Java, HTML, JQuery, PhP, CGI, JavaScript, MS SQL Server, JDBC, Eclipse IDE.