Hadoop Developer Resume
San Jose, CA
SUMMARY
- Around 8+ years of professional IT experience in Analysis, Development, Integration and maintenance of Web based and Client/Server applications using Java and Big Data technologies.
- 5+ years of experience as Hadoop Development and analysis. Worked on various technologies like Hive, Pig, Java MapReduce, UNIX, and HDFS.
- Strong experience working with HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Flume, Kafka, Yarn, Oozie and HBase.
- Over 2+ years of experience in development, Linux administration, implementation and maintenance of web servers and distributed Enterprise applications.
- Experience in all phases of software development life cycle (SDLC), which includes User Interaction, Business Analysis/Modelling, Design/Architecture, Development, Implementation, Integration, Documentation, Testing, and Deployment.
- Experience in analyzing teh Business requirement and creating teh hive or pig scripts to process teh aggregate data.
- Good understanding in processing of real - time data using Spark.
- Involved in preparation of Test Plans, Test Cases & Test Scripts based on business requirements, rules, data mapping requirements and system specifications.
- Ingest data using Sqoop from various RDBMS like Oracle, MYSQL, and Microsoft SQL Server into Hadoop HDFS.
- Experience in implementation of Open-Source frameworks like spring, Hibernate, Web Services etc.
- Troubleshooting issues in development and operational environments on configuration of Hadoop environments.
- Experience in Continuous Integration and Continuous Deployment by teh tools like Jenkins.
- Experience in manipulating teh streaming data to clusters through Kafka and Spark-Streaming.
- Experience with databases such as PostgreSQL, and MySQL Server with cluster setup and writing teh SQL queries Triggers & Stored Procedures.
- Experience in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Very Good understanding and Working Knowledge of Object Oriented Programming (OOPS), Python and Scala.
- Experienced with teh Spark improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Proficient in working with NoSQL database like MongoDB, Cassandra and HBase.
- Good Knowledge in NoSQL databases HBASE (Column family DB).
- Good knowledge on Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Communicated to diverse communities of clients at offshore and onshore, dedicated to client satisfaction and quality outcomes. Extensive experience in coordinating teh Offshore Development activities.
- Highly organized and dedicated with positive Attitude, possess good time management and organizational skills with teh ability to handle multiple tasks with positive attitude.
- A team player with good interpersonal, communication and leadership skills.
- Easily adaptable to teh work conditions and can consistently deliver teh quality work and capable of adapting to new technologies and facing new challenges.
TECHNICAL SKILLS
Big Data Ecosystems: Hadoop, Teradata, Map Reduce, Spark, HDFS, HBase, Pig, Hive, Sqoop, Oozie, Storm, Scala, Kafka and Flume.
Programming Languages: Java (J2SE, J2EE), C, C#, PL/SQL, Swift, SQL+, ASP.NET, JDBC, Python.
Web Development: JavaScript, jQuery, HTML 5.0, CSS 3.0, AJAX, JSON
Development Tools: Net Beans 8.0.2, Visual Studio 2013, Eclipse Neon, Android Studio, SQL developer
Testing Tools: J-Unit Testing, HP- Unified functional testing, HP- Performance Center, Selenium, win runner, Load Runner, QTP
UNIX Tools: Apache, Yum, RPM
Operating Systems: Windows, Linux, Ubuntu, Mac OS, Red Hat Linux
Protocols: TCP/IP, HTTP and HTTPS
Web Servers: Apache Tomcat
Cluster Management Tools: Cloudera Manager, Horton Works, Ambari
Methodologies: Agile, V-model, Waterfall model
Databases: HBase, MongoDB, Cassandra, Oracle 10g, MySQL, Couch, MS SQL server
PROFESSIONAL EXPERIENCE
Confidential, San Jose, CA
Hadoop Developer
Responsibilities:
- Teh GVS-CS project is having multiple teams. Among them, me involved in teh Data Engineering team.
- Teh focus of teh team is getting teh data from different vendors and process that data by using business rules.
- After processing teh data, we will send teh data to teh Eloqua tool.
- Myself, involved in teh Hadoop security architecture, which added different users to teh same YARN queue in development and productions clusters.
- After adding teh users, we validate some jobs and checked that teh new users are allocated teh same YARN queue or not in respective clusters.
- Also involved in teh security architecture for Google Platform, we are in teh process of implementing this to Google cloud projects.
- Teh security architecture for Google platform is basically a two-step verification whose is going to access teh cloud projects.
- In this project, we are spark-Sql and hive to validate large data sets with teh business rules.
- Also, involved in teh discussion of teh Hadoop data pipe lines automation to implement on our Hadoop.
- In Hadoop data pipe line automation, we want to implement Jenkins to automate teh Git commits when we push.
- their are number of offers which are going live every week, and monthly based on teh client requirements.
- We are involved in cleaning teh database when it is required like hive tables, python scripts etc.
Environment: Hadoop, HDFS, Hive, Python, Spark, SQL, Jenkins, UNIX Shell Scripting, Big Data, Map Reduce, Git, Eloqua.
Confidential, Plano, TX
Hadoop Developer
Responsibilities:
- Expert in implementing advanced procedures like text analytics and processing using teh in-memory computing capabilities like Apache Spark written in Scala.
- Used flume, Sqoop, Hadoop, spark and Oozie for building data pipeline.
- Cluster coordination services through Zookeeper.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Automated all teh jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
- Experienced in managing and reviewing Hadoop log files.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Developed Oozie workflow for scheduling and orchestrating teh ETL process. Designed & Implemented Java MapReduce programs to support distributed data processing.
- Worked with highly unstructured and semi-structured data of 30TB in size (90TB with replication factor of 3).
- Contributed towards developing a Data Pipeline to load data from different sources like Web, RDBMS, and NoSQL to Apache Kafka or Spark cluster.
- Migrating data from Spark-RDD into HDFS and NoSQL like Cassandra/HBase.
- Implement Pig in Pig-Latin to handle teh preprocessing of data and make data regular.
- Worked on reading multiple data formats on HDFS using PySpark.
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Developed MapReduce programs by using Java.
- Worked on teh core and Spark SQL modules of Spark extensive.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Environment: Hadoop, HDFS, Hive, Python, Scala, Spark, SQL, Teradata, UNIX Shell Scripting, Big Data, Map Reduce, Sqoop, Oozie, Pig, Flume, LINUX, Java, Eclipse.
Confidential, South Portland, ME
Hadoop Developer
Responsibilities:
- Worked in Multi Clustered Hadoop Eco-System environment.
- Created Map Reduce programs using Java API that filter un-necessary records and find out unique records based on different criteria.
- Used Unit Test Pythonlibrary for testing many Python programs and block of codes.
- Parse JSON and XML data using Python.
- Rewrite existing Java application in Pythonmodule to deliver certain format of data.
- Load and transform large sets of unstructured data from UNIX system to HDFS.
- Use Apache Scoop to dump teh data user data into teh HDFS on a weekly basis.
- Created production jobs using Oozie work flows that integrated different actions like Map Reduce, Sqoop, and Hive.
- Involved in importing teh real time data to Hadoop usingKafkaand implemented teh Oozie job for Daily.
- Involved in developing Hive DDLs to create, alter and drop Hive tables.
- Experienced in transferring data from different data sources into HDFS systems usingKafka Producers.
- Prepared ETL pipeline with teh halp of Sqoop for consumption.
- Written PIG Scripts to analyze Hadoop logs.
- Created tables, loading with data and writing HIVE queries which will run internally in map.
- Troubleshoot and debug HADOOP ecosystem run-time issues.
- Participated in all phases of SDLC includes areas of requirement gathering, analysis, estimation, design, coding, testing and documentation.
- Developed SOAP web service as publisher/producer.
- Developed different GUI screens JSPs using HTML, JavaScript and CSS.
- Designed teh user interface of teh application using Angular JS, Bootstrap, HTML5, CSS3 and JavaScript.
- Designed and developed front-end Graphic User Interface with JSP, HTML5, CSS3, JavaScript, and JQuery.
- Developed entire frontend and backend modules using Pythonon Django Web Framework.
- Developed tools using Python, Shell scripting, XML, BIG DATA to automate some of teh menial tasks.
- Performed Single Point of Technical Contact for different application teams and DEV, QA, Line Managers.
Environment: Hadoop MapReduce, HIVE, HDFS, Java, CSV files, Python, Django, Java, AWS, XML, Shell Scripting, MySQL, HTML, XHTML, Jenkins, Linux.
Confidential, Boston, MA
Hadoop Data Analyst
Responsibilities:
- Used Hive quires and Pig scripts to analyze data.
- Used Hive for partitioning and bucketing of data from different kind of sources to improve teh performance.
- Following agile methodology (SCRUM) during development of teh project and oversee teh software development by attending daily stand-ups.
- Used Oozie to automate teh flow of jobs and Zookeeper for coordination.
- Used Flume to distribute Unstructured and semi structured data.
- Used Sqoop to distribute structured data.
- Wrote teh Shell scripts to run teh Cron Jobs to automate teh data migration process from external servers and FTP sites.
- Prepared ETL pipeline with teh halp of Sqoop, PIG, and HIVE to be able to frequently bring in data from teh source and make it available for consumption.
- Used Tableau for visualization and generate reports for financial data consolidation, reconciliation and segmentation.
- Involved in loading data from UNIX file system to HDFS.
- Created portioned tables in HIVE.
- Developed MapReduce programs by using Java.
- Developed various UDF's in hive for various hive scripts achieving various functionalities.
- Implemented Kafka messaging services to stream large data and insert into database.
- Analysed large amounts of data sets by writing Pig scripts.
- Developed Map reduce programs for teh files generated by hive query processing to generate key, value pairs and upload teh data to NoSQL database HBase.
Environment: HDFS, Hive, MapReduce, Java, NoSQL, Unix, Linux, Jenkins, shell scripting, MySQL, Spreadsheet.
Confidential, Fort Worth, TX
Data Analyst
Responsibilities:
- Communicated effectively in both a verbal and written manner to client and offshore team.
- Completed documentation on all assigned systems and databases, including business rules, logic, and processes.
- Created Test data and Test Cases documentation for regression and performance.
- Designed, built and implemented relational databases.
- Determined changes in physical database by studying project requirements.
- Developed intermediate business knowledge of teh functional area and processed to understand teh application of data information to support business function.
- Facilitated gathering moderately complex business requirements by defining teh business problem.
- Facilitated teh monthly Opportunities for Improvement (OFI) meeting.
- Identified Opportunities for Improvement (OFI), recommended and implemented, as applicable, processed improvement plans in collaboration with identified departments.
- Identified and addressed outliers in an efficient and professional manner following a predetermined protocol.
- Identified data requirements and isolated data elements.
- Leveraged a basic understanding of multiple data structures and sources.
- Maintained and assisted in teh development of moderately complex business solutions, which included data, reporting, business intelligence/analytics.
- Maintained data dictionary by revising and entering definitions.
- Maintained direct, timely and appropriate communication with clients.
- Supported data governance, integrity, quality and audit functions.
- Supported teh implementation of technical data solutions and standards.
- Utilized and prepared analysis reports summarizing Opportunities for Improvements (OFIs).
- Worked closely with other members of teh database group.
Environment: Linux, Unix, Java, spreadsheet, QlikView, SQL, Excel, shell scripting, MySQL.
Confidential
Java Developer
Responsibilities:
- Used Eclipse as an IDE for development of teh application.
- Developed Application in Jakarta Struts Framework using MVC architecture.
- Implemented J2EE design patterns Session Facade pattern, Singleton Pattern.
- Created Action Forms and Action classes for teh modules.
- Customizing all teh JSP pages with same look and feel using Tiles, CSS.
- Developed JSP's to validate teh information automatically using Ajax.
- Created struts-config.xml and tiles-def.xml files.
- Involved in coding for teh presentation layer using Apache Struts, XML and JavaScript.
- Used XSLT for UI to display XML Data.
- Utilized JavaScript for client-side validation. Participated in designing teh user interface for teh application using HTML and connected them to database using JDBC.
- Created web pages based on teh requirements and styled them using CSS.
- Involved in writing client-Side Scripts using Java Scripts and server-Side scripts using Java Beans and used Servlets for handling teh business
- Developed teh Form Beans and Data Access Layer classes.
- Involved in writing complex sub-queries and used Oracle for generating on-screen reports
- Worked on database interaction layer for insertions, updating and retrieval operations on data.
- Involved in deploying teh application in test environment using Apache Tomcat.
Environment: JSP, Core Java, Servlets, Struts, UML, AJAX, SQL, JUNIT, JavaScript, Eclipse, JIRA, HTML, CSS.