Hadoop Developer Resume
Bentonville, AR
SUMMARY:
- 8 years of total IT experience which includes Java Development, Web application Development, Database Management and Big Data ecosystem technologies
- Around 3+ years of Leveraged strong Skills in developing applications involving Big Data technologies like Hadoop, Spark, MapReduce, Yarn, Flume, Hive, Pig, Kafka, Storm, Sqoop, HBase, Cassandra, Hortonworks, Cloudera, Mahout, Avro and Scala.
- Hands - on experience with Hadoop applications (such as administration, configuration management, monitoring, debugging, and performance tuning)
- Skilled programming in Map-Reduce framework and Hadoop ecosystems.
- Very good experience in designing and implementing MapReduce jobs to support distributed data processing and process large data sets utilizing the Hadoop cluster.
- Worked on a live 60 nodes Hadoop cluster running Cloudera CDH4
- Worked with highly unstructured and semi structured data of 90 TB in size (270 TB with replication factor of 3)
- Implemented Commissioning and Decommissioning of new nodes to existing cluster.
- Extracted the data from Relational Database (SQL, Oracle, DB2) into HDFS using Sqoop.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Developed UDFs in Java as and when necessary to use in PIG and HIVE queries
- Experience in using Sequence files, RCFile, AVRO and HAR file formats.
- Developed Oozie workflow for scheduling and orchestrating the ETL process
- Implemented authentication using Kerberos and authentication using Apache Sentry.
- Worked with the admin team in designing and upgrading CDH 3 to CDH 4
- Good knowledge of Amazon Web Service components like EC2, EMR, S3 etc
- Experience and knowledge NoSQL database like HBase, Cassandra or MongoDB.
- Handled Administration, installing, upgrading and managing distributions of Cassandra.
- Advanced knowledge in performance troubleshooting and tuning Cassandra clusters.
- Done Scaling Cassandra cluster based on lead patterns.
- Good understanding of Cassandra Data Modelling based on applications.
- Experience with Cassandra Performance tuning.
- Highly involved in development/implementation of Cassandra environment.
- Good Working knowledge of BI Tools like Tableau, Spotfire Qlikview.
- Extensive experience with SQL, PL/SQL and database concepts
- Participated in requirement analysis, reviews and working sessions to understand the requirements and system design
- Object Oriented Design (OOD) experience with Rational Rose and Enterprise Architect (EA)
- Applied Use Case diagrams and Class diagrams using UML and Rational Rose
- Strong experience as a senior Java Developer in Web/intranet, Client/Server technologies using Java, J2EE, Servlets, JSP, EJB, JDBC, Python
- Proactive and well organized with effective time management skills and problem solving skills
- Good Inter personnel skills and ability to work as part of a team. Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines
TECHNICAL SKILLS:
Operating Systems: Linux (Ubuntu, CentOS), Windows, Mac OS
Hadoop ECO Systems: Hadoop, MapReduce, Yarn, HDFS, HBase, Impala, Hive, Pig, Sqoop, Oozie, Flume, ZooKeeper, Spark, Scala
NOSQL Databases: HBase, Cassandra, MangoDB
Programming Languages: C, Scala, Core Java, J2EE (SERVLETS, JSP, JDBC, JAVA BEANS, EJB), C#, ASP.NET
Frameworks: MVC, Struts, Spring, Hibernate
Web Technologies: HTML, CSS, XML, JavaScript, Maven
Scripting Languages: Java Script, UNIX, Python, R Language
Databases: Oracle 11g, MSAccess, MySQL, SQL: Server 2000/2005/2008/2012 , Teradata
SQL Server Tools: SQL Server Management Studio, Enterprise Manager, Query Analyzer, Profiler, Export & Import (DTS).
IDE: Eclipse, Visual Studio, IDLE, IntelliJ
Web Services: Restful, SOAP
Tools: Methodologies: Bugzilla, QuickTestPro (QTP) 9.2, Selenium, Quality Center, Test Link, TWS, SPSS, SAS, Documentum, Tableau, Mahout Agile, UML, Design Patterns
PROFESSIONAL EXPERIENCE:
Confidential, Bentonville, AR
Hadoop Developer
Responsibilities:
- Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents
- Extensively involved in Design phase and delivered Design documents
- Worked on analyzingHadoop cluster and different Big Data analytic tools including Pig, Hive, Impala, HBase database and SQOOP
- Involved in validating the aggregate table based on the rollup process documented in the data mapping.Developed Hive QL, Spark RDD SQL and automated the flow using shell scripting
- Developed MapReduce programs to parse the raw data and store the refined data in tables.
- Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Created Hive tables, loaded data and wrote Hive queries that run within the map.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Involved in fetching brands data from social media applications like Facebook, twitter.
- Developed and updated social media analytics dashboards on regular basis.
- Performed data mining investigations to find new insights related to customers.
- Involved in forecast based on the present results and insights derived from data analysis.
- Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
- Manage and review Hadoop log files.
- Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages.
- Involved in identification of topics and trends and building context around that brand.
- Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.
Environment: Hadoop, MapReduce, Yarn, Hive, Impala, Pig, HBase, Oozie, Sqoop, Flume, Oracle 11g, Core Java Cloudera HDFS, Eclipse.
Confidential, Chicago, IL
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Kafka, Pig, Hive, Impala and Map Reduce.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Real time streaming the data using Spark with Kafka.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala
- Worked within the Apache Hadoop framework, utilizing Opinion Lab statistics to ingest the data from a streaming application program interface (API), automate processes by creating Oozie workflows, and draw conclusions about consumer sentiment based on data patterns found through the use of Hive for external client use.
- Wrote the Storm topology with HDFS Bolt and Hive Bolts as destinations.
- Expertise in writing Storm topology development, maintenance and bug fixes.
- Developed Hadoop streaming Map/Reduce works using Java.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Involved in loading data from LINUX file system to HDFS.
- Importing and exporting data into HDFS using Sqoop.
- Good knowledge on building Apache spark applications using Scala.
- Experience working on processing unstructured data using Pig.
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Good knowledge with NoSQL databases like HBase, Cassandra
- Handled Administration, installing, upgrading and managing distributions of Cassandra.
- Advanced knowledge in performance troubleshooting and tuning Cassandra clusters.
- Done Scaling Cassandra cluster based on lead patterns.
- Good understanding of Cassandra Data Modelling based on applications.
- Experience with Cassandra Performance tuning.
- Highly involved in development/implementation of Cassandra environment.
- Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes and VMWare Vm's as required in the environment.
- Expertise in AWS data migration between different database platforms like SQL Server to Amazon Aurora using RDS tool.
- Supported Map Reduce Programs those are running on the cluster.
- Gained experience in managing and reviewing Hadoop log files.
- Involved in scheduling Oozie workflow engine to run multiple pig jobs.
- Responsible for developing data pipeline using flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
- Data scrubbing and processing with Oozie.
- Developed Pig Latin scripts to extract data from the web server output files to load into HDFS.
- Involved in developing Hive DDLs to create, alter and drop tables.
- Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
- Also exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, Pair RDDs, Storm, Spark YARN.
- Used many features like Parallelize, Partitioned, Caching (both in-memory and disk Serialization), Kryo Serialization, etc.
Environment: Hadoop, Cloudera, Big Data, HDFS, Map Reduce, Sqoop, Spark, Hive, HBase, LINUX, Java, Eclipse, Hadoop Distribution of Cloudera, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, Mango DB, Cassandra, Tableau, UNIX Shell Scripting, Putty and Eclipse
Confidential, Jacksonville, FL
JAVA/Hadoop Developer
Responsibilities:
- Migrating the needed data from MySQL in to HDFS using Sqoop and importing various formats of flat files into HDFS.
- Mainly worked on Hive queries to categorize data of different claims
- Integrated the hive warehouse with HBase for information sharing among teams.
- Written customized Hive UDFs in Java where the functionality is too complex.
- Designed and created Hive external tables using shared meta-store and supported partitioning, dynamic partitioning for faster data retrieval.
- Developed the Sqoop scripts in order to make the interaction between Pig and MySQL Database.
- Spring IOC being used to inject parameter values for the Dynamic parameters
- EJB session Beans being used to interact with Database using the JPA.
- HiveQL scripts to create, load, and query tables for extracting the summarized information
- Implemented modules using Core Java APIs, Java collection, Threads and integrating the module.
- Used Avro's the file storage format to save disk storage space.
- Maintained System integrity of sub components primarily HDFS, MR, HBase, and Hive.
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
Environment: Apache Hadoop, Shell scripting, HDFS, Hive, Map Reduce, HBase, Java, Pig, Sqoop, Cloudera CDH4, MYSQL, Tableau, Avro, Scala, Spring, EJB, XML, Java Collections, REST.
Confidential, Charlotte, NC
Sr. Java Developer
Responsibilities:
- Analyzed and reviewed client requirements and design
- Followed agile methodology for development process
- Developed presentation layer using HTML5, and CSS3, Ajax
- Developed the application using Struts Framework that uses Model View Controller (MVC) architecture with JSP as the view
- Extensively used Spring IOC for Dependency Injection and worked on Custom MVC Frameworks loosely based on Struts
- Used RESTful Web services for transferring data between applications
- Configured spring with ORM framework Hibernate for handling DAO classes and to bind objects to the relational model
- Adopted J2EE design patterns like Singleton, Service Locator and Business Facade
- Developed POJO classes and used annotations to map with database tables
- Used Java Message Service (JMS) for reliable and asynchronous exchange of important information such as Credit card transactions report
- Used Multi-Threading to handle more users
- Developed Hibernate JDBC code for establishing communication with database
- Worked with DB2 database for persistence with the help of PL/SQL querying
- Used SQL queries to retrieve information from database
- Developed various triggers, functions, procedures, views for payments
- XSL/XSLT is used for transforming and displaying reports
- Used GIT to keep track of all work and all changes in source code
- Used JProfiler for performance tuning
- Wrote python scripts to parse XML documents and load the data in database
- Wrote test cases that adhere to a Test Driven Development (TDD) pattern
- Used JUnit, a test framework that uses annotations to identify methods that specify a test
- Used Log 4J to log messages depending on the messages type and level
- Built the application using MAVEN and deployed using WebSphere Application server
Environment: Java 8, Spring framework, Spring Model View Controller (MVC), Struts 2.0, XML, Hibernate 3.0, UML,Java Server Pages (JSP) 2.0, Python, Servlets 3.0, JDBC4.0, JUnit, Log4j, MAVEN, Win 7, HTML, RESTClient, Eclipse, Agile Methodology, Design Patterns, WebSphere 6.1.
Confidential - Springfield, IL
Java Developer
Responsibilities:
- Responsible for Business Analyst activities for critical functionality for the business interfacing projects.
- Responsible for successful deployment of all Major and Minor Releases.
- Developed key modules of the Application using Frameworks / Languages / Tools as JMS, Spring 2.5, Oracle 9i and Hibernate 3.0.
- As application developer I was actively involved in designing of various Business Layer and Data Management components of this web based system over J2EE architecture using JSTL, JSP, Servlets, and JavaScript.
- Implemented MVC architecture using Spring Framework, coding involved writing Action Classes/Custom Tag Libraries, JSP
- Have been involved in the design and key component (Matching Engine) of the system using PL/SQL procedure on Oracle Database
- Built packages and procedures for designing business rules for the applications on the database side.
- Provided request and reports and Supporting Client Services for all customer requests being an SME of the application Experience developing for Unix/Linux based systems Development of Tools and Value adds to assist Performance Testing.
- Used AGILE methodology.
- Was involved in creating front-end pages using HTML 4.01, CSS and JavaScript.
- Used log4j to log exception and modified the xml to roll up, zip and archive logs.
Environment: Java 1.5, JSP, JSTL, JMS, Spring 2.5, Servlets, JavaScript, Oracle 9i, HTML 4.01, CSS, Log4j, Hibernate 3.0, PL/SQL, AGIL
Confidential
Java Developer
Responsibilities:
- Developed the system by following the agile methodology.
- Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.
- Applied OOAD principles for the analysis and design of the system.
- Created real time web applications using Node.js
- Used Web sphere Application Server to deploy the build.
- Developed front-end screens using JSP, HTML, JQuery, JavaScript and CSS.
- Used Spring Framework for developing business objects.
- Performed data validation in Struts Form beans and Action Classes.
- Used Eclipse for the Development, Testing and Debugging of the application.
- Used DOM Parser to parse the xml files.
- Log4j framework has been used for logging debug, info & error data.
- Used Oracle 10g Database for data persistence.
- SQL Developer was used as a database client.
- Used WinSCP to transfer file from local system to other system.
- Performed Test Driven Development (TDD) using JUnit.
- Used Ant script for build automation.
- Used Rational Clear Quest for defect logging and issue tracking.
Environment: Windows XP, Unix, Java, Design Patterns, Web sphere, Apache Ant, J2EE (Servlets, JSP), HTML, JSON, JavaScript, CSS, Struts, Spring, Hibernate, Eclipse, Oracle 10g, SQL Developer, WinSCP, Log4J and JUnit.
