Senior Hadoop Developer Resume
Phoenix, AZ
SUMMARY
- Hadoop Developer wif 7+ Years of experience in IT and having exclusive experience in Hadoop and Big Data related Technologies.
- Experienced in developing web applications in various domains like Retail, Banking, Insurance and Healthcare
- Strong development skills in Hadoop, HDFS, Map Reduce, Hive, Sqoop, HBase wif solid understanding of Hadoop internals.
- Well versed experience in installing, configuring, and using Apache Hadoop ecosystem components.
- Very Familiar wif HDFS, Hive, Spark, Kafka, SQOOP, Pig Latin, OOZIE, Flume and teh various components of Hadoop Eco System.
- Expertise in ingesting real time/near real time data using Flume, Kafka, Storm
- Knowledge of NO SQL databases like Mongo DB, Cassandra and HBase.
- Good noledge in writing Spark application using Python, Scala and Java
- Efficient in using Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Good understanding on Spark architecture and its components.
- Comprehensive noledge and experience in process improvement, normalization/de - normalization, data extraction, data manipulation, data cleansing on HIVE.
- Good hands-on experience in Apache Spark wif Scala
- Developed/supported application on LAMP stack (PHP, MYSQL and Apache)
- Good experience wif Amazon Cloud EC2, Simple Storage Service S3 and Amazon SQS
- Experience working in Oracle, DB2, SQL Server and My SQL database.
- Skilled in data management, data extraction, manipulation, validation, and analyzing huge volume of data.
- Experience in implementing Java/J2EE technologies for application development in various layers of projects.
- Deep noledge of AngularJS practices and commonly used modules based on extensive work experience
- Extensively Used JavaScript for client side validations and implemented jQuery for reducing data transfer between user and server
- Experience wif NLP, Elastic Search, Text mining.
- Own teh end-end development life cycle wif high quality of solution and evangelize teh test driven development (test code coverage, etc).
- Developed core modules in large cross-platform applications using JAVA, J2EE, spring, Struts, Hibernate, JAX-WS Web Services, and JMS.
- Experience in Agile and Waterfall models.
- Experience in UNIX Shell scripting.
- Proficient in using OOPs Concepts (Polymorphism, Inheritance, Encapsulation) etc.
- Analytical, organized, enthusiastic to work in a fast paced and team oriented environment.
- Expertise in interacting wif business users and understanding teh requirement and providing solutions to match their requirement.
- Excellent communication and inter-personal skills, flexible and adaptive to new environments, self-motivated, team player, positive thinker and enjoy working in multicultural environment.
TECHNICAL SKILLS
Big Data Eco System: HDFS, HBase, Hadoop MapReduce, Hive, Pig, AngularJs, Flume, Sqoop. SPARK, Kafka, Oozie.
Languages: C, C++, Core JAVA, JDBC, PL/SQL, Scala
Methodologies: Agile, V-model (Verification & Validation Model)
Database: Oracle 11/10g, My SQL, Cassandra, MongoDB, NO SQL.and HBase.
IDE/Testing Tools: Eclipse
Operating System: Linux, Windows and UNIX
Scripts: Java Script, Shell scripting, Python.
Others: MS-Office, QC, JIRA, Share Point, Visio.
Operating System: Windows XP/2000/NT/98/95,UNIX, LINUX
PROFESSIONAL EXPERIENCE
Confidential, Phoenix, AZ
Senior Hadoop Developer
Responsibilities:
- Installed and configured Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Developed workflows using custom MapReduce, Pig, Hive and Sqoop.
- Expertise in cluster tasks like Adding and Removing Nodes wifout any TEMPeffect to running jobs and data
- Worked on moving all log files generated from various sources to HDFS for further processing.
- Responsible for developing multiple Kafka Producers and Consumers from scratch as per teh software requirement specifications.
- Involved in Integrating Apache Storm wif Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating wif Storm
- Developed Spark Application by using Scala
- Highly skilled in integrating Kafka wif Spark streaming for high speed data processing
- Developed Spark scripts by using Java, and Python shell commands as per teh requirement.
- Used Spark Dataframes, Spark-SQL, Spark MLLib extensively
- Written Hive UDF to sort Structure fields and return complex data type.
- Created, altered and deleted topics (Kafka Queues) when required wif varying
- Optimizing of existing algorithms inHadoopusing Spark Context, Spark-SQL, Data Frames and Pair RDD's, Spark Yarn.
- Analyzed teh SQL scripts and designed teh solution to implement using Scala
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
- Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Responsible for loading data from UNIX file system to HDFS.
- Worked on MongoDB, Hbase (NoSql) databases which differ from classic relational databases
- Developed workflow in Control M to automate tasks of loading data into HDFS and preprocessing wif PIG.
- Tuned teh cluster for optimal performance to process teh large data sets.
- Designed and developed a distributed processing system running to process binary files in parallel and crunch teh analysis metrics into a Data Warehousing platform for reporting.
- Implemented dashboards that internally use Hive queries to perform analytics on Structured data, Avro and Json data to meet business requirements.
- Written Hive and Pig scripts as per requirements.
- Cluster co-ordination services through ZooKeeper
- Configured Oozie work flows to automate data flow, preprocess and cleaning tasks using Hadoop Actions.
- Experience in AWS EC2, configuring teh servers for Auto scaling and Elastic load balancing.
- Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
Environment: Hive QL, MySQL, Apache Spark 1.6.1, Scala 2.11, Hive, HDFS, YARN, Sqoop,Kafka, HBase, Hive, Eclipse (Kepler), Hadoop, AWS EC2, Oracle 11g, PL/SQL, SQL*PLUS, Toad 9.6, Flume, PIG,UNIX, Cosmos.
Confidential, Fort Worth, TX
Sr. Hadoop Developer
Responsibilities:
- Experience wif professional software engineering practices and best practices for teh full software development life cycle including coding standards, code reviews, source control management and build processes.
- Work closely wif various levels of individuals to coordinate and prioritize multiple projects. Estimate scope, schedule and track projects throughout SDLC.
- Worked in teh BI team in teh area of Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
- Handling structured and unstructured data and applying ETL processes.
- Assess existing and available data warehousing technologies and methods to ensure our Data warehouse/BI architecture meets teh needs of teh business unit and enterprise and allows for business growth.
- Developed a data pipeline usingKafka, Spark and Hive to ingest, transform and analyzing data.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Designed a data warehouse using Hive.
- Creating teh Hive tables and partitioned tables using Hive Index and bucket to make ease data analytics.
- Involved in creating MapReduce jobs to power data for search and aggregation.
- Worked extensively wif Sqoop for importing and exporting teh data from Relational Database systems/mainframe to HDFS and vice-versa. Loading data into Relational Database systems/mainframe.
- Developed interface for validating incoming data into HDFS before kicking offHadoopprocess.
- Developed Pig Latin scripts to extract teh data from teh web server output files to load into HDFS.
- Developed workflow in Oozie to automate teh tasks of loading teh data into HDFS and pre-processing wif Pig.
- Develop and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, SOAP, JBoss, XML, JMS, and Web Services.
- Designed and implemented Map reduce based large scale parallel relation learning system
- Prepare Developer (Unit) Test cases and execute Developer Testing.
- Supports and assist QA Engineers in understanding, testing and troubleshooting.
- Coding complex Oracle stored procedures, functions, packages, and cursors for teh client specific applications.
- Involved in teh database migrations to transfer data from one database to other and complete virtualization of many client applications
- Written build scripts using ant and participated in teh deployment of one or more production systems
- Production Rollout Support which includes monitoring teh solution post go-live and resolving any issues that are discovered by teh client and client services teams.
- Designed, documented operational problems by following standards and procedures using a software reporting tool JIRA.
Environment: Hadoop, MapReduce, HDFS, Hive, Kafka, Spark, HBase, Sqoop, Java (jdk1.6), Pig, Oozie, Oracle 11/10g, DB2, MySQL, Eclipse, ETL Tool (Informatica), PL/SQL, Java, JSP, JDBC, XML, HTML, JSON, SOAP, Maven, Ant, SVN, JIRA .Linux, Shell Scripting, SQL Developer, Toad, WinScp, Putty.
Confidential, San Francisco, CA
Hadoop developer
Responsibilities:
- Responsible to manage data coming from different sources, loading of structured and unstructured data and involved in HDFS maintenance.
- Responsible for building scalable distributed data solutions using Hadoop.
- Created Data Pipeline of Map Reduce programs using Chained Mappers
- Visualize teh HDFS data to customer Used BI tool wif teh halp of Hive ODBC Driver.
- Processed Multiple Data sources input to same Reducer using Generic Writable and Multi Input format.
- Developed and executed hive queries for denormalizing teh data.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Worked Big data processing of clinical and non clinical data using Map Reduce..
- Performed data validation on teh data ingested using MapReduce by building a custom model to filter all teh invalid data and cleanse teh data.
- Familiarity wif a NoSQL database such as MongoDB, Cassandra.
- Used Flume for importing log files from various sources into HDFS.
- Created customized BI tool for manager team that perform Query analytics using HiveQL.
- Implemented Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
- Written Hive UDF to sort Structure fields and return complex data type.
- Working on PIG Latin Scripts and UDF's while ingestion, querying, processing and analysis of Data.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Modeled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive bestpractices for tuning.
- Developed Hive queries to process teh data and generate teh data cubes for visualizing.
- Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.
- Worked on custom Pig Loaders and storage classes to work wif variety of data formats such as JSON and XML file formats.
- Involved extensively wif different kind of compression techniques like LZO, GZip, Snappy.
Environment: Hadoop, HDFS, HBase, MongoDb, MapReduce, Java, Hive, Pig, Sqoop, Flume, Oozie, Hue, SQL, ETL, Cloudera Manager, Oracle, My SQL, framework.
Confidential, Kalamazoo, MI
Big Data Analyst/Java Developer
Responsibilities:
- Installed and configured Apache Hadoop to test teh maintenance of log files in Hadoop cluster.
- Installed and configured Hive, Pig, Sqoop, and Oozie on teh Hadoop cluster.
- Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
- Extensively Involved in loading data from UNIX file system to HDFS.
- Involved in evaluating teh business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
- Responsible for building scalable distributed data solutions using Hadoop.
- Implemented Map Reduce jobs in HIVE by querying teh available data.
- Migration of ETL processes from Oracle to Hive to test teh easy data manipulation.
- Used Amazon Redshift to Store and retrieve teh data from data-warehouses.
- Developed PIG scripts to transform teh raw data into intelligent data as specified by business users.
- Supported in setting up QA environment and updating configurations for implementing scripts wif Pig.
- Performed some unit testing for teh development team wifin teh sandbox environment.
- Used Hive and created Hive tables and also involved in writing Hive UDFs and data loading.
- Imported data into HDFS and Hive from other data systems by using Sqoop.
- Installed Oozie Workflow engine to run multiple Hive and Pig Jobs.
- Generated aggregations and groups and visualizations using Tableau.
- Developed Hive queries to process teh data.
- Developed and maintain several batch jobs to run automatically depending on business requirements.
Environment: Apache Hadoop, Cloudera Manager, CDH2, CDH3 CentOS, Apache Hama, Eclipse Indigo, Java, MapReduce, Hive, Sqoop, Pig, Oozie and SQL, Struts, JUnit.
Confidential
Java Developer
Responsibilities:
- Involved in teh implementation of design using vital phases of teh Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.
- Developed teh system by following teh agile methodology.
- Applied OOAD principals for teh design and analysis of teh system.
- Using Node.js created real time web applications.
- Developed front-end screens using JSP, HTML, CSS, Java Script and Jquery.
- Used Spring Framework for developing business objects.
- Performed data validation in Struts Form beans and Action Classes.
- Used Eclipse for teh Development, Testing and Debugging of teh application.
- Web sphere Application Server has been used to deploy teh build.
- Used DOM Parser to parse teh xml files.
- Used Log4j framework for logging debug, info & error data.
- Used Oracle 10g Database for data persistence.
- SQL Developer was used as a database client.
- Performed Test Driven Development (TDD) using JUnit.
- Ant script has been used for build automation.
- Used WinSCP to transfer file from local system to other system.
- Used Rational Clear Quest for defect logging and issue tracking.
Environment: Java/J2EE, SQL, Oracle 10g, JSP 2.0, Java Script, Web Sphere 6.1, HTML, JDBC 3.0, XML, JMS, Log4j, Junit, Servlets, MVC.
Confidential
Responsibilities:
- Analyze teh requirements and documented teh technical specifications.
- Actively involved in development of JSP pages, Servlet classes and unit testing.
- Utilized Java debugging and error handling classes and techniques to troubleshoot and debug issues.
- Worked extensively wif teh Eclipse IDE built on Weblogic Server.
- Involved in teh Design Document, Coding and Debugging.
- Used Ajax Controls and CSS to give richness for GUI.
- Involved in Preparation of Unit Test Cases and Module Level Test Cases.
- Implemented teh connectivity to teh Oracle database using JDBC.
- Created SQL views, queries, functions and triggers to be used to fetch data for teh system.
- Involved in writing stored procedures and triggers using PL/SQL.
- Code walks through and Code reviews.
- Coordinating wif Project and Software Quality Assurance (SQA) teams.
Environment: JSP, Servlets, JDBC, RMI, Swing, Websphere 6.0, WSAD 5, Oracle 9i.