Hadoop Developer Resume
Bowie, MD
SUMMARY
- 8+ years of IT experience in analysis, design and development using Hadoop, Java and J2EE.
- 3 plus years of experience with Hadoop, HDFS, MapReduce and Hadoop Ecosystem including Pig & Hive.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Knowledge of Data Analytics and Business Analytics processes.
- Real time streaming the data usingSparkwith Kafka.
- Hands on experience withSparkstreaming to receive real time data using Kafka
- Experience in ingesting streaming data intohadoopusingSpark, Storm Framework and Scala.
- Hands on experience in installing, configuring and deployment of Hadoop ecosystem components like Hadoop Map Reduce, YARN, HDFS, HBase, Oozie, Hive, Pig, Impala and Spark, Storm, Kafka, Tableau, Sqoop, Pig, HCatalog, Zoo Keeper, Amazon Web Services and Flume.
- CreatingSparkSQL queries for faster requests.
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java
- Continuous delivery/Continuous Integration (CD/CI) using Jenkins/CloudBeesand hostage of QC.
- Experience in building, maintaining multiple Hadoop clusters of different sizes and configuration and setting up the rack topology for large clusters also in Hadoop Administration/Architecture/Developer with multiple distributions like Horton Works & Cloudera.
- Experienced with test frameworks for hadoop using MRUnit.
- Performed data analytics using PIG, Hive, Language R for Data Scientists within the team.
- Worked extensively on Data Visualization tool Tableau, Graph DataBase like Neo4J.
- Worked on 32+ node Apache/Cloudera 4.3.2 Hadoop cluster for PROD Environment and used tools like sqoop, Flume for data ingestion from different sources to Hadoop system and Hive/Sparksql to generate reports for analysis.
- Configured Splunk to perform the web analytics.
- Good technical Skills in Oracle 11i, SQL Server, ETL Development using InformaticaQlikview, Cognos, SAS.
- Experience in managing and reviewing Hadoop log files.
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Responsible for smooth error - free configuration of DWH-ETL solution and Integration with Hadoop.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Worked on multiple stages of Software Development Life Cycle including Development, Component Integration, Performance Testing, Deployment and Support Maintenance.
- Worked extensively with Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
- Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSP, JSF, Struts, Spring, Hibernate, and Web Services.
TECHNICAL SKILLS
Hadoop/Big Data: HDFS, Mapreduce, HBase, Pig, Hive, Sqoop, Flume, Cassandra, Impala, Oozie, Zookeeper, MapR, Amazon Web Serivces, EMR, MRUnit, Spark, Storm, Greenplum, Datameer, Language R, Ignite.Java & J2EE TechnologiesCore Java, JDBC, Servlets, JSP, JNDI, Struts, Spring, Hibernate and Web Services (SOAP and Restful)
IDE’s: Eclipse, Net beans, MyEclipse, IntelliJ
Frameworks: MVC, Struts, Hibernate, Spring
Programming languages: C,C++, Java, Python, Ant scripts, Linux shell scripts, RDatabasesOracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, MongoDB, Couch DB. Graph DB
Web Servers: Web Logic, Web Sphere, Apache Tomcat
Web Technologies: HTML, XML, JavaScript, AJAX, SOAP, WSDL, Restful WS
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
ETL Tools: Informatica, IBM Infosphere, Qlikview and Cognos
PROFESSIONAL EXPERIENCE
Confidential, Bowie, MD
Hadoop Developer
Responsibilities:
- Worked on evaluation and analysis of Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
- Continuous delivery/Continuous Integration (CD/CI) using Jenkins/CloudBeesand hostage of QC App.
- Hands on experience withSparkstreaming to receive real time data using Kafka.
- Ingesting streaming data intohadoopusingSpark, Storm Framework and Scala.
- CreatingSparkSQL queries for faster requests
- Hands on experience withSparkstreaming to receive real time data using Kafka
- Conducted information sharing and teaching sessions to facilitate increased awareness of industry trends and upcoming initiatives by ensuring compliance between business strategies and goals and solution architecture designs
- Implemented Cloudera Manager on existing cluster
- Configured TLS security for Cloudera Manager and configured Hadoop security for CDH 5 using Kerberos through Cloudera Manager
- Performance tuned the application at various layers - MR, HIVE, CDH, Oracle
- Used Spark streaming for the real time processing of the data from HGFS.
- Used Qlikview to create visual interface of the real time data processing.
- Implemented partitioning, dynamic partitioning and bucketing in hive
- Imported, exported data from various databases NETEZZA, ORACLE, and MYSQL into HDFS.
- Implemented Pub/Sub model using Apache Kafka for real-time transactions to load in HDFS.
- Automated the process from pulling the data from data sources to Hadoop and exporting the data in the form of Jason files in to specified location.
- Migrated the Hive queries to Impala
- Worked on various file formats Avro, SerDe, Parquet, and Text by using snappy compression.
- Created analysis batch job prototypes using Hadoop, Pig, Oozie, Hue and Hive
- Designed, documented operational problems by following standards and procedures using a software reporting tool JIRA.
Environment: Hadoop, HDFS, Map Reduce, Spark, SOLR,Hive, Impala, Pig, Sqoop, Java, Unix shell scripting, Oracle, Netezza, MySql, Qlikview
Confidential, Mountain View, CA
BigData/Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop.
- Implemented nine nodes CDH3 Hadoop cluster on Red hat LINUX.
- Involved in loading data from LINUX file system to HDFS.
- Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
- Developed performance utilization charts, optimized and tuned SQL and designed physical databases.
- Assisted developers with Teradata load utilities and SQL.
- Researched Sources and identified necessary Business Components for Analysis.
- Gathered the required information from the users.
- Interacted with different system groups for analysis of systems.
- Created tables, views in Teradata, according to the requirements.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Implemented a script to transmit sysprin information from Oracle to Hbase using Sqoop.
- Implemented best income logic using Pig scripts and UDFs.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Responsible to manage data coming from different sources.
- Involved in loading data from UNIX file system to HDFS.
- Load and transform large sets of structured, semi structured and unstructured data
- Cluster coordination services through Zookeeper.
- Experience in managing and reviewing Hadoop log files.
- Job management using Fair scheduler.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and
- Troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Experience with Re-engineered customer account software systems used by brokerage teams. Web developer for user interfaces to trading inquiries, support parallel systems.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment: MapReduce, Java (jdk1.6), Flat files, Oracle 11g/10g, Netezza, UNIX, Sqoop, Hive, Oozie.
Confidential, Charlotte, NC
Hadoop Consultant
Responsibilities:
- Worked extensively on importing data using scoop and flume.
- Responsible for creating complex tables using hive.
- Created partitioned tables in Hive for best performance and faster querying.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
- Experience with professional software engineering practices and best practices for the full software development life cycle including coding standards, code reviews, source control management and build processes.
- Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources
- Involved in source system analysis, data analysis, data modeling to ETL (Extract, Transform and Load)
- Written multiple MapReduce procedures to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
- Handling structured and unstructured data and applying ETL processes.
- Develop Hive queries for the analysts
- Prepare Developer (Unit) Test cases and execute Developer Testing.
- Create/Modify shell scripts for scheduling various data cleansing scripts and ETL loading process.
- Supports and assist QA Engineers in understanding, testing and troubleshooting.
- Written build scripts using ant and participated in the deployment of one or more production systems
- Production Rollout Support that includes monitoring the solution post go-live and resolving any issues that are discovered by the client and client services teams.
- Designed, documented operational problems by following standards and procedures using a software-reporting tool JIRA.
Environment: MapReduce, Java (jdk1.6), Flat files, Oracle 11g/10g, Netezza, UNIX, Sqoop, Hive, Oozie.
Confidential, Westborough, MA
Java Developer
Responsibilities:
- Writing application code using Core Java, J2EE, Servlets, JSP, Hibernate and Spring with Restful Web Services, Maven and ANT build tools.
- Used Spring framework for DI/IOC, Spring MVC design pattern implementation, configuring application context files and performed database object mapping using Hibernate annotations.
- Involved in implementing DAO pattern for database connectivity and Hibernate for object persistence.
- Designed front ends using JSP’s, JSTL Tag Libs, Display Tags, JavaScript, HTML, CSS, JQuery and Dojo Toolkit.
- Using ANT and MAVEN for project builds.
- Designing and developing search components using Oracle Endeca Information Discovery platform.
- Developed and configured the baseline, partial and delta pipelines.
- Writing Java classes for each component in the pipeline.
- Converting raw data into XML format with DTD validation, Extracting URL’s from these XML’s, reading and writing XML files using DOM Parser and SAX Parser.
- Configuring the Search patterns, properties and dimensions.
- Configuring Endeca platform using Web Studio and Developer Studio for implementing search pattern as required by client.
- Perform logging and Unit testing the code using JUnits.
- Writing use cases and sequence diagrams using UML.
- Writing queries, Stored Procedures, Functions, and backend programming using Oracle, SQL/PL-SQL.
Environment: Java/J2EE, JSP, Hibernate, Spring, Maven, Restful Web Services, Endeca Information Discovery (Endeca Server, Developer Studio, Web Studio, Web Crawler), Eclipse, log4j, Slf4j, Maven, ANT, Oracle, SQL/PL-SQL, HTML, XML, JSON, JavaScript, CSS, Dojo, JQuery, and Weblogic.
Confidential
Java Developer
Responsibilities:
- Involved in the High level and detailed design, Coding, Testing, and Implementation of the applications.
- Created the riders module, where information of clients are collected, stored, and managed.
- Developed code for Vendor module& Product module which contain information about various vendors from which insurance are bought and their product information respectively.
- Involved in creating JSP pages, JavaScript’s validation and developed error handling framework for the application.
- Preparing unit test cases and documents for QA.
- Involved in the Data-Migration activities for creating Data Mapping documents from client source data to our new target system.
- Used AJAX for asynchronous communication with server to provide better user experience.
- Writing standalone Java classes, reusable components and Java Beans for getting data from oracle tables.
- Written hibernate mapping files, POJO classes.
- Involved in writing HQL queries and criteria queries.
- Supporting System Integrating Testing and User Acceptance Testing.
- Performing the Unit testing and basic functional testing with different set of data.
- Used Junit for unit testing and code coverage using EclEmma
Environment: Java, Spring, Hibernate, ZK/ZUL, Rest webservice, PL/SQL, Oracle database, Junit, Maven
Confidential
J2EE Programmer
Responsibilities:
- Coordinated with mainframe developers to understand, preserve and migrated legacy application functionality from mainframe to Java/J2EE.
- Developed application using core java, J2EE, hibernate, oracle.
- Worked on JavaScript, JSP, and Servlets as a web application replacement for mainframe front end.
- Wrote customization codes for FIT to adapt to the application requirements.
- Identified bugs in the migrated application by running test cases and using Eclipse IDE.
- Wrote ANT scripts and handled deployments on Weblogic server at test, stage levels.
- Analyzed, created and proposed remediation measures to fix the bugs in the application.
- Used log4j for logging monitoring errors and exceptions all across the application.
- Analyzed database table relationships to understand legacy application.
- Coordinated with Database Administrators to analyze and optimize the application load on database.