Sr Big Data Developer Resume
El Segundo, CA
SUMMARY
- Around 7+ years of overall experience in IT as aDeveloper, Designer & Database Administrator with cross platform integration experience usingBig Data - Hadoop and Java/J2EE.
- Around 5 years of experience exclusively on BIG DATA ECOSYSTEM using HADOOP framework and related technologies such as HDFS, HBASE, MapReduce, HIVE, PIG, FLUME, OOZIE, SQOOP, and ZOOKEEPER.
- Participated in analysis, architecture and design of Data Lake solutions for clients.
- Worked extensively in various Data Migration, Data Ingestion and Data Analysis projects.
- Experience in using Cloudera Manager for installation and management of single-node and multi-node Hadoop cluster (CDH3, CDH4 & CDH5).
- Working knowledge on MapR distribution (Converged data platform).
- In-depth knowledge and understanding of MapR-File System and MapR Streaming.
- Good Understanding of the Hadoop Distributed File System and Eco System (MapReduce, Pig, Hive, Sqoop and HBase).
- Successfully implemented the proof of concept for Spark using Python.
- Hands-on experience to setup node clusters.
- Expertise in writing Map Reduce jobs using Java native code, Pig, Hive for data Processing.
- Experience in working on feed management tool, Falcon.
- Worked on Importing and exporting data into HDFS and Hive using Sqoop.
- Worked on Import & Export of data using ETL tool Sqoop from MySQL to HDFS.
- Database design/modeling skills/performance tuning in AWS (Redshift).
- Hands-on experience loading large data sets into AWS Redshifts.
- Worked extensively on ETL Data Integration tool Talend.
- Developed PigLatin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT to extract data from data files to load into HDFS.
- Wrote Hive queries for data analysis to meet the requirements.
- Created Hive tables to store data into HDFS and processed data using HiveQL.
- Experience in developing e-commerce applications, highly distributed applications using Java, J2EE, Servlets, JSP, Struts, Spring, JDBC, Apache Tomcat Server, JavaScript, HTML, XML, XSL, SQL, Oracle 11g, 9i/8i/7.x under Unix and Windows NT/XP. Distributed Object Oriented Component analysis and Design according to industry leading J2EE platforms.
- Platform product knowledge on Tibco Datasynapse.
- Highly experienced in Server side development using Java/J2EE Technologies and ORM tools such as Hibernate 3.0, Spring 2.5, Struts 1.x/Tiles Frameworks.
- Highly Working experience with multiple Web/Application Servers like WebSphere 6.0/5.1, Weblogic 10g/8.1/6.1, Oracle App Server 10, Tomcat 4.X.
- Experience of writing client side web technologies including using XML 1.0, XSL, HTML, DHTML, CSS and JavaScript, JQuery. Experienced in parsing (DOM and SAX) XML using JAXB 2.0 API.
- Highly proficient in developing SQL, PL/SQLs. Experienced with RDBMS implementation and development using Oracle 8/8i/9i, MySQL, SQL Server 2000 and DB2.
- Expertise in PL/SQL programming in Oracle and IBM DB2 database technology.
- Experience working on NoSQL database (Hbase, Cassandra).
- Design and deployment in Web Technologies with JSP, HTML, XML, JavaScript, AJAX, Active Widgets.
- Experience in GUI design and application development using RAD 7, Eclipse 3.x, and MyEclipse 6.5.
- An excellent team player and self-starter with good communication skills and proven abilities to finish tasks before target deadlines.
TECHNICAL SKILLS
Big Data: Hadoop 2.7.2, HDFS, MapReduce, PIG 0.15.0, Hive, Hbase 1.0.0 & 0.99.0, Sqoop 1.4.6, Zookeeper 3.4.6, Flume, Zookeeper, Oozie, Spark, Impala Scala & Cassandra.
Hadoop Technologies and Distributions: Apache Hadoop, Cloudera Hadoop Distribution (HDFS and Map Reduce) - CDH3, CDH4, CDH5 and Hortonworks Data Platform (HDP)
Languages: C, C++, Java, Perl, Unix Shell Scripts, Python
Client Technologies: Java Script, CSS, HTML5, XHTML, JQUERY
Web services: XML, SOAP, WSDL, SOA, JAX- WS, DOM, SAX, XPATH, XSLT, UDDI, JAX-RPC, REST, and JAXB 2.0
Databases: MySQL, SQL/PL SQL, MongoDB Teradata, Cassandra
Web/Application Servers: Apache Tomcat 5.x, BEA Weblogic 8.x, IBM WebSphere 6.0/5.1.1, AWS.
IDE Development Tools: Eclipse 3.5, Net Beans, My Eclipse, Oracle JDeveloper 10.1.3, SOAP UI, Ant, Maven, RAD
Operating System: Linux (RHEL, Ubuntu, CentOs), Windows (XP/7/8)
ETL: Informatica, Talend
PROFESSIONAL EXPERIENCE
Confidential - El Segundo, CA
Sr Big Data Developer
Responsibilities:
- Involved in all stages of Data Migration & Data Ingestion.
- Implemented Big Data Interface to get information of customers using Rest API and Pre-Process data using Map Reduce and store into HDFS.
- Successfully implemented the proof of concept for Spark.
- Used Python Scripting for the POC on Spark.
- Imported data from RDBMS to HDFS using Sqoop import/export options.
- Configured Oozie workflows to automate data flow preprocess and cleaning tasks using Hadoop Actions used Oozie for shell actions, java actions and EL.
- Wrote MapReduce jobs using Java API
- Load and transform large sets of structured, semi structured and unstructured data.
- Extract, Transform and Load operations on large sets of structured, semi structured and unstructured data from RDBMS, using Talend.
- Worked on Talend ETL tool for Data Integration.
- Implemented Generic writable to incorporate multiple data sources into reducer to implement recommendation based reports using Map Reduce programs.
- Implemented Optimized joins to perform analysis on different data sets using Map Reduce programs.
- Implemented Hbase features such as compression and used to design, build MapReduce jobs
- Experienced in optimizing Shuffle and Sort phase in Map Reduce Phase.
- Implemented Device based business logic using Hive UDF's to perform ad-hoc queries on structured data.
- Worked extensively with HIVE DDLs and Hive Query language (HQLs)
- Implemented dashboards that internally use Hive queries to perform analytics on Structured Big data, Avro and Json data to meet business requirements.
- Experienced in handling Avro and Json data in Hive using Hive SerDe's.
- Written Map/Reduce programs, Pig scripts to specify the conditions to separate the fraudulent claims.
- Creating Hive tables and working on them using Hive QL. Importing and exporting data into HDFS from Oracle Database and vice versa using Sqoop.
- Implemented test scripts to support test driven development and continuous integration.
- Wrote shell scripts to automate document indexing to SolrCloud in production.
- Experience in managing and reviewing Hadoop log files.
- Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
- Used AWS remote computing services such as S3, EC2.
Environment: Hadoop, Hive, Map Reduce, HDFS, Pig, Sqoop, Maven, Jenkins, Java (JDK 1.6), Java 6, Eclipse, Oracle 10g, PL/SQL, SQL*PLUS, Linux.
Confidential ., Charlotte, NC
Hadoop Developer
Responsibilities:
- Wrote MapReduce jobs using Java API
- Load and transform large sets of structured, semi structured and unstructured data.
- Extract, Transform and Load operations on large sets of structured, semi structured and unstructured data.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Involved in loading data from UNIX file system to HDFS.
- Worked as part of a team involved in designing the data lake and data pipelines.
- Involved in creating Hive tables, loading with Big data and writing hive queries, which will run internally in MapReduce way.
- Developed Hive queries to pre-process the data for analysis by imposing read only structure on the stream data.
- Migrated ETL operations into Hadoop system using Pig Latin scripts for joins, filtering, and transformations.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Automated all jobs for pulling data from FTP server to load data into Hive tables using Oozie workflow.
- Supported MapReduce programs that were running on the cluster.
- Integrated spring schedulers with Oozie client as beans to handle jobs.
- Involved in upgrading Hadoop Cluster from HDP 1.3 to HDP 2.0
- Implemented secondary sorting to sort reducer output globally in MapReduce.
- Implemented data pipeline by chaining multiple mappers by using Chained Mapper.
- Created Hive Dynamic partitions to load time series data
- Experienced in handling different types of joins in Hive like Map joins, bucker map joins, sorted bucket map joins.
- Created tables, partitions, bucket and perform analytics using Hive ad-hoc queries.
- Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters
- Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop 1.3,2.0, MapReduce, Hive, Sqoop, Flume, Oozie, Java (JDK1.6), UNIX Shell Scripting, Oracle 11g, Windows NT, Perl, IBM Datastage 8.1.
Confidential, Houston, TX
Hadoop Developer
Responsibilities:
- Experience in Importing and exporting data into HDFS and Hive using Sqoop.
- Developed Pig program for loading and filtering the streaming data into HDFS using Flume.
- Experienced in handling data from different data sets, join them and pre process using Pig join operations.
- Moving Bulk amount data into HBase using Map Reduce Integration.
- Developed Map-Reduce programs to clean and aggregate the data
- Developed HBase data model on top of HDFS data to perform real time analytics using Java API.
- Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
- Strong understanding of Hadoop eco system such as HDFS, MapReduce, HBase, Zookeeper, Pig, Hadoop streaming, Sqoop, Oozie and Hive
- Implement counters on HBase data to count total records on different tables.
- Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
- Experienced import/export data into HDFS/Hive from relational database and Teradata using Sqoop.
- Handling continuous streaming data comes from different sources using flume and set destination as HDFS.
Environment: Hadoop Framework, HDFS, Map Reduce, Hive, Pig, Hbase, Sqoop, RDBMS/DB, Flat files, MySQL, CSV.
Confidential, Raleigh, NC
Hadoop Administrator/ Developer
Responsibilities:
- Hands on experience Installation, configuration, maintenance, monitoring, performance and tuning, and troubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster and Production.
- Job Tracker is used to assign MapReduce Tasks to Task Tracker in cluster of Nodes
- Good experience on cluster audit findings and tuning configuration parameters.
- Implemented Kerberos security in all environments.
- Defined file system layout and data set permissions.
- Implemented Capacity Scheduler to share the resources of the cluster for the MapReduce jobs given by the users.
- Demonstrate and understanding of concepts, best practices and functions to implement a Big Data solution in a corporate environment.
- Worked on pulling the data from oracle databases into the Hadoop cluster.
- Help design of scalable Big Data clusters and solutions.
- Manage and review data backups and log files and experience in deploying Java applications on cluster.
- Commissioning and Decommissioning Nodes from time to time.
- Work with Hadoop developers, designers in troubleshooting map reduce job failures and issues and helping to developers.
- Work with network and Linux system engineersto define optimum network configurations, server hardware and operating system.
- Evaluate and propose new tools and technologies to meet the needs of the organization.
- Production support responsibilities include cluster maintenance.
Environment: Hadoop 1.2.1, MapReduce, HDFS, Pig, Hive, Java (J2EE), XML, Microsoft (Word & excel), Linux.
Confidential, Stamford, CT
Hadoop Developer (Full-Time)
Responsibilities:
- Participated in the entire SDLC in analysis
- Involved in all Phases of Software Development Lifecycle (SDLC) using Agile development methodology
- Involved in business requirement gathering and technical specifications
- Implemented J2EE standards, MVC architecture using Spring Framework
- Developed UI using AJAX and JSF and used GWT to implement AJAX in Application
- Used Servlets, JSP,JavaScript, HTML5, and CSS for manipulating, validating, customizing, error messages to the User Interface
- Used JUnit for testing purposes
- Used JMS and EJB for J2EE platform
- Presentation Tier is built using the Spring framework
- Debugging of production issues, developing and coding different pages usingJava, JSP and HTML as per the requirement
- Usage of real time services and batch processing during the project
- Involved in Marshaling the XML files Using JAXB
- Used Apache ANT and Maven to integrate the build process
- Consumed Web Services for data transfer from client to server and vice versa using Apache CFX, SOAP and WSDL
- Worked with JSON for communicating between frontend to middleware
- Used Soap-UI for testing web-services
- Used JNDI to perform lookup services for the various components of the system
- Used Spring Inversion of Control (IOC) to wire DAO using Hibernate
- Involved in fixing defects and unit testing with test cases using JUnit
- Used Scripting in Perl and Shell Scripting
Environment: Java, J2ee, RAD 7.x, Struts 1.3.5, SQL Server 2008, JSP, CSS, JavaScript, WebSphere 6.0, Log4j, UNIX, XML, HTML, Wire Frames, CVS Tortoise.
Confidential
Jr. Java Developer
Responsibilities:
- Implemented J2EE standards, MVC2 architecture using Struts Framework
- Implementing Servlets, JSP and Ajax to design the user interface
- Used JSP, Java Script, HTML5, and CSS for manipulating, validating, customizing, error messages to the User Interface
- Used JBoss for EJB and JTA, for caching and clustering purpose
- Used EJBs (Session beans) to implement the business logic, JMS for communication for sending updates to various other applications and MDB for routing priority requests
- All the Business logic in all the modules is written in core Java
- Wrote Web Services using SOAP for sending and getting data from the external interface
- Used XSL/XSLT for transforming and displaying reports Developed Schemas for XML
- Developed a web-based reporting for monitoring system with HTML and Tiles using Struts framework
- Used Design patterns such as Business delegate, Service locator, Model View Controller, Session, DAO
- Implemented the presentation layer with HTML, XHTML, JavaScript, and CSS
- Developed web components using JSP, Servlets, and JDBC
- Involved in fixing defects and unit testing with test cases using JUnit
- Developed user and technical documentation
- Made extensive use of Java Naming and Directory interface (JNDI) for looking up enterprise beans
- Developed presentation layer using HTML, CSS, and JavaScript
- Developed stored procedures and triggers in PL/SQL
Environment: JAVA multithreading, collections, J2EE, EJB, UML, SQL, PHP, Sybase, Eclipse, JavaScript, WebSphere, JBOSS, HTML5, DHTML, CSS, XML, ANT, STRUTS 1.3.8, JUNIT, JSP, Servlets, Rational Rose, Hibernate, JSP, Servlets, JDBC, CSS, MySQL, JUnit, Apache Tomcat.
