Sr. Hadoop/spark Developer Resume
Pleasanton, CA
PROFESSIONAL SUMMARY:
- Overall 7 years of IT experience in a variety of industries, which includes hands on experience on Big Data Analytics, and Development.
- Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and Map Reduce Programming paradigm.
- Experience in installation, configuration, Management, supporting and monitoring Hadoop using
- Experienced in writing complex Map Reduce programs that work with different file formats like Text, Sequence, XML, JSON and Avro.
- Have good experience in Big data related technologies like Hadoop frameworks, Map Reduce, Hive, HBase, PIG, Sqoop, Spark, Kafka, Flume, Zookeeper, Oozie, and Storm.
- Experienced in writing complex MapReduce programs that work with different file formats like Text Sequence, XML, JSON and Avro.
- Hand - on experience in using Spark Streaming, batch processing for processing the Streaming
- Have working experience on Cloudera Data Platform using VMware Player, Cent OS 6 Linux Environment. Strong experience on Hadoop distributions like Cloudera and HortonWorks.
- Good knowledge of No-SQL databases MongoDB and HBase.
- Expertise in Database Design, Creation and Management of Schemas, writing Stored Procedures, Functions, DDL, DML SQL queries.
- Used Spark SQL, HQL queries for analyzing the data in HDFS.
- Worked on HBase to load and retrieve data for real time processing using Rest API.
- Very good experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Good working experience using Sqoop to import data into HDFS or Hive from RDBMS and exporting data back to HDFS or HIVE from RDBMS.
- Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
- Worked with Big Data distributions like Cloudera (CDH 3 and 4) with HDP 2.4.
- Worked on ETL tools like Talend to simplify Map Reduce jobs from the front end. Also have knowledge of Pentaho and Informatics as another working ETL tool with Big Data.
- Worked with BI tools like Tableau for report creation and further analysis from the front end.
- Extensive knowledge in using SQL queries for backend database analysis.
- Involved in unit testing of Map Reduce programs using Apache MRunit.
- Worked on Amazon Web Services and EC2.
- Excellent Java development skills using J2EE, J2SE, Servlets, JSP, Spring, Hibernate, JDBC.
- Experience in creating Reusable Transformations (Joiner, Sorter, Aggregator, Expression, and Lookup, router Filter, Update Strategy, Sequence Generator, Normalizer and Rank) and Mappings using Informatica Designer and processing tasks using Workflow Manager to move data from multiple sources into targets.
- Implemented SOAP based web services.
- Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle.
- Experience working with Build tools like Maven and Ant.
- Experienced in both Waterfall and Agile Development (SCRUM) methodologies
- Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions
- Experience in developing service components using JDBC.
- Coordinated with the Offshore and Onshore teams for Production Releases.
- Good analytical, problem solving, communication skills and have the ability to work either
TECHNICAL SKILLS:
Hadoop Technologies: Apache Hadoop, Cloud era Hadoop Distribution (HDFS and Map Reduce) Technologies HDFS, YARN, MapReduce, Hive, Pig, Sqoop, Flume, Spark, Kafka, Zookeeper, and Oozie
Java/J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts.
NOSQL Databases: Hbase, MongoDB
Programming Languages: Java, Scala, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Java Script, Shell Scripting
Web Technologies: HTML, J2EE, CSS, JavaScript, AJAX, Servlet, JSP, DOM, XML
Application Servers: Web Logic, Web Sphere, JBoss
Cloud Computing tools: Amazon AWS.
Build Tools: Jenkins, Maven, ANT
Databases: MySQL, Oracle, DB2
Business Intelligence Tools: Splunk,Talend
Development Methodologies: Agile/Scrum, Waterfall.
Development Tools: Microsoft SQL Studio, Toad, Eclipse, NetBeans.
Operating Systems: WINDOWS, MAC OS, UNIX, LINUX.
PROFESSIONAL EXPERIENCE:
Confidential, Pleasanton, CA
Sr. Hadoop/Spark developer
Responsibilities:
- The main aim of the project is tuning the performance of the existing Hive Queries and preparing Spark jobs that are scheduled daily in Tez.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Responsible for design development of Spark,SQL Scripts bases on Functional Specifications.
- Responsible for Spark streaming configuration based on type of Input.
- Real time streaming the data using Spark, Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
- Developed the services to run the Map-Reduce jobs as per the requirement basis.
- Importing and exporting data into HDFS and HIVE, PIG using Sqoop.
- Responsible to manage data coming from different sources.
- Monitoring the running MapReduce programs on the cluster.
- Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and also written Pig/Hive UDFs.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Writing MapReduce (Hadoop) programs to convert text files into AVRO and loading into Hive(Hadoop) tables.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Developing design documents considering all possible approaches and identifying best of them.
- Loading Data into HBase using Bulk Load and Non-bulk load.
- Developed scripts and automated data management from end to end and sync up b/w all the clusters.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Experienced with Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Partitioning data streams using Kafka. Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second. Used Kafka producer 0.8.3 API's to produce messages.
- Involved in gathering the requirements, designing, development and testing.
- Followed agile methodology for the entire project.
- Prepare technical design documents, detailed design documents.
Environment: Hadoop, Spark Core, Spark-SQL, Spark-Streaming, HDFS, MapReduce, Hive, HBase, Flume, Java, Maven, Impala, Pig, Spark, Oozie, Oracle, Yarn, GitHub, Junit, Unix, Cloudera, Flume, Sqoop, HDFS, Java, Scala, Python.
Confidential, Houston, TX
Spark/Hadoop developer
Responsibilities:
- Data Ingestion into the Indie-Data Lake using Open source Hadoop distribution to process Structured, Semi-Structured and Unstructured datasets using Open source Apache tools like FLUME and SQOOP into HIVE environment (Used IBM Big Insights Ver-4.1 platform).
- Developed MapReduce jobs in Java API to parse the raw data and store the refined data.
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Consumed the data from Kafka queue using Spark.
- Developed Sqoop Scripts to extract data from DB2 EDW source databases into HDFS.
- Configured different topologies for Spark cluster and deployed them on regular basis.
- Involved in loading data from LINUX file system to HDFS.
- Created ETL packages with different data sources (SQL Server, Flat Files, Excel source files, XML files etc.) and then loaded the data into destination tables by performing complex transformations using SSIS/DTS packages.
- Designed & developed various SSIS packages (ETL) to extract & transform data & involved in Scheduling SSIS Packages.
- Importing and exporting data into HDFS and Hive using Sqoop .
- Managed and reviewed Hadoop log files. Installing and deploying IBM Web-sphere. Installing and deploying IBM Web-sphere.
- Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in parquet format.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
- Automated and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
- Involved in performing the Linear Regression using Scala API and Spark.
- Installed and configured MapReduce, HIVE and the HDFS.Created MapReduce programs for some refined queries on big data.
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, HBase and Hive by integrating with Storm.
- Involved in the development of Pig UDF'S to analyze by pre-processing the data.
- Used Data Frame developed solutions to pre-process large sets of structured, with different file formats (Text file, Avro data files, Sequence files, Xml and JSON files, ORC and Parquet).me API in Java for converting the distributed collection of data organized into named columns.
- Created RDD's in Spark technology and extracted data from data warehouse on to the Spark RDD’s.
- Troubleshooting experience in debugging and fixed the wrong data or data missing problem for both Oracle Database and Mongo DB.
Environment: HDFS, MapReduce, Java API, JSP, JavaBean, Pig, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark Streaming, Storm, Yarn, Eclipse, Spring, PL/SQL, Unix Shell Scripting, Cloudera
Confidential, NY
Hadoop Developer
Responsibilities:
- Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs.
- Developed different components of system like Hadoop process that involves MapReduce, and Hive.
- Wrote hive queries using optimized ways like user-defined functions, customizing Hadoop shuffle & sort parameters.
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based Data pipeline. This pipeline is also involved in Amazon Web Services EMR, S3 and RDS.
- Worked on creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive.
- Used HCATALOG to access Hive table metadata from MapReduce or Pig code.
- Developed complex queries using HIVE and IMPALA.
- Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest claim data and financial histories into HDFS for analysis.
- Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP.
- Developed Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
- Configured Hive metastore with MySQL, which stores the metadata for Hive tables.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Developing map reduce programs for different types of Files using Combiners with UDF's and UDAF's.
- Designed the ETL process and created the high level design document including the logical data flows, source data extraction process, the database staging and the extract creation, source archival, job scheduling and Error Handling.
- Worked on Talend ETL tool and used features like context variable and database components like input to oracle, output to oracle, tFile compare, tFile copy, to oracle close ETL components.
- Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.
- Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts.
- Created, altered and deleted topics (Kafka Queues) when required with varying Performance tuning using Partitioning, bucketing of IMPALA tables.
- Experience in NoSQL database such as HBase, MongoDB.
- Load and transform large sets of structured, semi structured and unstructured data.
- Involved in loading data from UNIX file system to HDFS.
- Created an e-mail notification service upon completion of job or the particular team which requested for the data.
Environment: Hadoop, Java, HDFS, MapReduce, Pig, Hive, Impala, Sqoop, Flume, HBase, Oozie, Java, SQL scripting, ETL tools, Linux shell scripting, Eclipse and Cloudera.
Confidential, Wilmington, DE
Hadoop Developer
Responsibilities:
- Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications. installed Oozie workflow engine to run multiple Hive and Pig jobs. Used Scala collection framework to store and process the complex consumer information. Used Scala functional programming concepts to develop business logic.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
- Managed and reviewed Hadoop log files. Installing and deploying IBM Web-sphere. Installing and deploying IBM Web-sphere.
- Implemented the NoSQL database HBase and the management of the other tools and process observed running on YARN
- Importing and exporting data into HDFS and Hive using Sqoop.
- Tested raw data and executed performance scripts. Shared responsibility for administration of Hadoop, Hive and Pig. Analyze, validate and document the changed records for IBM web application.
- Setup and benchmarked Hadoop/HBase clusters for internal use. Assist the development team to install single node Hadoop 224 in local machine.
- Managing work flow and scheduling for complex map reduce jobs using Apache Oozie.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Written Hive queries for data analysis to meet the business requirements.
- These new data items will be used for further analytics/reporting purpose. It has Cognos reports as the BI component. Analysis with data visualization player Tableau. Writing Pig scripts for data processing.
- Experience in deploying applications in heterogeneous Application Servers TOMCAT, WebLogic, IBM WebSphere and Oracle Application Server.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
- Load and transform large sets of structured, semi structured and unstructured data.
- Validating the data using MD5 algorithms.
- Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs.
- Involved in Configuring core-site.xml and mapred-site.xml according to the multi node cluster environment.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard. Loaded the aggregated data onto DB2 for reporting on the dashboard.
- Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts.
- Used AVRO, Parquet file formats for serialization of data.
Environment: Big Data/Hadoop, Python, Java, Agile, Spark Streaming, HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie, DB2, NoSQL, HBase, IBM WebSphere, Tomcat and Tableau.
Confidential
Java/ J2EE Developer
Responsibilities:
- Involved in SDLC Requirements gathering, Analysis, Design, Development and Testing of application developed using AGILE methodology.
- Implemented Persistence layer using Hibernate to interact with the Oracle database used Hibernate Framework for object relational mapping and persistence.
- Actively participated in Object Oriented Analysis Design sessions of the Project, which is based on MVC Architecture using Spring Framework.
- Developed user interfaces using JSP, JSF frame work with AJAX, Java Script, HTML, DHTML, and CSS.
- Develop software in JAVA/J2EE, XML, Oracle EJB, Struts, and Enterprise Architecture.
- Developed Servlets and JSPs based on MVC pattern using Spring Framework.
- Developed Web Services using SOAP, SOA, WSDL Spring MVC and developed DTDs, XSD schemas for XML (parsing, processing, and design) to communicate with Active Directory application using Restful API.
- Used Struts tiles libraries for layout of web page, and performed struts validations using Struts validation framework.
- Involved in Daily Scrum meetings, Sprint planning and estimation of the tasks for the user stories, participated in retrospective and presenting Demo at end of the sprint.
- Experience in writing PL/SQL Stored procedures, Functions, Triggers, Oracle reports and Complex SQL’s.
- COTS Evaluation and implementation for reporting tool, that resulted in choosing Business Objects.
- Experience in developing Unit testing & Integration testing with unit testing frameworks like JUnit, Mockito, TestNG, Jersey Test and Power Mocks.
- Designed and developed entire application implementing MVC Architecture.
- Developed frontend of application using Bootstrap (Model, View, and Controller), Java Script, and Angular.js framework.
- Used Spring framework for implementing IOC/JDBC/ORM, AOP and Spring Security.
- Involved in Java, J2ee, Spring 4.0, Restful Web Services, WebSphere 5.0/6.0 in a fast-paced development environment.
- Proficient in developing applications having exposure to Java, JSP, UML, Servlets, Struts, Swing DB2, Oracle (SQL, PL/SQL), HTML, Junit, JSF, Java Script, CSS.
- Proactively found the issues and resolved them.
- Established efficient communication between teams to resolving the issues.
- Gave an innovative for logging for all interdepends application.
- Successfully delivered all product deliverables that resulted with zero defects.
- Implemented ftp utility program for copying the contents of an entire directory recursively up to two levels from a remote location using Socket Programming.
Environment: Java, J2EE, Spring, Hibernate, Struts, JSF, EJB, MYSQL, Oracle, SQL Server, DB2, PL/SQL, JavaScript, JQuery, Servlets, JSP, HTML, CSS, Agile Methodology, Eclipse, WebLogic Application Server, UNIX, XML, Junit, SOAP, Restful Web services, JDBC.
Confidential
Java /J2EE Developer
Responsibilities:
- Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC).
- Extensively worked on CORE JAVA (Collections of Generics and Templates, Interfaces for passing data from GUI Layer to Business Layer)
- Developed web interface for user's modules using JSP, HTML, XML, CSS, Java script, AJAX.
- Developed using J2EE design patterns like Command Pattern, Session Facade, Business Delegate, Service Locator, Data Access Object and value object patterns.
- Analyzed, designed and implemented Online Enrollment Web Application using Struts, JSTL, Hibernate, UML, Design Patterns and Log4J.
- Developed Custom tags, JSTL to support custom User Interfaces.
- Designed the user interfaces using JSP.
- Designed and Implemented MVC architecture using Struts Framework, Coding involves writing Action Classes/Custom Tag Libraries, JSP.
- Experienced in MS SQL Server 2005, writing Stored Procedures, SSIS Packages, Functions, and Triggers & Views.
- Developed Action Forms and Controllers in Struts 1.2 framework. Utilized various Struts features like Tiles, tagged libraries and Declarative Exception Handling via XML for the design.
- Development process the SCRUM, Iterative Agile methodologies for web application.
- Implemented Business processes such as user authentication, Account Transfer using Session EJBs
- Worked with Oracle Database to create tables, procedures, functions and select statements.
- Used Log4J to capture the log that includes runtime exceptions and developed WAR framework to alert the client and production support in case of application failures.
- Developed the Dao's using SQL and Data Source Object.
- Developed Stored Procedures, Triggers, Views, and Cursors using SQL Server 2005.
- Development carried out under Eclipse Integrated Development Environment (IDE).
- Used JBoss for deploying various components of application.
- Used Ant for building Scripts.
- Used JUNIT for testing and check API performance.
Environment: Java EE 5, JSP 2.0, Java Bean, EJB3.0, JDBC, Application Server, Eclipse, Java API, J2SDK 1.4.2, JDK 1.5, JDBC, JMS, Message queues, Web services, UML, XML, HTML, XHTML, JavaScript, log4j, CVS, Junit, Windows and Sun OS 2.7/2.8.