Hadoop Spark Developer. Resume California - Hire IT People

SUMMARY:

Around 8+ years of Experience in IT Industry in Application Development and Data Analytics using various languages and 3 years of experience in Big Data Hadoop Eco - System, developing java code and Strong in Design, Requirement gathering, Analysis, development, implementation and support of applications in the role of Java Developer and Big Data Hadoop Developer.
Proficiency in Big Data & ETL Practices and Technologies like HDFS, YARN, MapReduce, Hive, PIG, HBase, SQOOP, OOZIE, ZOOKEEPER, Flume, Kafka, Impala, Spark.
Excellent Hands on Experience in developing Hadoop Architecture in Windows and Linux platforms.
Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
Experience in writing MapReduce programs with custom logics based on the requirement.
Experience in writing custom UDFs in hive based on the user requirement.
Strong Knowledge of using PIG and Hive for processing and analyzing large volumes of data.
Expert in extending core functionality of Hive and PIG by writing the Custom UDF’s using Java, Python based on user requirement.
Experienced in Writing Hive queries for data analysis and to process the data for visualization using Tableau and Splunk.
Extensive experience with Data Extraction, Transformation, and Loading (ETL) from disparate data sources like Multiple Relational Databases (Oracle,Informatica,SQL SERVER, and DB2), VSAM and Flat Files.
Expertise in importing and exporting data using SQOOP from HDFS to various Database Systems (RDBMS, Data Ware Houses, Data lake) and vice-versa .
Experience in ETL tools like Talend and in-depth knowledge on Working of Data Ware Houses.
Capable of creating Real time data streaming Solutions and Batch style large scale distributed computing applications using Apache Spark, Spark Streaming, Kafka and Flume.
Knowledge and experience in job work-flow scheduling and monitoring tools like Autosys, Control M, Oozie and Zookeeper.
Proficient in writing in shell and Perl scripts and proficient in Unix/Linux Commands.
Experience in storing, processing unstructured data using NoSQL databases like HBase, MongoDB, and Cassandra.
Good Knowledge on AWS Concepts like EC2, S3, and EMR.
Experience in performing Unit Test, Regression Test, Functional test and Integration Test and experienced in using MR Unit for Hadoop Testing.
Experience in Object Oriented Analysis and Design ( OOAD ) and development of software using Java, Scala, and Python in LINUX/UNIX platform.
Load and transform data into HDFS from large set of structured data/Oracle/Sql server Informatica using Talend Big data studio.
Migration of Informatica code to Hive/Impala.
Optimization and performance tuning of Hive QL, formatting table column using Hive functions.
Have been involved in designing & creating hive tables to upload data in Hadoop and process like merging, sorting and creating, joining tables.
Extensive Experience in middle-tier development using J2EE Technologies and Frameworks like struts, Restful Web services, SOA, spring, Hibernate and Application servers like glassfish and Tomcat.
Proficient in working with version controls Like GIT, SVN and with Bug Tracking Tool like JIRA.

TECHNICAL SKILLS:

Big Data Eco Components: HDFS, MapReduce, Yarn, Hive, SQOOP, Pig, HBASE, Zookeeper, Oozie, Impala, Flume, Kafka, Spark,Talend

Programming Languages: Java, C, C++, Python, PL/SQL, Scala, shell scripting

J2EE Components: JDBC, JSP, Servlets, EJBs, and Design Patterns.

Frameworks: Spring, Hibernate, Struts, Restful and SOAP

Databases: Oracle MySQL, SQL Server, Cassandra.

Application Servers: Tomcat, Glass Fish, Web Sphere and Web Logic

Build Tools and Central repositories: Maven, Ant, GIT and SVN.

IDEs: Eclipse, NetBeans, Sublime

Methodology: Agile and Waterfall

Environment: Windows and Linux

PROFESSIONAL EXPERIENCE:

Confidential, California

Hadoop Spark Developer.

Responsibilities:

Performed data graduation from traditional data warehouse (Teradata) to Hadoop Data Lake
Implemented data ingestion from multiple sources like Teradata, Oracle into Hadoop using SQOOP.
Involved in collecting and aggregating large amounts of log data using Apache Flume and staging the data in HBase/HDFS for further Analysis.
Collected the logs data from web servers and integrated into HBase using Flume.
Developed Map Reduce programs that filter un-necessary records and find out unique records based on different criteria.
Responsible for performing extensive data validation using Hive.
Experienced in implementing POC's to migrate iterative map reduce programs into Spark transformations using Scala.
Push data as delimited files into HDFS using Talend Big data studio.
Usage of different Talend Hadoop Component like Hive, Pig, Spark.
Loading data into parquet files by applying transformation using Impala
Implemented and tested analytical solutions in Hadoop
Coordinated with ETL developers for preparation of hive and pig scripts.
Utilized SQL scripts for supporting existing applications.
Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
Importing and exporting data into HDFS and Hive using Sqoop.
Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SerDe's like REGEX, JSON and Avro.
Moving the data from Oracle, MS SQL Server in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
Migrated complex Map reduce programs into Spark RDD transformations, actions.
Improving the performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN using Scala.
Implemented Spark Core in Scala to process data in memory.
Performed job functions using Spark API's in Scala for real time analysis and for fast querying purposes.
Importing and exporting data into HDFS, Hive and HBase using SQOOP.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
Optimized MapReduce jobs to use HDFS efficiently using various compression Mechanisms.
Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance.
Used Kafka, Flume for building robust and fault tolerant data Ingestion pipeline for transporting streaming web log data into HDFS.
Developed a data pipeline using Kafka and storm to store data in HDFS.
Used Maven to Build and Deploy Jar’s for MapReduce, Hive and PIG UDF’s.
Developed Hive QL to process the data and to generate the Reports for visualizing.

Environment: Hadoop, MapR, MapReduce, HDFS, Hive, Pig, Zookeeper, pySpark, Spark SQL, Spark Streaming, Scala,Python, impala, EDW, Maven, Jenkins, Sqoop, Oozie, Kafka, Teradata.

Confidential, NJ

Hadoop Developer

Responsibilities:

Processed data into HDFS by developing solutions, analyzed the data using MapReduce, Hive and produce summary results from Hadoop to downstream systems.
Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
Implemented flume (Multiplexing) to stream data from upstream pipes in to HDFS.
Used SQOOP widely to import data from various RDBMS (DB2) into HDFS and to Move data between MongoDB and HDFS.
Involved in Hive partitioning, Bucketing, and performing different types of joins on Hive table and implementing serde’s like RegEx.
Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL) and Hive UDF’s in python.
Created scalable and High Performances Rest web services for data tracking.
Applied Hive quires to perform analysis of vast data on HBASE using Storage Handler to meet the business requirements.
Involve in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
Worked on NoSQL databases like HBase, integration with written storm topology to accept inputs from Kafka producer.
Planned, Implemented and Managed Splunk for log Management and analytics.
Implemented Scala jobs to integrate the real-time data coming from various queues messaging to parse it.
Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations.
Load and Transform Large Sets of Structured and semi-structured data
Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various Hadoop Programs using OOZIE.
Exported the analyzed data to the relational databases using SQOOP for visualization and to generate reports for the BI team.
Involved in writing Unix/Linux Shell scripting for scheduling jobs and for writing Pig scripts and Hive QL.
Cluster co-ordination services through Zookeeper.
Involved in upgrading clusters to Cloudera Distributed versions.
Designed, documented operational problems by following standards and procedures using a software reporting tool JIRA.
Automated Build upon check in with Jenkins CI (continuous Integration)
Implemented test Scripts to support test driven development and continuous integration.
Assisted in Cluster maintenance, cluster monitoring, adding and removing cluster nodes and Troubleshooting.

Environment: Hadoop, MapReduce, HDFS, SQOOP, Flume, Kafka, LINUX, OOZIE, Python, Splunk, Pig, Scala, ETL, MySQL, JIRA, Hive, Jenkins, HBASE, DB2, MongoDB, Cloudera Hadoop Cluster.

Confidential, MI

Hadoop Admin/Developer

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop.
Involved in start to end process of Hadoop Cluster setup including installation, configuration and monitoring the Hadoop Cluster.
Administered Cluster Maintenance, commissioning and decommissioning data nodes, Cluster monitoring troubleshooting.
Performed Adding/Removing nodes to an existing Hadoop cluster.
Implemented Backup configurations and recoveries from a Name Node failure.
Monitored systems and services, architecture design and implementation of Hadoop Deployment, configuration management backup and disaster recovery systems and procedures.
Implemented multiple MapReduce programs in Java for Data Analysis.
Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the Enterprise Data Ware House (EDW).
Created Hive Queries for the Market analysts to analyze the emerging data and comparing it with fresh data with EDW references tables.
Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
Designed and presented plan for POC on impala.
Involved in migrating Hive QL into Impala to minimize query response time.
Implemented Avro and parquet data formats for apache Hive computations to handle custom business requirements.
Worked on Sequence files, ORC, RC files, Map side joins, bucketing, partitioning for Hive.
Performance enhancement and storage improvement.
Performed extensive Data Mining applications using HIVE.
SQOOP jobs, PIG and Hive scripts were created for data ingestion from relational databases (MySQL) to compare with historical data.
Worked on Storm real time processing bolts which save data to SOLR and HBase.
POC for Enabling members and suspect search using SOLR.
Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
Maintained and managed code using GIT and used JIRA for BUG Tracking
Performed complex Linux administrative activities as we created, maintained and updated Linux shell scripts
Implemented test scripts to support test driven development and continuous integration.
Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Sqoop, STORM, SOLR, Flume, JIRA, Control M, Java, Linux, Maven, Zookeeper, EDW, GIT, ETL, Tableau, Cloudera, MySQL.

Confidential, Dallas, TX

Java Hadoop Developer

Responsibilities:

Involved in gathering requirements and converting the requirement into technical specifications.
Implemented various OOP concepts and design patterns such as IOC (DI), Singleton, DAO, and prototype.
Designing and implementing Spring UI Layer for the Application using the Spring MVC and Java Scripts.
Involved in Developing Restful web services, Deployment configurations and testing Using Jersey.
Involved in setting up and Monitoring the Hadoop Cluster along with Administrator.
Responsible for managing data coming from different sources
Involved in HDFS maintenance and loading of structured and unstructured data into HDFS.
Developed and supported MapReduce programs to perform data filtering for unstructured data and jobs that’s are running on Hadoop Clustering.
Implemented the import and export of Data using SQOOP between MySQL to HDFS on regular basis.
Used Flume to load the data from different sources like file systems, servers into HDFS.
Created partitioned Hive tables and wrote Hive queries for Data Analysis to meet the Business requirements.
Designed and Developed UDF’s to extend the functionality in both PIG and Hive.
Developed scripts and batch jobs to schedule various Hadoop Programs.
Developed Hibernate with Spring Integration as the data abstraction to interact with the database.
Configured Apache HTTP server and Apache Tomcat Server.
Designed and Maintained Control M work flow to manage the flow of jobs in the cluster.
Involved in Unit Testing and Developed Junit Test cases for unit testing and used various mock up frameworks like mock it Rest client UI.
Actively updated the upper management with Daily updates on the progress of projects that include the classifications levels that were achieved on the data.

Environment: Java, Hadoop, HDFS, MapReduce, PIG, Hive, SQOOP, Control M, Linux, MySQL, J2EE, spring, Spring MVC, Hibernate, SQL, Restful Web Services, Apache Tomcat, Junit, Maven, HTML, JSP.

Confidential

Java developer

Responsibilities:

Extensively worked on Struts Framework.
Associated in designing application using MVC design pattern.
Developed front-end user interface modules by using HTML, XML, Java AWT, and Swing.
Front-end validations of user requests carried out using Java Script.
Designed and developed the interacting JSPs and Servlets for modules like User Authentication and Summary Display.
Used Jconsole for the memory management.
Developing Action, Action Form, Front Controller, Singleton Classes, and Transfer Objects (TO), Business Delegates (BD), Session Façade, Data Access Objects (DAO) and business validators.
Analyzed, designed, implemented and integrated product in existing application.
Wrote Jboss Quartz to schedule jobs.
Communicated with the other components using JMS within the system.
Designed and Developed Web Services implemented SOA architecture using SOAP and XML for the module and published (exposed) the Web Services.
Used JDBC to connect the J2EEserver with the relational database.
User input validations done using JavaScript and developed use cases using UML.
Extreme programming methodologies for replacing the existing code and testing in J2EE environment.
Developed web pages using HTML5, DOM, CSS3, JSON, JavaScript, JQuery and AJAX.
Implemented applications using Bootstrap framework.
Worked on developing internal customer service representative (CSR's) tools.
Redesigned the service plan page to display dynamically service products based on user selection.
Created functional requirements document for rental car industry with telematics system capability and use cases integrating Verizon managed certificate services into the Verizon M2M Management Center.
Developed java classes for business layer.
Developed database objects like tables, views, stored procedures, indexes.
Involved in testing and fixing the bugs.

Environment: Java, J2EE, JSP, Servlets, Struts, HTML, Maven, Java Script, JDBC, Oracle (PL/SQL), DAO, Tomcat, JUnit, Eclipse.

We provide IT Staff Augmentation Services!

Hadoop Spark Developer. Resume

CaliforniA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship