Hadoop Spark Developer. Resume
CaliforniA
SUMMARY:
- Around 8+ years of Experience in IT Industry in Application Development and Data Analytics using various languages and 3 years of experience in Big Data Hadoop Eco - System, developing java code and Strong in Design, Requirement gathering, Analysis, development, implementation and support of applications in the role of Java Developer and Big Data Hadoop Developer.
- Proficiency in Big Data & ETL Practices and Technologies like HDFS, YARN, MapReduce, Hive, PIG, HBase, SQOOP, OOZIE, ZOOKEEPER, Flume, Kafka, Impala, Spark.
- Excellent Hands on Experience in developing Hadoop Architecture in Windows and Linux platforms.
- Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
- Experience in writing MapReduce programs with custom logics based on the requirement.
- Experience in writing custom UDFs in hive based on the user requirement.
- Strong Knowledge of using PIG and Hive for processing and analyzing large volumes of data.
- Expert in extending core functionality of Hive and PIG by writing the Custom UDF’s using Java, Python based on user requirement.
- Experienced in Writing Hive queries for data analysis and to process the data for visualization using Tableau and Splunk.
- Extensive experience with Data Extraction, Transformation, and Loading (ETL) from disparate data sources like Multiple Relational Databases (Oracle,Informatica,SQL SERVER, and DB2), VSAM and Flat Files.
- Expertise in importing and exporting data using SQOOP from HDFS to various Database Systems (RDBMS, Data Ware Houses, Data lake) and vice-versa .
- Experience in ETL tools like Talend and in-depth knowledge on Working of Data Ware Houses.
- Capable of creating Real time data streaming Solutions and Batch style large scale distributed computing applications using Apache Spark, Spark Streaming, Kafka and Flume.
- Knowledge and experience in job work-flow scheduling and monitoring tools like Autosys, Control M, Oozie and Zookeeper.
- Proficient in writing in shell and Perl scripts and proficient in Unix/Linux Commands.
- Experience in storing, processing unstructured data using NoSQL databases like HBase, MongoDB, and Cassandra.
- Good Knowledge on AWS Concepts like EC2, S3, and EMR.
- Experience in performing Unit Test, Regression Test, Functional test and Integration Test and experienced in using MR Unit for Hadoop Testing.
- Experience in Object Oriented Analysis and Design ( OOAD ) and development of software using Java, Scala, and Python in LINUX/UNIX platform.
- Load and transform data into HDFS from large set of structured data/Oracle/Sql server Informatica using Talend Big data studio.
- Migration of Informatica code to Hive/Impala.
- Optimization and performance tuning of Hive QL, formatting table column using Hive functions.
- Have been involved in designing & creating hive tables to upload data in Hadoop and process like merging, sorting and creating, joining tables.
- Extensive Experience in middle-tier development using J2EE Technologies and Frameworks like struts, Restful Web services, SOA, spring, Hibernate and Application servers like glassfish and Tomcat.
- Proficient in working with version controls Like GIT, SVN and with Bug Tracking Tool like JIRA.
TECHNICAL SKILLS:
Big Data Eco Components: HDFS, MapReduce, Yarn, Hive, SQOOP, Pig, HBASE, Zookeeper, Oozie, Impala, Flume, Kafka, Spark,Talend
Programming Languages: Java, C, C++, Python, PL/SQL, Scala, shell scripting
J2EE Components: JDBC, JSP, Servlets, EJBs, and Design Patterns.
Frameworks: Spring, Hibernate, Struts, Restful and SOAP
Databases: Oracle MySQL, SQL Server, Cassandra.
Application Servers: Tomcat, Glass Fish, Web Sphere and Web Logic
Build Tools and Central repositories: Maven, Ant, GIT and SVN.
IDEs: Eclipse, NetBeans, Sublime
Methodology: Agile and Waterfall
Environment: Windows and Linux
PROFESSIONAL EXPERIENCE:
Confidential, California
Hadoop Spark Developer.
Responsibilities:
- Performed data graduation from traditional data warehouse (Teradata) to Hadoop Data Lake
- Implemented data ingestion from multiple sources like Teradata, Oracle into Hadoop using SQOOP.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging the data in HBase/HDFS for further Analysis.
- Collected the logs data from web servers and integrated into HBase using Flume.
- Developed Map Reduce programs that filter un-necessary records and find out unique records based on different criteria.
- Responsible for performing extensive data validation using Hive.
- Experienced in implementing POC's to migrate iterative map reduce programs into Spark transformations using Scala.
- Push data as delimited files into HDFS using Talend Big data studio.
- Usage of different Talend Hadoop Component like Hive, Pig, Spark.
- Loading data into parquet files by applying transformation using Impala
- Implemented and tested analytical solutions in Hadoop
- Coordinated with ETL developers for preparation of hive and pig scripts.
- Utilized SQL scripts for supporting existing applications.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SerDe's like REGEX, JSON and Avro.
- Moving the data from Oracle, MS SQL Server in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
- Migrated complex Map reduce programs into Spark RDD transformations, actions.
- Improving the performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN using Scala.
- Implemented Spark Core in Scala to process data in memory.
- Performed job functions using Spark API's in Scala for real time analysis and for fast querying purposes.
- Importing and exporting data into HDFS, Hive and HBase using SQOOP.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Optimized MapReduce jobs to use HDFS efficiently using various compression Mechanisms.
- Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance.
- Used Kafka, Flume for building robust and fault tolerant data Ingestion pipeline for transporting streaming web log data into HDFS.
- Developed a data pipeline using Kafka and storm to store data in HDFS.
- Used Maven to Build and Deploy Jar’s for MapReduce, Hive and PIG UDF’s.
- Developed Hive QL to process the data and to generate the Reports for visualizing.
Environment: Hadoop, MapR, MapReduce, HDFS, Hive, Pig, Zookeeper, pySpark, Spark SQL, Spark Streaming, Scala,Python, impala, EDW, Maven, Jenkins, Sqoop, Oozie, Kafka, Teradata.
Confidential, NJ
Hadoop Developer
Responsibilities:
- Processed data into HDFS by developing solutions, analyzed the data using MapReduce, Hive and produce summary results from Hadoop to downstream systems.
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Implemented flume (Multiplexing) to stream data from upstream pipes in to HDFS.
- Used SQOOP widely to import data from various RDBMS (DB2) into HDFS and to Move data between MongoDB and HDFS.
- Involved in Hive partitioning, Bucketing, and performing different types of joins on Hive table and implementing serde’s like RegEx.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL) and Hive UDF’s in python.
- Created scalable and High Performances Rest web services for data tracking.
- Applied Hive quires to perform analysis of vast data on HBASE using Storage Handler to meet the business requirements.
- Involve in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Worked on NoSQL databases like HBase, integration with written storm topology to accept inputs from Kafka producer.
- Planned, Implemented and Managed Splunk for log Management and analytics.
- Implemented Scala jobs to integrate the real-time data coming from various queues messaging to parse it.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations.
- Load and Transform Large Sets of Structured and semi-structured data
- Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various Hadoop Programs using OOZIE.
- Exported the analyzed data to the relational databases using SQOOP for visualization and to generate reports for the BI team.
- Involved in writing Unix/Linux Shell scripting for scheduling jobs and for writing Pig scripts and Hive QL.
- Cluster co-ordination services through Zookeeper.
- Involved in upgrading clusters to Cloudera Distributed versions.
- Designed, documented operational problems by following standards and procedures using a software reporting tool JIRA.
- Automated Build upon check in with Jenkins CI (continuous Integration)
- Implemented test Scripts to support test driven development and continuous integration.
- Assisted in Cluster maintenance, cluster monitoring, adding and removing cluster nodes and Troubleshooting.
Environment: Hadoop, MapReduce, HDFS, SQOOP, Flume, Kafka, LINUX, OOZIE, Python, Splunk, Pig, Scala, ETL, MySQL, JIRA, Hive, Jenkins, HBASE, DB2, MongoDB, Cloudera Hadoop Cluster.
Confidential, MI
Hadoop Admin/Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in start to end process of Hadoop Cluster setup including installation, configuration and monitoring the Hadoop Cluster.
- Administered Cluster Maintenance, commissioning and decommissioning data nodes, Cluster monitoring troubleshooting.
- Performed Adding/Removing nodes to an existing Hadoop cluster.
- Implemented Backup configurations and recoveries from a Name Node failure.
- Monitored systems and services, architecture design and implementation of Hadoop Deployment, configuration management backup and disaster recovery systems and procedures.
- Implemented multiple MapReduce programs in Java for Data Analysis.
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the Enterprise Data Ware House (EDW).
- Created Hive Queries for the Market analysts to analyze the emerging data and comparing it with fresh data with EDW references tables.
- Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
- Designed and presented plan for POC on impala.
- Involved in migrating Hive QL into Impala to minimize query response time.
- Implemented Avro and parquet data formats for apache Hive computations to handle custom business requirements.
- Worked on Sequence files, ORC, RC files, Map side joins, bucketing, partitioning for Hive.
- Performance enhancement and storage improvement.
- Performed extensive Data Mining applications using HIVE.
- SQOOP jobs, PIG and Hive scripts were created for data ingestion from relational databases (MySQL) to compare with historical data.
- Worked on Storm real time processing bolts which save data to SOLR and HBase.
- POC for Enabling members and suspect search using SOLR.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
- Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
- Maintained and managed code using GIT and used JIRA for BUG Tracking
- Performed complex Linux administrative activities as we created, maintained and updated Linux shell scripts
- Implemented test scripts to support test driven development and continuous integration.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Sqoop, STORM, SOLR, Flume, JIRA, Control M, Java, Linux, Maven, Zookeeper, EDW, GIT, ETL, Tableau, Cloudera, MySQL.
Confidential, Dallas, TX
Java Hadoop Developer
Responsibilities:
- Involved in gathering requirements and converting the requirement into technical specifications.
- Implemented various OOP concepts and design patterns such as IOC (DI), Singleton, DAO, and prototype.
- Designing and implementing Spring UI Layer for the Application using the Spring MVC and Java Scripts.
- Involved in Developing Restful web services, Deployment configurations and testing Using Jersey.
- Involved in setting up and Monitoring the Hadoop Cluster along with Administrator.
- Responsible for managing data coming from different sources
- Involved in HDFS maintenance and loading of structured and unstructured data into HDFS.
- Developed and supported MapReduce programs to perform data filtering for unstructured data and jobs that’s are running on Hadoop Clustering.
- Implemented the import and export of Data using SQOOP between MySQL to HDFS on regular basis.
- Used Flume to load the data from different sources like file systems, servers into HDFS.
- Created partitioned Hive tables and wrote Hive queries for Data Analysis to meet the Business requirements.
- Designed and Developed UDF’s to extend the functionality in both PIG and Hive.
- Developed scripts and batch jobs to schedule various Hadoop Programs.
- Developed Hibernate with Spring Integration as the data abstraction to interact with the database.
- Configured Apache HTTP server and Apache Tomcat Server.
- Designed and Maintained Control M work flow to manage the flow of jobs in the cluster.
- Involved in Unit Testing and Developed Junit Test cases for unit testing and used various mock up frameworks like mock it Rest client UI.
- Actively updated the upper management with Daily updates on the progress of projects that include the classifications levels that were achieved on the data.
Environment: Java, Hadoop, HDFS, MapReduce, PIG, Hive, SQOOP, Control M, Linux, MySQL, J2EE, spring, Spring MVC, Hibernate, SQL, Restful Web Services, Apache Tomcat, Junit, Maven, HTML, JSP.
Confidential
Java developer
Responsibilities:
- Extensively worked on Struts Framework.
- Associated in designing application using MVC design pattern.
- Developed front-end user interface modules by using HTML, XML, Java AWT, and Swing.
- Front-end validations of user requests carried out using Java Script.
- Designed and developed the interacting JSPs and Servlets for modules like User Authentication and Summary Display.
- Used Jconsole for the memory management.
- Developing Action, Action Form, Front Controller, Singleton Classes, and Transfer Objects (TO), Business Delegates (BD), Session Façade, Data Access Objects (DAO) and business validators.
- Analyzed, designed, implemented and integrated product in existing application.
- Wrote Jboss Quartz to schedule jobs.
- Communicated with the other components using JMS within the system.
- Designed and Developed Web Services implemented SOA architecture using SOAP and XML for the module and published (exposed) the Web Services.
- Used JDBC to connect the J2EEserver with the relational database.
- User input validations done using JavaScript and developed use cases using UML.
- Extreme programming methodologies for replacing the existing code and testing in J2EE environment.
- Developed web pages using HTML5, DOM, CSS3, JSON, JavaScript, JQuery and AJAX.
- Implemented applications using Bootstrap framework.
- Worked on developing internal customer service representative (CSR's) tools.
- Redesigned the service plan page to display dynamically service products based on user selection.
- Created functional requirements document for rental car industry with telematics system capability and use cases integrating Verizon managed certificate services into the Verizon M2M Management Center.
- Developed java classes for business layer.
- Developed database objects like tables, views, stored procedures, indexes.
- Involved in testing and fixing the bugs.
Environment: Java, J2EE, JSP, Servlets, Struts, HTML, Maven, Java Script, JDBC, Oracle (PL/SQL), DAO, Tomcat, JUnit, Eclipse.