Hadoop/spark Developer Resume
Eagan, MN
PROFESSIONAL SUMMARY:
- Around 7 years of professional experience involving project development, implementation, deployment and maintenance using Java/J2EE and Big Data related technologies.
- Hadoop Developer with 3 years of working experience in designing and implementing complete end - to-end Hadoop based data analytical solutions using HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase.
- Good experience in creating data ingestion pipelines, data transformations, data management, data governance and real time streaming at an enterprise level.
- Experience in working with different Hadoop distributions like CDH and Hortonworks .
- Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Experience developing Pig Latin and HiveQL scripts for Data Analysis and ETL purposes and extended the default functionality by writing User Defined Functions (UDFs), User Defined Aggregate Function (UDAFs) for custom data specific processing.
- Strong Knowledge on Architecture of Distributed systems and Parallel processing, In-depth understanding of MapReduce programing paradigm and Spark execution framework.
- Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- In depth understanding of Hadoop Architecture and its various components such as Resource Manager, Application Master, Name Node, Data Node, HBase design principles etc.,
- Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop.
- Experience in job workflow scheduling and monitoring tools like Oozie and good knowledge on Zookeeper.
- Profound understanding of Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Worked on NoSQL databases including HBase and Mongo DB.
- Experienced with performing CRUD operations using HBase Java Client API and Solr API
- Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 and S3 .
- Experience in Implementing Continuous Delivery pipeline with Maven, Ant, Jenkins and AWS.
- Experienced in Java Application Development, Client/Server Applications, Internet/Intranet based applications using Core Java, J2EE patterns, Spring, Hibernate, Struts, Web Services (SOAP/REST), Oracle, SQL Server and other relational databases.
- Experience writing Shell scripts in Linux OS and integrating them with other solutions.
- Strong Experience in working with Databases like Oracle 10g, DB2, SQL Server 2008 and MySQL and proficiency in writing complex SQL queries.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Excellent communication, interpersonal and analytical skills and a highly motivated team player with the ability to work independently.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase, Hue, Zookeeper.
Programming Languages : Java, PL/SQL, Pig Latin, Python, HiveQL, Scala, SQL
Java/J2EE & Web Technologies: J2EE, EJB, JSF, Servlets, JSP, JSTL, CSS, HTML, XHTML, XML, Angular JS, AJAX, JavaScript, JQuery.
Development Tools : Eclipse, SVN, Git, Ant, Maven, SOAP UI
Databases: Greenplum, Oracle 11g/10g/9i, Teradata, MS SQL
No SQL Databases: Apache HBase, Mongo DB
Frameworks : Struts, Hibernate, And Spring MVC.
Distributed platforms : Hortonworks, Cloudera.
Operating Systems: UNIX, Ubuntu Linux and Windows 00/XP/Vista/7/8
PROFESSIONAL EXPERIENCE:
Confidential, Eagan, MN
Hadoop/Spark Developer
Responsibilities:
- Active member for developing POCs on Real time data processing applications by using Scala and implemented Apache Spark Streaming from our streaming source WSO2 JMS message broker. Json Data for real time processing would be available as event streams on WS02 ESB for streaming ingestion for Spark.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data
- Involved in developing POC to develop and Configured Kafka brokers to pipeline server logs data into Spark streaming for real time processing.
- Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive.
- Involved in adding the data to the new partition in hive external staging table to read data from partition and loaded the external Hive ORC tables with Snappy compression.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
- Developed HQL queries to implement the select, insert, update and operations to the database by creating HQL named queries
- Used IIG (Infosys Information Grid) jobs for batch files received via PRIME data Intake process are ingested into HDFS RAW Layer.
- Worked in job scheduling workflow designed by in IIG tool supportel Airflow scheduling platform.
- Spark processing will load data from HDFS into Hive managed tables in Stage Layer using data processing jobs.
- Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
- Extensively worked on creating roles and providing grant permissions for specific active directory groups.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries. Implemented Partitioning and bucketing in Hive based on the requirement.
- Created tables in HiveQL and used Serde to analyze the JSON files from HBase.
- Tested Apache Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive
- Wrote different Pig scripts to clean up the ingested data and created partitions for the daily data.
- Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data onto HDFS.
- Used Amazon S3 as landing area for incoming files and later pulling the data from Amazon s3 bucket to data lake and built hive tables on top of it and created data frames in Spark on top of that data and Performed transformations and actions on RDD's.
- Involved in database design, creating Tables, Views, Stored Procedures, Functions, Triggers and Indexes. Strong experience in Data Warehousing and ETL using DataStage.
- Worked with Cloudera team on continuous monitoring and managing the Hadoop/spark cluster using Cloudera Manager and solving Hive sentry issues.
- Onsite-Offshore synchronization teams at both the ends should be well connected to have a smooth flow in the project and solve the roadblocks
- Monitoring the ticketing tool for any tickets indicating an issue/incident reported and resolving with the appropriate fix in the project
- Work with cross functional consulting teams within the data science and analytics team to design, develop, and execute solutions to derive business insights and solve clients' operational and strategic problems.
Environment: Scala 2.12.2, Apache Spark 1.6.2, WSO2, IIG (Infosys Information Grid), Airflow, CDH 5.13.1, Hadoop2.6, Hive, HDFS, Sqoop, SQL, Pig, Apache HBase, AWS, Maven, UNIX Shell Scripting.
Confidential, NY
Hadoop Developer
Responsibilities:
- Worked on Lilly for indexing the data added/updated/deleted in HBase database to Solr collection. Indexing allows to query data stored in HBase with the Solr service.
- Working on Spark stack to develop preprocessing job which includes RDD, Datasets and Data frames Api's to transform the data for upstream consumption.
- Used Lilly Indexer for supporting flexible, custom, application-specific rules to extract, transform, and load HBase data into Solr.
- Worked Spark on Treadmill to deploy a cluster from scratch under couple of minutes.
- Implemented Moving averages, Interpolations and Regression analysis on input data using Spark with Scala.
- Worked on POC for streaming data using Kafka and Spark streaming
- Implemented Kafka Customer with Spark-streaming and Spark SQL using Scala
- Experience in pulling the data from Amazon s3 bucket to data lake and built hive tables on top of it and created data frames in spark on top of that data and performed further analysis.
- Created HBase tables to store variable data formats of input data coming from different portfolios
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Migrated data from Hadoop to AWS S3 bucket using DISTCP. Also migrated data across new and old clusters using DISTCP.
- Create, modify and execute DDL and ETL scripts for de-normalized tables to load data into Hive and AWS Redshift tables.
- Extensively used Talend Bigdata tool to load the big volume of source files from S3 to Redshift.
- Designed and managed External tables with right partition strategies to optimize performance in Hive.
- Responsible for gathering the business requirements for the Initial POCs to load the enterprise data warehouse data to Greenplum databases.
- Developed and maintained large scale distributed data platforms with experienced in data warehouses, data marts and data lakes.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
- Apply data science and machine learning techniques using Zeppelin to improve search engine in Wealth management firm.
Environment: Java 1.8, Scala 2.11.5, Apache Spark 1.6.0, Apache Zeppelin, AWS Redshift, GreenPlum 4.3 (PostgreSQL), Treadmill, CDH 5.8.2, Spring 3.1, ivy 2.0, Gradle 2.13, Hive, HDFS, Sqoop 1.4.3, Flume, Apache SOLR, Apache HBase, UNIX Shell Scripting, Python 2.6, AWS S3, Jenkins.
Confidential, Waukegan IL
Java/Bigdata Developer
Responsibilities:
- Experience in supporting and managing Hadoop Clusters using Hortonworks distribution.
- Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
- Started using Apache NiFi to copy the data from local file system to HDP.
- Responsible for business logic using java and JavaScript , JDBC for querying database.
- Involved in requirement analysis, design, coding and implementation.
- Worked in Agile Methodology and used JIRA for maintain the stories about project.
- Analysed large data sets by running Hive queries.
- Implemented commissioning and decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers
- Involved in Design, develop Hive Data model, loading with data and writing Java UDF for Hive.
- Handled importing and exporting data into HDFS by developing solutions, analyzed the data using Map Reduce, Hive and produce summary results from Hadoop to downstream systems.
- Used Sqoop to import and export the data from Hadoop Distributed File System (HDFS) to Oracle.
- Created Hive tables and loaded data from HDFS to Hive tables as per the requirement.
- Created components like Hive UDFs for missing functionality in Hive to analyze and process the large volumes of data.
- Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
- Involved in writing complex queries to perform join operations between multiple tables.
- Involved actively verifying and testing data in HDFS and Hive tables while Sqooping data from Hive to RDBMS tables.
- Involved monitoring Auto Sys’s file watcher jobs and testing data for each transaction and verified data weather it ran properly or not.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
Environment: Java, HORTONWORKS (HDP), Apache Nifi, Apache Hive, Oracle 12c, HDFS, MapReduce, Sqoop, Oozie, Mongo DB, Sub Version, Toad, UNIX Shell Scripting, SOAP, REST services, Agile Methodology, Auto Sys.
Confidential
Java/SQL Developer
Responsibilities:
- Involved in projects utilizing Java, JavaEE web applications to create fully-integrated client management systems
- Developed UI using HTML, JavaScript, JSP and developed business Logic and interfacing components using Business Objects, JDBC and XML
- Designed, Developed and analyzed the front-end and back-end using JSP, Servlets and spring.
- Developed several Soap web services supporting XML to expose information from Customer Registration System.
- Created maven archetypes for generating fully functional Soap web services supporting XML message transformation.
- Implemented Log4j to log errors and messages for ease of debugging.
- Designed and developed Struts like MVC 2 Web framework using the front-controller design pattern, which is used successfully in several production systems.
- Developed SQL Scripts to perform different joins, sub queries, nested querying, Insert/Update and Delete data in MS SQL database tables
- Normalized Oracle database, conforming to design concepts and best practices.
- Created database program in SQL server to manipulate data accumulated by internet transactions.
- Wrote Servlets class to generate dynamic HTML pages.
- Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
- Developed the XML Schema and Web services for the data maintenance and structures. Wrote test cases in JUnit for unit testing of classes.
- Used DOM and DOM Functions using Firefox and IE Developer Tool bar for IE.
- Debugged the application using Firebug to traverse the documents.
- Experience on modeling principles, database design and programming, creating E-R diagrams and data relationships to design a database
- Involved in developing web pages using HTML and JSP.
- Involved in writing procedures, complex queries using PL-SQL to extract data from database and to delete the data and to reload the data Oracle database.
- Integrated SSRS Reports using various web parts into Share point and various delivery mechanisms
- Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects.
- Involved in writing SQL Queries, Stored Procedures and used JDBC for database connectivity with MySQL Server.
- Developed the presentation layer using CSS and HTML taken from bootstrap to develop for browsers.
Environment: Java, XML, HTML, JavaScript, JDBC, CSS, PL/SQL, MS SQL Server Reporting Services, MS SQL Server Analysis Services, SQL Server 2008 (SSRS & SSIS), Oracle 10g, Web MVC, Eclipse, Ajax, JQuery, Log4j, Spring with Hibernate and Apache Tomcat.