Sr. Big Data Developer Resume
Dublin, OH
PROFILE SUMMARY:
- Over 9+ years of experience as a Sr. Big Data Developer with skills in analysis, design, development, testing and deploying various software applications.
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 Web Services which provides fast and efficient processing of Teradata Big Data Analytics.
- Experience in collection of Log Data and JSON data into HDFS using Flume and processed the data using Hive/Pig.
- Extensive experience on developing Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets) and used Spark SQL as required.
- Experience on developing Java MapReduce jobs for data cleaning and data manipulation as required for the business.
- Strong knowledge on Hadoop eco - systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS .
- Good experience in developing MapReduce jobs in J2EE /Java for data cleansing, transformations, pre-processing and analysis.
- Excellent experience in installing and running various Oozie workflows and automating parallel job executions.
- Experience on Spark and SparkSQL, Spark Streaming, Spark GraphX, Spark MLlib.
- Extensively development experience in different IDE like Eclipse, NetBeans, and IntelliJ.
- Extensive experience with advanced J2EE Frameworks such as spring, Struts, JSF and Hibernate.
- Expertise in JavaScript, JavaScript MVC patterns, Object Oriented JavaScript Design Patterns and AJAX calls.
- Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR.
- Strong knowledge in NOSQL column oriented databases like HBase and its integration with Hadoop Cluster .
- Good knowledge of coding using SQL, SQL Plus, T-SQL, PL/SQL, Stored Procedures/Functions.
- Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
- Also have experience in understanding of existing systems, maintenance and production support, on technologies such as Java, J2EE and various databases (Oracle, SQL Server).
- Experience working with Hortonworks and Cloudera environments.
- Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required.
- Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python/Java into Pig Latin and HQL (HiveQL).
- Experience with Oozie Scheduler in setting up workflow jobs with MapReduce and Pig jobs.
- Knowledge of architecture and functionality of NOSQL DB like HBase, Cassandra and MongoDB.
- Good Understanding on querying datasets, Filtering the data, Aggregations, Joining the disparate datasets and produce ranked or sorted data using Spark RDD, Spark DF, Spark SQL, Hive, Impala.
- Good at writing custom RDD's in Scala and also implemented design patterns to improve the performance.
- Experience in analyzing large volume of data using Hive Query Language and also assisted with performance tuning.
- Experience using middleware architecture using Sun Java technologies like J2EE, JSP, and Servlets.
TECHNICAL SKILLS:
Big Data Ecosystem: MapReduce, HDFS, HIVE 2.3, HBase 1.2 Pig, Sqoop, Flume 1.8, HDP, Oozie, Zookeeper, Spark, Kafka, storm, Hue Hadoop Distributions Cloudera (CDH3, CDH4, CDH5), Hortonworks
Cloud Services: Amazon AWS, EC2, Redshift, MS Azure
Relational Databases: Oracle 12c, MySQL, MS-SQL Server2016
NoSQL Databases: HBase, Hive 2.3, and MongoDB
Version Control: GIT, GitLab, SVN
Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX-WS
Programming Languages: Java, Python, SQL, PL/SQL, AWS, HiveQL, UNIX Shell Scripting, Scala.
Software Development & Testing Life cycle: UML, Design Patterns (Core Java and J2EE), Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC
Web Technologies: JavaScript, CSS, HTML and JSP.
Operating Systems: Windows, UNIX/Linux and Mac OS.
Build Management Tools: Maven, Ant.
IDE &: Command line tools: Eclipse, IntelliJ, Toad and NetBeans.
WORK EXPERIENCE:
Confidential, Dublin, OH
Sr. Big Data Developer
Responsibilities
- Worked as a Sr. Big Data Developer with Hadoop Ecosystems components.
- Developed Big Data solutions focused on pattern matching and predictive modeling.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
- Primarily involved in Data Migration process using Azure by integrating with GitHub repository and Jenkins .
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology .
- Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharding features.
- Involved in designing the row key in HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in a sorted order.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Used Java Persistence API (JPA) framework for object relational mapping which is based on POJO Classes.
- Responsible for fetching real time data using Kafka and processing using Spark and Scala .
- Worked on Kafka to import real time weblogs and ingested the data to Spark Streaming.
- Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
- Maintained Hadoop, Hadoop ecosystems , and database with updates/upgrades, performance tuning and monitoring.
- Developed customized Hive UDFs and UDAFs in Java, JDBC connectivity with hive development and execution of Pig scripts and Pig UDF’s.
- Used Hadoop YARN to perform analytics on data in Hive .
- Developed and maintained batch data flow using HiveQL and Unix scripting
- Used Oozie and Zookeeper operational services for coordinating cluster and scheduling workflows.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python .
- Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala .
- Developed SQL scripts using Spark for handling different data sets and verifying the performance over MapReduce jobs.
- Used J2EE design patterns like Factory pattern & Singleton Pattern.
- Involved in converting MapReduce programs into Spark transformations using Spark RDD's using Scala and Python .
- Developed and execute data pipeline testing processes and validate business rules and policies
- Built code for real time data ingestion using Java, MapR-Streams (Kafka ) and STORM.
- Extensively used JQuery to provide dynamic User Interface and for the client side validations.
- Responsible for defining the data flow within Hadoop eco-system and direct the team in implement them.
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and Spark .
- Build large-scale data processing systems in data warehousing solutions , and work with unstructured data mining on NoSQL.
- Worked with application teams to install operating system, Hadoop updates , patches, version upgrades as required.
- Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
- Created Hive tables , and loading and analyzing data using hive queries.
- Wrote Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries.
- Used Struts which is an open source MVC framework for creating elegant, modern java web applications .
- Continuous coordination with QA team, production support team and deployment team.
Environment: Agile, Hadoop 3.0, MS Azure, MapReduce, Java, MongoDB 4.0.2, HBase 1.2, JSON, Scala 2.12, Oozie 4.3, Zookeeper 3.4, J2EE, Python 3.7, JQuery, NoSQL, MVC, Struts 2.5.17, Hive 2.3
Confidential, Mt Laurel, NJ
Sr. Spark/Hadoop Developer
Responsibilities
- As a Spark/Hadoop Developer worked on Hadoop eco-systems including Hive, MongoDB, Zookeeper, Spark Streaming with MapR distribution.
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing .
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
- Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharding features.
- Involved in designing the row key in HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in a sorted order.
- Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications.
- Used Kibana , which is an open source based browser analytics and search dashboard for Elastic Search.
- Maintain Hadoop, Hadoop ecosystems , and database with updates/upgrades, performance tuning and monitoring.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Prepared data analytics processing, and data egress for availability of analytics results to visualization systems, applications, or external data stores.
- Builds large-scale data processing systems in data warehousing solutions, and work with unstructured data mining on NoSQL .
- Responsible for design and development of Spark SQL Scripts based on Functional Specifications.
- Used AWS services like EC2 and S3 for small data sets processing and storage.
- Provisioning of Cloudera Director AWS instance and adding Cloudera manager repository to scale up Hadoop Cluster in AWS .
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
- Developed Spark Applications by using Scala, Java , and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Wrote Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop .
- Used Spark SQL on data frames to access hive tables into spark for faster processing of data.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala .
- Responsible for developing data pipeline using flume, Sqoop and Pig to extract the data from weblogs and store in HDFS .
- Developed Pig Latin scripts to extract data from the web server output files to load into HDFS .
- Developed data pipeline using MapReduce, Flume, Sqoop and Pig to ingest customer behavioral data into HDFS for analysis.
- Used Different Spark Modules like Spark core, Spark SQL, Spark Streaming, Spark Data sets and Data frames.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Designed and developed automation test scripts using Python
- Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
- Used the Spark -Cassandra Connector to load data to and from Cassandra .
- Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, MapReduce and then loading data into HDFS .
- Exported the analyzed data to the relational databases using Sqoop , to further visualize and generate reports for the BI team.
- In preprocessing phase of data extraction , we used Spark to remove all the missing data for transforming of data to create new features.
- Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Writing Pig-scripts to transform raw data from several data sources into forming baseline data.
- Analyzed the SQL scripts and designed the solution to implement using Pyspark
- Analyzed the data by performing Hive queries ( HiveQL ) and running Pig scripts ( Pig Latin ) to study customer behavior.
- Extracted large volumes of data feed on different data sources, performed transformations and loaded the data into various Targets .
- Developed data formatted web applications and deploy the script using HTML, XHTML, CSS, and Client- side scripting using JavaScript.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Assisted in Cluster maintenance, Cluster Monitoring, and Troubleshooting, Manage and review data backups and log files.
Environment: Hadoop 3.0, Spark 2.3, MapReduce, Java, MongoDB, HBase 1.2, JSON, Hive 2.3, Zookeeper 3.4, AWS, MySQL, Scala 2.12, Python, Cassandra 3.11, HTML5, JavaScript
Confidential, Hillsboro, OR
Hadoop Developer
Responsibilities
- Extensively worked on Hadoop eco-systems including Hive, Spark Streaming with MapR distribution .
- Implemented J2EE Design Patterns like DAO, Singleton, and Factory .
- Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Upgraded the Hadoop Cluster from CDH3 to CDH4 , setting up High Availability Cluster and integrating Hive with existing applications.
- Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop .
- Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
- Build Hadoop solutions for big data problems using MR1 and MR2 in YARN .
- Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS .
- Worked on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting.
- Developed the Java/J2EE based multi-threaded application, which is built on top of the struts framework.
- Used Spring/MVC framework to enable the interactions between JSP/View layer and implemented different design patterns with J2EE and XML technology.
- Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
- Created Hive Tables , loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
- Implemented the J2EE design patterns Data Access Object (DAO), Session Façade and Business Delegate.
- Developed Nifi flows dealing with various kinds of data formats such as XML, JSON and Avro.
- Implemented MapReduce jobs in HIVE by querying the available data.
- Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Used Cloudera Manager for installation and management of Hadoop Cluster.
- Collaborated with business users/product owners/developers to contribute to the analysis of functional requirements.
- Implemented application using MVC architecture integrating Hibernate and spring frameworks.
- Utilized various JavaScript and JQuery libraries Bootstrap, Ajax for form validation and other interactive features.
- Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
- Integrated Kafka-Spark streaming for high efficiency throughput and reliability
- Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts.
Environment: Hadoop 3.0, Hive 2.1, J2EE, JDBC, Pig 0.16, HBase 1.1, Sqoop, NoSQL, Impala, Java, Spring, MVC, XML, Spark 1.9, PL/SQL, HDFS, JSON, Hibernate, Bootstrap, JQuery, JavaScript, Ajax
Confidential, Dallas, TX
Java/J2EE Developer
Responsibilities
- As a Java/J2EE Developer worked on middleware architecture using Java technologies like J2EE, Servlets, and application servers like Web Sphere and Web logic.
- Worked as a Java/J2EE Developer to manage data and to develop web applications.
- Implemented MVC architecture by separating the business logic from the presentation layer using Spring .
- Involved in Documentation and Use case design using UML modeling include development of Class diagrams, Sequence diagrams, and Use case diagrams.
- Extensively worked on n-tier architecture system with application system development using Java, JDBC, Servlets, JSP, Web Services, WSDL, Soap, Spring, Hibernate, XML, SAX, and DOM.
- Extensively used Eclipse IDE for developing, debugging, integrating and deploying the application.
- Developed UI using HTML, CSS, Bootstrap, JQuery, and JSP for interactive cross browser functionality and complex user interface.
- Developed Service layer interfaces by applying business rules to interact with DAO layer for transactions.
- Developed various UML diagrams like use cases, class diagrams, interaction diagrams (sequence and collaboration) and activity diagrams
- Involved in requirements gathering and performed object oriented analysis, design and implementation.
- Used Spring Framework for MVC for writing Controller, Validations and View.
- Provided utility classes for the application using Core Java and extensively used Collection package.
- Used Core Spring for Dependency Injection of various component layers.
- Used SOA REST (JAX-RS) web services to provide/consume the Web services from/to down-stream systems.
- Developed a web-based reporting for credit monitoring system with HTML, CSS, XHTML, JSTL, Custom tags using spring.
- Developed user interface using JSP, JSP Tag libraries and Struts Tag Libraries to simplify the complexities of the application.
- Implemented Business Logic using POJO's and used WebSphere to deploy the applications.
- Used the built tools Maven to build JAR & WAR files and ANT for clubbing all source files and web content in to war files.
- Worked on various SOAP and RESTful services used in various internal applications.
- Developed JSP and Java classes for various transactional/ non-transactional reports of the system using extensive SQL queries.
- Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive and Spark.
- Implemented Storm topologies to pre-process data before moving into HDFS system.
- Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala .
- Involved in configuring builds using Jenkins with Git and used Jenkins to deploy the applications onto Dev, QA environments
- Involved in unit testing, system integration testing and enterprise user testing using JUnit .
- Used Maven to build, run and create Aerial-related JARs and WAR files among other uses.
- Used JUnit for unit testing of the system and Log4J for logging.
- Worked with production support team in debugging and fixing various production issues.
Environment: Java, spring 3.0, XML, Hibernate 3.0, JavaScript, JUnit, HTML 4.0.1, CSS, Ajax, Bootstrap, Angular.JS, WebSphere, Maven 3.0, Eclipse
Confidential
Java Developer
Responsibilities
- Responsible for design and implementation of various modules of the application using Struts-Spring-Hibernate architecture.
- Created user-friendly GUI interface and Web pages using HTML, CSS and JSP.
- Developed web components using MVC pattern under Struts framework.
- Wrote JSPs, Servlets and deployed them on Weblogic Application server.
- Used JSP's, HTML on front end, Servlets as Front Controllers and JavaScript for client side validations.
- Wrote the Hibernate-mapping XML files to define java classes -database tables mapping.
- Developed the UI using JSP, HTML, CSS and AJAX and learned how to implement JQuery, JSP and client & server validations using JavaScript .
- Implemented MVC architecture by using spring to send and receive the data from front-end to business layer.
- Designed, developed and maintained the data layer using JDBC and performed configuration of Java Application Framework.
- Extensively used Hibernate in data access layer to access and update information in the database.
- Migrated the Servlets to the Spring Controllers and developed Spring Interceptors, worked on JSPs, JSTL, and JSP Custom Tags.
- Used Jenkins for continuous integration purpose in using SVN, JUnit and Mockito as version control and Unit testing by Creating design documents and test cases for development work.
- Worked on Eclipse IDE for front end development environment for insertions, updating and retrieval operations of data from oracle database by writing stored procedures.
- Responsible for writing Struts action classes, Hibernate POJO classes and integrating Struts and Hibernate with spring for processing business needs.
- Developed the application using Servlets and JSP for the presentation layer along with JavaScript for the client side validations.
- Wrote Hibernate classes, DAO's to retrieve & store data, configured Hibernate files.
- Used Web Logic for application deployment and Log4J used for Logging/debugging.
- Used CVS version controlling tool and project build tool using ANT .
- Used various Core Java concepts such as multi-threading, Exception Handling, Collection APIs to implement various features and enhancements.
- Wrote and debugged the Maven Scripts for building the entire web application.
- Designed and developed Ajax calls to populate screens parts on demand.
Environment: Struts, HTML, CSS, JSP, MVC, Hibernate, JSP, AJAX, JQuery, Java, Jenkins, ANT, Maven