Sr. Big Data Developer Resume
Durham, NC
SUMMARY
- Over 5+ years of professional IT experience of Big Data Hadoop Ecosystems experience in ingestion, storage, querying, processing and analysis of big data.
- Experience in using Maven for building and deploying J2EE Application archives (Jar and War) on Web Logic, IBM Web Sphere.
- Experience in building Pig scripts to extract, transform and load data onto HDFS for processing.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.
- Good understanding of Neo4j graphical databases to design the graphical structure applications
- Experience in Big Data analysis using PIG and HIVE and understanding of SQOOP and Puppet.
- Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data migration, Oozie for scheduling and HBase as a NoSQL data store.
- Good Exposure on Apache Hadoop MapReduce programming, PIG Scripting and Distribute Application and HDFS.
- Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Load streaming log data from various web servers into HDFS using Flume.
- Experience in deployment of Hadoop Cluster using Puppet tool.
- Experience in scheduling Cron jobs on EMR, Kafka, and Spark using Clover Server.
- Hands on experience with build and deploying tools like Maven and GitHub using Bash scripting.
- Extensive experience working with structured data using Spark SQL, Data frames, Hive QL, optimizing queries, and in corporate complex UDF's in business logic.
- Experience working with Data Frames, RDD, Spark SQL, Spark Streaming, APIs, System Architecture, and Infrastructure Planning.
- Experience with Core Java component Collection, Generics, Inheritance, Exception Handling and Multi-threading.
- Developed Java applications using various IDE's like Spring Tool Suite and Eclipse.
- Experience in usage of Hadoop distribution like Cloudera and Hortonworks.
- Strong knowledge in working with UNIX/LINUX environments, writing shell scripts and PL/SQL Stored Procedures.
- Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
- Operated on Java/J2EE systems with different databases, which include Oracle, MySQL and DB2.
- Knowledge on implementing Big Data in Amazon Elastic MapReduce (Amazon EMR) for processing, managing Hadoop framework dynamically scalable Amazon EC2 instances.
- Build AWS secured solutions by creating VPC with private and public subnets.
TECHNICAL SKILLS
Big data/Hadoop: Hadoop 2.7/2.5, HDFS1.2.4, MapReduce, Hive, Pig, Oozie, Flume, Kafka and Spark 2.0/2.0.2
NoSQL Databases: HBase, MongoDB3.2 & Cassandra
Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX- WS
Programming Languages: Java, Python, SQL, PL/SQL, AWS, HiveQL, Unix Shell Scripting, Scala
IDE and Tools: Eclipse 4.6, NetBeans 8.2
Database: Oracle 12c/11g, MYSQL, SQL Server 2016/2014
Web Technologies: HTML5/4, DHTML, AJAX, JavaScript, jQuery and CSS3/2, JSP, Bootstrap 3/3.5
Application Server: Apache Tomcat, JBoss, IBM Web sphere, Web Logic
Operating Systems: Windows8/7, UNIX/Linux and Mac OS.
Other Tools: Maven, ANT, WSDL, SOAP, REST, GraphDB, Neo4j
Methodologies: Software Development Lifecycle (SDLC), Waterfall, Agile, STLC (Software Testing Life cycle), UML, Design Patterns (Core Java and J2EE)
PROFESSIONAL EXPERIENCE
Confidential, Durham, NC
Sr. Big Data Developer
Responsibilities:
- Working as a Sr. Big Data Developer with Hadoop Ecosystems components.
- Developed Big Data solutions focused on pattern matching and predictive modeling.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
- Identified relationship between different data sets in GraphDB database.
- Designed and implemented Data Lineage graph from origination to consumption point of view.
- Responsible for fetching real time data using Kafka and processing using Spark and Scala.
- Worked on Kafka to import real time weblogs and ingested the data to Spark Streaming.
- Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
- Designed and developed a decision tree application using Neo4J graph database to model the nodes and relationships for each decision.
- Executed functions like Drop index, Delete node, Delete Relationship in GraphDB.
- Maintained Hadoop, Hadoop ecosystems, and database with updates/upgrades, performance tuning and monitoring.
- Developed customized Hive UDFs and UDAFs in Java, JDBC connectivity with hive development and execution of Pig scripts and Pig UDF’s.
- Used Hadoop YARN to perform analytics on data in Hive.
- Developed and maintained batch data flow using HiveQL and Unix scripting
- Used Oozie and Zookeeper operational services for coordinating cluster and scheduling workflows.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala.
- Involved in converting MapReduce programs into Spark transformations using Spark RDD's using Scala and Python.
- Used Python files to load the data from CSV files to Neo4J graphical database.
- Developed and execute data pipeline testing processes and validate business rules and policies
- Built code for real time data ingestion using Java, MapR-Streams (Kafka) and STORM.
- Extensively used JQuery to provide dynamic User Interface and for the client side validations.
- Responsible for defining the data flow within Hadoop eco-system and direct the team in implement them.
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and Spark.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries.
Environment: Agile, Hadoop 3.0, MS Azure, MapReduce, Java, GraphDB, Neo4j, Oozie 4.3, J2EE, Python 3.7, JQuery, NoSQL, MVC, Hive 2.3
Confidential, Chicago, IL
Big Data Developer
Responsibilities:
- Worked in Agile development environment having KANBAN methodology.
- Worked in AWS environment for development and deployment of custom Hadoop applications.
- Worked in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing
- Involved in gathering requirements from client and estimating time line for developing complex queries using Hive and Impala for logistics application.
- Used graph structures for semantic queries with nodes, edges and properties to represent and store data.
- Developed Business process hierarchy graph for every process available in the system.
- Data integration and data analysis using APOC procedures and functions in Neo4j.
- Designed and developed a decision tree application using Neo4J graph database to model the nodes and relationships for each decision.
- Explored with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
- Worked on Proof of concept with Spark with Scala and Kafka.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Developed Spark scripts by using Python shell commands.
- Developed Spark scripts by using Python shell commands as per the requirement.
- Wrote Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries.
- Used Hadoop YARN to perform analytics on data in Hive.
- Developed and maintained batch data flow using HiveQL and Unix scripting
- Used Oozie and Zookeeper operational services for coordinating cluster and scheduling workflows.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala.
- Continuous coordination with QA team, production support team and deployment team.
Environment: Hadoop, Zookeeper, Hive, Spark, GraphDB, Neo4j, Scala, HDFS, Oozie, Python
Confidential, Bellevue, WA
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
- Worked on loading data into Spark RDD's, perform advanced procedures like text analytics using in-memory data computation capabilities of Spark to generate the Output response.
- Developed the statistics graph using JSP, Custom tag libraries, Applets and Swing in a multi-threaded architecture
- Executed many performance tests using the Cassandra-stress tool to measure and improve the read and write performance of the cluster.
- Handled large datasets using Partitions, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Used Kafka Streams to Configure Spark Streaming to get information and then store it in HDFS.
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
- Performed the migration of Hive and MapReduce Jobs from on-premise MapR to AWS cloud using EMR.
- Partitioned data streams using Kafka, designed and Used Kafka producer API's to produce messages.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Performed tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Ingested data from RDBMS to Hive to perform data transformations, and then export the transformed data to Cassandra for data access and analysis.
- Experienced in Core Java, Collection Framework, JSP, Dependency Injection, Spring MVC, RESTful Web services.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Implemented Informatica Procedures and Standards while developing and testing the Informatica objects.
Environment: Hadoop 3.0, Spark 2.1, Cassandra 1.1, Kafka 0.9s, JSP, HDFS, AWS, EC2, Hive 1.9, MapReduce, MapR, Java, MVC, Scala, NoSQL
Confidential
Java/Hadoop Developer
Responsibilities:
- Designed and Developed application modules using spring and Hibernate frameworks.
- Responsible for building scalable distributed data solutions using Hadoop.
- Experienced in loading and transforming of large sets of structured, semi structured and unstructured data.
- Used MAVEN for developing build scripts and deploying the application onto WebLogic.
- Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark data frames, Scala and Python.
- Implemented MVC architecture using Spring Framework, Coding involves writing Action Classes/Custom Tag Libraries, JSP.
- Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
- Creating Hive tables with periodic backups, writing complex Hive/Impala queries to run on Impala.
- Implemented partitioning, bucketing and worked on Hive, using file formats and compressions techniques with optimizations.
- Involved in designing and developing modules at both Client and Server Side.
- Worked on JDBC framework encapsulated using DAO pattern to connect to the database.
- Developed the UI Screens using JSP and HTML and did the client side validation with the JavaScript.
- Worked on various SOAP and RESTful services used in various internal applications.
- Developed JSP and Java classes for various transactional/ non-transactional reports of the system using extensive SQL queries.
- Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive and Spark.
- Implemented Storm topologies to pre-process data before moving into HDFS system.
- Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala.
- Involved in configuring builds using Jenkins with Git and used Jenkins to deploy the applications onto Dev, QA environments
- Involved in unit testing, system integration testing and enterprise user testing using JUnit.
- Involved in creating Hive tables, loading with data and writing Hive queries which runs internally in MapReduce way.
- Developed Python scripts to automate and provide Control flow to Pig scripts.
Environment: spring 4.0, Hibernate 5.0.7, Hadoop 2.6.5, Spark 1.1, Hive, Python 3.3, Scala, Sqoop, Flume 1.3.1, Impala, MapReduce, LINUX