Sr. Big Data/hadoop Developer Resume
Columbus, OH
SUMMARY:
- Around 8 years of working experience on big data, Hadoop & Java/J2EE open - source technologies.
- Knowledge of the software Development Life Cycle (SDLC), Agile and Waterfall Methodologies.
- Experience on applications using Java, python and UNIX shell scripting
- Experience in consuming Web services with Apache Axis using JAX-RS (REST) API.
- Hands on experience in advanced Big-Data technologies like Spark Ecosystem (Spark SQL, MLlib, Spark, R and Spark Streaming), Kafka and Predictive analytics
- Expertise in ingesting real time/near real time data using Flume, Kafka, Storm.
- Experience in building tool Maven, ANT and logging tool Log4J.
- Experience in Programming and Development of java modules for an existing web portal based in Java using technologies like JSP, Servlets, JavaScript and HTML, SOA with MVC architecture.
- Good experience in Tableau for Data Visualization and analysis on large datasets, drawing various conclusions.
- Good knowledge of NOSQL databases like Mongo DB, Cassandra and HBase.
- Strong development skills in Hadoop, HDFS, Map Reduce, Hive, Sqoop, HBase with solid understanding of Hadoop internals.
- Excellent knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MRA and MRv2 (YARN).
- Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, Pig, Hive, HBase, Apache Crunch, Zookeeper, Scoop, Hue, Scala, AVRO.
- Strong Programming Skills in designing and implementing of multi-tier applications using Java, J2EE, JDBC, JSP, JSTL, HTML, CSS, JSF, Struts, JavaScript, Servlets, POJO, EJB, XSLT, JAXB.
- Extensive experience in SOA-based solutions - Web Services, Web API, WCF, SOAP including Restful APIs services.
- Strong knowledge in NOSQL column oriented databases like HBase and its integration with Hadoop cluster.
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
- Experienced in collection of Log Data and JSON data into HDFS using Flume and processed the data using Hive/Pig.
- Good knowledge of NoSQL databases such as HBase, MongoDB and Cassandra.
- Experience in working with Eclipse IDE, Net Beans, and Rational Application Developer.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Expertise in developing a simple web based application using J2EE technologies like JSP, Servlets, and JDBC.
- Extensively development experience in different IDE like Eclipse, Net Beans, IntelliJ and STS.
- Experience working on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), set EMR (Elastic MapReduce).
- Work Extensively in Core Java, Struts2, JSF2.2, Spring, Hibernate, Servlets, JSP and Hands-on experience with PL/SQL, XML and SOAP.
- Well versed working with Relational Database Management Systems as Oracle, MS SQL, MySQL Server
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node.
- Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
- Hands on experience in working on XML suite of technologies like XML, XSL, XSLT, DTD, XML Schema, SAX, DOM, JAXB.
- Strong experience in core SQL and Restful web services (RWS).
- Experience in working with Web Servers like Apache Tomcat and Application Servers like IBM Web Sphere and JBOSS.
- Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.
TECHNICAL SKILLS:
Big Data Technologies: Hadoop 3.0, HDFS, MapReduce, HBase 1.4, Apache Pig, Hive 2.3, Sqoop 1.4, Apache Impala 2.1, Oozie 4.3, Yarn, Apache Flume 1.8, Kafka 1.1, Zookeeper
Cloud Platform: Amazon AWS, EC2, EC3, MS Azure, Azure SQL Database, Azure Data Lake, Data Factory
Hadoop Distributions: Cloudera, Hortonworks, MapR
Programming Language: Java, Scala, Python 3.6, SQL, PL/SQL, Shell Scripting, Storm 1.0, JSP, Servlets
Frameworks: Spring 5.0.5, Hibernate 5.2, Struts 1.3, JSF, EJB, JMS
Web Technologies: HTML, CSS, JavaScript, JQuery 3.3, Bootstrap 4.1, XML, JSON, AJAX
Databases: Oracle 12c/11g, SQL
Operating Systems: Linux, Unix, Windows 10/8/7
IDE and Tools: Eclipse 4.7, NetBeans 8.2, IntelliJ, Maven
NoSQL Databases: HBase 1.4, Cassandra 3.11, MongoDB
Web/Application Server: Apache Tomcat 9.0.7, JBoss, Web Logic, Web Sphere
SDLC Methodologies: Agile, Waterfall
Version Control: GIT, SVN, CVS
PROFESSIONAL EXPERIENCE:
Confidential, Columbus, OH
Sr. Big Data/Hadoop Developer
Responsibilities:
- As a Sr. Big Data/Hadoop Developer worked on Hadoop eco-systems including Hive, HBase, HDFS, Spark Streaming with MapR distribution.
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Installed and configured Hadoop Map Reduce, HDFS, Developed multiple MapReduce jobs in Java for data cleaning and Pre-processing.
- Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications.
- Develop predictive analytic using Apache Spark Scala APIs.
- Primarily involved in Data Migration process using Azure by integrating with Github repository and Jenkins.
- Built code for real time data ingestion using Java, Map R-Streams (Kafka) and STORM.
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
- Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
- Imported weblogs & unstructured data using the Apache Flume and stores the data in Flume channel.
- Exported event weblogs to HDFS by creating a HDFS sink which directly deposits the weblogs in HDFS.
- Used RESTful web services with MVC for parsing and processing XML data.
- Utilized XML and XSL Transformation for dynamic web-content and database connectivity.
- Involved in loading data from UNIX file system to HDFS.
- Involved in designing schema, writing CQL's and loading data using Hive.
- Built the automated build and deployment framework using Jenkins, Maven etc.
- Installed and configured Hadoop Map Reduce, HDFS, Developed multiple Map Reduce jobs in Java for data cleaning and Pre-processing.
- Primarily involved in Data Migration process using Azure by integrating with Github repository and Jenkins.
- Built code for real time data ingestion using Java, Map R-Streams (Kafka) and STORM.
- Worked on Apache Solr which is used as indexing and search engine.
- Involved in development of Hadoop System and improving multi-node Hadoop Cluster performance.
- Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Used J2EE design patterns like Factory pattern & Singleton Pattern.
- Used Spark to create the structured data from large amount of unstructured data from various sources.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, Impala and loaded final data into HDFS.
- Developed Python scripts to find vulnerabilities with SQL Queries by doing SQL injection.
- Responsible for coding Map Reduce program, Hive queries, testing and debugging the Map Reduce programs.
- Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into HBase.
- Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala.
Environment: Hadoop 3.0, Solr, Pig 0.17, Sqoop 1.4, HBase, Azure, XML, Avro, Hive 2.3, Spark, Oozie 1.8, Agile, Zookeeper 3.4, UNIX, Maven, MapReduce, Jenkins 2.1, Java, Kafka 2.0, Scala.
Confidential, Hartford, CT
Sr. Hadoop Developer
Responsibilities:
- Developed Big Data solutions focused on pattern matching and predictive modeling.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
- Primarily involved in Data Migration process using Azure by integrating with Github repository and Jenkins.
- Involved in all phases of Software Development Life Cycle (SDLC) using Agile.
- Used Spark SQL on data frames to access hive tables into spark for faster processing of data
- Designed and developed jobs to validate the data post migration such as reporting fields from source and designation systems using Spark SQL RDDs and Data Frames/Datasets.
- Used Spark Data Frame API to process Structured and Semi Structured files and load them into S3 Bucket.
- Involved in complete Big Data flow of the application data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data.
- Develop Hive queries on external tables in order to perform various analysis.
- Used HUE for running Hive queries. Created partitions according to data using Hive to improve performance.
- Importing and exporting data into HDFS and HIVE using Sqoop.
- Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and written Hive UDFs.
- Developed Spark Applications by using Scala and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.
- Used Different Spark Modules like Spark core, Spark SQL, Spark Streaming, Spark Data sets and Data frames.
- Creating end to end Spark-Solr applications using Scala to perform various data cleansing, Validation, transformation according to the requirement.
- Worked on apache Solr for indexing and load balanced querying to search for specific data in larger datasets
- Worked on Spark Streaming and Structured Spark streaming using Apache Kafka for real time data processing.
- Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
- Involved in reading uncompressed data formats on Avro and compressed the same according to the business logic by writing generic code.
- Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
- Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Working experience on Cloudera Hadoop distribution version CDH5 for executing the respective scripts.
- Worked on multiple clusters in managing the Data in HDFS for Data Analytics.
- Gathered business requirements and design and develop data ingestion layer and presentation layer.
- Performed ad-hoc queries on structured data using Hive QL and used Partition, bucketing techniques and joins with Hive for faster data access.
Environment: Hadoop 3.0, Agile, HDFS, Apache Hive 2.3, Sqoop 1.4, UNIX, Spark, Scala, Apache Solr, Kafka 2.0, Impala 3.0, Avro, XML, CDH5, Cloudera.
Confidential, Merrimack, NH
Hadoop Developer Admin
Responsibilities:
- Responsible for Writing MapReduce jobs to perform operations like copying data on HDFS and defining job flows on EC2 server, load and transform large sets of structured, semi-structured and unstructured data. .
- Developed a process for Sqooping data from multiple sources like SQL Server, Oracle and Teradata.
- Responsible for creation of mapping document from source fields to destination fields mapping.
- Developed a shell script to create staging, landing tables with the same schema like the source and generate the properties which are used by Oozie jobs.
- Developed Oozie workflow’s for executing Sqoop and Hive actions.
- Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
- Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.
- Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow.
- Developed scripts to run Oozie workflows, capture the logs of all jobs that run on cluster and create a metadata table which specifies the execution times of each job.
- Developed Hive scripts for performing transformation logic and also loading the data from staging zone to final landing zone.
- Worked on Parquet File format to get a better storage and performance for publish tables.
- Involved in loading transactional data into HDFS using Flume for Fraud Analytics.
- Developed Python utility to validate HDFS tables with source tables.
- Designed and developed UDF’S to extend the functionality in both PIG and HIVE.
- Import and Export of data using Sqoop between MySQL to HDFS on regular basis.
- Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications..
- Involved in using CA7 tool to setup dependencies at each level (Table Data, File and Time).
- Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
- Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.
- Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
Environment: Hadoop, HDFS, Map Reduce, Hive, HBase, Kafka, Zookeeper, Oozie, Impala, Java(jdk1.6), Cloudera, Oracle, Teradata SQL Server, UNIX Shell Scripting, Flume, Scala, Spark, Sqoop, Python.
Confidential, Bellevue, WA
Sr. Java/Spark Developer
Responsibilities:
- Developed Java modules implementing business rules and workflows using Spring MVC, Web Framework.
- Developed Intranet Web Application using J2EE architecture, using JSP to design the user interfaces and Hibernate for database connectivity.
- Extensively Worked with Eclipse as the IDE to develop, test and deploy the complete application.
- Worked extensively on the spring framework, implementing Spring MVC, Spring Security, IOC (dependency injection) and Spring AOP.
- Used Angular JS, Bootstrap and AJAX to get the data from the server asynchronously by using JSON objects.
- Developed Spark applications using Scala to perform enrichments, aggregations and other business metrics processing click stream data along with user profile data.
- Worked on development of Hibernate, including mapping files, configuration file and classes to interact with the database.
- Created Spark applications using Spark Data frames and Spark SQL API extensively.
- Performed tuning J2EE apps, performance testing, analysis, and tuning.
- Developed the Product Builder UI screens using Angular-JS, NodeJS, HTML5, CSS, and JavaScript.
- Utilized Spark Scala API to implement batch processing of jobs
- Used Broadcast variables in Spark, effective & efficient Joins, transformations and other capabilities for data processing.
- Used Spark-SQL to perform event enrichment and to prepare various levels of user behavioral summaries.
- Designed dynamic and browser compatible pages using HTML5, DHTML, CSS3 and JavaScript.
- Used AngularJS and Bootstrap to consume service and populated the page with the products and pricing returned.
- Used RESTFUL in conjunction with Ajax calls using JAX-RS and Jersey.
- Designed and developed the Application using spring and Hibernate framework.
- Worked on fine-tuning spark applications/jobs to improve the efficiency and overall processing time for the pipelines.
- Used Oracle as Database and used Toad for queries execution and Involved in writing SQL scripts, PL/ SQL code for procedures and functions.
- Developed classes using core java (multithreading, concurrency, collections, memory management) and some spring IOC.
Environment: Java, J2EE, JavaScript, HTML5, Hibernate 4.2, Hadoop 2.5, HDFS, Ajax, Oracle 11g, PL/ SQL, Scala, spark, Eclipse.
Confidential
Java Developer
Responsibilities:
- Used spring framework for dependency injection, transaction management. Used Spring MVC framework controllers for Controllers part of the MVC.
- Implemented Business Logic using POJO's and used WebSphere to deploy the applications.
- Consumed REST based Micro services with Rest template based on RESTful APIs.
- Developed front end web application using AngularJS along with cutting edge HTML and CSS.
- Developed processing component to retrieve customer information from MySQL database, developed DAO layer using Hibernate.
- Used Maven for developing build scripts and deploying the application onto WebLogic.
- Used Spring Framework for MVC for writing Controller, Validations and View.
- Used Eclipse as IDE for development of the application.
- Built data-driven Web applications with server side Java technologies like Servlets/JSP and generated dynamic Web pages with Java Server Pages (JSP)
- Involved in mapping of data representation from MVC model to Oracle Relational data model with a SQL-based schema using Hibernate, object/relational-mapping (ORM) solution.
- Used core java to design application modules, base classes and utility classes.
- Involved in Implementation of the application by following the Java best practices and patterns.
- Used both Java Objects and Hibernate framework to develop Business components to map the Java classes to the database.
- Used Spring IOC framework to integrate with Hibernate.
- Implemented Maven Script to create JAR & dependency JARS and deploy the entire project onto the WebLogic Application Server.
- Coded JavaBeans and implemented Model View Controller (MVC) Architecture.
- Developed Client applications to consume the Web services based on both SOAP and REST protocol.
- Utilized log4j for logging purposes and debug the application.
- Involved in bug fixing during the System testing, Joint System testing and User acceptance testing.
- Worked on various SOAP and RESTful services used in various internal applications.
Environment: Java, spring, Hibernate, MVC, POJO, WebSphere, Eclipse, Maven, JavaBeans, SOAP, log4j, SQL, PL/SQL, CSS, MySQL.