We provide IT Staff Augmentation Services!

Sr. Big Data Developer Resume

2.00/5 (Submit Your Rating)

Harrisburg, PA

SUMMARY:

  • Above 9+ of working experience as a Big Data/Hadoop Developer in designed and developed various applications like big data, Hadoop, Java/J2EE open - source technologies.
  • Experience in Architect, Analysis, Design, Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects.
  • Excellent knowledge and working experience in Agile & Waterfall methodologies.
  • Expertise in Web pages development using JSP, Html, Java Script, JQuery and Ajax.
  • Experienced in writing database objects like Stored Procedures, Functions, Triggers, PL/SQL packages and Cursors for Oracle, Sql Server, and MySQL & Sybase databases.
  • Excellent experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, Apache Crunch, Zookeeper, SQOOP, Scala, Solr, Git, Maven, Avro, Json.
  • Excellent experience in Amazon, Cloudera and Hortonworks Hadoop distribution and maintaining and optimized AWS infrastructure (EMR EC2, S3, EBS)
  • Expertise in developing Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Yarn and Map Reduce programming paradigm.
  • Good knowledge of NoSQL databases such as HBase, MongoDB and Cassandra.
  • Experience in working with Eclipse IDE, Net Beans, and Rational Application Developer.
  • Extensive experienced in working with semi/unstructured data by implementing complex MapReduce programs using design patterns.
  • Good exposure to Service Oriented Architectures (SOA) built on Web services (WSDL) using SOAP protocol.
  • Expertise in developing a simple web based application using J2EE technologies like JSP, Servlets, and JDBC.
  • Experienced on major components in Hadoop Ecosystem including Hive, Sqoop, Flume &knowledge of MapReduce/Hdfs Framework.
  • Experienced in working with MapReduce Design patterns to solve complex MapReduce programs.
  • Excellent Knowledge in Talend Big data integration for business demands to work towards Hadoop and NoSQL
  • Hands-on programming experience in various technologies like java, J2EE, Html, XML
  • Excellent Working Knowledge on Sqoop and Flume for Data Processing
  • Expertise in loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into partitioned Hive tables
  • Work Extensively in Core Java, Struts2, JSF2.2, Spring3.1, Hibernate, Servlets, JSP and Hands-on experience with PL/SQL, XML and SOAP.
  • Hands on experience in working on XML suite of technologies like XML, XSL, XSLT, DTD, XML Schema, SAX, DOM, JAXB.
  • Experienced on Hadoop cluster maintenance including data and metadata backups, file system checks, commissioning and decommissioning nodes and upgrades.
  • Extensive experience writing custom MapReduce programs for data processing and UDFs for both Hive and Pig in Java.
  • Strong experience in analyzing large amounts of data sets writing Pig scripts and Hive queries.
  • Extensive experienced in working with structured data using Hive QL, join operations, writing custom UDF's and experienced in optimizing Hive Queries.
  • Experienced in importing and exporting data using Sqoop from HDFS to Relational Database.
  • Expertise in job workflow scheduling and monitoring tools like Oozie.
  • Experienced in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as web server, telnet sources etc.
  • Extensively designed and executed SQL queries in order to ensure data integrity and consistency at the backend.
  • Strong experience in architecting batch style large scale distributed computing applications using tools like Flume, MapReduce, Hive etc.
  • Experience using various Hadoop Distributions (Cloudera, Hortonworks, MapReduce) to fully implement and leverage new Hadoop features
  • Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
  • Experienced in working with different scripting technologies like Python, Unix shell scripts.
  • Strong experienced in working with Unix/Linux environments, writing shell scripts.
  • Great team player and quick learner with effective communication, motivation, and organizational skills combined with attention to details and business improvements.

TECHNICAL SKILLS:

Programming Languages: Java, python 3.6, Scala 2.12, Shell Scripting, SQL, and PL/SQL.

J2EE Technologies: Core Java, Spring, Servlets, SOAP/REST services, JSP, JDBC, SML, Hibernate

Big Data Ecosystem: HDFS, HBASE 1.2, Hadoop 3.0, MapReduce, Hive 2.3, Pig, Sqoop 1.4, Impala 2.1, Cassandra 3.11, Oozie, Zookeeper 3.4, Flume 1.8, Storm, Spark and Kafka.

Databases: NoSQL, Oracle12c, SQL Server2016, MySQL

Database Tools:: Oracle SQL Developer, MongoDB 3.6, TOAD and PLSQL Developer

Web Technologies: HTML5, JavaScript, XML, JSON, JQuery, Ajax, CSS3

Web Services: Web Logic, WebSphere, Apache Tomcat

IDEs: Eclipse, NetBeans, WinSCP.

AWS tools: EC2, S3 Bucket, AMI, RDS & Basic MS Azure.

Cloud Management: Amazon Web Services(AWS), Redshift

Methodologies:: Software Development Lifecycle (SDLC), Waterfall, Agile, STLC (Software Testing Life cycle), UML, Design Patterns (Core Java and J2EE)

Other Tools: Maven, ANT, WSDL, SOAP, REST

PROFESSIONAL EXPERIENCE:

Confidential - Harrisburg, PA

Sr. Big Data Developer

Responsibilities:

  • Worked as a Big Data implementation engineer within a team of professionals.
  • Analyzed the existing data flow to the warehouses and taking the similar approach to migrate the data into HDFS.
  • Followed agile software development with Scrum methodology.
  • Created Partitioning, Bucketing, and Map Side Join, Parallel execution for optimizing the hive queries decreased the time of execution from hours to minutes.
  • Involved in gathering requirements from client and estimating time line for developing complex queries using Hive for logistics application.
  • Worked with cloud provisioning team on a capacity planning and sizing of the nodes (Master and Slave) for an AWS EMR Cluster.
  • Worked with Amazon EMR to process data directly in S3 when we want to copy data from S3 to the Hadoop Distributed File System (HDFS) on your Amazon EMR cluster by setting up the Spark Core for analysis work.
  • Performed multiple MapReduce jobs in Sqoop and Hive for data cleaning and pre-processing.
  • Involved in data pipeline using Sqoop to ingest cargo data and customer histories into HDFS for analysis.
  • Worked on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting
  • Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Responsible for loading the customer's data and event logs from Kafka into HBase using REST API.
  • Developing data pipeline using Sqoop and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Implemented Map Reduce jobs in HIVE by querying the available data.
  • Configured Hive meta store with MySQL, which stores the metadata for Hive tables.
  • Performed data analytics in Hive and then exported those metrics back to Oracle Database using Sqoop.
  • Performance tuning of Hive queries, MapReduce programs for different applications.
  • Created custom UDF’s for Spark and Kafka procedure for some of non-working functionalities in custom UDF into Scala in production environment.
  • Developed workflows in Oozie and scheduling jobs in Mainframes by preparing data refresh strategy document & Capacity planning documents required for project development and support.
  • Worked with different actions in Oozie to design workflow like Sqoop action, pig action, hive action, shell action.
  • Mastered major Hadoop distributes like Hortonworks and Cloudera numerous Open Source projects and prototype various applications that utilize modern Big Data tools.
  • Implemented Fair scheduler on the Job tracker to share the resources of the cluster for the map reduces jobs given by the users.
  • Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
  • Experienced in configuring and working with Flume to load the data from multiple sources directly into HDFS and transferred large data sets between Hadoop and RDBMS by implementing Sqoop.
  • Implemented Kafka consumers to move data from Kafka partitions into Cassandra for near real time analysis.
  • Involved in collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Ingested all formats of structured and unstructured data including Logs/Transactions, Relational Databases using Sqoop & Flume into HDFS.
  • Implemented Flume custom interceptors to perform cleansing operations before moving data into HDFS.

Environment: HDFS, Hive 2.3, AWS, RDBMS, Pig, Sqoop 1.4, MySQL, Kafka, Spark, Scala 2.12, Oozie, Hadoop 3.0, Hortonworks, MapReduce, HBase 1.2, Zookeeper 3.4, Flume 1.8, Cassandra 3.11, MongoDB 3.6

Confidential - Washington, DC

Sr. Big Data/ Hadoop Developer

Responsibilities:

  • As a Big data developer involved in Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
  • Responsible for building and configuring distributed data solution using MapR distribution of Hadoop.
  • Involved in complete Big Data flow of the application data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data.
  • Scripts were written for distribution of query for performance test jobs in Amazon Data Lake.
  • Created Hive Tables, loaded transactional data from Teradata using Sqoop.
  • Developed MapReduce (Yarn) jobs for cleaning, accessing and validating the data.
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables.
  • Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
  • Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
  • Developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
  • Responsible for building scalable distributed data solutions using Hadoop Cloudera.
  • Designed and developed automation test scripts using Python
  • Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
  • Writing Pig-scripts to transform raw data from several data sources into forming baseline data.
  • Analyzed the SQL scripts and designed the solution to implement using PySpark
  • Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in Hdfs.
  • Uploaded streaming data from Kafka to HDFS, HBase and Hive by integrating with storm.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
  • Supporting data analysis projects by using Elastic MapReduce on the Amazon Web Services (AWS) cloud performed Export and import of data into S3.
  • Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharding features.
  • Involved in designing the row key in Hbase to store Text and JSON as key values in Hbase table and designed row key in such a way to get/scan it in a sorted order.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as MapReduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Worked on custom talend jobs to ingest, enrich and distribute data in Cloudera Hadoop ecosystem.
  • Developed syllabus/Curriculum data pipelines from Syllabus/Curriculum Web Services to HBase and Hive tables.
  • Worked on Cluster co-ordination services through Zookeeper.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
  • Exported the analyzed data to the RDBMS using Sqoop for to generate reports for the BI team.
  • Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
  • Creating the cube in Talend to create different types of aggregation in the data and also to visualize them.

Environment: Hadoop 3.0, HDFS, Teradata r15, Sqoop 1.4, Yarn, MapReduce, AWS, EC2, Python, Kafka, Apache Storm, Pig, SQL, Hive 2.3, HBase 1.2, MongoDB 3.6, JSON, Oozie, Talend, Zookeeper 3.4, Maven, Jenkins, RDBMS

Confidential - Bellevue, WA

Sr. Hadoop Developer

Responsibilities:

  • Creating the automated build and deployment process for application, re-engineering setup for better user experience, and leading up to building a continuous integration system.
  • Involved in the Complete Software development life cycle (SDLC) to develop the application.
  • Interacted with Business Analysts to understand the requirements and the impact of the ETL on the business.
  • Collaborated with Business users for requirement gathering for building Tableau reports per business needs.
  • Worked on deploying Hadoop 2.7 cluster with multiple nodes and different big data analytic tools including Pig 0.16, Hbase 0.98 database and Sqoop HDP2.3.
  • Involved in loading data from LINUX file system to HDFS.
  • Experience in reviewing Hadoop log files to detect failures.
  • Exported the analyzed data to the relational databases using Sqoop 2.3 for visualization and to generate reports for the BI team.
  • Developed continuous flow of data into HDFS from social feeds using Apache Storm Spouts and Bolts.
  • Real streaming the data using Spark 1.6 with Kafka 0.10.
  • Importing and exporting data into HDFS and Hive 2.0 using Sqoop 2.3.
  • Implemented new Apache Camel routes and extended existing Camel routes that provide end-to-end communications between the web services and other enterprise back end services
  • Used various transformations like Filter, Expression, Sequence Generator, Update Strategy, Joiner, Stored Procedure, and Union to develop robust mappings in the Informatica Designer.
  • Implemented test scripts to support test driven development and continuous integration.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig 0.16 and Sqoop 2.3.x.
  • Implementing custom code for map reduce practitioner and custom writable.
  • Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second. Used Kafka producer 0.8.3 API's to produce messages.
  • Implemented data ingestion and handling clusters in real time processing using Kafka.
  • Performed benchmarking of the No-SQL databases, Cassandra3.7 and HBase
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way.
  • Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (Hbase)
  • Along with the Infrastructure team, I have implemented Kafka-Storm based data pipeline
  • Written python 3 scripts for internal testing which pushes the data reading form a file into Kafka queue which in turn is consumed by the Storm application.
  • Integrating bulk data into Cassandra3.7 file system using MapReduce3 programs.
  • Involved in creating data-models for customer data using Cassandra Query Language
  • Supported MapReduce Programs those are running on the cluster.
  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Worked on tuning the performance Pig queries.
  • Developed analytical component using Scala, Spark 1.6.0 and Spark Stream.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Mentored analyst and test team for writing Hive Queries.
  • Installed Oozie workflow engine to run multiple Mapreduce3 jobs.
  • Provided support to develop the entire warehouse architecture and plan the ETL process.
  • Experience in developing Unix Shell Scripts for automation of ETL process
  • Worked with application teams to install operating system, Hadoop updates, patches, version u grades as required.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Worked in collaboration with BI team for creating visual illustrations with Pentaho.

Environment: Java1.3, Hadoop, Informatica Power Center 9.5, Oracle 11g, SQLServer2015, PL/SQL, XML, Kafka 0.10, HDFS, Yarn, Map Reduce3, Hive2.0, Pig0.16, Spark 1.6, Storm, Sqoop, Linux, Cassandra, Oozie.

Confidential - Edison, NJ

Sr. Java/J2EE Developer

Responsibilities:

  • Worked on developing the application involving Spring MVC implementations and Restful web services.
  • Responsible for designing Rich user Interface Applications using JavaScript, CSS, HTML, XHTML and AJAX.
  • Developed the spring AOP programming to configure logging for the application
  • Involved in the analysis, design, and development and testing phases of Software Development Life Cycle (SDLC).
  • Developed code using Core Java to implement technical enhancement following Java Standards.
  • Worked with Swing and RCP using Oracle ADF to develop a search application which is a migration project.
  • Implemented Hibernate utility classes, session factory methods, and different annotations to work with back end data base tables.
  • Implemented Ajax calls using JSF-Ajax integration and implemented cross-domain calls using JQuery Ajax methods.
  • Implemented Object-relational mapping in the persistence layer using Hibernate frame work in conjunction with spring functionality.
  • Used JPA (Java Persistence API) with Hibernate as Persistence provider for Object Relational mapping.
  • Used JDBC and Hibernate for persisting data to different relational databases.
  • Developed and implemented Swing, spring and J2EE based MVC (Model-View-Controller) framework for the application
  • Implemented application level persistence using Hibernate and spring.
  • Data Warehouse (DW) data integrated from different sources in different format (PDF, TIFF, JPEG, web crawl and RDBMS data MySQL, oracle, Sql server etc.)
  • Used XML and JSON for transferring/retrieving data between different Applications.
  • Also wrote some complex PL/SQL queries using Joins, Stored Procedures, Functions, Triggers, Cursors, and Indexes in Data Access Layer.
  • Implementing Restful web services architecture for Client-server interaction and implemented respective POJOs for its implementations
  • Designed and developed SOAP WebServices using CXF framework for communicating application services with different application and developed web services interceptors.
  • Implemented the project using JAX-WS based WebServices using WSDL, UDDI, and SOAP to communicate with other systems.
  • Involved in writing application level code to interact with APIs, Web Services using AJAX, JSON and XML.
  • Wrote JUnit test cases for all the classes. Worked with Quality Assurance team in tracking and fixing bugs.
  • Developed back end interfaces using embedded SQL, PL/SQL packages, stored procedures, Functions, Procedures, Exceptions Handling in PL/SQL programs, Triggers.
  • Used Log4j to capture the log that includes runtime exception and for logging info.
  • Used ANT as build tool and developed build file for compiling the code of creating WAR files.
  • Used Tortoise SVN for Source Control and Version Management.
  • Responsibilities include design for future user requirements by interacting with users, as well as new development and maintenance of the existing source code.

Environment: JAVA, J2EE, JDK 1.5, Servlets, JSP, XML, JSF, Web Services (JAX-WS: WSDL, SOAP), Spring MVC, JNDI, Hibernate 3.6, JDBC, SQL, PL/SQL, HTML, DHTML, JavaScript, Ajax, Oracle 10g, SOAP, SVN, SQL, Log4j, ANT.

Confidential

Java/J2EE Developer

Responsibilities:

  • Involved in Software Development Life Cycle (SDLC) of the application: Requirement gathering, Design Analysis and Code development.
  • Implemented Struts framework based on the Model View Controller design paradigm.
  • Designed the application by implementing Struts based on MVC Architecture, simple Java Beans as a Model, JSP UI Components as View and Action Servlets as a Controller.
  • Used JNDI to perform lookup services for the various components of the system.
  • Involved in designing and developing dynamic web pages using HTML and JSP with Struts tag libraries.
  • Used HQL (Hibernate Query Language) to query the Database System and used JDBC Thin Driver to connect to the database.
  • Developed Hibernate entities, mappings and customized criterion queries for interacting with database.
  • Responsible for designing Rich user Interface Applications using JavaScript, CSS, HTML and AJAX and developed WebServices by using SOAP UI.
  • Used JPA to persistently store large amount of data into database.
  • Implemented modules using Java APIs, Java collection, Threads, XML, and integrating the modules.
  • Applied J2EE Design Patterns such as Factory, Singleton, and Business delegate, DAO, Front Controller Pattern and MVC.
  • Used JPA for the management of relational data in application.
  • Designed and developed business components using Session and Entity Beans in EJB.
  • Developed the EJBs (Stateless Session beans) to handle different transactions such as online funds transfer, bill payments to the service providers.
  • Developed XML configuration files, properties files used in struts Validate framework for validating Form inputs on server side.
  • Extensively used AJAX technology to add interactivity to the web pages.
  • Developed JMS Sender and Receivers for the loose coupling between the other modules and Implemented asynchronous request processing using Message Driven Bean.
  • Used JDBC for data access from Oracle tables.
  • JUnit was used to implement test cases for beans.
  • Successfully installed and configured the IBM WebSphere Application server and deployed the business tier components using EAR file.
  • Involved in deployment of application on Weblogic Application Server in Development & QA environment.
  • Used Log4j for External Configuration Files and debugging.

Environment: JSP 1.2, Servlets, Struts1.2.x, JMS, EJB 2.1, Java, OOPS, Spring, Hibernate, JavaScript, Ajax, Html, CSS, JDBC, JMS, Eclipse, WebSphere, DB2, JPA, ANT.

We'd love your feedback!