We provide IT Staff Augmentation Services!

Bigdata Developer Resume

5.00/5 (Submit Your Rating)

Denver, CO

PROFESSIONAL SUMMARY:

  • Around 9 years of professional IT experience which includes experience in Big data ecosystem and Java/J2EE related technologies.
  • Experience with AWS components like Amazon Ec2 instances, S3 buckets, cloud watch and EBS Volumes.
  • Experience in Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, Data Node and MapReduce programming paradigm.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Flume and kafka.
  • In depth and extensive knowledge of analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
  • Good Knowledge on Hadoop Cluster architecture and working with Hadoop clusters using Cloudera (CDH5) and HortonWorks Distributions.
  • Excellent understanding and knowledge of NOSQL databases like Hbase and HBase.
  • Extensive knowledge on file formats like AVRO, sequence files, Parquet, ORC and RC .
  • Experience in importing and exporting data using Sqoop to Relational Database Systems and vice - versa.
  • Good knowledge in using job scheduling and workflow designing tools like Oozie.
  • Have good experience creating real time data streaming solutions using Apache Spark, Kafka and Flume.
  • Experienced in Developing Spark application using Spark Core, Spark SQL and Spark Streaming API's.
  • Extending Hive and Pig core functionality by writing custom UDFs .
  • Strong experience in analyzing large amounts of data sets writingPySparkscripts and Hive queries.
  • Experience in managing Hadoop clusters using Cloudera Manager Tool and Ambari.
  • Experience in Installing and monitoring standalone multi-node Clusters of Kafka and Storm
  • Very good experience in complete project life cycle (design, development, testing and implementation) Rapid Application Development (RAD), Agile Methodology and Scrum software development processes
  • Highly skilled with object oriented architectures and patterns, systems analysis, software design, effective coding practices, databases, and servers
  • Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster
  • Developer in Big Data team, worked with Hadoop AWS cloud, and its ecosystem.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Experience in Java, JSP, Servlets, WebLogic, WebSphere, Hibernate, Spring, JBoss, JDBC, Java Script, Ajax, Jquery, XML, and HTML.
  • Experience in different compression techniques like Gzip, LZO, Snappy and Bzip2
  • Experience in working with multi/ multiple Operating Systems like Windows, Linux and strong knowledge with troubleshooting, finding and fixing critical problems.
  • Functional knowledge of Telecom and Health Insurance domain.
  • Ability to adapt to evolving technology, strong sense of responsibility and .
  • Proficient with Core JAVA, AWT and also with the markup languages like HTML 5.0,XHTML,
  • DHTML, CSS, XML 1.1, XSL, XSLT, XPath, XQuery, Angular.js, Node.js
  • Worked with version control systems like Subversion, Perforce, and GIT for providing common platform for all the developers.
  • Articulate in written and verbal communication along with strong interpersonal, analytical, and organizational skills.
  • Experience in preparing deployment packages and deploying to Dev and QA environments and prepare deployment instructions to Production Deployment Team.
  • Highly motivated team player with the ability to work independently and adapt quickly to new and emerging technologies. Good exposure in following all the process in a production environment like change management, incident management and managing escalations
  • Creatively communicate and present models to business customers and executives, utilizing a variety of formats and visualization methodologies.

TECHNICAL SKILLS:

Hadoop technology stack: Apache Hadoop (MRv1, MRv2), Hive, Pig, Sqoop, HBase, Hbase, Flume, Spark, Zookeeper, Oozie.

Programming Languages: C, Java, scala,SQL/PLSQL.

Methodologies: Agile, Rad, V-model,water fall model

Databases: Oracle, MySQL, Hbase, Hbase, MS SQL server,Mongo DB

Web Technologies: HTML, JSP, JSF, CSS, JavaScript, JSON & AJAX

IDE’s: Eclipse, Netbeans, Visual Studio

Build tools: Maven, Ant, Sbt

Web services: SOAP & RESTful Web Services

Cloud technologies: Amazon Web Services (AWS)

Monitoring Tools: Wire shark, Nagios, Ganglia

Operating System: Windows, Ubuntu, Red Hat Linux, Cent OS.

Scripting languages: JavaScript, Shell Scripting.

PROFESSIONAL EXPERIENCE:

Confidential, Denver, CO

Bigdata Developer

Responsibilities:

  • Imported the data from various formats like JSON, Sequential, Text, CSV, AVRO and Parquet to HDFS cluster with compressed for optimization.
  • Worked on ingesting data from RDBMS sources like - Oracle, SQL Server and Teradata into HDFS using Sqoop.
  • Loaded all data-sets into Hive from Source CSV files using spark and Cassandra from Source CSV files using Spark
  • Created environment to access Loaded Data via spark SQL, through JDBC&ODBC (via Spark Thrift Server). Developed real time data ingestion/ analysis using Kafka / Spark-streaming.
  • Configured Hive and written Hive UDF's and UDAF's Also, created Static and Dynamic with bucketing as required.
  • Managing and scheduling Jobs on a Hadoop cluster using Oozie.
  • Loaded all data-sets into Hive from Source CSV files using spark and Cassandra from Source CSV files using Spark/PySpark.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Migrated the computational code in hql toPySpark.
  • Completed data extraction, aggregation and analysis in HDFS by usingPySparkand store the data needed to Hive.
  • Developed Oozie workflow for scheduling and orchestrating the ETL process and worked on Oozie workflow engine for job scheduling.
  • Managed and reviewed the Hadoop log files using Shell scripts.
  • Migrated ETL jobs to Pig scripts to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
  • Experience in managing and reviewing huge Hadoop log files.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Expertise in designing and creating various analytical reports and Automated Dashboards to help users to identify critical KPIs and facilitate strategic planning in the organization.
  • Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting
  • Maintaining technical documentation for each and every step of development environment and launching Hadoop clusters.
  • Worked on different file formats like Parquette, Orc, Avro, Sequence files using MapReduce/Hive/Impala.
  • Worked with Avro Data Serialization system to work with JSON data formats.
  • Used Amazon Web Services (AWS) S3 to store large amount of data in identical/similar repository.
  • Worked with the Data Science team to gather requirements for various data mining projects.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Involved in build applications using Maven and integrated with Continuous Integration servers like Jenkins to build jobs.
  • Worked on BI tools as Tableau to create dashboards like weekly, monthly, daily reports using tableau desktop and publish them to HDFS cluster.

Environment: Spark, Spark SQL, Spark Streaming, Scala, Kafka, Hadoop, HDFS, Hive, Oozie, Pig, Sqoop, AWS (EC2, S3, EMR), Shell Scripting, HBase, Jenkins, Tableau, Oracle, MySQL, Teradata and AWS.

Confidential, MA

Spark Developer

Responsibilities:

  • Used Sqoop to ingest various type of financial data into HDFS (Encryption Zone).
  • Created a SFTP module that would pull the data from FTP server using spark for daily ingestion and automating the framework through Control-M.
  • Performing integrity check which validates the files after ingestion using Message Digest Algorithm .
  • Performing various benchmarking steps to optimize the performance of spark jobs and thus improve the overall batch processing.
  • Built a validation Framework that would compare the data at source(RDBMS systems) and destination (Hive tables) using spark-scala.
  • Creating Hive tables on the data ingested.Maintaining the data in Text,Avro and ORC file formats .
  • Optimizing Hive query performance by using various techniques.
  • Developing and maintaining Workflow Scheduling Jobs in Oozie for generation of Reports into reporting Zones.
  • Developed shell scripts and HQL’s to generate the hive create statements from the data and load the data into the table and thus migrate the stored procedures into HQL’s.
  • Analyzing data with HIVE, TEZ, Spark SQL and comparing its results with TEZ, LLAP and SPARK SQL.
  • Used ORC file format and used various Optimization techniques for improving query performance.
  • Analyzed user transactions and implemented various performance optimization measures including but not limited to implementing partitions and buckets in HiveQL
  • Worked on Providing User support and application support on Hadoop Infrastructure.
  • Working on Encryption Zone in AWS and securing the financial data by creating ranger policies.
  • Connecting Tableau and SAP BO to hive tables and exposing the refined data to perform business analytics and Reporting.
  • All the Mterics data is directly published in kafka, where it is consumed by a consumers group called Spark Streaming API .
  • .Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to AWS S3.
  • Implemented Spark usingScala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Created end to end Spark applications usingScalato perform various data cleansing, validation, transformation and summarization activities according to the requirement
  • Involved in converting Hive/SQL queries into Spark transformations using Spark DataFrames andScala.
  • Worked on reading multiple data formats on HDFS usingScala
  • Implemented the Data Bricks API inScalaprogram to push the processed data to AWS S3
  • Load D-Stream data into Spark RDD and do in memory data computation to generate Output response
  • Assisted in monitoring Hadoop cluster using tools like Ambari
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS and VPC.
  • Used Hadoop as ETL instead of Informatica and migrated an existing on-premises applications to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EC2.
  • Implemented a Continuous Delivery pipeline with Gitlab.
  • Worked with Kerberos, Ranger Active Directory/LDAP, Unix based File System
  • Built a Ingestion Framework that would ingest the files from SFTP to HDFS using Apache NIFI and ingest Financial data into HDFS.

Environment: Hortonworks Distribution of Hadoop, Apache Ranger, Apache NIFI,AWS,HDFS, Java (JDK 1.8), MySQL, DB2,SybaseIQ, Kafka,Tez, Spark SQL, TEZ,LLAP, Spark Streaming, UNIX Shell Scripting,Oozie, Scala.

Confidential, San Jose, CA

Big Data consultant

Responsibilities:

  • Worked with Hortonworks Distribution and involved in installation and configuration of parcels of various Hadoop eco system components including HDFS, Pig, Hive, Sqoop, Hbase.
  • Developed data pipeline using Sqoop to ingest customer behavioural data into HDFS for analysis.
  • Import data into Hive using Sqoop from RDBS system (SAP HANA) and Teradata.
  • Written Hive queries structure them in tabular format to facilitate effective querying on the log data to perform business analytics .
  • Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SerDe.
  • All the Mterics data is directly published in kafka, where it is consumed by a consumers group called Spark Streaming API .
  • Optimizing Hive query performance and handle 20TB of data per day .
  • Provided production support for cluster maintenance and its tuning.
  • Triggered workflows based on timeoravailability of data using Oozie.
  • Monitoring and Debugging Hadoop jobs/Applications running in production using AMBARI
  • Worked on Providing User support and application support on Hadoop Infrastructure.
  • Migrating the Data from RDBMS into Hadoop and perform analytics on top it .

Environment: Hortonworks Distribution of Hadoop, HDFS, Oozie, Java (JDK 1.6), Eclipse, MySQL,Tez,Spark SQL, UNIX Shell Scripting

Confidential, CO

Hadoop Developer.

Responsibilities:

  • Launching Amazon EC2 Cloud Instances using Amazon Images and configuring hadoop instances with respect to specific applications.
  • Creating Private networks and sub-networks and bringing instances under them based on the requirement.
  • Creating Security groups for both individual instances and for group of instances under a network.
  • Handle the installation and configuration of a Hadoop cluster.
  • Handle the data exchange between HDFS and different Web Applications and databases using Flume and Sqoop.
  • Close monitoring and analysis of the Map Reduce job executions on cluster at task level.
  • Changes to the configuration properties of the cluster based on volume of the data being processed and performance of the cluster.
  • Commission and decommission the Data nodes from cluster in case of problems.
  • Worked on No SQL databases like Hbase and ingest the data into HDFS
  • Worked on NoSQL (HBase) for support enterprise production.
  • Loading data into HBASE using HIVE and SQOOP.
  • Involved in upgradation process of the Hadoop cluster from CDH3 to CDH4
  • Creating Hive tables and working on them using Hive QL.
  • Performed cluster co-ordination and assisted with data capacity planning and node forecasting using Zookeeper.
  • Installed Hadoop, Map Reduce, HDFS,developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing
  • Developed job flows in Oozie to automate the workflow for extraction of data from warehouses and weblogs.
  • Pig UDFs for custom data processing (clean, edit and format unstructured data) .

Environment: AWS,CDH3 AND CDH4,Hadoop, MapReduce, HDFS, Hive, Java, Hadoop distribution of Horton Works, Pig, HBase, Linux, XML, MySQL, MySQL Workbench, Java 6, Eclipse, Oracle 10g, PL/SQL, SQL*PLUS.

Confidential

Java/J2EE Developer

Responsibilities:

  • Implemented Java/J2EE Design patterns like Business Delegate and Data Transfer Object (DTO), Data Access Object.
  • Developed the Java/J2EE based multi-threaded application, which is built on top of the struts framework.
  • Developed various UML diagrams like use cases, class diagrams, interaction diagrams (sequence and collaboration) and activity diagrams
  • Created the UI interface using JSP Struts, JavaScript, CSS, and HTML.
  • Designed and Implemented MVC architecture using Struts Framework, which involved writing Custom Tag Libraries & JSP pages.
  • Used Web Services to extract client related data from databases using WSDL, XML, and Soap.
  • Worked with QA team to design test plan and test cases for User Acceptance Testing (UAT).
  • Used Apache Ant to compile java classes and package into jar/war archives, involved in Low-Level and High-Level Documentation of the product.
  • Created POJO layer to facilitate the sharing of data between the front end and the J2EE business objects.
  • Involved in creating the Hibernate POJO objects and mapped using Hibernate Annotations.
  • Used Spring/MVC framework to enable the interactions between JSP/View layer and implemented different design patterns with J2EE and XML technology.
  • Implemented the J2EE design patterns Data Access Object (DAO), Session Façade and Business Delegate.
  • Developed the Restful web services using Spring IOC to provide user a way to run the job and generate daily status report.
  • Implemented application using MVC architecture integrating Hibernate and spring frameworks.
  • Utilized various JavaScript and JQuery libraries bootstrap, Ajax for form validation and other interactive features.
  • Developed Interactive GUI screens using HTML, bootstrap and JSP using JavaScript and AngularJS.
  • Involved in developing Java APIs, which communicates with the JavaBeans.
  • Involved in preparing Ant builds scripts (XML based), deployments, and integration and configuration management of the entire application modules.
  • Configured local Maven repositories and multi-component projects and scheduled projects in Jenkins for continuous integration.

Environment: Java, J2EE, JSP, Struts, JavaScript, CSS, HTML, MVC, Apache Ant, POJO, Hibernate, XML, bootstrap, Ajax, JQuery, AngularJS, Ant, Maven, Jenkins

Confidential

Java Developer

Responsibilities:

  • Implemented the application in Spring Framework and it is Model View Controller design pattern based framework.
  • Configured web.xml, faces-config.xml for navigations and managed beans. Spring and Hibernate Frameworks.
  • Involved in various Software Development Life Cycle (SDLC) phases of the project which was modeled using Rational Unified Process (RUP).
  • In-order to load the data to Oracle using Java and JExcelAPI we developed dump and Load Utility to extract the data.
  • Implemented workflow system in a SOA environment, through web services built using Axis2 for SOAP over HTTP and SMTP.
  • In the project, we used PL/SQL commands to work on Oracle database.
  • Configuration of Jenkins along with Maven and Python Scripts for Automated build and deployment Process.
  • Configured Managed and controlled the source code repository, housed in Subversion, GIT.
  • Used Junit to simplify the client-side scripting of HTML.
  • Used Angular JS directives to specify custom and reusable HTML-like elements.
  • Worked on retail and merchandising website to update the goods and services.
  • Developed Servlets and back-end Java classes using Web Sphere application server.
  • Performed managing and monitoring the JVM performance by WebLogic Heap Size, garbage collection, JDBC Pools and taking Thread dumps and analyzing to find the problems in application
  • Worked in a team of 6 members and used panel container to organize the panels.
  • Involved in developing code for obtaining bean s in spring framework using Dependency Injection (DI) or Inversion of Control (IOC) using annotations.
  • Used Hibernate, object/relational-mapping (ORM) solution, technique of mapping data representation from MVC model to Oracle Relational data model with a SQL-based schema.
  • Implemented Singleton, Service Locator design patterns in MVC framework and developed command, delegate, model action script classes to interact with the backend.
  • Designed and developed the application presentation layer using JSTL.
  • Used Ant to deploy the Services in Jboss 6.0. Involved in the deployment of the application on Jboss.
  • Involved in putting the entries to external XML files which are read by doc-builder.
  • Worked on various technologies like HTML, JSP, and Servlets. Responsible for change requests and maintenance during development of the project.
  • Involved in finding as well as fixing the bugs and making modifications to already existing code.

Environment: Jdk 16, JBoss, JSP, Angular JSP, WEB API, Hibernate 3.6, Spring XML, Servlets, CVS, SQL, HTML, JSP, Servlets, JavaScript, CSS, Apache Server, SVN, Oracle 10g.

Confidential

Software Developer

Responsibilities:

  • Involved in the process of analysis, design, and development of the application.
  • Developed user interface using JSP, Struts and Java Script to simplify the complexities of the application.
  • Entire application was developed in J2EE using an MVC based architecture with help of Apache Struts. Programmed Struts Action classes, Model classes.
  • Implemented the application using the concrete principles laid down by several Java/J2EE Design patterns like MVC, Singleton, Data Transfer Object (DTO) and Service Locator
  • Used JavaScript for client-side validations, and AJAX to create interactive front-end GUI.
  • Used Apache Ant as build tool. And Web logic as the application server for deployment.
  • Involved in writing stored procedures, complex queries using SQL to process the data on MYSQL.
  • Performed unit testing on various project modules using JUnit framework.
  • Used IBM Rational Clear Case as version control tool for maintaining source code and project documents.
  • Implemented Log4J for Logging Errors, debugging and tracking.

Environment: Java, JSP, Struts, HTML, CSS, JavaScript, JUnit, Shell, MySQL, Log4J, Web logic, Eclipse, Linux/UNIX, Singleton, Model View Controller, IBM Rational Clear Case.

We'd love your feedback!