We provide IT Staff Augmentation Services!

Senior Bigdata Developer Resume

2.00/5 (Submit Your Rating)

Charlotte, NC


  • Above 9+ years of experience in IT industry, including bigdata environment,Hadoopecosystem and Design, Developing, Maintenance of various applications and design of Java based enterprise applications.
  • Excellent in Hadoop, Eco - system components HDFS, MapReduce (MRV1, YARN), Pig, Hive, HBase, Scoop, Flume, Kafka, Impala, Oozie and Programming in Spark using Scala and exposure to Cassandra.
  • Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
  • Expertise in Data Development in Hortonworks HDP platform &Hadoop ecosystem tools like Hadoop, HDFS, Spark, Zeppelin, Hive, HBase, SQOOP, flume, Atlas, SOLR, Pig, Falcon, Oozie, Hue, Tez, ApacheNiFi, Kafka.
  • Have very good experience in Apache Spark, Spark Streaming, Spark SQL and No SQL databases like Cassandra and Hbase
  • Expert in Amazon EMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloud watch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshit, RDS, Aethna, Zeppelin & Airflow.
  • Strong noledge on creating and monitoring Hadoop clusters onAmazon EC2, VM, HortonworksData Platform 2.1 & 2.2, CDH3, CDH4Cloudera Manager on Linux, Ubuntu OS etc.
  • Expertise in Java Script, JavaScript MVC patterns, Object Oriented JavaScript Design Patterns and AJAX and developed core modules in large cross-platform applications using JAVA, JSP, Servlets, JDBC, JavaScript, XML, and HTML.
  • In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLlib.
  • Expertise in writing Spark RDD transformations, actions, Data Frame's, case classes for teh required input data and performed teh data transformations using Spark-Core.
  • Good noledge of Hadoop Architecture and various components such as YARN, HDFS, NodeManager, ResourceManager, JobTracker, TaskTracker, NameNode, DataNode and MapReduce concepts.
  • Strong noledge in NOSQL column oriented databases like HBase, Cassandra, MongoDB and its integration wif Hadoop cluster.
  • Experience in installation, configuration, supporting and managing - Hortonworks/Cloudera's Hadoop platform along wif CDH3&4 clusters.
  • Solid SQL skills, can write complex SQL queries; functions, triggers and stored procedures for Backend testing, Database Testing and End-to-End testing.
  • Experienced on Hadoop cluster on Azure HD Insight Platform and deployed Data analytic solutions using tools like Spark and BI reporting tools.
  • Strong programming skills in designing and implementation of applications using Core Java, J2EE, JDBC, JSP, HTML, Spring Framework, Spring batch framework, Spring AOP, Struts, JavaScript, Servlets.
  • Very Good understanding of SQL, ETL and Data Warehousing Technologies and Have sound noledge on designing data warehousing applications wif using Tools like Teradata, Oracle and SQL Server
  • Experience in build scripts using Maven and do continuous integrations systems like Jenkins.
  • Expertise in using Kafka as a messaging system to implement real-time Streaming solutions and implemented Sqoop for large data transfers from RDMS to HDFS/HBase/Hive and vice-versa.
  • Experience in migrating ETL process into Hadoop, Designing Hive data model and wrote Pig Latin scripts to load data into Hadoop.
  • Good Knowledge on Cloudera distributions and in Amazon simple storage service (Amazon S3), AWS and Amazon EC2, Amazon EMR.
  • Good Knowledge on Object Oriented Analysis and Design (OOAD) and Java Design patterns and good level of experience in Core Java, JEE technologies as JDBC, Servlets, and JSP.


Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Pig, Impala, Oozie, Kafka, Spark, Zookeeper, Storm, Yarn, AWS.

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans IDE's Eclipse, Net beans, IntelliJ

Frameworks: MVC, Struts, Hibernate, Spring

Programming languages: Java, JavaScript, Scala, Python, Unix & Linux shell scripts and SQL

Databases: Oracle MySQL, DB2, Teradata, MS-SQL Server.

NoSQL Databases: Hbase, Cassandra, MongoDB

Web Servers: Web Logic, Web Sphere, Apache Tomcat

Web Technologies: HTML, XML, JavaScript, AJAX, SOAP, WSDL

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

ETL Tools: Informatica, Talend

Web Development: HTML, DHTML, XHTML, CSS, Java Script, AJAX

XML/Web Services: XML, XSD, WSDL, SOAP, Apache Axis, DOM, SAX, JAXP, JAXB, XMLBeans.

Methodologies/Design Patterns: OOAD, OOP, UML, MVC2, DAO, Factory pattern, Session Facade

Operating Systems: Windows, AIX, Sun Solaris, HP-UX.


Confidential, Charlotte NC

Senior Bigdata Developer


  • Involved in complete project life cycle starting from design discussion to production deployment.
  • Working as a Big Data/HadoopDeveloperon Integration and Analytics based onHadoop, SOLR and web Methods technologies and Responsible for completeBigDataflow of teh application data ingestion from upstream to HDFS, processing teh data in HDFS and analyzing teh data.
  • Worked wif Hortonworks distribution ofHadoopfor setting up teh cluster and monitored it using Ambari.
  • Developed Pig scripts to halp perform analytics on JSON and XML data and created Hive tables (external, internal) wif static and dynamic partitions and performed bucketing on teh tables to provide efficiency.
  • Implemented Apache Nifi flow topologies to perform cleansing operations before moving data into HDFS and used NiFi to automate teh data movement between differentHadoopsystems.
  • Used Hive QL to analyze teh partitioned and bucketed data and compute various metrics for reporting and performed data transformations by writing MapReduce and Pig jobs as per business requirements
  • Used Apache Kafka to aggregate web log data from multiple servers and make them available in Downstream systems for analysis and used Kafka Streams to Configure Spark streaming to get information and then store it in HDFS
  • Performed data analysis, feature selection, feature extraction using Apache Spark Machine Learning streaming libraries in Python.
  • Collaborate wif various stakeholders (Domain Architects, Solution Architects, and Business Analysts) and provide Initial datasets and founding feature sets to Data Scientists for building Machine learning predictive models using Pyspark.
  • Worked on setting up and configuring AWS's EMR Clusters and Used Amazon IAM to grant fine-grained access to AWS resources to users
  • Enable and configure Hadoop services such as HDFS, YARN, Hive, Ranger, Hbase, Kafka, Sqoop, Zeppeline Notebook and Spark/Spark2 and involved in analyzing log data to predict teh errors by using Apache Spark.
  • Evaluate deep learning algorithms for text summarization using Python, Keras, TensorFlow and Theano on Cloudera Hadoop system
  • Extracting real time data using Kafka and spark streaming by Creating DStreams and converting them into RDD, processing it and stored it into Cassandra.
  • Used AWS Data Pipeline to schedule an Amazon EMR cluster to clean and process web server logs stored in Amazon S3 bucket.
  • Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Casandra tables for quick searching, sorting and grouping and involved in analyzing log data to predict teh errors by using Apache Spark.
  • Developed multiple POCs using Scala and Pyspark and deployed on teh Yarn cluster, compared teh performance of Spark, and SQL
  • Used Sqoop to ingest from DBMS and Python to ingest logs from client data centers.

    Develop Python and bash scripts for automation and implemented Map Reduce jobs using Java API and Python using Spark

  • Imported data from RDBMS systems like MySQL into HDFS using Sqoop and developed Sqoop jobs to perform incremental imports into Hive tables.
  • Involved in loading and transforming of large sets of structured and semi structured data and created Data Pipelines as per teh business requirements and scheduled it using Oozie Coordinators.
  • Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
  • Integrated MapReduce wif HBase to import bulk amount of data into HBase using MapReduce programs.
  • Used Impala and Written Queries for fetching Data from Hive tables and developed Several MapReduce jobs using Java API.
  • Worked wif Apache SOLR to implement indexing and wrote Custom SOLR query segments to optimize teh search.
  • Definedatagovernancerules and administrating teh rights depending on job profile of users.
  • Developed Pig and Hive UDF's to implement business logic for processing teh data as per requirements and developed Pig UDFs in Java and used UDFs from PiggyBank for sorting and preparing teh data.
  • Configured and optimized teh Cassandra cluster and developed real-time java based application to work along wif teh Cassandra database.
  • Created Hive tables, loaded data and wrote Hive queries that halped market analysts spot emerging trends by comparing fresh data wif EDW tables and historical metrics.

Environment: Hadoop, Hive, HDFS, Pig, Sqoop, Python, SparkSQL, Machine Learning, MongoDB, AWS, AWS S3, AWS EC2, AWS EMR, Oozie, ETL, Tableau, Spark, Spark-Streaming, KAFKA, Apache Solr, Cassandra, Cloudera Distribution, Java, Impala, Web Server's, Maven Build, MySQL, AWS, Agile-Scrum

Confidential, Stamford CT

Sr. Bigdata Developer/Engineer


  • Configure a number of node (Amazon EC2 spot Instance)Hadoopcluster to transfer teh data from Amazon S3 to HDFS and HDFS to AmazonS3 and also to direct input and output to theHadoopMapReduce framework.
  • Delivered Working Widget Software using EXTJS4, HTML5, RESTFUL Web services, JSON Store, Linux,Hadoop, ZOOKEEPER, NO SQL databases, JAVA, SPRING Security, JBOSS Application Server for Big Dataanalytics.
  • Developed a custom AVRO Framework capable of solving small files problem inHadoopand also extended PIG and Hive tools to work wif it.
  • Exploring wif teh Spark improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, and Spark Yarn.
  • Worked on cloud computing infrastructure (e.g. Amazon Web Services EC2) and considerations for scalable, distributed systems
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
  • Worked on Go-cd (ci/cd tool) to deploy application and have experience wif Munin frame work for BigData Testing.
  • Involved in file movements between HDFS and AWS S3 and extensively worked wif S3 bucket in AWS and converted all Hadoop jobs to run in EMR by configuring teh cluster according to teh data size.
  • Demonstrated Hadoop practices and broad noledge of technical solutions, design patterns, and code for medium/complex applications deployed in Hadoop production.
  • Wrote Spark applications for Data validation, cleansing, transformations and custom aggregations and imported data from different sources into Spark RDD for processing and developed custom aggregate functions using Spark SQL and performed interactive querying
  • Extensively worked wif Avro and Parquet files and converted teh data from either format Parsed Semi Structured JSON data and converted to Parquet using Data Frames in Spark.
  • Involved in converting Hive/SQL queries into Spark Transformations using Spark RDDs and Scala and involved in using SQOOP for importing and exporting data between RDBMS and HDFS.
  • Imported teh data from different sources like AWS S3, Local file system into Spark RDD.
  • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations on teh fly to build teh common learner data model and persistence teh data in HDFS.
  • Involved in transforming teh relational database to legacy labels to HDFS, and HBASE tables using Sqoop and vice versa.
  • Processed teh web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis and worked on MongoDB NoSQL data modeling, tuning, disaster recovery and backup.
  • Developed data pipeline using Spark, Hive and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
  • Developed a Python Script to load teh CSV files into teh S3 buckets and created AWS S3 buckets, performed folder management in each bucket, managed logs and objects wifin each bucket
  • Worked wif different file formats like JSon, AVRO and parquet and compression techniques like snappy and developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool.
  • Developed shell scripts for dynamic partitions adding to hive stage table, verifying JSON schema change of source files, and verifying duplicate files in source location.
  • Monitor and Troubleshoot Hadoop jobs using Yarn Resource Manager and EMR job logs using Genie and Kibana.
  • DataManagement,DataAccess,DataGovernanceand Integration, Security, and

    Operations performed by using HortonworksDataPlatform (HDP).

  • Worked wif importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
  • Worked extensively wif importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud and making teh data available in Athena and Snowflake.
  • Extensively used Stash Git-Bucket for Code Control and Worked on AWS Components such as Airflow, Elastic Map Reduce (EMR), Athena and Snow-Flake.

Environment: Spark, AWS, EC2, EMR, Hive, SQL Workbench, Genie Logs, PySpark, Kibana, Sqoop, Spark SQL, Spark Streaming, Scala, Python, Hadoop (Cloudera Stack), Hue, Spark, Kafka, HBase, HDFS, Hive, Pig, Sqoop, Oracle, ETL, AWS S3, AWS EMR, GIT.


Bigdata/Hadoop Developer


  • Worked on loading disparate data sets coming from different sources to BDpaas (HADOOP) environment using Spark.
  • Developed UNIX scripts in creating Batch load for bringing huge amount of data from Relational databases to BIGDATA platform.
  • Delivery experience on major Hadoop ecosystem Components such as Pig, Hive, Spark Kafka, Elastic Search &HBase and monitoring wif Cloudera Manager.
  • Used AWS Data Pipeline to schedule an Amazon EMR cluster to clean and process web server logs stored in Amazon S3 bucket.
  • Implemented teh Machine learning algorithms using Spark wif Python and worked on Spark Storm, Apache and Apex and python.
  • Involved in analyzing data coming from various sources and creating Meta-files and control files to ingest teh data in to teh Data Lake.
  • Involved in configuring batch job to perform ingestion of teh source files in to teh Data Lake and developed Pig queries to load data to HBase
  • Leveraged Hive queries to create ORC tables and developed HIVE scripts for analyst requirements for analysis.
  • Implemented Kafka consumers to move data from Kafka partitions into Cassandra for near real-time analysis and worked extensively on Hive to create, alter and drop tables and involved in writing hive queries.
  • Developed workflow in Oozie to automate teh tasks of loading teh data into HDFS and pre-processing wif Pig and parsed high-level design spec to simple ETL coding and mapping standards.
  • Created and altered HBase tables on top of data residing in Data Lake and Created external Hive tables on teh Blobs to showcase teh data to teh Hive Meta Store.
  • Involved in requirement and design phase to implement Streaming Architecture to use real time streaming using Spark and Kafka.
  • Use Spark API for Machine learning. Translate a predictive model from SAS code to Spark and used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Created Reports wif different Selection Criteria from Hive Tables on teh data residing in Data Lake.
  • Worked on Hadoop Architecture and various components such as YARN, HDFS, NodeManager, ResourceManager, JobTracker, TaskTracker, NameNode, DataNode and MapReduce concepts.
  • Deployed Hadoop components on teh Cluster like Hive, HBase, Spark, Scala and others wif respect to teh requirement.
  • Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop.
  • Implemented teh Business Rules in Spark/ SCALA to get teh business logic in place to run teh Rating Engine.
  • Used Spark UI to observe teh running of a submitted Spark Job at teh node level and used Spark to do Property Bag Parsing of teh data to get teh required fields of data.
  • Extensively used ETL methodology for supporting Data Extraction, transformations and loading processing, using Hadoop.
  • Used both Hive context as well as SQL context of Spark to do teh initial testing of teh Spark job and used WINSCP and FTP to view teh data storage structure in teh server and to upload JARs which were used to do teh Spark Submit.
  • Developed code from scratch in Spark using SCALA according to teh technical requirements.

Environment: Hadoop, Map Reduce, Yarn, Hive, Pig, HBase, Sqoop, Spark, Scala, MapR, Core Java, R Language, SQL, Python, Eclipse, Linux, Unix, HDFS, Map Reduce, Impala, Cloudera, SQOOP, Kafka, Apache Cassandra, Oozie, Impala, Zookeeper, MySQL, Eclipse, PL/SQL


Sr. Java Developer


  • Involved in teh design and development phases of Agile Software Development and analyzed current Mainframe system and designed new GUI screens.
  • Developed teh application using 3 Tier Architecture me.e. Presentation, Business and Data Integration layers in accordance wif teh customer/client standards.
  • Played a vital role in Scala framework for web based applications and used Filenet for Content Management and for streamlining Business Processes.
  • Created Responsive Layouts for multiple devices and platforms using foundation framework and implemented printable chart report using HTML, CSS and jQuery.
  • Applied JavaScript for client side form validation and worked on UNIX, LINUX to move teh project into production environment.
  • Created Managed Beans for handling JSF pages and include logic for processing of teh data on teh page and Created simple user interface for application's configuration system using MVC design patterns and swing framework.
  • Used Object/Relational mapping tool Hibernate to achieve object to database table persistency.
  • Worked wif Core Java to develop automated solutions to include web interfaces using HTML, CSS, JavaScript and Web services.
  • Developed web GUI involving HTML, Java Script under MVC architecture and creation of WebLogic domains and setup Admin & Managed servers for JAVA/J2EE applications on Non Production and Production environments.
  • Implemented code according to coding standards and Created AngularJS Controller Which Isolate scopes perform operations and used GWT, GUICE, JavaScript, and Angular JS for client side implementation and extensively used Core Java such as Exceptions, and Collections.
  • Configured teh Web sphere application server to connect wif DB2, Oracle and SQL Server in teh back end by creating JDBC data source and configured MQ Series wif IBM RAD and WAS to create new connection factories and queues.
  • Extensively worked on TOAD for interacting wif data base, developing teh stored procedures and promoting SQL changes to QA and Production Environments.
  • Used Apache Maven for project management and building teh application and CVS was used for project management and version management.
  • Involved in teh configuration of Spring Framework and Hibernate mapping tool and monitoring WebLogic/JBoss Server health and security.
  • Creation of Connection Pools, Data Sources in WebLogic console and implemented Hibernate for Database Transactions on DB2.
  • Involved in configuring hibernate to access database and retrieve data from teh database and written Web Services (JAX-WS) for external system via SOAP/HTTP call.
  • Used Log4j framework to log/track application and involved in developing SQL queries, stored procedures, and functions.
  • Creating and updating existing build scripts using Ant for deployment Tested and implemented/deployed application on WAS server and used Rational Clear Case for Version Control.

ENVIRONMENT: FileNet, IBM RAD 6.0, Scala, Java 1.5, JSP, Servlets, Core Java, Spring, Swing, Hibernate, JSF, ICE Faces, Hibernate, HTML, CSS, JavaScript, NodeJs, UNIX, Web Services- SOAP, WAS 6.1, XML, IBM WebSphere 6.1, Rational Clear Case, Log 4j, IBM DB2.

Confidential -Dallas, TX

Java Developer


  • Involved in gathering and analyzing system requirements and played key role in teh high-level design for teh implementation of this application.
  • Mavenized teh existing applications using Maven tool and added teh required jar files to teh application as dependencies to teh pom.XML file and used JSF & Struts frameworks to interact wif teh front end.
  • Utilized Swing/JFC framework to develop client side components and developed J2EE components on Eclipse IDE.
  • Developed a new CR screen from teh existing screen for teh LTL loads (Low Truck Load) using JSF.
  • Used spring framework configuration files to manage objects and to achieve dependency injection.
  • Implemented cross cutting concerns like logging and monitoring mechanism using Spring AOP.
  • Implemented SOA architecture wif web services using SOAP, WSDL, UDDI and XML and made screen changes to teh existing screen for teh LTL (Low Truck Load) Accessories using Struts.
  • Developed desktop interface using Java Swing for maintaining and tracking products.
  • Used JAX-WS to access teh external web services, get teh xml response and convert it back to java objects.
  • Developed teh application using Eclipse IDE and worked under Agile Environment and worked wif Web admin and teh admin team to configure teh application on development,, test and stress environments (Web logic server).
  • Executed and coordinated teh installation for teh project and worked on web-based reporting system wif HTML, JavaScript and JSP.
  • Build PL\SQL functions, stored procedures, views and configured Oracle Database wif JNDI data source wif connection pooling enabled.
  • Developed teh and Appraisal modules using Java, JSP, Servlets and JavaScript
  • Used Hibernate based persistence classes at data access tier and adopted J2EE design patterns like Service Locator, Session Facade and Singleton.
  • Worked on Spring Core layer, Spring ORM, Spring AOP in developing teh application components.
  • Modified web pages using JSP and Used Struts Validation Framework for form input validation.
  • Created teh WSDL and used Apache Axis 2.0 for publishing teh WSDL and creating PDF files for storing teh data required for module.
  • Used custom components using JSTL tags and Tag libraries implementing struts and used Web Logic server for deploying teh war files and used Toad for teh DB2 database changes.

ENVIRONMENT: Java, J2EE, JSF, Hibernate, Struts, Spring, Swing/JFC, JSP, HTML, XML, Web Logic, iText, DB2, Eclipse IDE, SOAP, Maven, JSTL, TOAD, DB2, JDK, Web Logic Server, WSDL, JAX-WS, Apache Axis.

We'd love your feedback!