We provide IT Staff Augmentation Services!

Sr. Hadoop/big Data Developer Resume

4.00/5 (Submit Your Rating)

Columbus, OH

SUMMARY

  • 7+ years of total IT experience which includes 4+ years of experience in Hadoop and Big data with hands on experience in Java/J2EE Technologies
  • Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Data Frame, Pair RDD's, Spark YARN.
  • Good exposure to Service Oriented Architectures (SOA) built on Web services (WSDL) using SOAP protocol.
  • Written multiple MapReduce programs in Python for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
  • Extensively worked with different data sources non-relational databases such as XML files, parses like SAX, DOM and other relational databases such as Oracle, MySQL.
  • Experience in Apache Hadoop ecosystem components like HDFS, Map Reduce, Pig, Hive, Impala, HBase, SQOOP, Flume, Oozie.
  • Profuse experience in Amazon Web Services (AWS) Cloud services like EC2, S3.
  • Experience working on Application servers like IBM WebSphere, JBoss, BEA WebLogic and Apache Tomcat.
  • Extensive experience in Internet, client/server technologies using Java, J2EE, Struts, Hibernate, Spring, HTML, HTML5, DHTML, CSS, JavaScript, XML, PERL.
  • Expert in deploying the code trough web application servers like Web Sphere/Web Logic/ Apache Tomcat in AWS CLOUD.
  • Architect, design, construct, test, tune, and deploy ETL infrastructure based on the Hadoop ecosystem based technologies
  • Proficient in Core Java, Enterprise technologies such as EJB, Hibernate, Java Web Service, SOAP, REST Services, Java Thread, Java Socket, Java Servlet, JSP, JDBC etc.
  • Extracted & processed streaming log data from various sources and integrated in to HDFS using Flume.
  • Strong knowledge on implementation of SPARK core - SPARK SQL, MLlib, GraphX and Spark streaming.
  • Good understanding of Hadoop architecture and hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming.
  • Experience in importing and exporting data between HDFS and RDBMS using Sqoop.
  • Experience in working on the Hadoop Eco system, also have extensive experience in installing and configuring of the Hortonworks (HDP) distribution and Cloudera distribution (CDH3 and CDH4).
  • Experience in NoSQL database HBase, MongoDB and Cassandra.
  • Experienced in Working of NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Expertise in core Java, J2EE, Multithreading, JDBC, Hibernate, Shell Scripting Servlets, JSP, Spring, Struts, EJBs, Web Services and proficient in using Java API's for application development
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers, which also includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode.
  • Good working experience in Application and web Servers like JBoss and Apache Tomcat.
  • Experience in writing Pig and Hive scripts and extending the core functionality by writing custom UDF's.
  • Extensive experience with Agile Development, Object Modeling using UML.
  • Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
  • Experienced in building tool Maven, ANT and logging tool Log4J.

TECHNICAL SKILLS

Big data/Hadoop: Hadoop 2.7/2.5, HDFS 1.2.4, Map Reduce, Hive, Pig, Sqoop, Oozie, Hue

NoSQL Databases: HBase, MongoDB3.2 & Cassandra

Programming Languages: Java, Python, SQL, PL/SQL, AWS, Hive QL, Unix Shell Scripting, Scala

IDE and Tools: Eclipse 4.6, Netbeans 8.2, BlueJ

Web Technologies: HTML 5/4, DHTML, AJAX, JavaScript, jQuery and CSS3/2, JSP, Bootstrap 3/3.5

Application Server: Apache Tomcat, Jboss, IBM Web sphere, Web Logic

Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX- WS

Operating Systems: Windows 8/7, UNIX/Linux and Mac OS.

Database: Oracle 12c/11g, MYSQL, SQL Server 2016/2014, ETL

Other Tools: Maven, ANT, WSDL, SOAP, REST.

Methodologies: Waterfall, Agile UML, Design Patterns (Core Java and J2EE)

PROFESSIONAL EXPERIENCE

Confidential - Columbus, OH

Sr. Hadoop/Big data Developer

Responsibilities:

  • Loaded the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Developed complete end to end Big-data processing in Hadoop eco system.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Worked in exporting data from Hive 2.0.0 tables into Netezza 7.2.x database.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Experienced in Hadoop Big Data Integration with ETL on performing data extract, loading and transformation process for ERP data.
  • Used AWS Cloud with Infrastructure Provisioning / Configuration.
  • Developed Spark code using Python and Spark-SQL/Streaming for faster processing of data
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Performed File system management and monitoring on Hadoop log files.
  • Developed ETL Process using SPARK, SCALA, HIVE and HBASE.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
  • Deploy ETL code that aligns with the ETL target state architecture standards and development standards.
  • Developed Python scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark 2.0.0 for Data Aggregation, queries and writing data back into RDBMS through Sqoop.

Environment: Kafka, Apache Cassandra, Cassandra, MySQL, Oozie, Cloudera, AWS, Pig, Sqoop, Apache Hadoop, HDFS, Hive, Map Reduce, Eclipse, PL/SQL, GIT.

Confidential - Indianapolis, IN

Hadoop Developer

Responsibilities:

  • Worked on analyzing the Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS which consists of key attributes consumed by Business and other down streams.
  • Worked on NoSQL (HBase) for support enterprise production and loading data into HBASE using Impala and SQOOP.
  • Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
  • Experienced in analyzing, designing and developing ETL strategies and processes, writing ETL specifications.
  • Working on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting.
  • Worked in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications.
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
  • Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
  • Configured Hive metastore with MySQL, which stores the metadata for Hive tables.
  • Created tables in HBase to store variable data formats of PII data coming from different portfolios.
  • Involved in identifying job dependencies to design workflow for Oozie & YARN resource management.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP.
  • Implemented Map Reduce jobs in HIVE by querying the available data.
  • Performance tuning of Hive queries, MapReduce programs for different applications.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
  • Developed a standard ETL framework to enable the reusability of similar logic across the board. Involved in System Documentation of Dataflow and methodology
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Worked on MongoDB, HBase (NoSQL) databases which differ from classic relational databases
  • Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
  • Integrated Kafka-Spark streaming for high efficiency throughput and reliability
  • Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts.

Environment: HDFS, Map Reduce, Pig, Hive, Sqoop, Oracle 12c, Flume, Oozie, HBase, Impala, Spark Streaming, Yarn, Eclipse, spring, PL/SQL, UNIX Shell Scripting, Cloudera, BitBucket.

Confidential, Folsom, CA

Big Data/Hadoop Engineer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
  • Installed Hadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Developed data pipeline using Flume, Sqoop to ingest cargo data and customer histories into HDFS for analysis.
  • Worked on importing data from HDFS to Oracle database and vice-versa using SQOOP to configure Hive meta store with MySQL, which stores the metadata for Hive tables.
  • Involved in gathering requirements from client and estimating time line for developing complex queries using HIVE and IMPALA for logistics application.
  • Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
  • Developed Simple to complex Map/reduce Jobs using Scala and Java in Spark.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
  • Written Hive and Pig scripts as per requirements to automate the workflow using shell scripts.
  • Participated in Rapid Application Development and Agile processes to deliver new cloud platform services.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Responsible for design development of Spark SQL Scripts based on Functional Specifications.
  • Exploring with the Spark improving the Performance and Optimization of the existing algorithms in Hadoop.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.

Environment: Apache Hadoop, Hive, Zookeeper, Map Reduce, Sqoop, crunch API, Pig, HCatalog, Unix, Java, Oracle, SQL Server, MYSQL, Oozie, Python.

Confidential, Orlando, FL

Sr. Java Developer

Responsibilities:

  • Worked on designing and developing the Web Application User Interface and implemented its related functionality in Java/J2EE for the product.
  • Designing and implementing SOA compliant management and metrics infrastructure for Mule ESB infrastructure utilizing the SOA management components.
  • Used NodeJs for server side rendering. Implemented modules into NodeJs to integrate with designs and requirements.
  • Provided and implemented numerous solution ideas to improve the performance and stabilize the application.
  • Extensively used LDAP Microsoft Active Directory for user authentication while login.
  • Developed unit test cases using JUnit.
  • Wrote JSF managed beans, converters and validators following framework standards and used explicit and implicit navigations for page navigations.
  • Designed and developed Persistence layer components using Hibernate ORM tool.
  • Used Oracle 10g as backend to store and fetch data.
  • Experienced in using IDEs like Eclipse and Net Beans, integration with Maven
  • Created Real-time Reporting systems and dashboards using XML, MySQL, and Perl
  • Worked on Restful web services which enforced a stateless client server and support JSON (few changes from SOAP to RESTFUL Technology)
  • Used JSF framework to implement MVC design pattern.
  • Developed and coordinated complex high quality solutions to clients using J2SE, J2EE, Servlets, JSP, HTML, Struts, Spring MVC, SOAP, JavaScript, JQuery, JSON and XML.
  • Involved in detailed analysis based on the requirement documents.
  • Involved in Design, development and testing of web application and integration projects using Object Oriented technologies such as Core Java, J2EE, Struts, JSP, JDBC, Spring Framework, Hibernate, Java Beans, Web Services (REST/SOAP), XML, XSLT, XSL and Ant.
  • Migrated existing Struts application to Spring MVC framework.
  • Created the project from scratch using Angular JS as frontend, Node Express JS as backend.
  • Involved in developing Perl script and some other scripts like java script
  • Tomcat is the web server used to deploy OMS web application.
  • Used SOAPLite module to communicate with different web-services based on given WSDL.
  • Prepared technical reports &documentation manuals during the program development.

Environment: JDK 1.5, JSF, Hibernate 3.6, JIRA, NodeJs, Cruise control, Log4j, Tomcat, LDAP, JUNIT, NetBeans, Windows/Unix.

We'd love your feedback!