Big Data Engineer Resume
Denver, CO
PROFESSIONAL SUMMARY:
- 8 years of total IT experience this includes 4+ years of experience in Hadoop and Big data which includes hands on experience in Java/J2EE Technologies
- Experience in Apache Hadoop ecosystem components like HDFS, Map Reduce, Pig, Hive, Impala, HBase, SQOOP, Flume, Oozie.
- Well versed experience in Amazon Web Services (AWS) Cloud services like EC2, S3.
- Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Data Frame, Pair RDD's, Spark YARN.
- Experienced in Worked on NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
- Proficient in Core Java, Enterprise technologies such as EJB, Hibernate, Java Web Service, SOAP, REST Services, Java Thread, Java Socket, Java Servlet, JSP, JDBC etc.
- Good exposure to Service Oriented Architectures (SOA) built on Web services (WSDL) using SOAP protocol.
- Written multiple MapReduce programs in Python for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
- Experience in working on the Hadoop Eco system, also have extensive experience in installing and configuring of the Hortonworks (HDP) distribution and Cloudera distribution (CDH3 and CDH4).
- Experience in NoSQL database HBase, MongoDB and Cassandra.
- Good understanding of Hadoop architecture and hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming.
- Experience in importing and exporting data between HDFS and RDBMS using Sqoop.
- Extracted & processed streaming log data from various sources and integrated in to HDFS using Flume.
- Strong knowledge on implementation of SPARK core - SPARK SQL, MLlib, GraphX and Spark streaming.
- Extensively worked with different data sources non-relational databases such as XML files, parses like SAX, DOM and other relational databases such as Oracle, MySQL.
- Experience working on Application servers like IBM WebSphere, JBoss, BEA WebLogic and Apache Tomcat.
- Extensive experience in Internet, client/server technologies using Java, J2EE, Struts, Hibernate, Spring, HTML, HTML5, DHTML, CSS, JavaScript, XML, PERL.
- Expert in deploying the code trough web application servers like Web Sphere/Web Logic/ Apache Tomcat in AWS CLOUD.
- Expertise in core Java, J2EE, Multithreading, JDBC, Hibernate, Shell Scripting Servlets, JSP, Spring, Struts, EJBs, Web Services and proficient in using Java API's for application development
- Good working experience in Application and web Servers like JBoss and Apache Tomcat.
- Experience in writing Pig and Hive scripts and extending the core functionality by writing custom UDF's.
- Extensive experience with Agile Development, Object Modeling using UML.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode.
- Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
- Experienced in building tool Maven, ANT and logging tool Log4J.
- Experience in working with Eclipse IDE, NetBeans.
TECHNICAL SKILLS
Big data/Hadoop: Hadoop2.7/2.5, HDFS1.2.4, Map Reduce, Hive, Pig, Sqoop, Oozie, Hue NoSQL Databases HBase, MongoDB3.2 & Cassandra
Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX- WS
Programming Languages: Java, Python, SQL, PL/SQL, AWS, Hive QL, Unix Shell Scripting, Scala
IDE and Tools: Eclipse 4.6, Netbeans 8.2, BlueJ
Database: Oracle 12c/11g, MYSQL, SQL Server 2016/2014
Web Technologies: HTML5/4, DHTML, AJAX, JavaScript, jQuery and CSS3/2, JSP, Bootstrap 3/3.5
Application Server: Apache Tomcat, Jboss, IBM Web sphere, Web Logic
Operating Systems: Windows8/7, UNIX/Linux and Mac OS.
Other Tools: Maven, ANT, WSDL, SOAP, REST.
Methodologies: Software Development Lifecycle (SDLC), Waterfall, Agile UML, Design Patterns (Core Java and J2EE)
PROFESSIONAL EXPERIENCE
Confidential - Denver, CO
Big Data Engineer
Responsibilities:
- Worked on live 30 nodes Hadoop cluster running CDH 4.4.
- Worked with highly unstructured and semi structured data of 20TB in size.
- Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
- Created Hive External tables to stage data and then move the data from Staging to main tables
- Worked in exporting data from Hive 2.0.0 tables into Netezza 7.2.x database.
- Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
- Developed Python scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark 2.0.0 for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Developed Spark code using Python and Spark-SQL/Streaming for faster processing of data.
- Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
- Developed complete end to end Big-data processing in Hadoop eco system.
- Used AWS Cloud with Infrastructure Provisioning / Configuration.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Performed File system management and monitoring on Hadoop log files.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
- Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
- Implemented partitioning, dynamic partitions and buckets in HIVE.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
Environment: Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Cloudera, AWS, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, MySQL, Eclipse, PL/SQL, GIT.
Confidential - Louisville, KY
Big Data Engineer
Responsibilities:
- Worked on Cloudera CDH 5.4 distribution of Hadoop.
- Extensively working with MySQL for identifying required tables and views to export into HDFS.
- Responsible for moving data from MySQL to HDFS to development cluster for validation and cleansing.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Worked in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
- Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications.
- Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS which consists of key attributes consumed by Business and other down streams.
- Worked on NoSQL (HBase) for support enterprise production and loading data into HBASE using Impala and SQOOP.
- Performed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
- Worked on AWS provisioning EC2 Infrastructure and deploying applications in Elastic load balancing.
- Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
- Created tables in HBase to store variable data formats of PII data coming from different portfolios.
- Involved in identifying job dependencies to design workflow for Oozie & YARN resource management.
- Working on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting
- Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
- Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
- Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP.
- Implemented Map Reduce jobs in HIVE by querying the available data.
- Configured Hive meta store with MySQL, which stores the metadata for Hive tables.
- Performance tuning of Hive queries, MapReduce programs for different applications.
- Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Used Cloudera Manager for installation and management of Hadoop Cluster.
- Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Worked on MongoDB, HBase (NoSQL) databases which differ from classic relational databases
- Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
- Integrated Kafka-Spark streaming for high efficiency throughput and reliability
- Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
- Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts.
Environment: HDFS, Map Reduce, Pig, Hive, Sqoop, Oracle 12c, Flume, Oozie, HBase, Impala, Spark Streaming, Yarn, Eclipse, spring, PL/SQL, UNIX Shell Scripting, Cloudera, BitBucket.
Confidential - Nashville, TN
Hadoop Developer
Responsibilities:
- Involved in database development and creating SQL scripts.
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
- Installed Hadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Involved in gathering requirements from client and estimating time line for developing complex queries using HIVE and IMPALA for logistics application.
- Responsible for design development of Spark SQL Scripts based on Functional Specifications.
- Exploring with the Spark improving the Performance and Optimization of the existing algorithms in Hadoop.
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Developed Simple to complex Map/reduce Jobs using Scala and Java in Spark.
- Developed data pipeline using Flume, Sqoop to ingest cargo data and customer histories into HDFS for analysis.
- Worked on importing data from HDFS to Oracle database and vice-versa using SQOOP to configure Hive meta store with MySQL, which stores the metadata for Hive tables.
- Written Hive and Pig scripts as per requirements to automate the workflow using shell scripts.
- Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System using Oozie Workflow Scheduler.
- Participated in Rapid Application Development and Agile processes to deliver new cloud platform services.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
Environment: Apache Hadoop, Hive, Zookeeper, Map Reduce, Sqoop, crunch API, Pig, HCatalog, Unix, Java, Oracle, SQL Server, MYSQL, Oozie, Python.
Confidential - Orlando, FL
Sr. Java Developer
Responsibilities:
- Worked on designing and developing the Web Application User Interface and implemented its related functionality in Java/J2EE for the product.
- Used JSF framework to implement MVC design pattern.
- Developed and coordinated complex high quality solutions to clients using J2SE, J2EE, Servlets, JSP, HTML, Struts, Spring MVC, SOAP, JavaScript, JQuery, JSON and XML.
- Wrote JSF managed beans, converters and validators following framework standards and used explicit and implicit navigations for page navigations.
- Designed and developed Persistence layer components using Hibernate ORM tool.
- Designed UI using JSF tags, Apache Tomahawk & Rich faces.
- Used Oracle 10g as backend to store and fetch data.
- Experienced in using IDEs like Eclipse and Net Beans, integration with Maven
- Created Real-time Reporting systems and dashboards using XML, MySQL, and Perl
- Worked on Restful web services which enforced a stateless client server and support JSON (few changes from SOAP to RESTFUL Technology)
- Involved in detailed analysis based on the requirement documents.
- Involved in Design, development and testing of web application and integration projects using Object Oriented technologies such as Core Java, J2EE, Struts, JSP, JDBC, Spring Framework, Hibernate, Java Beans, Web Services (REST/SOAP), XML, XSLT, XSL and Ant.
- Designing and implementing SOA compliant management and metrics infrastructure for Mule ESB infrastructure utilizing the SOA management components.
- Used NodeJs for server side rendering. Implemented modules into NodeJs to integrate with designs and requirements.
- Used JAX-WS to interact in front-end module with backend module as they are running in two different servers.
- Responsible for Offshore deliverables and provide design/technical help to the team and review to meet the quality and time lines.
- Migrated existing Struts application to Spring MVC framework.
- Provided and implemented numerous solution ideas to improve the performance and stabilize the application.
- Extensively used LDAP Microsoft Active Directory for user authentication while login.
- Developed unit test cases using JUnit.
- Created the project from scratch using Angular JS as frontend, Node Express JS as backend.
- Involved in developing Perl script and some other scripts like java script
- Tomcat is the web server used to deploy OMS web application.
- Used SOAPLite module to communicate with different web-services based on given WSDL.
- Prepared technical reports &documentation manuals during the program development.
Environment: JDK 1.5, JSF, Hibernate 3.6, JIRA, NodeJs, Cruise control, Log4j, Tomcat, LDAP, JUNIT, NetBeans, Windows/Unix.
Confidential
Java Developer
Responsibilities:
- Developed using new features of Java 1.5 Annotations, Generics, enhanced for loop and Enums.
- Used Struts and Hibernate for implementing IOC, AOP and ORM for back end tiers.
- Designing of the system as per the change in requirement using Struts MVC architecture, JSP, DHTML
- Designed the application using J2EE patterns.
- Design of REST APIs that allow sophisticated, effective and low cost application integrations.
- Developed the presentation layer using Struts Framework.
- Wrote Java utility classes common for all of the applications.
- Analyzed and fine Tuned RDBMS/SQL queries to improve performance of the application with the database.
- Deployed the jar files in the Web Container on the IBM WebSphere Server 5.x.
- Designed and developed the screens in HTML with client side validations in JavaScript.
- Developed the server side scripts using JMS, JSP and Java Beans.
- Adding and modifying Hibernate configuration code and Java/SQL statements depending upon the specific database access requirements.
- Design database Tables, View, Index's and create triggers for optimized data access.
- Analyzed and fine Tuned RDBMS/SQL queries to improve performance of the application with the database.
- Creating XML based configuration, property files for application and developing parsers using JAXP, SAX, and DOM technologies.
- Developed Web Services using Apache AXIS tool.