Sr. Big Data Developer Resume
Atlanta, GA
SUMMARY
- Around 8+ years of experience in Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects and experience in Big Data in implementing end - to-end Hadoop solutions.
- Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
- Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing
- Extensive experience in installing, configuring and using Big Data ecosystem components like Hadoop MapReduce, HDFS, Sqoop, Pig, Hive, Impala, Spark and Zookeeper
- Expertise in using J2EE application servers such as IBM Web Sphere, JBoss and web servers like Apache Tomcat.
- Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP).
- Experience in developing web services with XML based protocols such as SOAP, Axis, UDDI and WSDL.
- Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Very well verse in writing and deploying Oozie Workflows and Coordinators.
- Highly skill in integrating Amazon Kinesis streams with Spark streaming applications to build long running real-time applications.
- Good working experience on using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse and Data Mart.
- Good knowledge in SQL and PL/SQL to write Stored Procedures and Functions and writing unit test cases using JUnit.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- In-depth knowledge of handling large amounts of data utilizing Spark Data Frames/Datasets API and Case Classes.
- Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required.
- Working experience in Impala, Mahout, Spark SQL, Storm, Avro, Kafka, and AWS.
- Experience with Java web framework technologies like Apache Camel and Spring Batch.
- Experience in version control and source code management tools like GIT, SVN, and Bit Bucket.
- Hands on experience working with databases like Oracle, SQL Server and MySQL.
- Great knowledge of working with Apache Spark Streaming API on Big Data Distributions in an active cluster environment.
- Proficiency in developing secure enterprise Java applications using technologies such as Maven, Hibernate, XML, HTML, CSS Version Control Systems
- Developing and implementing Apache NIFI across various environments, written QA scripts in Python for tracking files.
- Excellent understanding of Hadoop and underlying framework including storage management.
TECHNICAL SKILLS
Hadoop/Big Data Technologies: Hadoop 3.0, HDFS, MapReduce, HBase 1.4, Apache Pig, Hive 2.3, Sqoop 1.4, Apache Impala 2.1, Oozie 4.3, Yarn, Apache Flume 1.8, Kafka 1.1, Zookeeper
Cloud Platform: Amazon AWS, EC2, EC3, MS Azure, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake, Data Factory
Programming Language: Java, Scala, Python 3.6, SQL, PL/SQL, Shell Scripting, Storm 1.0, JSP, Servlets
Frameworks: Spring 5.0.5, Hibernate 5.2, Struts 1.3, JSF, EJB, JMS
Web Technologies: HTML, CSS, JavaScript, JQuery 3.3, Bootstrap 4.1, XML, JSON, AJAX
Operating Systems: Linux, UNIX, Windows 10/8/7
IDE and Tools: Eclipse 4.7, NetBeans 8.2, IntelliJ, Maven
NoSQL Databases: HBase 1.4, Cassandra 3.11, MongoDB, Accumulo
SDLC Methodologies: Agile, Waterfall
Version Control: GIT, SVN, CVS
PROFESSIONAL EXPERIENCE
Confidential, Atlanta GA
Sr. Big Data Developer
Responsibilities:
- Developed Big Data applications using Spark and Scala.
- Worked on Big Data eco-systems including Hive, MongoDB, Zookeeper, Spark Streaming with MapR distribution.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
- Involved in writing Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
- Loaded the data into Spark RDD and Perform in-memory data computation to generate the output as per the requirements.
- Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
- Built Hadoop solutions for big data problems using MR1 and MR2 in YARN.
- Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
- Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
- Worked on analyzing, writing Hadoop MapReduce jobs using JavaAPI, Pig and hive.
- Developed reports, dashboards using Tableau for quick reviews to be presented to business.
- Worked on configuring and managing disaster recovery and backup on Cassandra Data.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Worked on MongoDB, HBase databases which differ from classic relational databases
- Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
- Used Hive to perform data validation on the data ingested using Sqoop and cleansed the data.
- Developed several business services using Java RESTful Web Services using Spring MVC framework.
- Involved in identifying job dependencies to design workflow for Oozie and YARN resource management.
- Designed solution for various system components using Microsoft Azure.
- Worked on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting
- Explored with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD.
- Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
- Implemented Security in Web Applications using Azure and deployed Web Applications to Azure.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Participated in all aspects of Software Development Life Cycle (SDLC) and Production troubleshooting, Software testing using Standard Test Tool.
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
- Developed ApacheNifi flows dealing with various kinds of data formats such as XML, JSON, and Avro.
- Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP.
- Configured Hive meta store with MySQL, which stores the metadata for Hive tables.
- Performed data analytics in Hive and then exported those metrics back to Oracle Database using Sqoop.
- Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
- Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
- Developed many distributed, transactional, portable applications using Enterprise JavaBeans (EJB) architecture for Java 2 Enterprise Edition (J2EE) platform.
- Used Cloudera Manager for installation and management of Hadoop Cluster.
Environment: Flume 1.8, Tableau, GIT, Kafka 1.1, MapReduce, JSON, AVRO, Teradata, Maven, SOAP Hadoop 3.0 , Oozie 4.3, Zookeeper 3.4, Cassandra 3.0, Sqoop 1.4, Apache NiFi 1.4, ETL, Azure, Hive 2.3, HBase 1.4, Pig 0.17, HDFS 3.1.
Confidential, Phoenix, AZ
Spark/Hadoop Developer
Responsibilities:
- Worked on Hadoop cluster scaling from 4 nodes in development environment to 8 nodes in pre-production stage and up to 24 nodes in production.
- Extensively migrated existing architecture to Spark Streaming to process the live streaming data.
- Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
- Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review sessions.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS.
- Worked with multiple teams to provision AWS infrastructure for development and production environments.
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Created custom new columns depending up on the use case while ingesting the data into Hadoop Lake using pyspark.
- Designed and developed applications in Spark using Scala to compare the performance of Spark with Hive and SQL.
- Stored data in S3 buckets on AWS cluster on top of Hadoop.
- Developed time driven automated Oozie workflows to schedule Hadoop jobs.
- Designed and implemented map reduce jobs to support distributed processing using java, Hive, Scala and Apache Pig.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured.
- Worked with Apache Nifi for managing the flow of data from sources through automated data flow.
- Created, modified and executed DDL and ETL scripts for De-normalized tables to load data into Hive and AWS Redshift tables.
- Implemented POC to migrate MapReduce programs into Spark transformations using and Scala.
- Managed and review data backups, Manage and review Hadoop log files Hortonworks Cluster.
- Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Participated in development/implementation of Cloudera Impala Hadoop environment.
- Used Test driven approach for developing the application and Implemented the unit tests using Python Unit Test framework
- Involved in Ad Hoc stand up and architecture meetings to set up daily priorities.
Environment: Hadoop 3.0, Spark 2.4, Agile, Sqoop 1.4, Hive 2.3, AWS, pyspark, Scala 2.12, Oozie 4.3, Apache Pig 0.17, NoSQL, NoSQL, MapReduce, Hortonworks
Confidential, New Albany, OH
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop Ecosystem components and Cloudera manager using CDH distribution.
- Worked on S3 buckets on AWS to store Cloud Formation Templates and worked on AWS to create EC2 instances.
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components
- Used Oozie scripts for deployment of the application and perforce as the secure versioning software.
- Involved in Installing, Configuring Hadoop Eco-System, Cloudera Manager using CDH3, CDH4 Distributions.
- Worked on No-SQL databases for POC purpose in storing images and URIs.
- Involved in creating Hive tables, then applied HiveQL on those tables, this will invoke and run MapReduce jobs automatically.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Highly skilled in integrating Kafka with Spark streaming for high-speed data processing
- Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala.
- Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
- Used Enterprise Data Warehouse database to store the information and to make it access all over organization.
- Exported of result set from HIVE to MySQL using Sqoop export tool for further processing.
- Collected and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Involved in creating Hive tables, loading with data and writing hive queries which runs internally in MapReduce way.
- Worked on Spark for in memory commutations and comparing the Data Frames for optimizing performance.
- Created web services request-response mappings by importing source and target definition using WSDL file.
- Developed custom UDF's to generate unique key for the use in Apache pig transformations.
- Created conversion scripts using Oracle SQL queries, functions and stored procedures, test cases and plans before ETL migrations.
- Developed Shell scripts to read files from edge node to ingest into HDFS partitions based on the file naming pattern.
Environment: Hadoop 2.3, AWS, Sqoop 1.2, HDFS, Oozie 4.1, Cassandra 3.0, MongoDB 3.5, Hive 2.1, Kafka, Spark 2.1, MapReduce, Scala 2.0, MySQL, Flume, JSON
Confidential, McLean, VA
Java/J2EE Developer
Responsibilities:
- Applied J2EE Design Patterns such as Factory, Singleton, and Business delegate, DAO, Front Controller Pattern and MVC.
- Designed Rich user Interface Applications using JavaScript, CSS, HTML and AJAX and developed web services by using SOAP.
- Implemented modules using Java APIs, Java collection, Threads, XML, and integrating the modules.
- Successfully installed and configured the IBM WebSphere Application server and deployed the business tier components using EAR file.
- Designed and developed business components using Session and Entity Beans in EJB.
- Implemented CORS (Cross Origin Resource Sharing) using Node JS and developed REST services using Node and Express, Mongoose modules.
- Used Log4j for Logging various levels of information like error, info, debug into the log files.
- Developed integrated applications and light weight component using spring framework and IOC features from spring web MVC to configure application context for spring bean factory.
- Used JDBC prepared statements to call from Servlets for database access.
- Used Angular JS to connect the web application to back-end APIs, used RESTFUL methods to interact with several API's.
- Involved in build/deploy applications using Maven and integrated with CI/CD server Jenkins.
- Implemented JQuery features to develop the dynamic queries to fetch the data from database.
- Involved in developing module for transformation of files across the remote systems using JSP and servlets.
- Worked on Development bugs assigned in JIRA for Sprint following agile process.
- Used ANT scripts to fetch, build, and deploy application to development environment.
- Extensively used JavaScript to provide the users with interactive, Speedy, functional and more usable user interfaces.
- Implemented MVC architecture using Apache struts, JSP & Enterprise Java Beans.
- Used AJAX and JSON to make asynchronous calls to the project server to fetch data on the fly.
- Developed batch programs to update and modify metadata of large number of documents in FileNet Repository using CE APIs
- Worked on creating a test harness using POJOs which would come along with the installer and test the services every time the installer would be run.
- Worked on creating Packages, Stored Procedures & Functions in Oracle using PL/SQL and TOAD.
- Used JNDI to perform lookup services for the various components of the system.
- Deployed the application and tested on JBoss Application Server. Collaborated with Business Analysts during design specifications.
- Developed Apache Camel middleware routes, JMS endpoints, spring service endpoints and used Camel free marker to customize REST responses.
Environment: J2EE, MVC, JavaScript 2016, CSS3, HTML 5, AJAX, SOAP, Java 7, Jenkins 1.9, Maven, ANT, Apache struts, Apache Camel
Confidential
Java Developer
Responsibilities:
- Wrote complex SQL queries and programmed stored procedures, packages, and triggers.
- Designed HTML prototypes, visual interfaces and interaction of Web-based design.
- Designed and developed a business tiers using EJBs and Used Session Beans to encapsulate the Business Logic.
- Involved in the configuration of Spring MVC and Integration with Hibernate.
- Used Eclipse IDE to configure and deploy the application onto WebLogic application server using Maven build scripts to automate the build and deployment process
- Designed CSS based page layouts that are cross-browser compatible and standards-compliant.
- Used spring framework for Dependency Injection and JDBC connectivity.
- Created data source and connection pools in Web Logic and deployed applications on the server
- Developed XML and XSLT pages to store and present data to the user using parsers.
- Developed RESTful Web services client to consume JSON messages
- Implemented business logic with POJO using multithreading and design patterns.
- Created test cases for DAO Layer and service layer using JUNIT and bug tracking using JIRA.
- Used Struts, Front Controller and Singleton patterns, for developing the action and Servlets classes, Involved in
- Worked on web-based reporting system with HTML, JavaScript, and JSP.
- Wrote REST Web Services for sending and getting data from the external interface
- Implemented the application using Spring Boot Framework and handles the security using spring security.
- Researched and Executed of JavaScript Frameworks, including Angular JS and Node JS.
- Developed stored procedures and triggers using PL/SQL to calculate and update the tables to implement business logic
- Used Subversion (SVN) as the configuration management tool to manage the code repository.
- Used Maven as the build tool and Tortoise SVN as the Source version controller
- Used JQuery for basic animation and end user screen customization purposes.
- Used GIT as Version Control System performed Module and Unit Level Testing with JUnit and log4j
- Participated in Unit testing and functionality testing for tracking errors and debugging the code.
Environment: SQL, HTML 4, MVC, Hibernate, Eclipse, Maven, CSS2, JSON, JUNIT, JavaScript, PL/SQL, JQuery, JUnit