Big Data/hadoop Developer (aws) Resume
Phoenix, AZ
CAREER SUMMARY:
- Over 6+ years of IT experience in software analysis, design, development, testing and implementation of Big Data, Hadoop, NoSQL and Java/J2EE technologies.
- Experience using various Hadoop Distributions (Cloudera, Hortonworks, MapR, etc) to fully implement and leverage new Hadoop features.
- Experience working with Data Frames, RDD, Spark SQL, Spark Streaming, APIs, System Architecture, and Infrastructure Planning.
- Experience with Core Java component Collection, Generics, Inheritance, Exception Handling and Multi - threading.
- Very good understanding on NoSql databases like MongoDB and HBase.
- Experience on major components in Hadoop Ecosystem including Hive, Sqoop, Flume & knowledge of MapReduce/HDFS Framework.
- Hands-on programming experience in various technologies like Java, J2EE, Html, XML
- A very good experience in developing and deploying the applications using Web logic, Apache Tomcat, and JBoss.
- Experience in working with Developer Toolkits like Force.com IDE, Force.com Ant Migration Tool, Eclipse IDE, Mavens.
- Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Experience in installation, configuration and deployment of Big Data solutions.
- Knowledge on implementing Big Data in Amazon Elastic MapReduce (Amazon EMR) for processing, managing Hadoop framework dynamically scalable Amazon EC2.
- Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) Yarn Architecture.
- Hands on experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
- Excellent knowledge on Hadoop architecture; as in HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Experience in implementing spark solution to enable real time reports from Cassandra data.
- Hands on expertise in working and designing of Row keys & Schema Design with NOSQL databases like MongoDB.
- Experience in extracting files from MongoDB through Sqoop and placed in HDFS and processed.
- Hands on experience with Spark Core, Spark SQL and Data Frames/Data Sets/RDD API.
- Experience in using Kafka and Kafka brokers to initiate spark context and processing live streaming information with the help of RDD.
- Developed Java applications using various IDE's like Spring Tool Suite and Eclipse.
- Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
- Extensive experience in Application servers likes Web logic, Web Sphere, JBoss, Glassfish and Web Servers like Apache Tomcat.
TECHNICAL SKILLS:
Hadoop/Big Data Technologies: Hadoop 3.0, HDFS, MapReduce, HBase 1.4, Hive 2.3, Apache Impala 3.0, Yarn, Apache Flume 1.8, Kafka 1.1, Zookeeper 3.4
Hadoop Distributions: Cloudera, Hortonworks, MapR.
Cloud: AWS, Azure, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, Azure Data Lake and Data Factory.
Databases: Microsoft SQL Server, MySQL, Oracle, NoSQL and HBase.
Scripting Languages: JavaScript, HTML & Bash.
Tools: Eclipse, Maven and SBT.
Platforms: Windows, Linux, and Centos.
Programming Languages: Java, C/C++ and Scala.
Currently Exploring: Apache Kylo, Nifi, Flink and Alluxio.
PROFESSIONAL EXPERIENCE:
Confidential, Phoenix, AZ
Big Data/Hadoop Developer (AWS)
Responsibilities:
- As a Big Data/Hadoop Developer, I am working on Hadoop eco-systems including HBase, Hive, Spark Streaming and MapR distribution.
- Worked on Big Data infrastructure for batch processing as well as real-time processing. Responsible for building scalable distributed data solutions using Hadoop.
- Involved in development of Hadoop System and improving multi-node Hadoop Cluster performance.
- Used Spark to create the structured data from large amount of unstructured data from various sources.
- Deployed MapReduce and Spark jobs on Amazon Elastic MapReduce using datasets stored on S3.
- Used Amazon CloudWatch to monitor and track resources on AWS.
- Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review sessions.
- Created Managed tables and External tables in Hive and loaded data from HDFS.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Created Hive tables, then applied HiveQL on those tables, this will invoke and run MapReduce jobs automatically.
- Deployed the application in Hadoop cluster mode by using spark submit scripts.
- Worked on the large-scale Hadoop YARN cluster for distributed data processing and analysis using Spark and Hive.
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and MapReduce.
- Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrated Hive with existing applications.
- Performance tuning of Hive queries, MapReduce programs for different applications.
- Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS.
- Used Test driven approach for developing the application and Implemented the unit tests using Python Unit Test framework
- Involved in ad hoc stand up and architecture meetings to set up daily priorities and track the status of work as a part of highly agile work environment.
Environment: Hadoop 3.0, Spark 2.3, Hive 2.3, MapReduce, Yarn, HDFS, AWS, S3, HBase 2.1, CDH3, CDH4, Python, ad hoc.
Confidential, Bellevue, WA
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Spark in near real time and Persists into Cassandra.
- Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
- Worked on loading data into Spark RDD's, perform advanced procedures like text analytics using in-memory data computation capabilities of Spark to generate the Output response.
- Developed the statistics graph using JSP, Custom tag libraries, Applets and Swing in a multi-threaded architecture
- Executed many performance tests using the Cassandra-stress tool to measure and improve the read and write performance of the cluster.
- Handled large datasets using Partitions, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Used Kafka Streams to Configure Spark Streaming to get information and then store it in HDFS.
- Migrated an existing on-premises application to AWS.
- Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
- Performed the migration of Hive and MapReduce Jobs from on-premise MapR to AWS cloud using EMR.
- Partitioned data streams using Kafka, designed and Used Kafka producer API's to produce messages.
- Developed Spark code using MapReduce and Spark-SQL/Streaming for faster testing and processing of data.
- Performed tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Ingested data from RDBMS to Hive to perform data transformations, and then export the transformed data to Cassandra for data access and analysis.
- Experienced in Core Java, Collection Framework, JSP, Dependency Injection, Spring MVC, Restful Web services.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data.
Environment: Hadoop 3.0, Spark 2.1, Cassandra 1.1, Kafka 0.9s, JSP, HDFS, AWS, EC2, Hive 1.9, MapReduce, Java
Confidential, Charlotte, NC
Spark/Scala Developer
Responsibilities:
- As Spark/Scala Developer, I have worked with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Serializing JSON data and storing the data into tables using Spark SQL.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's and Scala.
- Developed cloud infrastructure using Azure Cloud services.
- Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
- Worked with Azure Monitoring and Data Factory.
- Supported migrations from on premise to Azure.
- Provided support services to enterprise customers related to Microsoft Azure Cloud networking and experience in handling critical situation cases.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
- Wrote Shell scripts to automate the process flow.
- Performed business analytical scripts using HiveSQL.
- Provided consulting and cloud architecture for premier customers and internal projects running on MS Azure platform for high-availability of services, low operational costs.
- Optimized test content and process with a reduction of 20% in false positives.
- Used SQL and excel to pull, analyze, polish and visualize data.
- Followed agile methodology and SCRUM meetings to track, optimize and tailored features to customer needs.
Environment: Hadoop, Hive, Spark, Spark-SQL, Spark-Streaming, Scala, Azure Data factory, Azure Storage and Agile Methodologies.
Confidential
Java Developer
Responsibilities:
- Involved in the complete SDLC software development life cycle of the application from requirement gathering and analysis to testing and maintenance.
- Worked with the business community to define business requirements and analyze the possible technical solutions.
- Requirement gathering, Business Process flow, Business Process Modeling and Business Analysis.
- Implemented the User Login logic using Spring MVC framework encouraging application architectures based on the Model View Controller design paradigm
- Used various Java, J2EE APIs including JDBC, XML, Servlets and JSP.
- Generated Hibernate Mapping files and created the data model using mapping files.
- Developed UI using JavaScript, JSP, HTML and CSS for interactive cross browser functionality and complex user interface.
- Created business logic using Servlets and session beans and deployed them on Apache Tomcat server
- Created complex SQL Queries, PL/SQL Stored procedures and functions for back end.
- Prepared the functional, design and test case specifications.
- Performed unit testing, system testing and integration testing.
- Used JUnit for unit testing of the application
- Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects. Resolved more priority defects as per the schedule
Environment: SDLC, Spring MVC, JSP, Servlets, JavaScript, SQL, HTML, CSS, PL/SQL, Hibernate, Junit.
