Spark/hadoop Developer Resume
FloridA
SUMMARY:
- Around 6 years of IT experience in Architecture, Analysis, design, development, implementation, maintenance and support with experience in developing strategic methods for deploying big data technologies to efficiently solve Big Data processing requirement.
- Proficient knowledge on handling big data using Hadoop architecture, HDFS, Map Reduce, YARN application, HBase and ecosystems like Hive, Pig, Flume, Oozie & Sqoop.
- Hands on Experience in Cloud Computing such as: Azure storage, Azure Logic apps, Service Bus, Compute, Data bases - SQL, Document DB(Cosmos) Data lake store & analytics, Data factory, HDInsight, Stream analytics, Cloudera-CDH, AWS.
- Experience in writing custom UDFs in java for Hive and Pig to extend the functionality.
- Experience in writing MAPREDUCE programs in java for data cleansing and preprocessing.
- Excellent understanding /knowledge on Hadoop (Gen-1 and Gen-2) and various components such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager (YARN).
- Experience in managing and reviewing Hadoop log files.
- Experience in working with Flume to load the log data from multiple sources directly into HDFS.
- Hands on experience in Capturing data from existing relational databases (Oracle, MySQL, SQL and Teradata) that provide SQL interfaces using Sqoop.
- Worked on different file formats (ORCFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO).
- Exposure to Databricks Connectors , Spark, Spark Streaming, Spark MLlib, Scala and Creating the Data Frames handled in Spark with Scala.
- Hands on experience in working on Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
- Good knowledge on AWS infrastructure services Amazon Simple Storage Service (Amazon S3), EMR, and Amazon Elastic Compute Cloud (Amazon EC2).
- Expertise in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning, bucketing, writing and optimizing the HiveQL queries.
- Experience in composing shell scripts to dump the shared information from MySQL servers to HDFS.
- Experience in setting up the Hadoop clusters, both in-house and as well as on the cloud.
- Profound experience in working with Cloudera (CDH4 &CDH5) and Horton Works Hadoop Distributions and Amazon EMR Hadoop distributors on multi-node cluster.
- Exposure in using build tools like Maven, Sbt.
- Excellent working knowledge on popular frameworks like MVC and Hibernate.
- Worked extensively with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
- Built real-time Big Data solutions using HBASE handling billions of records.
- Experience in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Experience working with JAVA, J2EE, JDBC, ODBC, JSP, Java Eclipse, Java Beans, EJB, Servlets, MS SQL Server.
- Experience in all stages of SDLC (Agile, Waterfall), writing Technical Design document, Development, Testing and Implementation of Enterprise level Data mart and Data warehouses.
- Extensive experience working in Oracle, DB2, SQL Server and My SQL database.
- Experience in J2EE technologies like Struts, JSP/Servlets, and spring.
- Good Exposure on scripting languages like JavaScript, Angular JS, jQuery and xml.
TECHNICAL SKILLS:
Technology: Hadoop Ecosystem/J2SE/J2EE /Database
Operating Systems: Windows Vista/XP/NT/2000/LINUX (Ubuntu, Cent OS), UNIX
DBMS/Databases: DB2, My SQL, PL/SQL
Programming Languages: C, C++, Core Java, XML, JSP/Servlets, Struts, Spring, HTML, JavaScript, jQuery, Web services, Xml.
Big Data Ecosystem: Azure HDInsight, AZURE Data lake Store, AZURE Data factory, Azure Event Hub, Databricks, HDFS, Map Reducing, Oozie, Hive, Pig, Sqoop, Spark, Flume, Zookeeper, Kafka and H base.
Methodologies: Agile, Water Fall
NOSQL Databases: Hbase
Version Control Tools: SVN, CVS, VSS, PVCS
ETL Tools: IBM data stage 8.1, Informatica.
PROFESSIONAL EXPERIENCE:
Spark/Hadoop Developer
Confidential, Florida
Responsibilities:
- Currently working as Hadoop developer for building Hadoop based analytics platform for measuring online TV, video and other digital content across the web and apps.
- Worked on Data lake store, Data lake analytics and on creating Data factory pipelines.
- Designed and developed standalone data migration applications to retrieve and populate data from Azure Table / BLOB storage to Python, HDInsight and Power BI.
- Involved in HDInsight cluster in Azure was part of the deployment and installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase and MySQL DB.
- Validating the fact table data which is migrated on daily load basis.
- Worked on spark accumulators to analyze the data recursions which we written on the data transformations.
- Created RDD's Extracting the data from Azure Blob Storage (Blobs, Files, Tables and Queues) and making transformations & actions.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Extracting the data from Azure Data Lake into HDInsight Cluster (INTELLIGENCE + ANALYTICS) and applying spark transformations & Actions and loading into HDFS.
- Implemented Azure Data Factory pipelines, datasets, copy and transform data in bulk via Data Factory UI and PowerShell, scheduling and exporting data.
- Exported data to Azure Data Lake Stores and stored them in their native formats using different sources, Relational and Semi-structured data using Azure Data Factory.
- Worked with a few linked services using ADF and Azure HDInsight to connect Event Hubs.
- Developed U-SQL Scripts for schematizing the data in Azure Data Lake Analytics.
- Configured and deployed Azure Automation Scripts for a multitude of applications utilizing the Azure stack (Including Compute, Web Mobile, Blobs, ADF, Resource Groups, Azure Data Lake, HDInsight Clusters, Azure Data Factory, Azure SQL, Cloud Services, and ARM), Services and Utilities focusing on Automation.
- Responsible for building scalable distributed data solutions using Hadoop Ecosystem.
- Responsible for writing MapReduce jobs to handle files in multiple formats (JSON, Text, XML etc.)
- Worked extensively on creating combiners, Partitioning, Distributed cache to improve the performance of MapReduce jobs.
- Involved in loading and transforming large Datasets from relational databases into HDFS and vice-versa using Sqoop imports and export.
- Implemented advanced procedures like text analytics and processing using regular expressions and window functions in Hive.
- Solved performance issues in Hive scripts with an understanding of Execution Plan, Joins, Group and Aggregation and how does it translate to MapReduce jobs.
- Solved performance issues in Impala scripts with an understanding of Execution plan, Joins, Group and Aggregation.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, Impala and loaded final data into HDFS.
- Used Hive QL to analyze the partitioned and bucketed data, Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
- Setting up Sentry role based authorization for HDFS, HIVE and worked on Kerberos authentication principals to establish secure network communication on cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for new users
- Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.
- Developed Java APIs for invocation in Pig Scripts to solve complex problems.
- Developed Shell scripts to automate and provide Control flow to Pig scripts.
- Responsible for managing data coming from different sources.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, JSON, CSV, PARQUET formats.
- Used Jira for bug tracking and SVN to check-in and checkout code changes.
- Worked with SCRUM team in delivering agreed user stories on time for every Sprint.
Environment: HDP, Azure HDInsight, AZURE Data lake Store, AZURE Data factory, Azure Event Hub, Databricks, PUTTY, SQOOP, Kafka, HDFS, Spark, Scala, Hive, HBASE, Eclipse, Teradata, FILE ZILLA, Oozie, Java(JDK1.6), UNIX Shell Scripting, Oracle 11g/12g.
Hadoop Developer
Confidential, Texas
Responsibilities:
- Involved and configured with Hadoop environment through Amazon Web Services in cloud.
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage.
- Ability to spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
- Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations to build the data model and persists the data in HDFS.
- Involved in pulling the data from AmazonS3 data lake and built Hive tables using Hive Context in Spark.
- Used Amazon Dynamodb to gather and track the event based metrics.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest data into HDFS for analysis
- Developed MapReduce jobs in java for data cleaning and preprocessing.
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Involved in installing Hadoop Ecosystem components on 50 nodes production.
- Installed and configured Hadoop security and access controls using Kerberos, Active Directory.
- Troubleshooting and monitoring Hadoop services using Cloudera manager.
- Monitoring and tuning Map Reduce Programs running on the cluster.
- Wrote Hive queries for data analysis to meet the business requirements.
- Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MapReduce, HIVE, SQOOP and Pig Latin.
- Extensive hands on experience in Hadoop file system commands for file handling operations.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
- Worked with parsing XML files using Map reduce to extract sales related attributed and store it in HDFS.
- Used Bash Shell Scripting, Sqoop, AVRO, Hive, Pig, Java, Map/Reduce daily to develop ETL, batch processing, and data storage functionality.
- Used Pig to do data transformations, event joins and some pre-aggregations before storing the data on the HDFS.
- Used Spark SQL to process the huge amount of structured data and Implemented Spark RDD transformations, actions to migrate Map reduce algorithms.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Used D-Streams, Accumulator, Broadcast variables, RDD caching for Spark Streaming.
- Worked on converting PL/SQL code into Scala code and also converted PL/SQL queries into HQL queries.
- Worked on designed, coded and configured server side J2EE components like JSP, AWS and JAVA.
- Worked on Designing and Developing ETL Workflows using Java for processing data in HDFS/Hbase using Oozie.
- Experienced in managing and reviewing Hadoop log files.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Cluster coordination services through Zoo Keeper.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Created several Hive tables, loaded with data and wrote Hive Queries in order to run internally in MapReduce.
- Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
- Performed data profiling of source systems and defined the data rules for the extraction and integration of the same.
- Followed Agile methodology, interacted directly with the client provided & receive feedback on the features, suggest/implement optimal solutions, and tailor application to customer needs.
- Maintained program documentation, operational procedures and user guidelines as per client requirements.
Environment: Apache Hadoop, AWS, Map Reduce, HDFS, Hive, Java, SQL, PIG, Zookeeper, Java (jdk1.6), Flat files, Oracle 11g/10g, MySQL, Windows NT, UNIX, Sqoop, Spark, Hive, Oozie, HBase.
Java Developer
Confidential
Responsibilities:
- Worked with sprint planning, sprint demo, status and daily standup meeting.
- Prepare Functional Requirement Specification and done coding, bug fixing and support.
- Involved in various phases of Software Development Life Cycle (SDLC) as requirement gathering, data modeling, analysis, architecture design & development for the project.
- Designed the front-end applications, user interactive (UI) web pages using web technologies like HTML, XHTML, and CSS.
- Used ANT scripts to automate application build and deployment processes.
- Involved in design, development and Modification of PL/SQL stored procedures, functions, packages and triggers to implement business rules into the application.
- Used Struts MVC architecture and SOA to structure the project module logic.
- Developed ETL processes to load data from Flat files, SQL Server and Access into the target Oracle database by applying business logic on transformation mapping for inserting and updating records when loaded.
- Scheduling the sessions to extract, transform and load data in to warehouse database on Business requirements.
- Implemented Struts MVC framework for developing J2EE based web application.
- Extensively used Java multi-threading to implement batch Jobs with JDK 1.5 features.
- Designed an entire messaging interface and Message Topics using WebLogic JMS.
- Implemented the online application using Core Java, JDBC, JSP, Servlets, Spring, Hibernate, Web Services, SOAP, and WSDL.
- Used Spring Framework for Dependency injection and integrated with the Hibernate framework.
- Used JMS (Java Messaging Service) for asynchronous communication between different modules.
- Developed web components using JSP, Servlets and JDBC.
- Extensively used spring’s features such as Dependency Injection/Inversion of Control to allow loose coupling between business classes (POJOs).
- As part of the team to develop and maintain an advanced search engine, would be able to attain expertise on a variety of new software technologies.
- Interacted with project management to understand, learn and to perform analysis of the Search Techniques.
Environment: J2EE, JDBC, Java 1.4, Servlets, JSP, Struts, Hibernate, Web services, SOAP, WSDL, Design Patterns, MVC, HTML, JavaScript 1.2, WebLogic 8.0, XML, JUnit, Oracle 10g, My Eclipse.
Java Developer
Confidential
Responsibilities:
- Used AGILE methodology for developing the application.
- As part of the lifecycle development prepared class model, sequence model and flow diagrams by analyzing Use cases using Rational Tools.
- Extensive use of SOA Framework for Controller components and view components.
- Involved in writing the exception and validation classes using Struts validation rules.
- Involved in writing the validation rules classes for general server side validations for implementing validation rules as part observer J2EE design pattern.
- Used OR mapping tool Hibernate for the interaction with database. Involved in writing Hibernate queries and Hibernate specific configuration and mapping files.
- Developed EJB tier using Session Facade, Singleton and DAO design patterns, which contains business logic, and database access functions.
- Developed web services using SOAP and WSDL with Apache Axis 2.
- Developed, implemented, and maintained an asynchronous, AJAX based rich client for improved customer experience using XML data and XSLT templates.
- Developed SQL stored procedures and prepared statements for updating and accessing data from database.
- Used JBoss for deploying various components of application.
- Used JUNIT for testing and check API performance. Involved in fixing bugs and minor enhancements for the front-end modules. Responsible for troubleshooting issues, monitoring and guiding team members to deploy and support the product.
- Used SVN Version Control for Project Configuration Management.
- Worked with the Android SDK and implemented Android Bluetooth and Location Connectivity components.
- Deploying applications in AppServers for DEVL, ALPHA and Beta integration environments.
- Worked with business and System Analyst to complete the development in time.
- Implemented the presentation layer with HTML, CSS and JavaScript.
- Developed web components using JSP, Servlets and JDBC.
- Implemented secured cookies using Servlets.
- Designed and developed Loans reports for Evans bank using Jasper and iReport.
- Resolved issues on outages for Loans reports.
- Maintained Jasper server on client server and resolved issues.
- Actively involved in system testing.
- Fine tuning SQL queries for maximum efficiency to improve the performance
- Designed Tables and indexes by following normalizations.
- Involved in Unit testing, Integration testing and User Acceptance testing.
Environment: Java, Servlets, JSP, Hibernate, Junit Testing, Oracle DB, SQL, Jasper Reports, iReport, Maven, Jenkins.
