Hadoop/spark Developer Resume
Broadway, NY
SUMMARY
- Talented and accomplished Software Engineer wif 7+ years of IT experience in developing applications using BigData, AWS, Java,SQL and Spark.
- 3+ years of experience wif Big Data tools like MapReduce, YARN, HDFS, Hbase, Impala,Hive, Pig, Oozie,AWS,, ApacheSpark for ingestion, storage, querying, processing and analysis of data.
- Performance tuning in Hive&Impala using multiple methods limited to dynamic partitioning, bucketing, indexing, files compressions.
- Hands on experience wifdata ingestion tools Kafka, Flume and workflow management tools Oozie and Zena.
- Hands on experience handling different file formats like JSON, AVRO, ORC, Parquet and compression techniques like snappy, zlib and lzo.
- Hands on experience in Hadoop Ecosystem components such as Hadoop, Spark, HDFS, YARN, TEZ, Hive, Sqoop, Flume, MapReduce, SCALA, Pig, OOZIE, Kafka, NIFI, Storm, HBASE.
- Experience on analyzing data in NOSQL databases like Hbase and Cassandraand its Integration wifHadoopcluster.
- Hands on experience wif Spark Core, Spark SQL and Data Frames/Data Sets/RDD API.
- Experience in using Kafka and Kafka brokers to initiate spark context and processing live streaming information wif teh halp of RDD.
- Developed Java applications using various IDE's like Spring Tool Suite and Eclipse.
- Good noledge in using Hibernate for mapping Java classes wif database and using Hibernate Query Language (HQL).
- Operated on Java/J2EE systems wif different databases, which include Oracle, MySQL and DB2.
- Knowledge on implementing Big Data in Amazon Elastic MapReduce (Amazon EMR) for processing, managing Hadoop framework dynamically scalable Amazon EC2 instances.
- Capable of processing large sets of structured, semi - structured and unstructureddata and supporting systems application architecture.
- Extensive development experience in sparkapplications for datatransformations and loading into HDFS using RDD, DataFrames and Datasets.
- Extensive noledge on performance tuning of Spark applications and converting Hive/SQL queries into Sparktransformations.
- Hands-on experience wif AWS (AmazonWebServices), using ElasticMapReduce (EMR), creating and storing data in S3buckets and creating ElasticLoadBalancers(ELB) for Hadoop front end WebUI’s.
- Extensive noledge on creating Hadoop cluster on multiple EC2 instances in AWS and configuring them through ambari and using IAM (Identity and AccessManagement) for creating groups, users and assigning permissions.
- Extensive programming experience in JavaCore concepts like OOPS, Multithreading, Collections and IO.
- Experience using Jira for ticketing issues and Jenkins for continuous integration.
- Extensive experience wif UNIX commands, shellscripting and setting up CRON jobs.
- Experience in software configuration management using Git.
- Good experience in using Relational databases Oracle&MySQL.
- Able to assess businessrules, collaborate wif stakeholders and perform source-to-target datamapping, design.
- Successfully working in fast-paced environment, both independently and in collaborative team environments.
TECHNICAL SKILLS
Hadoop/Big Data Technologies: Hadoop 3.0, HDFS, MapReduce, HBase 1.4, Apache Pig 0.17, Hive 2.3, Sqoop 1.4, Apache Impala 3.0, Oozie 4.3, Yarn, Apache Flume 1.8, Kafka 1.1, Zookeeper 3.4
Hadoop Distributions: Cloudera, Hortonworks, MapR
Cloud: AWS, Azure, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake and Data Factory.
Databases: Microsoft SQL Server, MySQL, Oracle, NoSQL and Hbase.
Scripting Languages: JavaScript, HTML & Bash.
Tools: Eclipse, IntelliJ IDEA, Maven and SBT.
Platforms: Windows, Linux, and Centos.
Programming Languages: Java, C/C++ and Scala.
Currently Exploring: Apache Kylo, Nifi, Flink and Alluxio.
PROFESSIONAL EXPERIENCE
Confidential - Broadway, NY
Hadoop/Spark Developer
Responsibilities:
- Actively involved in designing Hadoop ecosystem pipeline.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Performed SQL Joins among Hive tables to get input for Spark batch process.
- Worked wif data science team to build statistical model wif Spark MLLIB and Pyspark.
- Involved in performing importing data from various sources to teh Cassandracluster using Sqoop.
- Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze teh logs produced by teh spark cluster.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Developed Oozie workflow for scheduling & orchestrating teh ETL process.
- Created Data Pipelines as per teh business requirements and scheduled it using Oozie Coordinators.
- Worked extensively on Apache Nifi to build Nifi flows for teh existing Oozie jobs to get teh incremental load, full load and semi structured data and to get data from rest API into Hadoop and automate all teh Nifi flows runs incrementally.
- Created Nifi flows to triggerspark jobs and used put email processors to get notifications if there are any failures.
- Developed shell scripts to periodically perform incremental import of data from third party API to AmazonAWS
- Worked extensively wif importing metadata into Hive using Scala and migrated existing tables and applications to work on Hive and AWS cloud.
- Worked on creating data models for Cassandra from Existing Oracle data model.
- Designed Column families in Cassandra and Ingested data from RDBMS, performed data transformations, and tan export teh transformed data to Cassandra as per teh business requirement.
- Used Sqoop to import functionality for loading Historical data present in RDBMS to HDFS
- Designed workflows and coordinators in Oozie to automate and parallelize Hive jobs on Apache Hadoop environment by Hortonworks (HDP 2.2)
- Configured Hive bolts and written data to hive in Hortonworks as a part of POC.
- Developed teh batch scripts to fetch teh data from AWS S3 storage and do required transformations in Scala using Spark framework.
- Administrator for Pig, Hive and Hbase installing updates, patches and upgrades.
- Responsible for importing real time data to pull teh data from sources to Kafka clusters.
- Worked wif spark techniques like refreshing teh table and handling parallelly and modifying teh spark defaults for performance tuning.
- Experience in working wif NO-SQL databases such as HBASE.
- Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
- Involved in migrating MapReduce jobs into Spark jobs and used SparkSQL and DataframesAPI to load structured data into Spark clusters.
- Involved in using SparkAPI over Hadoop YARN as execution engine for data analytics using Hive and submitted teh data to BI team for generating reports, after teh processing and analyzing of data in Spark SQL.
- Used version control tools like Github to share teh code snippet among teh team members.
- Involved in daily Scrummeetings to discuss teh development/progress and was active in making scrum meetings more productive.
Environment: Hadoop 3.0, Scala 2.12, Spark, SQL, Hbase, Hive 2.3, Pyspark, Cassandra 3.11, Oozie, Apache Nifi, AWS, Oracle 12c, RDBMS, HDFS, Oozie 4.3, Hortonworks
Confidential - Bellevue, WA
Hadoop Developer (AWS wif Spark)
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Used Spark-Streaming APIs to perform necessary transformations and actions on teh fly for building teh common learner data model which gets teh data from Kafka in near real time and Persists into Cassandra.
- Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
- Worked on loading data into Spark RDD's, perform advanced procedures like text analytics using in-memory data computation capabilities of Spark to generate teh Output response.
- Developed teh statistics graph using JSP, Custom tag libraries, Applets and Swing in a multi-threaded architecture
- Created HBase tables to store various data formats coming from different applications.
- Executed many performance tests using teh Cassandra-stress tool to measure and improve teh read and write performance of teh cluster.
- Handled large datasets using Partitions, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Used Kafka Streams to Configure Spark Streaming to get information and tan store it in HDFS.
- Migrated an existing on-premises application to AWS. Used AWSservices like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining teh Hadoop cluster on AWS EMR.
- Performed teh migration of Hive and MapReduce Jobs from on-premise MapR to AWS cloud using EMR.
- Partitioned data streams using Kafka, designed and Used Kafka producer API's to produce messages.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Performed tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Ingested data from RDBMS to Hive to perform data transformations, and tan export teh transformed data to Cassandra for data access and analysis.
- Experienced in Core Java, Collection Framework, JSP, Dependency Injection, Spring MVC, RESTful Web services.
- Implemented Spark Scripts using Scala, Spark SQL to access hivetables into Spark for faster processing of data.
- Used Spark for interactive queries, processing of streaming data and integration wif popular NoSQLdatabase for huge volume of data.
- Extracted teh data from Teradata into HDFS/Dashboards using Spark Streaming.
- Implemented Informatica Procedures and Standards while developing and testing teh Informatica objects.
Environment: Hadoop 3.0, Spark 2.1, Hbase, Cassandra 1.1, Kafka 0.9s, JSP, HDFS, AWS, EC2, Hive 1.9, MapReduce, MapR, Java, MVC, Scala, NoSQL
Confidential - Austin, TX
Java/Hadoop Developer
Responsibilities:
- Implemented J2EEDesignPatterns like DAO, Singleton, and Factory.
- Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
- Used Spring/MVC framework to enable teh interactions between JSP/View layer and implemented different design patterns wif J2EE and XML technology.
- Implemented application using MVC architecture integrating Hibernate and spring frameworks.
- Utilized various JavaScript and JQuery libraries Bootstrap, Ajax for form validation and other interactive features.
- Extensively worked on Hadoop eco-systems including Hive, Spark Streaming wif MapRdistribution.
- Upgraded teh Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive wif existing applications.
- Worked on storing data in HDFS either directly or through Hbase.
- Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
- Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
- Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
- Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
- Worked on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting.
- Exploring wif Spark to improve teh performance and optimization of teh existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
- Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded teh processed data into target database.
- Involved in PL/SQL query optimization to reduce teh overall run time of stored procedures.
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
- Implemented teh J2EE design patterns Data Access Object (DAO), Session Façade and Business Delegate.
- Developed Nififlows dealing wif various kinds of data formats such as XML, JSON and Avro.
- Implemented MapReduce jobs in HIVE by querying teh available data.
- Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Used Cloudera Manager for installation and management of Hadoop Cluster.
- Collaborated wif business users/product owners/developers to contribute to teh analysis of functional requirements.
- Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
- Integrated Kafka-Sparkstreaming for high efficiency throughput and reliability
- Worked in tuning Hive&Pig to improve performance and solved performance issues in both scripts.
Environment: Hadoop 3.0, Hive 2.1, J2EE, Hbase, JDBC, Pig 0.16, HBase 1.1, Sqoop, NoSQL, Impala, Java, Spring, MVC, XML, Spark 1.9, PL/SQL, HDFS, JSON, Hibernate, Bootstrap, JQuery, JavaScript, Ajax
Confidential - Chicago, IL
Java Developer
Responsibilities:
- As a Java Developer involved in back-end and front-end developing team.
- Involved in teh Software Development Life Cycle (SDLC) including Analysis, Design, Implementation
- Responsible for use case diagrams, class diagrams and sequence diagrams using Rational Rose in teh Design phase.
- Developed ANT scripts that checkout code from SVN repository, build EAR files.
- Used XML Web Services using SOAP to transfer information to teh supply chain and domain expertise Monitoring Systems.
- Use Eclipse and Tomcat web server for developing & deploying teh applications.
- Developed REST Web Services clients to consume those Web Services as well other enterprise wide Web Services.
- Used JavaScript and AJAXtechnologies for front end user input validations and Spring validation framework for backend validation for teh User Interface.
- Used both annotation based configuration and XML based.
- Developed application service components and configured beans using (applicationContext.xml) Spring IOC.
- Implemented persistence mechanism using Hibernate (ORM Mapping).
- Developed teh DAO layer for teh application using Spring Hibernate Template support.
- Used WebLogic workshop, Eclipse IDE to develop teh application.
- Performed teh code build and deployment using Maven.
- Implementation of Spring Restful web services which produces JSON.
- Responsible for maintaining teh code quality, coding and implementation standards by code reviews.
- Developed teh front end of teh application using HTML, CSS, JSP and JavaScript.
- Created RESTFULL APIs using Spring MVC.
- Used SVN version controller to maintain teh code versions.
- Worked on web applications using open source MVCframeworks.
- Developed Web interface using JSP, Standard Tag Libraries (JSTL), and SpringFramework.
- Implemented logger for debugging and testing purposes using Log4j.
Environment: JSON, HTML 4, CSS, XML, Hibernate 3.6, Eclipse, Maven, JUnit, JDBC, ANT, SOAP, Log4j
Confidential - Rochester, NY
Java Developer
Responsibilities:
- Individually worked on all teh stages of a SoftwareDevelopmentLifeCycle (SDLC).
- Responsible for design and implementation of various modules of teh application using Struts-Spring-Hibernate architecture.
- Created user-friendly GUI interface and Web pages using HTML, CSS and JSP.
- Developed web components using MVC pattern under Struts framework.
- Wrote JSPs, Servlets and deployed them on Weblogic Application server.
- Used JSP's, HTML on front end, Servlets as Front Controllers and JavaScript for client side validations.
- Wrote teh Hibernate-mapping XML files to define java classes-database tables mapping.
- Developed teh UI using JSP, HTML, CSS and AJAX and learned how to implement JQuery, JSP and client &server validations using JavaScript.
- Implemented MVC architecture by using spring to send and receive teh data from front-end to business layer.
- Designed, developed and maintained teh data layer using JDBC and performed configuration of JavaApplication Framework.
- Extensively used Hibernate in data access layer to access and update information in teh database.
- Migrated teh Servlets to teh Spring Controllers and developed Spring Interceptors, worked on JSPs, JSTL, and JSP Custom Tags.
- Used Jenkins for continuous integration purpose in using SVN, JUnit and Mockito as version control and Unit testing by Creating design documents and test cases for development work.
- Worked on Eclipse IDE for front end development environment for insertions, updating and retrieval operations of data from oracle database by writing stored procedures.
- Responsible for writing Struts action classes, Hibernate POJO classes and integrating Struts and Hibernate wif spring for processing business needs.
- Developed teh application using Servlets and JSP for teh presentation layer along wif JavaScript for teh client side validations.
- Wrote Hibernate classes, DAO's to retrieve & store data, configured Hibernate files.
- Used Web Logic for application deployment and Log4J used for Logging/debugging.
- Used CVSversion controlling tool and project build tool using ANT.
- Used various Core Java concepts such as multi-threading, Exception Handling, Collection APIs to implement various features and enhancements.
- Wrote and debugged teh MavenScripts for building teh entire web application.
- Designed and developed Ajax calls to populate screens parts on demand.
Environment: Struts, HTML, CSS, JSP, MVC, Hibernate, JSP, AJAX, JQuery, Java, Jenkins, ANT, Maven
