We provide IT Staff Augmentation Services!

Hadoop Spark Developer Resume

New York, NY

PROFESSIONAL SUMMARY:

  • Total 8+ Years’ experience in Software Technologies includes JAVA, J2EE, and Include development of Big - Data and Hadoop framework.
  • Almost 5 years of experience in Big Data and Hadoop Ecosystem.
  • Working knowledge in Hadoop Components HDFS, MapReduce,
  • Working knowledge on Hadoop eco-system Components like Kite SDK, PIG, HIVE 1.2.1, HBASE, SQOOP, OOZIE, KAFKA, YARN, SPARK, Apache TEZ, Vectorization and FLUME.
  • Expertise in developing Real-Time Streaming Solutions using Spark Streaming .
  • In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLIib.
  • Good knowledge on Data Warehousing, ETL development, Distributed Computing, and largescale data processing.
  • Good knowledge on INFORMATICA for ETL tool, and stored procedures to pull data from source systems/ files, cleanse, transform and load data into databases.
  • Worked on NoSQL databases including HBase, MongoDB and Cassandra
  • Work experience in formats like AVRO, ORC, PARQUET, and TREVINI.
  • Developed SPARK CODE using SCALA and Spark-SQL/Streaming for faster testing and processing of data.
  • Work Experience in Compression Codecs like SNAPPY, SNAPPY-DEFLATE, ZLIB, GZib2
  • Work Experience on Hortonworks, Cloudera, BDPAAS Pivotal HD.
  • Good exposure on Zookeeper, Cassandra.
  • Experienced in Developing Spark programs using Scala and Java API's.
  • Experience in migrating ETL process into Hadoop, Designing Hive data model and wrote Pig Latin scripts to load data into Hadoop.
  • Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
  • Good experience in working with cloud environment like Amazon Web Services (AWS) EMR, EC2, ES and S3 .
  • Working experience in JDK1.5, Servlet, JSP, JNDI.
  • Working Experience in Hibernate, STRUTS, SPRING, EJB, JSON.
  • Working Experience in HTML, JavaScript, CSS.
  • Good Exposure on EXT JS, TAPESTRY, BIRT C, CPP, Java 1.8,
  • Working Experience in Log4J, RMI, ANT, Maven, Git, Apache Utilities like commons, String Utilities ,
  • Working knowledge on Design Patterns like Singleton, Factory etc.
  • Working knowledge on Architecture like SOA architecture (Web services), Master/Slave Architecture (HDFS), Publish/Subscribe Architecture (JMS).
  • Scheduled various ETL process and Hive scripts by developing OOZIE workflows.
  • Working knowledge on Performance Tuning using Tools like Java Profiler.
  • Working Knowledge on databases language like SQL, PL/SQL, Oracle 10g.
  • Good exposure of DB Triggers, Procedures, My SQL etc.
  • Working Experience in Sonar, Hudson, Review Board, Anthill Pro, Agile Methodology, Workflow Module,
  • Involved in the Board of Client Request Board, Decision Making, and Development lifecycle .
  • Capable of performing under minimal supervision multi-tasking, meeting deadlines as an individual contributor and also a good team leader.

TECHNICAL SKILLS:

BigData: Hadoop 2.4, Apache Spark

Hadoop EcoSystems: MapReduce, YARN, Pig, Hive1.2.1, HBase, SQOOP, FLUME, OOZEE, Zookeeper, KITE SDK, KAFKA, Apache Avro, Apache TEZ, ORC, PARQUET, TREVINI.

Languages: JAVA, J2EE, Python, Scala, C, C++, Tapestry, EXT JS, JavaScript and HTML

Java Frameworks: Hibernate, EJB, Spring, STRUTS

RDBMS: SQL, PL/SQL Oracle, MySQL

Operating Systems: UNIX/Linux/UBUNTU/CentOS, Windows XP

Tools: Bedrock,Eclipse, NetBeans, Maven,Ant, Git, Diyota

Servers: Tomcat, Jetty, Websphere, JBoss

Others: Log4j, RMI,Apache Utilities

PROFESSIONAL EXPERIENCE:

Confidential, New York, NY

Hadoop Spark Developer

Responsibilities:

  • Involved in writing Java Map Reduce.
  • Written the Apache PIG scripts to process the HDFS data.
  • Created HIVE tables to store the processed results in a tabular format
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
  • Strong experience in working with ELASTIC MAPREDUCE(EMR)and setting up environments on Amazon AWS EC2 instances
  • Pulled Excel data into HDFS
  • Have an experience to load and transform large sets of structured, semi structured and unstructured data, using SCOOP from Hadoop Distributed File Systems to Relational Database Systems and also Relational Database Systems to Hadoop Distributed File Systems.
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Experience in using Flume to efficiently collect, aggregate and move large amounts of log data.
  • Developed Spark Scripts by using Scala shell commands as per the requirement
  • Developed hive queries and UDF
  • Developed ETL workflow which pushes webserver logs to an Amazon S3 bucket.
  • Worked with BI teams in generating the reports and designing ETL workflows on Tableau
  • Designed the ETL process and created the High-level design document including the logical data flows, source data extraction process, the database staging, job scheduling and Error Handling
  • Done Schema Validation using MapReduce in Java
  • Writing the script files for processing data and loading to HDFS
  • Created External Hive Table on top of parsed data
  • Moved all log/text files generated by various products into HDFS location
  • Active involvement in Scrum meetings and Followed Agile Methodology for implementation.

Environment: Linux/UNIX, CentOS, Hadoop 2.4.x, OOZIE, HIVE0.13, SQOOP, FLUME, Kafka, Cassandra, Spark Hortonworks2.1.1, AWS, Tableau, AVRO.

Confidential, Jersey City, NJ

Hadoop Developer

Responsibilities:

  • Involved in Java Map Reduce
  • Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
  • Developed Spark API to import data into HDFS from Oracle and created Hive tables.
  • Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
  • Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.
  • Created Common Utilities like HDFS Service, CRON Jobs, Avro Service
  • Created Workflow/Coordinators templates using OOZIE
  • Converted delimited data and XML data to common format (JSON) using java MapReduce.
  • Stored data in compress mechanism like Apache Avro, ORC.
  • Creating external tables using hive and providing to the downstream data.
  • Exporting the results of transaction and sales data to RDBMS after aggregations and computations using SQOOP.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/ Teradata.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Involved in performance tuning of Hive from design, storage and query perspectives.
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
  • Developed Spark scripts to import large files from Amazon S3 buckets.
  • Developed Pig Latin scripts to extract data from the web server output files to load into HDFS
  • Cluster coordination services through Zoo Keeper.
  • Developed Spark core and Spark SQL scripts using Scala for faster data processing.
  • Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
  • Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
  • Integrated Hive and Tableau Desktop reports and published to Tableau Server.
  • Developed shell scripts for running Hive scripts in Hive and Impala.
  • Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

Environment: HDFS, Yarn, MapReduce, Hive, SCOOP, Pig, Flume, OOZIE, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, PL/SQL UNIX Shell Scripting, Cloudera.

Confidential, Overland park, KS

Hadoop Developer

Responsibilities:

  • Involved in writing Java MapReduce.
  • Created Common Utilities like HDFS Service, CRON Jobs, Avro Service
  • Created Workflow/Coordinators templates using OOZIE
  • Converted delimited data and XML data to common format (JSON) using java MapReduce.
  • Stored data in compress mechanism like Apache Avro, ORC.
  • Involved in source system analysis, data analysis, data modeling to ETL
  • Involved in scheduling OOZIE workflow engine to run multiple Hive and pig jobs
  • Worked on streaming log data into HDFS from web servers using Flume.
  • Creating Hive tables and working on them using Hive QL.
  • Developed Pig Latin scripts to extract data from the web server output files to load into HDFS
  • Creating external tables using hive and providing to the downstream data.
  • Handling structured and unstructured data and applying ETL processes.
  • Exporting the results of transaction and sales data to RDBMS after aggregations and computations using SQOOP.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/ Teradata.
  • Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files
  • Analyzed the SQL scripts and designed the solution to implement using Scala.

Environment: Centos, Hadoop 2.4, OOZIE, HIVE1.2.1, SQOOP, ORC, HDP, HBASE, Hortonworks.

Confidential, Longwood, FL

Hadoop Developer

Responsibilities:

  • Involved in writing Java MapReduce.
  • Converted delimited data and XML data to common format (JSON) using java MapReduce. we stored data in compress mechanism like Apache Avro.
  • Involved in creating Hive tables, loaded and analyzed data using Hive queries.
  • Worked extensively with SCOOP for importing metadata from Oracle
  • Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
  • Worked with completely structured data of size in TB.
  • Using SCOOP pulled data from different relational databases to Hive tables and HDFS.
  • Created AVRO schemas for these data.
  • Created Partitions for these data, these helps quick results from large hive tables.
  • Created tables and views for different Customers according to their permissions.
  • Performed partitioning and bucketing of hive tables to store data on Hadoop
  • Involved in loading data from UNIX file system to HDFS.
  • Integrated HBase with Map Reduce to move bulk amount of data into HBase.
  • Creating external tables using hive and providing to the downstream data.
  • Used Zookeeper operational services for coordinating cluster and scheduling workflows.
  • Create ETL transforms and jobs to move data from files to our operational database and from operational database to our data warehouse.
  • Exporting the results of transaction and sales data to RDBMS after aggregations and computations using SCOOP.

Environment: Linux/UNIX, UBUNTU, Hadoop 2.0.3, OOZIE, PIG, HIVE, SQOOP, ZOOKEEPER, HBASE, FLUME.

Confidential, Washington, DC

Hadoop Developer

Responsibilities:

  • Designed, developed and implemented the application as a team member.
  • Implemented Business Logic of the system using Core Java API
  • Involved in Analysis, Design, Coding and Development of custom Interfaces
  • Developed many Java Interfaces to integrate the Web Services with the database transaction tables.
  • Experience in developing web services for production systems using SOAP
  • Used JavaScript for client side validations and hands on experience with data persistency using Hibernate and Struts framework
  • Developed the Servlets and Data Access Layer classes and used JDBC API for interaction with the Oracle Database.
  • Maintained the existing code base developed in the Struts, Hibernate framework by incorporating new features and doing bug fixes
  • Testing the module, fixing the bugs and XML used to transfer the data between different layers.
  • Dealt with java Beans helper classes and servlets for interacting with the UI written in JSP.
  • Dealt with the database operations e.g., Calling stored procedures and stored functions.
  • Worked on database interaction layer for insertions, updating and retrieval operations on data.

Environment: Java, JavaScript, HTML, CSS, JSP, Servlets, Struts, Hibernate, JUnit, Web Services, Eclipse, SQL, PL/SQL, Oracle, TOAD, WebLogic, Windows, Linux

Confidential

Java/J2EE Developer

Responsibilities:

  • Developed application using Struts MVC framework
  • Detailed requirement analysis and interfacing with business users to understand project requirements was done
  • Involved in low-level Design specifications and implementations of various design Patterns.
  • Designed the application using MVC, Factory, Data Access Object, Transfer object, Service Locator and Singleton J2EE Design Patterns.
  • Developed request xml parameters between application and target interface
  • Implementation of new functionality using java, JSP, AJAX, Hibernate and java script
  • Deployed the portal and servlet using the automatic portal support in JavaScript
  • Developed Java Beans, Data transfer objects for accessing data and data management.
  • Involved in creation of Hibernate mapping files and Hibernate Query Language (HQL) as the Persistence Framework.
  • Used Web services (SOAP, UDDI, WSDL) to communicate with the financial analyst in order to get the credit reports from various external sources.
  • Developed application using WebLogic Application Server.
  • Preparation of test cases using Unit and Integration testing used Log4j as the logging tool.

Environment: J2EE 1.4, Java 1.4, JSP 1.2, Workshop, WebLogic, XML, Struts, Oracle 10g, JDBC, Servlet, JNDI, JavaScript, HTML, Hibernate 3.0, Web Services (SOAP, WSDL, UDDI), CVS, JUnit, Log4j

Hire Now