We provide IT Staff Augmentation Services!

Senior Hadoop/spark Developer Resume

3.00/5 (Submit Your Rating)

Seattle, WA

SUMMARY

  • Having 9+ years of IT experience in Design, Development, Maintenance and Support of Big Data Applications and JAVA/J2EE.
  • Hands on experience in Capturing data from existing relational databases (Oracle, MySQL, SQL and Teradata) dat provide SQL interfaces using Sqoop.
  • Hands on experience in Sequence files, RC files, Avro, Parquet, RCFile and JSON Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
  • Skilled in developing Java Map Reduce programs using java API and using hive, pig to perform data analysis, data cleaning and data transformation.
  • Worked with join patterns and implemented Map side joins and Reduce side joins using Map Reduce.
  • Developed multiple MapReduce jobs to perform data cleaning and preprocessing.
  • Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design to load data into Hadoop environment.
  • Expertise in writing Hive UDF, Generic UDF's to in corporate complex business logic into Hive Queries.
  • Extensive experience on importing and exporting data using stream processing platforms like Flume and Kafka.
  • Good noledge on AWS infrastructure services Amazon Simple Storage Service (Amazon S3),EMR, and Amazon Elastic Compute Cloud (Amazon EC2).
  • Implemented Sqoop for large dataset transfer between Hadoop and RDBMS.
  • Expertise in working with Hive data warehouse tool - creating tables, data distribution by implementing partitioning, bucketing, writing and optimizing teh HiveQL queries.
  • Experience in composing shell scripts to dump teh shared information from MySQL servers to HDFS.
  • Experience in data workflow scheduler Zoo-Keeper and Oozie to manage Hadoop jobs by Direct Acyclic Graph(DAG) of actions with teh control flows.
  • Experienced in performance tuning and real time analytics in both relational database and NoSQL database(HBase).
  • Worked on Implementing and optimizing Hadoop/MapReduce algorithms for Big Data analytics.
  • Experience on Mongo DB, Cassandra and various No-sql databases like Hbase, Neon, Redis etc.
  • Exposure to Spark, Spark Streaming, Spark MLlib, Scala and Creating teh Data Frames handled in Spark with Scala.
  • Hands on experience in working on Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save teh results to output directory into HDFS.
  • Experience in setting up teh Hadoop clusters, both in-house and as well as on teh cloud.
  • Profound experience in working with Cloudera (CDH4 &CDH5) and Horton Works Hadoop Distributions and Amazon EMR Hadoop distributors on multi-node cluster.
  • Exposure towards simplifying and automating big data integration with graphical tools and wizards dat generate native code using Talend.
  • Exposure in using build tools like Maven, Sbt.
  • Worked on different file formats (ORCFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO).
  • Good understanding of all aspects of Testing such as Unit, Regression, Agile, White-box, Black-box.
  • Good noledge of Web/Application Servers like Apache Tomcat, IBM WebSphere and Oracle WebLogic.
  • Excellent working noledge on popular frameworks like MVC and Hibernate.
  • Experience as a java Developer in client/server technologies using J2EE Servlets, JSP, JDBC and SQL.
  • Expertise in designing and development enterprise applications for J2EE platform using MVC, JSP, Servlets, JDBC, Web Services, Hibernate and designing Web Applications using HTML5, CSS3, AngularJS, Bootstrap.
  • Adept in Agile/Scrum methodology and familiar with SDLC life cycle from requirement analysis to system study, designing, testing, de-bugging, documentation and implementation.
  • Techno-functional responsibilities include interfacing with users, identifying functional and technical gaps, estimates, designing custom solutions, development, producing, documentation and production support.
  • Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.

TECHNICAL SKILLS

Big Data Ecosystem: HDFS, Mapreduce, Hadoop, Map Reduce, HDFS, Zookeeper, Hive, Yarn Pig, Sqoop, Oozie, Flume, Kafka, Spark.

Operating System: Windows, Linux, Unix.

Database Languages: SQL, PL/SQL, Oracle.

Programming languages: Scala, Java.

Databases: IBM DB2, Oracle, SQL Server, MySQL, RDBMS, Hbase, Cassandra.

Frameworks: Spring, Hibernate, JMS.

IDE: Eclipse, IntelliJ.

Tools: TOAD, SQL Developer, ANT, Log4J.

Web Services: WSDL, SOAP, REST.

ETL Tools: Talend ETL, Talend Studio.

Web/App Server: UNIX server, Apache Tomcat, Websphere, Weblogic.

Methodologies: Agile, Waterfall, UML, Design Patterns.

PROFESSIONAL EXPERIENCE

Senior Hadoop/spark Developer

Confidential, Seattle, WA

Responsibilities:

  • Analyze and define researcher's strategy and determine system architecture and requirement to achieve goals.
  • Developed multiple Kafka Producers and Consumers from as per teh software requirement specifications.
  • Used Kafka for log accumulation like gathering physical log documents off servers and places them in a focal spot like HDFS for handling.
  • Configured Spark streaming to get ongoing information from teh Kafka and store teh stream information to HDFS.
  • Used various spark Transformations and Actions for cleansing teh input data.
  • Developed shell scripts to generate teh hive create statements from teh data and load teh data into teh table.
  • Wrote Map Reduce jobs using Java API and Pig Latin
  • Optimized Hive QL/ pig scripts by using execution engine like Tez, Spark.
  • Involved in writing custom Map-Reduce programs using java API for data processing.
  • Integrated Maven build and designed workflows to automate teh build and deploy process.
  • Involved in developing a linear regression model to predict a continuous measurement for improving teh observation on data developed using spark with Scala API.
  • Worked extensively on spark and MLlib to develop a regression model for logistic information.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
  • Teh hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions and bucketing, intended for efficiency.
  • Load and transform large sets of structured, semi structured data using hive.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in teh form of Data Frame and save teh data as Parquet format in HDFS.
  • Used Spark and Spark-SQL to read teh parquet data and create teh tables in hive using teh Scala API.
  • Develop Hive queries for teh analysts.
  • Automated hourly and daily transaction reports using talend open studio.
  • Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs
  • Cassandra implementation using Datastax Java API.
  • Very good understanding Cassandra cluster mechanism dat includes replication strategies, snitch, gossip, consistent hashing and consistency levels.
  • Experienced in using teh spark application master to monitor teh spark jobs and capture teh logs for teh spark jobs.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Involved in making code changes for a module in work station simulation for processing across teh cluster using spark-submit.
  • Involved in performing teh analytics and visualization for teh data from teh logs and estimate teh error rate and study teh probability of future errors using regressing models.
  • Used WEB HDFS REST API to make teh HTTP GET, PUT, POST and DELETE requests from teh webserver to perform analytics on teh data lake.
  • Worked on a POC to perform sentiment analysis of twitter data using Open NLP API.
  • Worked on high performance computing (HPC) to simulate tools required for teh genomics pipeline.
  • Used Kafka to patch up a customer activity taking after pipeline as a course of action of steady appropriate subscribe supports.
  • Cluster coordination services through Zookeeper

Environment: Hadoop, Hive, HDFS, HPC, WEBHDFS, WEBHCAT, Spark, Spark-SQL, KAFKA, Java, Scala, Web Server's, Maven Build and SBT build, Pig, Hive, Sqoop, Oozie, Shell Scripting, SQL Talend, Spark, HBase, Hotonworks.

Hadoop Developer

Confidential, Charlotte, NC

Responsibilities:

  • Installed and configured Flume, Hive, Pig, Sqoop and Oozie on teh Hadoop cluster.
  • Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored teh data into HDFS for analysis.
  • Installed and configured Hadoop MapReduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and processing.
  • Also used Spark SQL to handle structured data in Hive.
  • Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
  • Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate particular visualizations using Tableau.
  • Analyzed substantial data sets by running Hive queries and Pig scripts.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Involved in transforming data from Mainframe tables to HDFS, and HBase tables using Sqoop
  • Defined teh Accumulo tables and loaded data into tables for near real-time data reports.
  • Created teh Hive external tables using Accumulo connector.
  • Written Hive UDFs to sort Structure fields and return complex data type.
  • Used distinctive data formats (Text format and ORC format) while stacking teh data into HDFS.
  • Worked in AWS environment for development and deployment of custom Hadoop applications.
  • Strong experience in working with ELASTIC MAPREDUCE(EMR)and setting up environments on Amazon AWS EC2 instances.
  • Ability to spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
  • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations to build teh data model and persists teh data in HDFS
  • Imported teh data from different sources like AWS S3, LFS into Spark RDD.
  • Involved in utilizing HCATALOG to get to Hive table metadata from MapReduce or Pig code.
  • Involved in creating Shell scripts to simplify teh execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move teh data inside and outside of HDFS.
  • Creating files and tuned teh SQL queries in Hive utilizing HUE.
  • Experience working with Apache SOLR for indexing and querying.
  • Created custom SOLR Query segments to optimize ideal search matching.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's.
  • Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
  • Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
  • Worked with NoSQL databases like Hbase in making Hbase tables to load expansive arrangements of semi structured data.
  • Worked with Kerberos and integrated it to teh Hadoop cluster to make it more strong and secure from unauthorized access.
  • Acted for bringing in data under HBase using HBase shell also HBase client API.
  • Designed teh ETL process and created teh high level design document including teh logical data flows, source data extraction process, teh database staging, job scheduling and Error Handling
  • Developed and designed ETL Jobs using Talend Integration Suite in Talend 5.2.2
  • Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.

Environment: Hadoop, Cloudera, HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Hbase, Apache Spark, Accumulo, Oozie Scheduler, Kerberos, AWS, Tableau, Java, Talend, HUE, HCATALOG, Flume, Solr, Git, Maven.

Hadoop/Java Developer

Confidential, Saco, ME

Responsibilities:

  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
  • Estimated teh hardware requirements for NameNode and DataNodes & planning teh cluster.
  • Optimizing teh Hive queries using Partitioning and Bucketing techniques, for controlling teh data distribution.
  • Worked with Kafka for teh proof of concept for carrying out log processing on a distributed system. Worked with NoSQL database Hbase to create tables and store data.
  • Responsible for importing log files from various sources into HDFS using Flume.
  • Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats
  • Used SQL queries, Stored Procedures, User Defined Functions (UDF), Database Triggers, using tools like SQL Profiler and Database Tuning Advisor (DTA)
  • Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
  • Performed joins, group by and other operations in MapReduce by using Java and PIG.
  • Created customized BI tool for manager team dat perform Query analytics using Hive QL.
  • Created Hive Generic UDF's, UDAF's, UDTF's in Java to process business logic dat varies based on policy.
  • Involved in writing Pig Scripts for Cleansing teh data and implemented Hive tables for teh processed data in tabular format
  • Used to leverage teh robust image-processing libraries written in C and C++.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Managed Hadoop clusters include adding and removing cluster nodes for maintenance and capacity needs.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Work with business stakeholders, application developers, and DBA's and production
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Assisted in managing and reviewing Hadoop log files.
  • Assisted in loading large sets of data (Structure, Semi Structured, and Unstructured)

Environment: Hadoop, HDFS, HBase, MapReduce, Java, JDK 1.5, J2EE 1.4, Struts 1.3, Hive, Pig, Sqoop, Flume, Kafka, Oozie, Hue, Hortonworks, Java, Storm, Zookeeper, Informatica, AVRO Files, SQL, ETL, Cloudera Manager, MySQL, MongoDB.

Java/ETL Developer

Confidential, New York, NY

Responsibilities:

  • Prepare Functional Requirement Specification and done coding, bug fixing and support.
  • Involved in various phases of Software Development Life Cycle (SDLC) as requirement gathering, data modeling, analysis, architecture design & development for teh project.
  • Designed teh front-end applications, user interactive (UI) web pages using web technologies like HTML, XHTML, and CSS.
  • Implemented GUI pages by using JSP, JSTL, HTML, XHTML, CSS, JavaScript, AJAX
  • Involved in creation of a queue manager in WebSphere MQ along with teh necessary WebSphere MQ objects required for use with WebSphere Data Interchange.
  • Developed SOAP based Web Services for Integrating with teh Enterprise Information System Tier.
  • Use ANT scripts to automate application build and deployment processes.
  • Involved in design, development and Modification of PL/SQL stored procedures, functions, packages and triggers to implement business rules into teh application.
  • Used Struts MVC architecture and SOA to structure teh project module logic.
  • Developed ETL processes to load data from Flat files, SQL Server and Access into teh target Oracle database by applying business logic on transformation mapping for inserting and updating records when loaded.
  • Have good Informatica ETL development experience in an offshore and onsite model and involved in ETL Code reviews and testing ETL processes.
  • Developed mappings in Informatica to load teh data including facts and dimensions from various sources into teh Data Warehouse, using different transformations like Source Qualifier, JAVA, Expression, Lookup, Aggregate, Update Strategy and Joiner.
  • Scheduling teh sessions to extract, transform and load data in to warehouse database on Business requirements.
  • Struts MVC framework for developing J2EE based web application.
  • Extensively used Java multi-threading to implement batch Jobs with JDK 1.5 features.
  • Designed an entire messaging interface and Message Topics using WebLogic JMS.
  • Implemented teh online application using Core Java, JDBC, JSP, Servlets, Spring, Hibernate, Web Services, SOAP, and WSDL.
  • Migrated datasource passwords to encrypted passwords using Vault tool in all teh JBoss application servers.
  • Used Spring Framework for Dependency injection and integrated with teh Hibernate framework.
  • Developed Session Beans which encapsulates teh workflow logic.
  • Used JMS (Java Messaging Service) for asynchronous communication between different modules.
  • Developed web components using JSP, Servlets and JDBC.

Environment: Java, J2EE, JDBC, Servlets, HTML, XHTML, CSS, JavaScript, Ajax, Javascript, MVC, Informatica, ETL, PL/SQL, Struts 1.1, Spring, JSP, JMS, JBoss 4.0, SQL Server 2000, Ant, CVS, PL/SQL, Hibernate, Eclipse, Linux

Java/J2EE Developer

Confidential, Augusta, ME

Responsibilities:

  • Developed teh J2EE application based on teh Service Oriented Architecture by employing
  • SOAP and other tools for data exchanges and updates.
  • Developed teh functionalities using Agile Methodology.
  • Used Apache Maven for project management and building teh application.
  • Worked in all teh modules of teh application which involved front-end presentation logic developed using Spring MVC, JSP, JSTL and JavaScript, Business objects developed using POJOs and data access layer using Hibernate framework.
  • Used JAX-RS (REST) for producing web services and involved in writing programs to
  • Consume teh web services with Apache CXF framework.
  • Used Restful API and SOAP web services for internal and external consumption.
  • Used Spring ORM module for integration with Hibernate for persistence layer.
  • Involved in writing Hibernate Query Language (HQL) for persistence layer.
  • Used Spring MVC, Spring AOP, Spring IOC, Spring Transaction and Oracle to create Club Systems Component.
  • Wrote backend jobs based on Core Java & Oracle Data Base to be run daily/weekly.
  • Coding teh core modules of teh application compliant with teh Java/J2EE coding standards and Design Patterns.
  • Written Java Script, HTML, CSS, Servlets, and JSP for designing GUI of teh application.
  • Worked on Service-side and Middle-tier technologies, extracting catching strategies/solutions.
  • Design data access layer using Data Access Layer J2EE patterns, Implementing teh MVC architecture Struts Framework for handling databases across multiple locations and display information in presentation layer.
  • Used XPath for parsing teh XML elements as part of business logic processing.

Environment: Java, JSP, HTML, CSS, Ubuntu Operating System, JavaScript, AJAX, Servlets, Struts, Hibernate, EJB (Session Beans), Log4J, WebSphere, JNDI, Oracle, Windows XP, LINUX, ANT, Eclipse.

We'd love your feedback!