We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

2.00/5 (Submit Your Rating)

Raleigh, NC

SUMMARY

  • Over 6+ years’ experience in IT industry as Hadoop Developer including4+ years’ experience on Hadoop Ecosystem i.e.,MapReduce, Hive, Impala, Flume, Sqoop, Oozie, Kafka, Zookeeper and Spark Systems, Scala, Storm and Python Scripting language. 2+ years’ experience in Java including JSP, Junit, Ajax, Struts, Spring, Hibernates, Servlets, Web Services and hands - on experience on Web Technologies.
  • Strong knowledge in HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive
  • Experience with the Scala, Sparkimproving the performance and optimization of the existing algorithms in Hadoop usingSpark Context, Spark-SQL, Pair RDD's,Spark YARN.
  • Experience in OLTP and OLAP design, development, testing, implementation and support of enterprise Datawarehouses.
  • Experienced in facilitating streaming data using Kafka and Storm.
  • Extensive experience in business data science project life cycle including Data Acquisition, Data Cleaning, Data Manipulation, Data Validation, Data Mining, Machine Learning Algorithms, and Visualization.
  • Experience onProductionizing Apache Nifi. for dataflows with significant processing requirements and controlling security of data flow.
  • Experience on Amazon Web Servies(AWS) with variety of services i.e, S3, EMR, Elastic Search(SOLR), EC2.
  • Designed and developed RDD Seeds usingScalaand Cascading.Streaming data to Spark streaming using Kafka
  • Experienced in building highly scalable Big-data solutions using Hadoop distributed platforms i.e., Cloudera.
  • Exposure to Hadoop Distributed Platforms i.e., Hortonworks and MapR.
  • Good understanding of NoSQL databases and hands on work experience in writing applications No SQL Databases HBase, Cassandra and MongoDB.
  • Experienced in installation, configuration, supporting and managing Hadoop Clusters using ApacheCloudera distributions, Hortonworks, Cloud Storage and Amazon web services (AWS).
  • Experience in deploying NiFi Data flow in Production team and Integrating data from multiple sources like Cassandra, MongoDB.
  • Deploying templates to environments can be done via NiFiRestAPI integrated with other automation tools
  • Experienced in Python programming, wrote WebCrawlers using Python.
  • Experienced in developing MapReduce jobs using Scala in Spark-Shell.
  • Good experienced on moving the data in and out of Hadoop RDBMS, No-SQL and UNIX from various systems using SQOOP and other traditional data movement technologies.
  • Experience on Integration of Quartz scheduler with Oozie work flows to get data from multiple data sources in parallel using fork.
  • Good knowledge on tuning the Spark jobs by changing the configuration properties and using broadcastvariables.
  • Developed REST APIs using Java, Play framework andAkka.
  • Expertise in search technology's like SOLR.
  • Experienced in Stormbuilder topologies to perform cleansing operations before moving data into HBase.
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into Hdfs.
  • Experience on configuring fully the Flume agent, suitable for all type of logger data and store them in Avro Sink in Parquet file format and developing 2-tier architecture connecting channels between Avro sinks and Source.
  • Experience creating Visual report, Graphical analysis and Dashboard reports using Tableau, Informaticaof historical data saved in Hdfsand data analysis using Splunk enterprise edition.
  • Good experience in utilizing Cloud Storage Services like Git. Extensive knowledge in using GitHub.
  • Experienced in job scheduling and monitoring using Oozie, Zookeeper.
  • Experience in Object Oriented concepts, Multithreading and Java/Scala

TECHNICAL SKILLS

Bigdata Ecosystem: HDFS and Map Reduce, Pig, Hive, Impala, YARN, Oozie, Zookeeper, Apache Spark, Apache NiFi, Apache STORM, Apache Kafka, Sqoop, Flume.

NoSQL Databases: HBase, MongoDB, Cassandra.

Java Technologies: Java, J2EE, JDK 1.4/1.5/1.6/1.7/1.8 , JDBC, Hibernate, XML Parsers, JSP 1.2/2, Servlets, EJB, JMS, Struts, Spring Framework, Java Beans, AJAX, JNDI.

Frameworks: MVC, Struts, Hibernate, Spring Framework, Spring Boot.

Databases: Netezza, SQL Server, MySQL, ORACLE, DB2.

Programming Languages: C, C++, Java, J2EE, JDBC, JUnit, Log4j, C#, Python, Scala, Swift, Android, PL/SQL, HQL, Unix, Shell Scripting.

Scripting Languages: Python, Perl, Shell, Sheme, Tcl, Unix Shell Scripts, Windows Power Shell

Web Technologies: HTML, JavaScript, JQuery, Ajax, Boot Strap, Angular JS, Node.js.

Development Methodologies: Waterfall, UML, Design Pattern (Core Java and J2EE), Agile Methodologies (Scrum).

IDE Development Tools: Eclipse, Net Beans, Visual Studio, XCode, Android Studio, Intellij, Jetbrains.

Operating Systems: Windows, Linux, Unix, Ubuntu.

Management Tech: SVN, Git, Jira, Maven.

Web Services: SOAP, RESTFUL API, WSDL.

PROFESSIONAL EXPERIENCE

Confidential, Raleigh, NC

Hadoop/Spark Developer

Responsibilities:

  • Involved in loading data from LINUX file system to HDFS.
  • ImplementedSparkusingScalaand utilizing Data frames andSparkSQLAPI, Data Frames and Pair RDD's for faster processing of data and created RDD's, Data Frames and datasets.
  • Used SparkAPI over ClouderaHadoopYARN to perform analytics on data in Hive.
  • Load the data into SparkRDD and performed in-memory data computation to get faster output response and implemented sparkSQL queries on data formats like Text file, CSV file and XML files.
  • Responsible for gathering requirements, process workflow, data modelling, architecture and design and led application development using Scrum.
  • Worked on large-scale Hadoop YARN cluster for distributed data processing and analysis using Data Bricks Connectors, Spark core, Spark SQL, Sqoop, Pig, Hive, Impala and NoSQL databases.
  • Integrated Maven build and designed workflows to automate the build and deploy process.
  • Used slick to query and storing in database in aScalafashion using the powerfulScalacollectionframework.
  • Collected and aggregated large amounts of web log data from different sources such as webserversin the form of XMLusing ApacheFlume and stored the data into HDFS for analysis.
  • Used Scala libraries to process XML data that was stored in HDFS and processed data was stored in HDFS.
  • Used Sparkfor interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Used HUE for running Hive queries. Created Partitionsper day using Hive to improve performance.
  • Prepared an ETL pipeline with the help of Sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption.
  • Involved in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster, compared the performance of Spark with Hive and SQL/Oracle.
  • Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Sparkframework.
  • Used Kafka for log accumulation like gathering physical log documents off servers and places them in a focal spot like HDFS for handling.
  • Used Sparkfor interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Worked with various HDFS file formats like Avro, Sequence File and various compression formats like Snappy, bzip2.
  • Developed UDF's to pre-process the data and compute various metrics for reporting in both pig and hive.
  • Responsible for implementing Machine learning algorithms like K-Means clustering and collaborative filtering in Spark.
  • Spark streaming collects the data from Kafka in near real time and performs necessary transformations and aggregations on the fly to build the common learner data model.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.

Environment: Cloudera, HDFS, Spark, Hive, Pig, Sqoop, Putty, HaaS (Hadoop as a Service),Apache Kafkaand the AWS, Spark, SPARK SQL, Maven, Java, Scala, SQL and Linux, YARN, Agile Methodology.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

  • Developed Hive (version 0.12.0) scripts to analyze data and mobile numbers are categorized into different segments and promotions are offered to customer based on segments.
  • Prepared an ETL pipeline with the help of Sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption.
  • Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop. Responsible for building scalable distributed data solutions using Hadoop.
  • Used Oozie workflow engine to manage independent Hadoop jobs and to automate several types of Hadoop such as javaMapReduce, Hive and Sqoop as well as system specific jobs.
  • Developed Hive (version 0.12.0) scripts to analyze data and mobile numbers are categorized into different segments and promotions are offered to customer based on segments.
  • Developed ETL scripts based on technical specifications/Data design documents.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
  • Reporting the data to analysts for further tracking of trends per various consumers.
  • Used Kafka Streams to Configure Sparkstreaming to get information and then store it in HDFS.
  • Wrote MapReduce jobs to generate reports for the number of activities created on a day, during a dumped from the multiple sources and the output was written back to HDFS.
  • Involved in using ORC, Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive, Pig and Sqoop to import files into Hadoop.
  • Used Spark-SQL to Load JSON data and create SchemaRDD and loaded it into Hive Tables and handled Structured data using SparkSQL.
  • Worked with ApacheSOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.
  • Expert knowledge on MongoDBNoSQL data modeling, tuning, disaster recovery and backup.
  • Used Zookeeper to manage coordination among the clusters.
  • Developed Custom InputFormat, Record Reader, Mapper, Reducer, Partitioner as part of developing end to end Hadoop applications.
  • Followed Agile-Scrum project development methodology for implementation of projects, part of the daily scrum meetings and sprint meetings.
  • Worked with NoSQL databases like HBase in making HBase tables to load expansive arrangements of semi structured data.
  • Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.

ENVIRONMENT: Hortonworks, HDFS, Map Reduce, Pig,Mesos, AWS Hive, Sqoop, Scala, Flume, Mahout, HBase, Spark, SPARK SQL, Yarn, Java, Maven, Git, Cloudera, MongoDB, Eclipse and Shell Scripting.

Confidential, SanFrancisco, CA

J2EE/Hadoop Developer

Responsibilities:

  • Responsible for gathering business and functional requirements for the development and support of in-house and vendor developed applications.
  • Played key role in design and development of new application using J2EE, Servlets, and Spring technologies/frameworks using Service Oriented Architecture (SOA).
  • Wrote Action classes, Request Processor, Business Delegate, Business Objects, Service classes and JSP pages.
  • Developed validation using Spring's Validation Interface and used Spring Core and MVC develop the applications and access data.
  • Design and developed different PL/SQL blocks, Stored Procedures in DB2 database.
  • Developed data mapping to create a communication bridge between various application interfaces using XML and XSL.
  • Developed the application under JEE architecture, developed, designed dynamic and browser compatible user interfaces using JSP, Custom Tags, HTML, CSS, and JavaScript
  • Worked with structured and semi structured data of approximately 100TB with replication factor of 3.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
  • Extensively used Hive/HQL or Hive queries to query or search for a string in Hive tables in HDFS.
  • Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
  • Involved in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
  • Created HBase tables to store various data formats of data coming from different portfolios.
  • Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
  • Worked with Apache SOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.

Environment: Java, UNIX, HDFS, Pig, Hive, Spark, Scala, MapReduce, Flume, Sqoop, HBase, Cassandra, Cloudera Distribution, Yarn, Shell scripting, Spring MVC, Oracle 11g J2EE, Java, JDBC, Servlets, JSP, XML, Design Patterns, CSS, HTML, JavaScript 1.2.

Confidential

Java Developer

Responsibilities:

  • Actively involved in the analysis, definition, design, implementation and deployment of full Software Development Life Cycle (SDLC) of the project.
  • Used Hibernate, Object Relational Mapping (ORM) solution, technique of mapping data representation from MVC model to Oracle Relational data model with an SQL-basedschema.
  • Implemented RESTful web services using Jersey for JAX-RS
  • Designed and implemented application using JSP, Spring MVC, JNDI, Spring IOC, Spring Annotations, Spring AOP, Spring Transactions, Hibernate, JDBC, SQL, ANT, JMS, Oracle.
  • Used object storage container to store the secured files, and retrieved from API by using Amazon Web Services (AWS).
  • Developed various UML diagrams like usecases, class diagrams, interaction diagrams (sequence and collaboration) and activity diagrams.
  • Used Multithreading (Concurrent) in programming to improve overall performance using Singleton design pattern in Hibernate Utility class.
  • Implemented SOA architecture with Web Services using SOAP, WSDL, UDDI and XML using Apache CXF framework tool/Apache Commons. Worked on parsing the XML files using DOM/SAX parsers.
  • Involved in Bug fixing of various modules that were raised by the testing teams in the application during the Integration testing phase.
  • Used Junit framework for unit testing of application and Log4j to capture the log that includes runtime exceptions. Used CVS for version control for implementing the application.

ENVIRONMENT: Java, JSP, HTML, CSS, Ubuntu Operating System, JavaScript, AJAX, Servlets, Struts, Hibernate, EJB (Session Beans), Log4J, WebSphere, UML, JNDI, Oracle, Windows XP, LINUX, ANT, Eclipse.

We'd love your feedback!