Hadoop/Spark Developer Resume Raleigh, NC - Hire IT People

SUMMARY

Over 6+ years’ experience in IT industry as Hadoop Developer including4+ years’ experience on Hadoop Ecosystem i.e.,MapReduce, Hive, Impala, Flume, Sqoop, Oozie, Kafka, Zookeeper and Spark Systems, Scala, Storm and Python Scripting language. 2+ years’ experience in Java including JSP, Junit, Ajax, Struts, Spring, Hibernates, Servlets, Web Services and hands - on experience on Web Technologies.
Strong knowledge in HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive
Experience with the Scala, Sparkimproving the performance and optimization of the existing algorithms in Hadoop usingSpark Context, Spark-SQL, Pair RDD's,Spark YARN.
Experience in OLTP and OLAP design, development, testing, implementation and support of enterprise Datawarehouses.
Experienced in facilitating streaming data using Kafka and Storm.
Extensive experience in business data science project life cycle including Data Acquisition, Data Cleaning, Data Manipulation, Data Validation, Data Mining, Machine Learning Algorithms, and Visualization.
Experience onProductionizing Apache Nifi. for dataflows with significant processing requirements and controlling security of data flow.
Experience on Amazon Web Servies(AWS) with variety of services i.e, S3, EMR, Elastic Search(SOLR), EC2.
Designed and developed RDD Seeds usingScalaand Cascading.Streaming data to Spark streaming using Kafka
Experienced in building highly scalable Big-data solutions using Hadoop distributed platforms i.e., Cloudera.
Exposure to Hadoop Distributed Platforms i.e., Hortonworks and MapR.
Good understanding of NoSQL databases and hands on work experience in writing applications No SQL Databases HBase, Cassandra and MongoDB.
Experienced in installation, configuration, supporting and managing Hadoop Clusters using ApacheCloudera distributions, Hortonworks, Cloud Storage and Amazon web services (AWS).
Experience in deploying NiFi Data flow in Production team and Integrating data from multiple sources like Cassandra, MongoDB.
Deploying templates to environments can be done via NiFiRestAPI integrated with other automation tools
Experienced in Python programming, wrote WebCrawlers using Python.
Experienced in developing MapReduce jobs using Scala in Spark-Shell.
Good experienced on moving the data in and out of Hadoop RDBMS, No-SQL and UNIX from various systems using SQOOP and other traditional data movement technologies.
Experience on Integration of Quartz scheduler with Oozie work flows to get data from multiple data sources in parallel using fork.
Good knowledge on tuning the Spark jobs by changing the configuration properties and using broadcastvariables.
Developed REST APIs using Java, Play framework andAkka.
Expertise in search technology's like SOLR.
Experienced in Stormbuilder topologies to perform cleansing operations before moving data into HBase.
Hands on experience in configuring and working with Flume to load the data from multiple sources directly into Hdfs.
Experience on configuring fully the Flume agent, suitable for all type of logger data and store them in Avro Sink in Parquet file format and developing 2-tier architecture connecting channels between Avro sinks and Source.
Experience creating Visual report, Graphical analysis and Dashboard reports using Tableau, Informaticaof historical data saved in Hdfsand data analysis using Splunk enterprise edition.
Good experience in utilizing Cloud Storage Services like Git. Extensive knowledge in using GitHub.
Experienced in job scheduling and monitoring using Oozie, Zookeeper.
Experience in Object Oriented concepts, Multithreading and Java/Scala

TECHNICAL SKILLS

Bigdata Ecosystem: HDFS and Map Reduce, Pig, Hive, Impala, YARN, Oozie, Zookeeper, Apache Spark, Apache NiFi, Apache STORM, Apache Kafka, Sqoop, Flume.

NoSQL Databases: HBase, MongoDB, Cassandra.

Java Technologies: Java, J2EE, JDK 1.4/1.5/1.6/1.7/1.8 , JDBC, Hibernate, XML Parsers, JSP 1.2/2, Servlets, EJB, JMS, Struts, Spring Framework, Java Beans, AJAX, JNDI.

Frameworks: MVC, Struts, Hibernate, Spring Framework, Spring Boot.

Databases: Netezza, SQL Server, MySQL, ORACLE, DB2.

Programming Languages: C, C++, Java, J2EE, JDBC, JUnit, Log4j, C#, Python, Scala, Swift, Android, PL/SQL, HQL, Unix, Shell Scripting.

Scripting Languages: Python, Perl, Shell, Sheme, Tcl, Unix Shell Scripts, Windows Power Shell

Web Technologies: HTML, JavaScript, JQuery, Ajax, Boot Strap, Angular JS, Node.js.

Development Methodologies: Waterfall, UML, Design Pattern (Core Java and J2EE), Agile Methodologies (Scrum).

IDE Development Tools: Eclipse, Net Beans, Visual Studio, XCode, Android Studio, Intellij, Jetbrains.

Operating Systems: Windows, Linux, Unix, Ubuntu.

Management Tech: SVN, Git, Jira, Maven.

Web Services: SOAP, RESTFUL API, WSDL.

PROFESSIONAL EXPERIENCE

Confidential, Raleigh, NC

Hadoop/Spark Developer

Responsibilities:

Involved in loading data from LINUX file system to HDFS.
ImplementedSparkusingScalaand utilizing Data frames andSparkSQLAPI, Data Frames and Pair RDD's for faster processing of data and created RDD's, Data Frames and datasets.
Used SparkAPI over ClouderaHadoopYARN to perform analytics on data in Hive.
Load the data into SparkRDD and performed in-memory data computation to get faster output response and implemented sparkSQL queries on data formats like Text file, CSV file and XML files.
Responsible for gathering requirements, process workflow, data modelling, architecture and design and led application development using Scrum.
Worked on large-scale Hadoop YARN cluster for distributed data processing and analysis using Data Bricks Connectors, Spark core, Spark SQL, Sqoop, Pig, Hive, Impala and NoSQL databases.
Integrated Maven build and designed workflows to automate the build and deploy process.
Used slick to query and storing in database in aScalafashion using the powerfulScalacollectionframework.
Collected and aggregated large amounts of web log data from different sources such as webserversin the form of XMLusing ApacheFlume and stored the data into HDFS for analysis.
Used Scala libraries to process XML data that was stored in HDFS and processed data was stored in HDFS.
Used Sparkfor interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
Used HUE for running Hive queries. Created Partitionsper day using Hive to improve performance.
Prepared an ETL pipeline with the help of Sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption.
Involved in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster, compared the performance of Spark with Hive and SQL/Oracle.
Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Sparkframework.
Used Kafka for log accumulation like gathering physical log documents off servers and places them in a focal spot like HDFS for handling.
Used Sparkfor interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
Worked with various HDFS file formats like Avro, Sequence File and various compression formats like Snappy, bzip2.
Developed UDF's to pre-process the data and compute various metrics for reporting in both pig and hive.
Responsible for implementing Machine learning algorithms like K-Means clustering and collaborative filtering in Spark.
Spark streaming collects the data from Kafka in near real time and performs necessary transformations and aggregations on the fly to build the common learner data model.
Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.

Environment: Cloudera, HDFS, Spark, Hive, Pig, Sqoop, Putty, HaaS (Hadoop as a Service),Apache Kafkaand the AWS, Spark, SPARK SQL, Maven, Java, Scala, SQL and Linux, YARN, Agile Methodology.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

Developed Hive (version 0.12.0) scripts to analyze data and mobile numbers are categorized into different segments and promotions are offered to customer based on segments.
Prepared an ETL pipeline with the help of Sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption.
Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop. Responsible for building scalable distributed data solutions using Hadoop.
Used Oozie workflow engine to manage independent Hadoop jobs and to automate several types of Hadoop such as javaMapReduce, Hive and Sqoop as well as system specific jobs.
Developed Hive (version 0.12.0) scripts to analyze data and mobile numbers are categorized into different segments and promotions are offered to customer based on segments.
Developed ETL scripts based on technical specifications/Data design documents.
Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
Reporting the data to analysts for further tracking of trends per various consumers.
Used Kafka Streams to Configure Sparkstreaming to get information and then store it in HDFS.
Wrote MapReduce jobs to generate reports for the number of activities created on a day, during a dumped from the multiple sources and the output was written back to HDFS.
Involved in using ORC, Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive, Pig and Sqoop to import files into Hadoop.
Used Spark-SQL to Load JSON data and create SchemaRDD and loaded it into Hive Tables and handled Structured data using SparkSQL.
Worked with ApacheSOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.
Expert knowledge on MongoDBNoSQL data modeling, tuning, disaster recovery and backup.
Used Zookeeper to manage coordination among the clusters.
Developed Custom InputFormat, Record Reader, Mapper, Reducer, Partitioner as part of developing end to end Hadoop applications.
Followed Agile-Scrum project development methodology for implementation of projects, part of the daily scrum meetings and sprint meetings.
Worked with NoSQL databases like HBase in making HBase tables to load expansive arrangements of semi structured data.
Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.

ENVIRONMENT: Hortonworks, HDFS, Map Reduce, Pig,Mesos, AWS Hive, Sqoop, Scala, Flume, Mahout, HBase, Spark, SPARK SQL, Yarn, Java, Maven, Git, Cloudera, MongoDB, Eclipse and Shell Scripting.

Confidential, SanFrancisco, CA

J2EE/Hadoop Developer

Responsibilities:

Responsible for gathering business and functional requirements for the development and support of in-house and vendor developed applications.
Played key role in design and development of new application using J2EE, Servlets, and Spring technologies/frameworks using Service Oriented Architecture (SOA).
Wrote Action classes, Request Processor, Business Delegate, Business Objects, Service classes and JSP pages.
Developed validation using Spring's Validation Interface and used Spring Core and MVC develop the applications and access data.
Design and developed different PL/SQL blocks, Stored Procedures in DB2 database.
Developed data mapping to create a communication bridge between various application interfaces using XML and XSL.
Developed the application under JEE architecture, developed, designed dynamic and browser compatible user interfaces using JSP, Custom Tags, HTML, CSS, and JavaScript
Worked with structured and semi structured data of approximately 100TB with replication factor of 3.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
Extensively used Hive/HQL or Hive queries to query or search for a string in Hive tables in HDFS.
Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
Involved in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
Created HBase tables to store various data formats of data coming from different portfolios.
Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
Worked with Apache SOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.

Environment: Java, UNIX, HDFS, Pig, Hive, Spark, Scala, MapReduce, Flume, Sqoop, HBase, Cassandra, Cloudera Distribution, Yarn, Shell scripting, Spring MVC, Oracle 11g J2EE, Java, JDBC, Servlets, JSP, XML, Design Patterns, CSS, HTML, JavaScript 1.2.

Confidential

Java Developer

Responsibilities:

Actively involved in the analysis, definition, design, implementation and deployment of full Software Development Life Cycle (SDLC) of the project.
Used Hibernate, Object Relational Mapping (ORM) solution, technique of mapping data representation from MVC model to Oracle Relational data model with an SQL-basedschema.
Implemented RESTful web services using Jersey for JAX-RS
Designed and implemented application using JSP, Spring MVC, JNDI, Spring IOC, Spring Annotations, Spring AOP, Spring Transactions, Hibernate, JDBC, SQL, ANT, JMS, Oracle.
Used object storage container to store the secured files, and retrieved from API by using Amazon Web Services (AWS).
Developed various UML diagrams like usecases, class diagrams, interaction diagrams (sequence and collaboration) and activity diagrams.
Used Multithreading (Concurrent) in programming to improve overall performance using Singleton design pattern in Hibernate Utility class.
Implemented SOA architecture with Web Services using SOAP, WSDL, UDDI and XML using Apache CXF framework tool/Apache Commons. Worked on parsing the XML files using DOM/SAX parsers.
Involved in Bug fixing of various modules that were raised by the testing teams in the application during the Integration testing phase.
Used Junit framework for unit testing of application and Log4j to capture the log that includes runtime exceptions. Used CVS for version control for implementing the application.

ENVIRONMENT: Java, JSP, HTML, CSS, Ubuntu Operating System, JavaScript, AJAX, Servlets, Struts, Hibernate, EJB (Session Beans), Log4J, WebSphere, UML, JNDI, Oracle, Windows XP, LINUX, ANT, Eclipse.

We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Raleigh, NC

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship