Hadoop/spark Developer Resume
Raleigh, NC
SUMMARY
- Over 6+ years’ experience in IT industry as Hadoop Developer including4+ years’ experience on Hadoop Ecosystem i.e.,MapReduce, Hive, Impala, Flume, Sqoop, Oozie, Kafka, Zookeeper and Spark Systems, Scala, Storm and Python Scripting language. 2+ years’ experience in Java including JSP, Junit, Ajax, Struts, Spring, Hibernates, Servlets, Web Services and hands - on experience on Web Technologies.
- Strong knowledge in HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive
- Experience with the Scala, Sparkimproving the performance and optimization of the existing algorithms in Hadoop usingSpark Context, Spark-SQL, Pair RDD's,Spark YARN.
- Experience in OLTP and OLAP design, development, testing, implementation and support of enterprise Datawarehouses.
- Experienced in facilitating streaming data using Kafka and Storm.
- Extensive experience in business data science project life cycle including Data Acquisition, Data Cleaning, Data Manipulation, Data Validation, Data Mining, Machine Learning Algorithms, and Visualization.
- Experience onProductionizing Apache Nifi. for dataflows with significant processing requirements and controlling security of data flow.
- Experience on Amazon Web Servies(AWS) with variety of services i.e, S3, EMR, Elastic Search(SOLR), EC2.
- Designed and developed RDD Seeds usingScalaand Cascading.Streaming data to Spark streaming using Kafka
- Experienced in building highly scalable Big-data solutions using Hadoop distributed platforms i.e., Cloudera.
- Exposure to Hadoop Distributed Platforms i.e., Hortonworks and MapR.
- Good understanding of NoSQL databases and hands on work experience in writing applications No SQL Databases HBase, Cassandra and MongoDB.
- Experienced in installation, configuration, supporting and managing Hadoop Clusters using ApacheCloudera distributions, Hortonworks, Cloud Storage and Amazon web services (AWS).
- Experience in deploying NiFi Data flow in Production team and Integrating data from multiple sources like Cassandra, MongoDB.
- Deploying templates to environments can be done via NiFiRestAPI integrated with other automation tools
- Experienced in Python programming, wrote WebCrawlers using Python.
- Experienced in developing MapReduce jobs using Scala in Spark-Shell.
- Good experienced on moving the data in and out of Hadoop RDBMS, No-SQL and UNIX from various systems using SQOOP and other traditional data movement technologies.
- Experience on Integration of Quartz scheduler with Oozie work flows to get data from multiple data sources in parallel using fork.
- Good knowledge on tuning the Spark jobs by changing the configuration properties and using broadcastvariables.
- Developed REST APIs using Java, Play framework andAkka.
- Expertise in search technology's like SOLR.
- Experienced in Stormbuilder topologies to perform cleansing operations before moving data into HBase.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into Hdfs.
- Experience on configuring fully the Flume agent, suitable for all type of logger data and store them in Avro Sink in Parquet file format and developing 2-tier architecture connecting channels between Avro sinks and Source.
- Experience creating Visual report, Graphical analysis and Dashboard reports using Tableau, Informaticaof historical data saved in Hdfsand data analysis using Splunk enterprise edition.
- Good experience in utilizing Cloud Storage Services like Git. Extensive knowledge in using GitHub.
- Experienced in job scheduling and monitoring using Oozie, Zookeeper.
- Experience in Object Oriented concepts, Multithreading and Java/Scala
TECHNICAL SKILLS
Bigdata Ecosystem: HDFS and Map Reduce, Pig, Hive, Impala, YARN, Oozie, Zookeeper, Apache Spark, Apache NiFi, Apache STORM, Apache Kafka, Sqoop, Flume.
NoSQL Databases: HBase, MongoDB, Cassandra.
Java Technologies: Java, J2EE, JDK 1.4/1.5/1.6/1.7/1.8 , JDBC, Hibernate, XML Parsers, JSP 1.2/2, Servlets, EJB, JMS, Struts, Spring Framework, Java Beans, AJAX, JNDI.
Frameworks: MVC, Struts, Hibernate, Spring Framework, Spring Boot.
Databases: Netezza, SQL Server, MySQL, ORACLE, DB2.
Programming Languages: C, C++, Java, J2EE, JDBC, JUnit, Log4j, C#, Python, Scala, Swift, Android, PL/SQL, HQL, Unix, Shell Scripting.
Scripting Languages: Python, Perl, Shell, Sheme, Tcl, Unix Shell Scripts, Windows Power Shell
Web Technologies: HTML, JavaScript, JQuery, Ajax, Boot Strap, Angular JS, Node.js.
Development Methodologies: Waterfall, UML, Design Pattern (Core Java and J2EE), Agile Methodologies (Scrum).
IDE Development Tools: Eclipse, Net Beans, Visual Studio, XCode, Android Studio, Intellij, Jetbrains.
Operating Systems: Windows, Linux, Unix, Ubuntu.
Management Tech: SVN, Git, Jira, Maven.
Web Services: SOAP, RESTFUL API, WSDL.
PROFESSIONAL EXPERIENCE
Confidential, Raleigh, NC
Hadoop/Spark Developer
Responsibilities:
- Involved in loading data from LINUX file system to HDFS.
- ImplementedSparkusingScalaand utilizing Data frames andSparkSQLAPI, Data Frames and Pair RDD's for faster processing of data and created RDD's, Data Frames and datasets.
- Used SparkAPI over ClouderaHadoopYARN to perform analytics on data in Hive.
- Load the data into SparkRDD and performed in-memory data computation to get faster output response and implemented sparkSQL queries on data formats like Text file, CSV file and XML files.
- Responsible for gathering requirements, process workflow, data modelling, architecture and design and led application development using Scrum.
- Worked on large-scale Hadoop YARN cluster for distributed data processing and analysis using Data Bricks Connectors, Spark core, Spark SQL, Sqoop, Pig, Hive, Impala and NoSQL databases.
- Integrated Maven build and designed workflows to automate the build and deploy process.
- Used slick to query and storing in database in aScalafashion using the powerfulScalacollectionframework.
- Collected and aggregated large amounts of web log data from different sources such as webserversin the form of XMLusing ApacheFlume and stored the data into HDFS for analysis.
- Used Scala libraries to process XML data that was stored in HDFS and processed data was stored in HDFS.
- Used Sparkfor interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Used HUE for running Hive queries. Created Partitionsper day using Hive to improve performance.
- Prepared an ETL pipeline with the help of Sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption.
- Involved in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster, compared the performance of Spark with Hive and SQL/Oracle.
- Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Sparkframework.
- Used Kafka for log accumulation like gathering physical log documents off servers and places them in a focal spot like HDFS for handling.
- Used Sparkfor interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Worked with various HDFS file formats like Avro, Sequence File and various compression formats like Snappy, bzip2.
- Developed UDF's to pre-process the data and compute various metrics for reporting in both pig and hive.
- Responsible for implementing Machine learning algorithms like K-Means clustering and collaborative filtering in Spark.
- Spark streaming collects the data from Kafka in near real time and performs necessary transformations and aggregations on the fly to build the common learner data model.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
Environment: Cloudera, HDFS, Spark, Hive, Pig, Sqoop, Putty, HaaS (Hadoop as a Service),Apache Kafkaand the AWS, Spark, SPARK SQL, Maven, Java, Scala, SQL and Linux, YARN, Agile Methodology.
Confidential, Chicago, IL
Hadoop Developer
Responsibilities:
- Developed Hive (version 0.12.0) scripts to analyze data and mobile numbers are categorized into different segments and promotions are offered to customer based on segments.
- Prepared an ETL pipeline with the help of Sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption.
- Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop. Responsible for building scalable distributed data solutions using Hadoop.
- Used Oozie workflow engine to manage independent Hadoop jobs and to automate several types of Hadoop such as javaMapReduce, Hive and Sqoop as well as system specific jobs.
- Developed Hive (version 0.12.0) scripts to analyze data and mobile numbers are categorized into different segments and promotions are offered to customer based on segments.
- Developed ETL scripts based on technical specifications/Data design documents.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Reporting the data to analysts for further tracking of trends per various consumers.
- Used Kafka Streams to Configure Sparkstreaming to get information and then store it in HDFS.
- Wrote MapReduce jobs to generate reports for the number of activities created on a day, during a dumped from the multiple sources and the output was written back to HDFS.
- Involved in using ORC, Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive, Pig and Sqoop to import files into Hadoop.
- Used Spark-SQL to Load JSON data and create SchemaRDD and loaded it into Hive Tables and handled Structured data using SparkSQL.
- Worked with ApacheSOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.
- Expert knowledge on MongoDBNoSQL data modeling, tuning, disaster recovery and backup.
- Used Zookeeper to manage coordination among the clusters.
- Developed Custom InputFormat, Record Reader, Mapper, Reducer, Partitioner as part of developing end to end Hadoop applications.
- Followed Agile-Scrum project development methodology for implementation of projects, part of the daily scrum meetings and sprint meetings.
- Worked with NoSQL databases like HBase in making HBase tables to load expansive arrangements of semi structured data.
- Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
ENVIRONMENT: Hortonworks, HDFS, Map Reduce, Pig,Mesos, AWS Hive, Sqoop, Scala, Flume, Mahout, HBase, Spark, SPARK SQL, Yarn, Java, Maven, Git, Cloudera, MongoDB, Eclipse and Shell Scripting.
Confidential, SanFrancisco, CA
J2EE/Hadoop Developer
Responsibilities:
- Responsible for gathering business and functional requirements for the development and support of in-house and vendor developed applications.
- Played key role in design and development of new application using J2EE, Servlets, and Spring technologies/frameworks using Service Oriented Architecture (SOA).
- Wrote Action classes, Request Processor, Business Delegate, Business Objects, Service classes and JSP pages.
- Developed validation using Spring's Validation Interface and used Spring Core and MVC develop the applications and access data.
- Design and developed different PL/SQL blocks, Stored Procedures in DB2 database.
- Developed data mapping to create a communication bridge between various application interfaces using XML and XSL.
- Developed the application under JEE architecture, developed, designed dynamic and browser compatible user interfaces using JSP, Custom Tags, HTML, CSS, and JavaScript
- Worked with structured and semi structured data of approximately 100TB with replication factor of 3.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
- Extensively used Hive/HQL or Hive queries to query or search for a string in Hive tables in HDFS.
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
- Involved in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
- Created HBase tables to store various data formats of data coming from different portfolios.
- Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
- Worked with Apache SOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.
Environment: Java, UNIX, HDFS, Pig, Hive, Spark, Scala, MapReduce, Flume, Sqoop, HBase, Cassandra, Cloudera Distribution, Yarn, Shell scripting, Spring MVC, Oracle 11g J2EE, Java, JDBC, Servlets, JSP, XML, Design Patterns, CSS, HTML, JavaScript 1.2.
Confidential
Java Developer
Responsibilities:
- Actively involved in the analysis, definition, design, implementation and deployment of full Software Development Life Cycle (SDLC) of the project.
- Used Hibernate, Object Relational Mapping (ORM) solution, technique of mapping data representation from MVC model to Oracle Relational data model with an SQL-basedschema.
- Implemented RESTful web services using Jersey for JAX-RS
- Designed and implemented application using JSP, Spring MVC, JNDI, Spring IOC, Spring Annotations, Spring AOP, Spring Transactions, Hibernate, JDBC, SQL, ANT, JMS, Oracle.
- Used object storage container to store the secured files, and retrieved from API by using Amazon Web Services (AWS).
- Developed various UML diagrams like usecases, class diagrams, interaction diagrams (sequence and collaboration) and activity diagrams.
- Used Multithreading (Concurrent) in programming to improve overall performance using Singleton design pattern in Hibernate Utility class.
- Implemented SOA architecture with Web Services using SOAP, WSDL, UDDI and XML using Apache CXF framework tool/Apache Commons. Worked on parsing the XML files using DOM/SAX parsers.
- Involved in Bug fixing of various modules that were raised by the testing teams in the application during the Integration testing phase.
- Used Junit framework for unit testing of application and Log4j to capture the log that includes runtime exceptions. Used CVS for version control for implementing the application.
ENVIRONMENT: Java, JSP, HTML, CSS, Ubuntu Operating System, JavaScript, AJAX, Servlets, Struts, Hibernate, EJB (Session Beans), Log4J, WebSphere, UML, JNDI, Oracle, Windows XP, LINUX, ANT, Eclipse.
