Hadoop/spark Developer Resume
TX
SUMMARY
- 6+ years of Professional experience in IT Industry, involved in Developing, Implementing and maintenance of various web based applications using Java, J2EE and Big Data Ecosystems experience on Windows and Linux environments.
- Over 3+ years of work experience on Big Data Analytics with hands on experience on writing Sparkand Map Reduce jobs on Hadoop Ecosystem including Hive, Pig, Sqoop and Flume.
- Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Knowledge on installing, configuring and using Hadoop ecosystem and components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper and Flume.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Proficiency in Spark using Scala for loading data from the local file systems like HDFS, Amazon S3, Relational and NoSQL databases using Spark SQL, Cassandra and Import data into RDD and Ingesting data from a range of sources using Spark Streaming.
- Developed Apache Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
- Analyzed large amounts of data sets using Pig scripts and Hive scripts.
- Exploring withvarious modules of Spark and working with Data Frames, RDD and Spark.
- Performed map-side joins on RDD.
- Experience in ETL operations on Hive to Spark
- Performed visualizations according to business requirements using visualization tools like Tableau.
- Designed and developed Tableau dashboards, installed and configured Tableau Server on enterprise wide deployments.
- Installed, tested and deployed monitoring solutions with Splunk services.
- Worked with Core Java and J2EE technologies such as Servlets, JSP, EJB, JMS, JDBC, Threads, Multi-Threading, Collections and Exception handling
- Hands on experience with Spring modules such as Spring Core, Spring MVC, Spring AOP, Spring Auto Wiring, Security and Transaction, Struts along with Hibernate as the back-end ORM tool.
- Experienced in developing applications using Model-View-Controller (MVC) Architecture andSpring framework.
- Experience in developing and consuming Web Services using REST, SOAP, XSD, XML, UDDI, JSON and WSDL.
- Experience in Deploying web application using application servers WebLogic, Apache Tomcat, WebSphere and JBOSS.
- Used Version Control tools like GIT, CVS, SVN and Clear Case.
- Good Experience on SDLC (Software Development Life cycle).
- Experienced in coding SQL, PL/SQL, Procedures/Functions, Triggers and Packages on database (RDBMS) packages like Oracle.
- Experienced inwebdevelopmentusingHTML/HTML5, DHTML, XHTML, CSS/CSS3, JavaScript, Angular JS, Node JS technologies.
TECHNICAL SKILLS
Hadoop Ecosystem: Hadoop, CDH5.3.2, MapReduce, YARN, Spark 1.6/2.0, Sqoop, Hive, Oozie, PIG, HDFS, Flume, ImpalaProgramming Languages C, Java, Scala 2.11 SQL, PL/SQL, PIG Latin, HiveQL, Unix shell scripting
Java & J2EE Technologies: Core Java, Servlets, JSP
No SQL Databases: HBase, MongoDB
Version control/Tools: Git, Git Hub,SAS, Tableau..
Databases: Oracle 11g/10g/9i, My SQL
Frameworks: Spring 3.0.5, Hibernate 3.5.1, Struts 1.3.10, EJB, JUnit, MRUnit
Web and Application Server: Apache Tomcat 7.0, Apache Tomcat 6.0, Web Logic 8.0Methodologies Agile Scrum, Waterfall
PROFESSIONAL EXPERIENCE
Confidential, TX
Hadoop/Spark Developer
Responsibilities:
- Experienced with batch processing of data sources using Apache Spark and Elastic search
- Experienced in implementing Spark RDD transformations, actions to implement business analysis
- Migrated Hive QL queries on structured into Spark QL to improve performance
- Implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala
- Configured, deployed and maintained a single node storm cluster in DEV environment
- Developing predictive analytic using Apache Spark Scala APIs used Spark Streaming with Kafka&HDFS/HBase to build a continuous ETL pipeline. This is used for real time analytics performed on the data
- Preparing Design Documents (Request-Response Mapping Documents, Hive Mapping Documents)
- Data ingestion is done using Flume with source as Kafka Source & sink as HDFS.
- Used Scala collection framework to store and process the complex consumer information. Based on the offers setup for each client, the requests were post processed and given offers.
- Used slick to query and storing in database in a Scala fashion using the powerful Scala collection framework.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL
- Created various Parser programs to extract data from Autosys, Tibco Business Objects, XML, Java, and database views using Scala
- Ran weekly sales enablement requirements including hands-on Git and GitHub workshops for reps
- Developed solutions to pre-process large sets of structured, semi-structured data, with different file formats (Text file, Avro data files, Sequence files, Xml and JSon files, ORC and Parquet)
- Handled importing of data from RDBMS into HDFS using Sqoop
- Experienced in data cleansing processing using Pig latin operations and UDFs
- Experienced in writing Hive Scripts for analyzing data in Hive warehouse using Hive Query Language (HQL)
- Implemented Partitioning, Dynamic partitioning and Bucketing in Hive using internal and external table for more efficient data.
- Involved in creating Hive tables, loading with data and writing hive queries to process the data
- Created scripts to automate the process of Data Ingestion
- Developed PIG scripts for source data validation and transformation
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability for analyzing HDFS audit data
- Experience in using Testing Frameworks of BigData world, MRUnit, PIGUnit for testing raw data and executed performance script
Environment: HDFS, CDH5.3.2, Apache Spark 4.1, Hive, Pig, Scala, Java, Sqoop, SQL, Shell scripting.
Confidential, Fort Worth, TX
Hadoop/Spark Developer
Responsibilities:
- Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster
- Installed and configured Hive, Pig, Sqoop, and Oozie on the Hadoop cluster
- Installed Oozie Workflow engine to run multiple Hive and Pig Jobs
- Developed multiple MapReduce jobs in Java for data cleansing and preprocessing
- Developed Simple to complex Map/Reduce Jobs using Hive and Pig
- Involved in loading data from UNIX file system to HDFS
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it
- Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports
- Responsible for building scalable distributed data solutions using Hadoop
- Migration of ETL processes from Oracle to Hive to test the easy data manipulation
- Performed optimization on Pig scripts and Hive queries increase efficiency and add new features to existing code
- Stored and retrieved data from data-warehouses using Amazon Redshift
- Developed PIG Latin scripts for the analysis of semi structured data
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs
- Used Sqoop to import data into HDFS and Hive from other data systems
- Installed Oozie workflow engine to run multiple Hive
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
- Conducted some unit testing for the development team within the sandbox environment
- Developed Hive queries to process the data
Environment: Apache Hadoop, Cloudera Manager, CDH2, CDH3 CentOS, Java, MapReduce, ApacheHama, Eclipse Indigo, Pig, Hive, Sqoop, Oozie and SQL, Struts, JUnit.
Confidential
Hadoop Developer
Responsibilities:
- Cloudera Hadoop installation and configuration of multiple nodes using Cloudera Manager and CDH 4.X/5.X.
- Designed documents and estimated efforts for the project.
- Developed Map Reduce Programs using MRv1 and MRv2 (YARN).
- Responsible for processing unstructured data using Pig and Hive.
- Developed Pig Latin scripts for extracting data.
- Used Pig for data loading, filtering and storing the data.
- Developed HIVE queries for the analysts.
- Developed Java code to stream the Packet tracer data into Hive using rest full services.
- Worked on migrating data from Mongo DB to Hadoop.
- Worked on integrating SFDC with Hadoop.
- Extracted the data from MySQL into HDFS using Sqoop.
- Involved in running Hadoop jobs for processing millions of records of text data for batch and online processes by using Tuned/Modified SQL.
- Designed and published workbooks and dashboards using Tableau Dashboard/Server 6.X/7.X
Environment: Cloudera, Hadoop (HDFS), Map Reduce, Spark, Hive, Java, Scala, JDK, UNIX Shell Scripting, MySQL, Eclipse, Tableau 8.X/9.X.
Confidential
Java Developer
Responsibilities:
- Involved in the complete development, testing and maintenance process of the application
- Responsible for gathering the requirements doing the analysis and formulating the requirements specifications with the consistent inputs/requirements
- Developed JSP as an application controller
- Designed and developed HTML front end screens and validated forms using JavaScript
- Used Frames and Cascading Style Sheets (CSS) to give a better view to the Web Pages
- Deployed the web application on Web Logic server
- Used JDBC for database connectivity
- Developed necessary SQL queries for database transactions
- Involved in testing, implementation and documentation
- Written Java script code for Input Validation
- Front End was built using JSPs, JavaScript and HTML
- Built Custom Tags for JSPs
- Built the report module on reports based from Crystal reports
- Integrating data from multiple data sources
- Generating schema difference reports for database using toad
Environment: Java, JSP, Web Logic 5.1, HTML, JavaScript, JDBC and SQL, PL/SQL, UNIX.
