Hadoop Spark Developer Resume
Kansas City, MissourI
SUMMARY
- Over 10+ years of Software Development Experience in Java/J2EE Technology and Big data/Hadoop stack.
- Expertise in various Hadoop distributions like Amazon AWS Ec2/EMR, Cloudera, Hortonworks and MapR distributions.
- Experience in importing/exporting of data into/from Traditional Database like Teradata/Oracle RDBMS using Sqoop
- Experienced in getting streaming data into HDFs using Flume, memory channels, custom interceptors.
- Around 2 years of experience in Talend development.
- Extensively worked in writing, fine tuning and profiling Mapreduce jobs for optimized performance.
- Extensive experience in implementing data analytical algorithms using Map reduce design patterns.
- Experience in implementing complex map reduce algorithms to perform joins on the Map side using distributed cache.
- Experience in writing Test cases, test classes using MRUnit, Junit and Mockito.
- Extended Hive and Pig core functionality by writing Custom UDFs.
- Experienced in handling ETL transformations using Pig Latin scripts, expressions, join operations and Custom UDF's for evaluation, filtering and storing data
- Expert in analyzing real time queries using different NoSQL data bases including Cassandra and HBase.
- Expert in implementing advanced procedures like text analytics and processing using the in - memory computing capabilities like Apache Spark written in Scala.
- Experience in converting business process into RDD transformations using Apache Spark and Scala.
- Experience in Writing Producers/consumers and creating messaging centric applications using Apache Kafka.
- Have knowledge on Apache Storm to integrate with Apache Kafka for stream processing.
- Experience in integrating Spark with Solr and Indexing with Apache Solr.
- Experience in supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS).
- Knowledge on Splunk UI to work at production support to perform log analysis.
- Expertise and Knowledge in using job scheduling and monitoring tools like Azkaban,Oozie and ZooKeeper.
- Expertise in writing Shell-Scripts, Cron Automation and Regular Expressions.
TECHNICAL SKILLS
Programming Languages: C, Java, Scala
Distributed File Systems: Apache Hadoop HDFS
Hadoop Distributions: Amazon aws/EMR,Apache Cloudera, Hortonworks, and MapR
Hadoop Technologies: HDFS, MapReduce, Hive, Pig, Sqoop,Azkaban, Oozie, Zookeeper, Flume sparksql,and Apache Kafka
NoSQL data bases: Cassandra, Hbase
Relational Data Stores: Oracle, MySQL.
In-memory/MPP/Search: Apache Spark, Apache Spark Streaming, Apache Storm
Cloud Platforms: Amazon AWS, OpenStack.
Application Servers: JBoss, Tomcat, Web Logic, Web Sphere
Web Services: SOAP,REST, WSDL, JAXB, and JAXP
Frameworks: Hibernate, Spring, Struts, JMS, EJB
Web Technologies: HTML5, CSS3, AngularJS, JavaScript, JQuery, AJAX, Servlets, JSP,JSON, XML, XHTML, JSF
PROFESSIONAL EXPERIENCE
Confidential, Kansas City, Missouri
Hadoop Spark Developer
Responsibilities:
- Worked on Spark core, Spark Streaming, Spark SQL modules of Spark.
- Developing scripts to perform business transformations on the data using Hive and PIG.
- Developing UDFs in java for hive and pig and worked on reading multiple data formats on HDFS using Scala.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Prepared pig scripts and Spark SQL/Spark streaming to handle all the transformations specified in the S2TM's and to handle SCD2 and SCD1 scenarios.
- Involved in converting Hive/SQL queries into Spark transformations using Spark SQL, Python and Scala.
- Worked on Apache spark writing Python applications to convert txt, xls files and parse.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Loading data into Spark RDD and do in memory data Computation to generate the Output response.
- Loading the data to HBASE by using bulk load and HBASE API.
- Used Scala to write several Spark Jobs in real time applications.
- Developed Spark code using Python for faster processing of data on Hive.
- Developed MapReduce jobs in Python for data cleaning and data processing.
- Connecting MySQL database through spark driver.
- Analysed the SQL scripts and designed the solution to implement using Scala.
- Data analysis through Pig, Map Reduce, Hive.
- Design and develop Data Ingestion component.
- Cluster coordination services through Zookeeper
- Import of data using Sqoop from Oracle to HDFS.
- Developed analytical component using Scala, Spark and Spark Stream
Environment: Java, Scala, Python, J2EE, Hadoop, Spark, HBase, Hive, Pig, Sqoop, MySQL, Teradata, GitHub.
Confidential, Los Angeles, CA
Hadoop Spark Scala Developer
Responsibilities:
- Design and develop ELT data pipeline using Spark App to fetch data from Legacy system and third party API, social media sites.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Experience in developing and designing POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Involved in migrating Hive queries into Spark transformations using Data frames, Spark SQL, SQL Context, and Scala.
- Used Spring & Hibernate Frameworks and implemented MVC architecture.
- Responsible for performing extensive data summarization using Hive.
- Importing the data into Spark from Kafka Consumer group using Spark Streaming APIs.
- Developed Pig UDF's to pre-process the data for analysis using Java or Python.
- Worked with SQOOP import and export functionalities to handle large data set transfer between Oracle database and HDFS.
- Worked on Spring RESTful for dependency injection.
- Developed and retrieved No-SQL data using Mongo DB using DAO's.
- Implemented test scripts to support test driven development and continuous integration.
- Perform data analytics and load data to Amazon s3/datalake/Spark cluster.
- Write and build Azkaban workflow jobs to automate the process.
- Develop spark Sql tables & queries to perform Adhoc data analytics for analyst team.
- Deploy components using Maven Build system and Docker images
- Involved in deploying multi module Azkaban applications using Maven
- For automating deployment process developed Shell Scripts.
- Played an important in migrating jobs from spark 0.9 to 1.4 to 1.6.
- Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
- Monitoring spark clusters.
- Experience in Agile methodology.
- Worked with file formats TEXT, AVRO, PARQUET and SEQUENCE files.
- Developed ETL Frame Works.
- Implemented UDFS, UDAFS, UDTFS in java for hive to process the data that can’t be performed using Hive inbuilt functions.
- Involved in creating Hive Internal/External tables, loading with data and troubleshoot with Hive jobs.
- Created partitioned tables in Hive for best performance and faster querying.
- Involved in integrating hive queries into spark environment using SparkSql.
Environment: Languages/Technologies: Java(JDK1.6 and higher), Azkaban, Spark, Spark Sql, Presto, Hive, Apache Crunch, Elastic Search, Spring boot.Special Software: Azkaban, Eclipse, GIT Repository, Amazon S3, Amazon AWS Ec2/EMR, Amazon EMR Spark cluster,Hadoop Framework, Sqoop, Azkaban, Maven, UNIX Shell Scripting
Confidential, Coppell TX
Hadoop Developer
Responsibilities:
- Actively participated with the development team to meet the specific customer requirements and proposed effective Hadoop solutions
- Structured data was ingested onto the data lake using Sqoop jobs and scheduled using Oozie workflow from the RDBMS data sources for the incremental data.
- Streaming Data (Time Series Data) was ingested into the data lake using Flume.
- Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
- Developed Map Reduce programs using Java programming language that are implemented on the Hadoop cluster.
- Used Avro data serialization system with Avro tools to handle Avro data files using Map reduce programs.
- Implemented Data Validation using map reduce programs to remove un-necessary records before move data into Hive tables.
- Implemented UDFS, UDAFS, UDTFS in java for hive to process the data that can’t be performed using Hive inbuilt functions
- Implemented optimized map side joins to get data from different data sources, cleaning data .
- Designed and implemented custom writable, custom input formats, custom partitions and custom comparators.
- Involved in creating Hive Internal/External tables, loading with data and troubleshoot with Hive jobs.
- Used the RegEx, JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data and implemented Hive custom UDF’s.
- Experienced in Using Hive ORC formats for better columnar format, compression and processing.
- Wrote pig scripts for advanced analytics on the data for recommendations.
- Gained profound knowledge of using Cassandra.
- Processed the source data to structured data and store in NoSQL database Cassandra.
- Worked with the performance testing team to optimize the Cassandra cluster by making certain changes in Cassandra yaml configuration file and some Linux OS configurations
- Involved in converting business transformations into Spark RDDs using Scala.
- Involved in integrating hive queries into spark environment using SparkSql.
- Computing the complex logics and controlling the Data flow through In-memory process tool Apache Spark.
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
- Implemented messaging system for different data sources using apache Kafka and configuring High level consumers for online and off-line processing.
- Experienced in configuring work flows using Oozie.
- Involved in deploying multi module applications using Maven and Jenkins.
- Experienced in working in agile environment and on-site/offshore co-ordination.
Environment: Hadoop Framework, MapReduce, Hive, Sqoop, Pig, HBase, Cassandra, Apache Kafka, Storm, Flume, Oozie, Maven, Jenkins, Java(JDK1.6), UNIX Shell Scripting, Oracle 11g/12g.
Confidential, Weldon Springs, MO
Hadoop Developer
Responsibilities:
- Participate in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.
- Exported the analyzed data to the relational databases (MySQL, Oracle) using Sqoop from HDFS.
- Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest data into HDFS for analysis.
- Used FLUME to dump the application server logs into HDFS.
- Responsible for writing MapReduce jobs to handle files in multiple formats (JSON, Text, XML etc..).
- Extensively worked on creating combiners, Partitioning, Distributed cache to improve the performance of MapReducejobs.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Worked with file formats TEXT, AVRO, PARQUET and SEQUENCE files.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Devised schemes to collect and store large data in HDFS and also worked on compressing the data using various formats to achieve optimal storage capacity.
- Designed a data warehouse using Hive.
- Created partitioned tables in Hive for best performance and faster querying.
- Good experience in Hive partitioning, bucketing and peform different types of joins on Hive tables and implementing Hive serdes like REGEX, JSON and Avro.
- Good Knowledge of Apache Spark and Scala.
- Cluster coordination services through Zoo Keeper.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Experience in working with NoSQL database HBase in getting real time data analytics.
- Implemented test scripts to support test driven development and continuous integration.
Environment: Hadoop, HDFS, Pig, Sqoop, HBase, Hive,Zoo Keeper, NoSQL databases, Oozie, Shell Scripting, Java (JDK 1.6), Hadoop Distribution Cloudera, MapReduce, PL/SQL.
Confidential, Minneapolis MN
Java/Hadoop Developer
Responsibilities:
- Designed and implemented Java engine and API to perform direct calls from front-end JavaScript (ExtJS) to server-side Java methods (ExtDirect).
- Developed interfaces and their implementation classes to communicate with the mid-tier (services) using JMS. Technically, it is a 3-tier client server application, where GUI tier interacts with Java middle-tier custom library and queries an Oracle 10g database using Hibernate.
- Implemented Restful services for Account Summary and workable list etc for Reports Decouple support.
- Developed and consumed SOAP Web services using JBoss ESB framework
- Designed and developed Java batch programs in Spring Batch.
- Involved in write database schema Through Hibernate.
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Experienced in managing and reviewing Hadoop log files
- Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
- Tuning of MapReduce configurations to optimize the run time of jobs.
- For automating the cluster installation developed Shell Scripts.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Involved in loading data from UNIX file system to HDFS.
- Created java operators to process data using DAG streams and load data to HDFS.
- Developed custom Input Formats to implement custom record readers for different datasets.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
- Automated all jobs for pulling data from FTP server to load data into Hive tables using Oozie workflow.
- Experienced in using Java Rest API to perform CURD operations on HBase data.
- Create & deploy web services using REST framework.
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
- Developed Unit test cases using Junit and MRUnit testing frameworks.
Environment: Hadoop, MapReduce, HDFS, Hbase, Hive, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, Shell Scripts, Java JDK 1.6, Eclipse.
Confidential
Java Developer
Responsibilities:
- Involved in gathering system requirements for the application and worked with the business team to review the requirements, and went through the Software Requirement Specification document and Architecture document
- Involved in intense User Interface (UI) operations and client side validations using AJAX toolkit.
- Responsible for extracting the data by Screen Scraping and also responsible for consuming the web services using Apache CXF.
- Used Axis and SOAP to expose company applications as a Web Service to outside clients.
- Log package is used for the debugging.
- Used Clear Case for version control.
- Ensuring adherence to delivery schedules and quality process on projects.
- Used Web Services for creating rate summary and used WSDL and SOAP messages for getting insurance plans from different module and used XML parsers for data retrieval.
- Developed business components and integrated those using Spring features such as Dependency Injection, Auto wiring components such as DAO layers and service proxy layers.
- Used Spring AOP to implement Distributed declarative transaction throughout the application.
- Wrote Hibernate configuration XML files to manage data persistence.
- Used TOAD to generate SQL queries for the applications, and to see the reports from log tables.
- Involved in migration of Data from Excel, Flat file, Oracle, XML files to SQL Server by using BCP and DTS utility.
Environment: Java/J2EE, MVC Arch with CICS interaction, HTML,Axis, SOAP, Servlets, Web services, Restful web Services, Sybase, Spring, DB2, RAD, Apache CXF, Rational Clear case, WCF, AJAX, Toad, Bcp, Dts.
Confidential
Java Developer
Responsibilities:
- Coded front end components using HTML, JavaScript and jQuery, Back End components using Java, spring, Hibernate, Services Oriented components using Restful and SOAP based web services, and Rules based components using JBoss Drools.
- Developed presentation layer using JSP, HTML and CSS and JQuery.
- Developed JSP custom tags for front end.
- Written Java script code for Input Validation.
- Used Apache CXF open source tool to generate java stubs form WSDL.
- Developed and consumed SOAP Web services using JBoss ESB framework
- Developed the Web Services Client using SOAP, WSDL description to verify the credit history of the new customer to provide a connection.
- Developed RESTful Web services within ESB framework and used Content based Routing to route to ESB's
- Designed and developed Java batch programs in Spring Batch.
- Test Driven development is done by maintaining the Junit and FlexUnit test cases throughout the application.
- Developed stand-alone Java batch applications with spring and Hibernate.
- Involved in write database schema Through Hibernate.
- Designed and developed DAO layer with Hibernate standards, to access data from IBM DB2.
- Developed the UI panels using JSF, XHTML, CSS, DOJO and JQuery.
Environment: Java 6 - JDK 1.6, JEE, Spring 3.1 framework, Spring Model View Controller (MVC), Java Server Pages (JSP) 2.0, Servlets 3.0, JDBC4.0, AJAX, Web services, Rest full, JSON, Java Beans, JQuery, JavaScript, Oracle 10g, JUnit, HTML Unit, XSLT, HTML/DHTML.
Confidential
Java Developer
Responsibilities:
- Developed the application using Struts Framework that leverages classical Model View Layer (MVC) architecture UML diagrams like use cases, class diagrams, interaction diagrams (sequence and collaboration) and activity diagrams were used
- Extensively used Core Java, Servlets, JSP and XML
- Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 9i database
- Implemented Enterprise Logging service using JMS and apache CXF.
- Developed Unit Test Cases, and used JUNIT for unit testing of the application
- Implemented Framework Component to consume ELS service.
- Implemented JMS producer and Consumer using Mule ESB.
- Wrote SQL queries, stored procedures, and triggers to perform back-end database operations
- Sending Email Alerts to supporting team using BMC msend
- Designed Low Level design documents for ELS Service.
- Closely worked with QA, Business and Architect to solve various Defects in quick and fast to meet deadlines
- Consumed SOAP web services in ordering Home and Auto prequotes from Express application.
- Developed a SOAP web service to handle Farmers Fast Quotes which is invoking eCMS for account creation for all input leads.
- Performed various CRUD operation using RESTful web services.
- Followed Spring MVC pattern in developing the framework for the RTA application.
- Responsible for the oracle schema design, generating various POJO objects and generating their corresponding Hibernate mappings (.hbm) files.
Environment: Java, Spring core, JMS Web services, JMS, JDK, SVN,, Mule ESB Mule, Junit,WAS7,Jquery, Ajax, SAX, Hibernate, ORM tool, HQL,Restful web services,eCMS.