We provide IT Staff Augmentation Services!

Hadoop Spark Developer Resume

3.00/5 (Submit Your Rating)

Richmond, VA

SUMMARY:

  • Around 7+ years of strong experience in software development.
  • Around 5+ years of experience using BigData Ecosystems & Java.
  • Extensiveexperience in Apache Spark with Scala, Apache Solr, Python
  • Solid Mathematics, Probability and Statistics foundation and broad practical statistical and data mining techniques cultivated through various industry work and academic programs
  • Involved in the Software Development Life Cycle (SDLC)phaseswhich include Analysis, Design, Implementation, Testing and Maintenance.
  • Strong technical, administration, &mentoring knowledge in Linux and BigData/Hadoop technologies.
  • Have sound knowledge on In - Memory MEMSQL.
  • Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Pentaho, Hbase, Zookeeper, Sqoop, Oozie, Cassandra, Flume and Avro.
  • Experienced the deployment of Hadoop Cluster using Puppet tool
  • Work experience with cloud infrastructure like Amazon Web Services (AWS).
  • Experience in importing and exporting the data using SQOOP from HDFS to Relational Database systems/mainframe and vice-versa
  • Expertise in working with ETL Architects, Data Analysts and data modelers to translate business rules/requirements into conceptual, physical and logical dimensional models and worked with complex normalized and denormalized data models.
  • Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Installing, configuring and managing of Hadoop Clusters and Data Science tools.
  • Managing the Hadoop distribution with Cloudera Manager, Cloudera Navigator, Hue.
  • Setting up the High-Availability for Hadoop Clusters components and Edge nodes.
  • Strong experience in writing applications using python using different libraries like Pandas, NumPy, SciPy, Matpotlib etc
  • Experience in developing Shell scripts and Python Scripts for system management.
  • Well versed in using Software development methodologies like Rapid Application Development (RAD), Agile Methodology and Scrum software development processes.
  • Involved in converting Cassandra/Hive/SQL queries into Spark transformations using RDDs, and Scala
  • Analyzed the Cassandra/SQL scripts and designed the solution to implement using Scala
  • Experience with Object Oriented AnalysisandDesign (OOAD)methodologies.
  • Experience in installations of software, writing test cases, debugging, and testing of batch and online systems.
  • Experience in Production, quality assurance (QA), SIT (System Integration testing) and user acceptance (UA) testing.
  • Expertise in J2EEtechnologies like JSP, Servlets, EJBs 2.0, JDBC, JNDI and AJAX.
  • Extensively worked on implementing SOA (Service Oriented Architecture)usingXMLWeb services (SOAP, WSDL, UDDI and XML Parsers).
  • Used PIG Latin Scripts, join operations, custom user defined functions (UDF) to perform ETL operations
  • Experience in Performance Tuning and Debugging of existing ETL processes.
  • Worked with XML parserslikeJAXP (SAX and DOM)andJAXB.
  • Expertise in applying Java Messaging Service (JMS)for reliable information exchange across Java applications.
  • Proficient with Core JAVA, AWT and also with the markup languages likeHTML 5.0,XHTML,
  • Expert in understanding the data and designing/Implementing the enterprise platforms like Hadoop Data lake and Huge Data warehouses.
  • Worked with version control systems like Subversion, Perforce, and GITfor providing common platform for all the developers.
  • Articulate in written and verbal communication along with strong interpersonal, analytical, and organizational skills.
  • Hands on experience with Microsoft Azure Cloud services, Storage Accounts and Virtual Networks.
  • Good at manage hosting plans for Azure Infrastructure, implementing and deploying workloads on Azure virtual machines (VMs).
  • Highly motivated team player with the ability to work independently and adapt quickly to new and emerging technologies.
  • Creatively communicate and present models to business customers and executives, utilizing a variety of formats and visualization methodologies.

TECHNICAL SKILLS:

Languages: C,C++,Java 6,Java 7, Java 8, Scala.

Big Data Skills: Map reduce, Hadoop,Spark,Kafka,Storm

Web Technologies: HTML5 JavaScript, Ajax, CSS, JQuery, XML,BootStrap.

Servers: WebSphere, Tomcat 6.x,MIIS (Microsoft Internet Information Server)

Case Tools and IDE: Eclipse,NetBeans,RAD,IntelliJ,Netezza.

Frameworks in Hadoop: Spark, Kafka, Storm

Databases: DB2, Oracle and MySQL Server

Version Tools: SVN, CVS, ClearCases

Web Services: SOAP, REST

PROFESSIONAL EXPERIENCE:

Confidential, Richmond, VA

Hadoop Spark Developer

Responsibilities:

  • Migration of Oracle tables to the HDFS using SQOOP.
  • Designed and developed rich front end screens using JSF (Ice faces), JSP, Docker, CSS, HTML, Angular JS and JQuery.
  • Developed Managed beans and defined Navigation rules for the application using JSF.
  • Developed Angular JS 2.0 code and migrated pre-existing code to updated Angular JS 2.0 framework. Written a custom Sqoop class for the XMLTYPE datatype in the Oracle Databse
  • Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report to MQ Server using MQ Series.
  • Designed and implemented Apache Spark job which takes the Sequence File from th HDFS and migrate to the Hbase.
  • Motivated and assisted team of six memebers in reaching individual andd team goals for quality,productivity and revenue generation
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data
  • Used WebSphere Application Server Developer Tools for Eclipse (WDT) to create Java batch projects based on the Java Batch 1.0 standard (JSR 352) and submit them to a Liberty profile server
  • Involved and guide the team for the preparing the technical specification
  • Involved in Development,Build and Deployment Application
  • Implemented micro services in order to separate the tasks and not to have dependency on other Parallel ongoing tasks of same Application.
  • Developed various shellscripts and pythonscripts to automate Spark jobs and hive scripts.
  • Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra and configured Kafka to read and write messages from external programs.
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
  • Identifying opportunities to improve infrastructure that effectively and efficiently utilizes the Microsoft Azure Windows server 2008/2012/R2, Microsoft SQL Server, Microsoft Visual Studio, Windows PowerShell, Cloud infrastructure.
  • Deployed Azure IaaS virtual machines (VMs) and Cloud services (PaaS role instances) into secure VNets and subnets.
  • Developed Web service using Restful with Jersey, and implemented JAX-RS and also provided security-using SSL.
  • Creation Of HBase Tables and implemented Salting on the Hbase
  • Migration of tables from oracle to Hbase on the Tenant basis
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Design and develop ETL code using Informatica Mappings to load data from heterogeneous Source systems like flat files, XML’s, MS Access files, Oracle to target system Oracle under Stage, then to data warehouse and then to Data Mart tables for reporting.
  • Developed ETL with SCD’s, caches, complex joins with optimized SQL queries.
  • Written Programs in Spark using Scala and Python for Data quality check.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark
  • One time migraion of 300 Billions of records using apache Spark BulkLoading
  • Write ETL jobs using PIG Latin and Worked on tuning the performance of HIVE queries.
  • Involved in Delta Migration using Sqoop Incremental Updates
  • Using shell scripts to perform ETL process to call sql or pmcmd commands, pre-post ETL process like file validation, zipping, massaging and archiving the source and target files, and used UNIX scripting to manage the file systems.
  • Maintained, structured, and surveyed documents within the NoSQL MongoDB database; ensuring data integrity, correcting anomalies, and increasing the overall maintainability of the database.
  • Involved in architecturing the Data Pipe-Line for the Add/Update flow of real-time analysis flow for the IMS Application
  • Achitectured the whole design for the data migration process and also for the real time analysis.
  • Written Apache Spark Jobs using Scala API
  • Involved in administrating and configuring the mapr distribution
  • Implemented Mapr Streams (KAFKA 0.9 API) to the Spark Streaming using Java API.

Environment: Apache Spark, Spark Streaming, Spark SQL, ETL, HadoopSecurity,Mapr Streams, Mapr 5.1,Open-Shift,Scala,Java,Hbase,Eclipse,MVN,Sequence Files.

Confidential, Nashville, TN

Hadoop Developer

Responsibilities:

  • Performed hands-on data manipulation, transformation, hypothesis testing and predictive modelingDeveloped robust set of codes that are tested, automated, structured and efficient
  • Defined service layer using EJB3.0 and also defined remote and local services.
  • Accessed remote and local EJB services from controller.
  • Developed application using JSP, Tag libraries, JSF and Struts (MVC) Framework.
  • Exposed web services to client developing WSDL also involved in developing web client for application interactions.
  • Evaluate, refine, and continuously improve the efficiency and accuracy of existing Predictive Models using Netezza.
  • Involved in developing several Data Pipelines using Apache Kafka.
  • Developed Framework API for Tax calculations in Yoda using server-side components using J2EE and spring framework.
  • Designed, developed and implemented a messaging module using Java Messaging Service (JMS) point-to-point messaging and Message Driven Beans to listen to the messages in the queue for interactions with client ordering data. Collaborated on insights with other Data Scientists, Business Analysts, and Partners.
  • Evaluate, refine, and continuously improve the efficiency and accuracy of existing Predictive Models.
  • Utilized various data analysis and data visualization tools to accomplish data analysis, report design and report delivery.
  • Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report to MQ Server using MQ Series.
  • Worked on POC’s with Apache Spark using Scala to implement spark in project.
  • Consumed the data from Kafka using Apache spark.
  • Worked with Apache Spark which provides fast and general engine for large data processing integrated with functional programming language Scala.
  • Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
  • Parsed high-level design specification to simple ETL coding and mapping standards. nvolved in building the ETL architecture and Source to Target mapping to load data into Data warehouse.
  • Importing and exporting the data from HDFS to RDBMS using Sqoop and Kafka.
  • Worked on creating Custom Azure Templates for quick deployments and advanced PowerShell scripting.
  • Contribute to the support forums (specific to Azure Networking, Azure Virtual Machines, Azure Active Directory, Azure Storage) for Microsoft Developers Network.
  • Populated HDFS and Cassandra with huge amounts of data using ApacheKafka.
  • Developed Scala and SQL code to extract data from various databases
  • Champion new innovative ideas around the Data Science and Advanced Analytics practices
  • Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.
  • Designed, developed and implemented a messaging module using Java Messaging Service (JMS) point-to-point messaging and Message Driven Beans to listen to the messages in the queue for interactions with client ordering data.
  • Uploaded data to Hadoop Hive and combined new tables with existing databases.
  • Developed statistical models to forecast inventory and procurement cycles.8
  • Reviewing ETL application use cases before on boarding to Hadoop
  • Perform ETL on data from Dev and QA box for staging it for micro batching to Spark
  • Developed Python code to provide data analysis and generate complex data report.
  • Deployed the Cassandra cluster in cloud (Amazon AWS) environment with scalable nodes as per the business requirement.
  • Implemented the data backup strategies for the data in the Cassandra cluster.
  • Generated the data cubes using hive, Pig, JAVA Map-Reducing on provisioning Hadoop cluster in AWS.
  • Implemented the ETL design to dump the Map-Reduce data cubes to Cassandra cluster.
  • Imported the data from relational databases into HDFS using SQOOP.
  • Implemented POC for using APACHE IMPALA for data processing on top of HIVE.
  • Utilized Python Panda Frame to provide data analysis.
  • Worked on Hortonworks 2.3distribution
  • Utilized Python regular expressions operation (NLP) to analysis customer review.
  • Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores, NOSQL, Hadoop, PIG,MySQLandOracle databases

Environment: Apache Spark, Pyspark, Spark Streaming, Spark SQL,ETL, Scala, Apache Kafka, Apache Flume, Python Pandas, Cassandra, Hortonworks (HDP) 2.3,AWS,Scala,AKKA,Hive,PIG,Big Data.

Confidential, Southborough,MA

Hadoop Developer-Java

Responsibilities:

  • Involved in architecture design, development and implementation of Hadoop deployment, backup and recovery systems.
  • Experience in working on multi-Petabyte clusters both administration and development.
  • Developed Chefmodules to automate the installation, configuration and deployment of ecosystem tools, OS's and network infrastructure at a cluster level.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Performed cluster co-ordination and assisted with data capacity planning and node forecasting using Zookeeper.
  • Developed Fault-tolerant data warehouse cluster by using Amazon S3 and monitoring we nodes and automatic replication was developed by using Amazon Redshift
  • Involved in performance Tuning of Hadoop clusters
  • Implemented Hadoop framework to capture user navigation across the application to validate the user interface and provide analytic feedback/result to the UI team.
  • Executed custom interceptors for Flume to filter data and defined channel selectors to multiplex the data into different sinks.
  • Analyzed the SQL scripts and designed the solution to implement using Pyspark
  • Extracted data from Oracle SQL server and MySQL databases to HDFS using Sqoop.
  • Optimized MapReduce jobs to use HDFS efficiently by using Gzip, LZO, Snappy compression techniques.
  • Experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
  • Created Hive tables to store the processed results in a tabular format and written Hive scripts to transform and aggregate the disparate data.
  • Improve search results using SOLR and customize lucence/Solr code.
  • Involved on Lucence, Solr and lead the index and search related development work.
  • Query huge data quickly near real time while feeding log file into solr.
  • Worked with the team to improve and ranking of search results using SOLR
  • Experience in using Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
  • Responsible for cluster maintenance, rebalancing blocks, commissioning and decommissioning of nodes, monitoring and troubleshooting, manage and review data backups and log files.
  • Driving the application from development phase to production phase using Continuous Integration and Continuous Deployment (CICD) model using Chef, Maven and Jenkins.
  • Develop Pentaho Kettle Graphs to cleanse and transform the raw data into useful information and load it to a Kafka Queue (further loaded to HDFS) and Neo4j database for UI team to display it using the Web application.
  • Automated the process for extraction of data from warehouses and weblogs into HIVE tables by developing workflows and coordinator jobs in Oozie.
  • Scheduled snapshots of volumes for backup to find root cause analysis of failures and document bugs and fixes for downtimes and maintenance of cluster.
  • Tune/Modify SQL for batch and online processes.
  • Commissioning and decommissioning the nodes.
  • Manage cluster through performance tuning and enhancement.

Environment: Hortonworks (HDP 2.2), HDFS, Batch-Processing,MapReduce, Apache Cassandra,Apache Solr, YARN, Spark, Scala,Hive, Pig, Flume, Sqoop, Chef,Puppet,Python, Oozie, ZooKeeper, Ambari, Oracle Database, MySQL, HBase, SparkSQL, AWS Redshift, Avro, Parquet, RCFile, JSON, UDF, Java (jdk1.7), Multi-Threading, Performance Tuning, CentOS

Confidential, Malvern, PA

Big Data Engineer

Responsibilities:

  • Worked with the business users to gather, define business requirements and analyze the possible technical solutions.
  • Hadoop installation, configuration of multiple nodes in Amazon EMR platform.
  • Setup and optimize Standalone-System/Pseudo-Distributed/Distributed Clusters.
  • Developed Simple to complex Map/reduce streaming jobs
  • Analyzing data with Hive, Pig and Hadoop Streaming.
  • Build/Tune/Maintain Hive QL and Pig Scripts for reporting purpose.
  • Handled importing of data from various data sources, performed transformations using Hive,Map/Reduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Processed Data across a Hadoop cluster of virtual servers on the Amazon Elastic Computer Cloud(EC2) by using Amazon EMR
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
  • Stored the data in an Apache Cassandra Cluster
  • Used Impala to query the Hadoop data stored in HDFS.
  • Manage and review Hadoop log files.
  • Support/Troubleshoot Map/Reduce programs running on the cluster
  • Load data from Linux file system into HDFS.
  • Install and configure Hive and write Hive UDFs.
  • Create tables, load data, and write queries in Hive.
  • Develop scripts to automate routine DBA tasks using Linux Shell Scripts, Python

Environment: Hadoop 0.20.2 - PIG, Hive, JAVA,AWS, AWS EMR,Cloudera manager, 30 Node cluster with Linux-Ubuntu

Confidential, NE

Jr. Java Developer

Responsibilities:

  • Worked with the business users to gather, define business requirements and analyze the possible technical solutions.
  • Developed the Controllers, Service Layer, Dao layer using Spring MVC, Spring JDBC.
  • Performed unit testing Using JUnit and JUnit annotations.
  • Implemented Restful Web service.
  • Configured the Transaction Management for the project using Spring Container Managed Transactions.
  • Developed web interface to display the customer information from the database tables
  • Created HTML, CSS, JavaScript, DHTML pages for Presentation Layer.
  • Experience in doing validation on the UI from one screen to other using JavaScript.
  • Used Annotations for JSR 303, spring. The complete application was developed using annotations.
  • Used UNIX and Linux commands for debugging.
  • Involved in resolving SSO login issue.
  • Involved in Daily Stand Up Meetings, Sprint Planning and Backlog Grooming for Agile Scrum Process.

Confidential

Jr. Java/J2EE Developer

Responsibilities:

  • Developed User Interfaces module using JSP, Java Script, DHTML and form beans for presentation layer.
  • Developed Servlets and Java Server Pages (JSP).
  • Developed PL/SQLqueries, and wrote stored procedures and JDBC routines to generate reports based on client requirements.
  • Enhancement of the System according to the customer requirements.
  • Involved in the customization of the available functionalities of the software for an NBFC (Non-Banking Financial Company).
  • Involved in putting proper review processes and documentation for functionality development.
  • Providing support and guidance for Production and Implementation Issues.
  • Used Java Script validation inJSP.
  • Used Hibernate framework to access the data from back-end SQL Server database.
  • Used AJAX (Asynchronous JavaScript and XML) to implement user friendly and efficient client interface.
  • Used MDB for consuming messages from JMS queue/topic.
  • Designed and developed Web Application using Struts Framework.
  • ANT to compile and generate EAR, WAR, and JAR files.
  • Created tes1t case scenarios for Functional Testing and wrote Unit test cases with JUnit.
  • Responsible for Integration, unit testing, systemtesting and stress testing for all the phases of project.

Environment: Java, J2EE, JSP 1.2, Performance Tuning,Spring1.2,Hibernate 2.0, JSF1.2,EJB 1.2, IBM WebSphere6.0, Servlets, JDBC, XML, XSLT, DOM, CSS, HTML, DHTML, SQL, Java Script, Log4J, ANT1.6, WSAD6.0, Oracle 9i, Windows 2000.

We'd love your feedback!