We provide IT Staff Augmentation Services!

Spark Consultant Resume

0/5 (Submit Your Rating)

Minneapolis, MN

SUMMARY

  • Around 9+ years of strong experience in software development using BigData, Hadoop, Apache Spark Java/J2EE, Scala, Python technologies.
  • Solid Mathematics, Probability and Statistics foundation and broad practical statistical and data mining techniques cultivated through various industry work and academic programs
  • Involved in the Software Development Life Cycle (SDLC)phases which include Analysis, Design, Implementation, Testing and Maintenance.
  • Strong technical, administration, &mentoring knowledge in Linux and Bigdata/Hadoop technologies.
  • Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Pentaho, Hbase, Zookeeper, Sqoop, Oozie, Cassandra, Flume and Avro.
  • Experienced the deployment of Hadoop Cluster using Puppet tool
  • Work experience with cloud infrastructure like Amazon Web Services (AWS).
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice - versa
  • Installing, configuring and managing of Hadoop Clusters and Data Science tools.
  • Managing the Hadoop distribution with Cloudera Manager, Cloudera Navigator, Hue.
  • Setting up the High-Availability for Hadoop Clusters components and Edge nodes.
  • Experience in developing Confidential scripts and Python Scripts for system management.
  • Well versed in using Software development methodologies like Rapid Application Development (RAD), Agile Methodology and Scrum software development processes.
  • Experience with Object Oriented Analysis and Design (OOAD)methodologies.
  • Experience in installations of software, writing test cases, debugging, and testing of batch and online systems.
  • Experience in Production, quality assurance (QA), SIT (System Integration testing) and user acceptance (UA) testing.
  • Expertise in J2EEtechnologies like JSP, Servlets, EJBs 2.0, JDBC, JNDI and AJAX.
  • Extensively worked on implementing SOA (Service Oriented Architecture) using XML Web services (SOAP, WSDL, UDDI and XML Parsers).
  • Worked with XML parsers like JAXP (SAX and DOM)and JAXB.
  • Expertise in applying Java Messaging Service (JMS)for reliable information exchange across Java applications.
  • Proficient with Core Java, AWT and also with the markup languages like HTML 5.0,XHTML, DHTML, CSS, XML 1.1, XSL, XSLT, XPath, XQuery, Angular.js, Node.js
  • Worked with version control systems like Subversion, Perforce, and GIT for providing common platform for all the developers.
  • Articulate in written and verbal communication along with strong interpersonal, analytical, and organizational skills.
  • Highly motivated team player with the ability to work independently and adapt quickly to new and emerging technologies.
  • Creatively communicate and present models to business customers and executives, utilizing a variety of formats and visualization methodologies.

TECHNICAL SKILLS

Languages: C,C++,Java 6,Java 7, Java 8, Scala.

Big Data Skills: Map reduce, Hadoop,Spark,Kafka,Storm

Web Technologies: HTML5 JavaScript, Ajax, CSS, JQuery, XML,BootStrap.

Servers: WebSphere, Tomcat 6.x,MIIS (Microsoft Internet Information Server)

Case Tools and IDE: Eclipse, NetBeans, RAD,IntelliJ,Netezza.

Frameworks in Hadoop: Spark, Kafka, Storm

Databases: DB2, Oracle and MySQL Server

Version Tools: SVN, CVS, ClearCase

Web Services: SOAP, REST

PROFESSIONAL EXPERIENCE

Confidential, Minneapolis, MN

Spark Consultant

Environment: Apache Spark, Pyspark, Spark Streaming, Spark SQL, Apache Kafka, Apache Flume, Python Pandas, Cassandra,, Hortonworks (HDP) 2.3, AWS, Scala, AKKA, Hive, PIG, Big Data.

Responsibilities:

  • Performed hands-on data manipulation, transformation, hypothesis testing and predictive modeling Developed robust set of codes that are tested, automated, structured and efficient
  • Evaluate, refine, and continuously improve the efficiency and accuracy of existing Predictive Models using Netezza.
  • Involved in developing several Data Pipelines Using Apache Kafka.
  • Extensively worked with logs aggregation by using Apache Kafka.
  • Ingesting the data into the Spark Streaming Jobs from different sources like Apache Kafka,Flume,HDFS.
  • Extensively worked with all kinds of data Un-Structured, Semi-Structured
  • Collaborated on insights with other Data Scientists, Business Analysts, and Partners.
  • Evaluate, refine, and continuously improve the efficiency and accuracy of existing Predictive Models.
  • Utilized various data analysis and data visualization tools to accomplish data analysis, report design and report delivery.
  • Developed Scala and SQL code to extract data from various databases
  • Champion new innovative ideas around the Data Science and Advanced Analytics practices
  • Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.
  • Uploaded data to Hadoop Hive and combined new tables with existing databases.
  • Developed statistical models to forecast inventory and procurement cycles.8
  • Developed Python code to provide data analysis and generate complex data report.
  • Deployed the Cassandra cluster in cloud (Amazon AWS) environment with scalable nodes as per the business requirement.
  • Implemented the data backup strategies for the data in the Cassandra cluster.
  • Generated the data cubes using hive, Pig, JAVA Map-Reducing on provisioning Hadoop cluster in AWS.
  • Implemented the ETL design to dump the Map-Reduce data cubes to Cassandra cluster.
  • Imported the data from relational databases into HDFS using Sqoop.
  • Implemented POC for using APACHE IMPALA for data processing on top of HIVE.
  • Utilized Python Panda Frame to provide data analysis.
  • Worked on Horton works 2.3 distribution
  • Utilized Python regular expressions operation (NLP) to analysis customer review.
  • Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores, NOSQL, Hadoop, PIG, MySQL and Oracle databases

Confidential, Southborough,MA

Big Data Engineer

Environment: Hortonworks (HDP 2.2), HDFS, Batch-Processing,MapReduce, Apache Cassandra,YARN, Spark, Scala,Hive, Pig, Flume, Sqoop, Chef,Puppet,Python, Oozie, ZooKeeper, Ambari, Oracle Database, MySQL, HBase, SparkSQL, Avro, Parquet, RCFile, JSON, UDF, Play,Java (jdk1.7), Multi-Threading, Performance Tuning, CentOS

Responsibilities:

  • Involved in architecture design, development and implementation of Hadoop deployment, backup and recovery systems.
  • Experience in working on multi-Petabyte clusters both administration and development.
  • Developed Chef modules to automate the installation, configuration and deployment of ecosystem tools, OS's and network infrastructure at a cluster level.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Performed cluster co-ordination and assisted with data capacity planning and node forecasting using ZooKeeper.
  • Involved in performance Tuning of Hadoop clusters
  • Implemented Hadoop framework to capture user navigation across the application to validate the user interface and provide analytic feedback/result to the UI team.
  • Executed custom interceptors for Flume to filter data and defined channel selectors to multiplex the data into different sinks.
  • Extracted data from Oracle SQL server and MySQL databases to HDFS using Sqoop.
  • Optimized MapReduce jobs to use HDFS efficiently by using Gzip, LZO, Snappy compression techniques.
  • Experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
  • Created Hive tables to store the processed results in a tabular format and written Hive scripts to transform and aggregate the disparate data.
  • Experience in using Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
  • Responsible for cluster maintenance, rebalancing blocks, commissioning and decommissioning of nodes, monitoring and troubleshooting, manage and review data backups and log files.
  • Driving the application from development phase to production phase using Continuous Integration and Continuous Deployment (CICD) model using Chef, Maven and Jenkins.
  • Develop Pentaho Kettle Graphs to cleanse and transform the raw data into useful information and load it to a Kafka Queue (further loaded to HDFS) and Neo4j database for UI team to display it using the Web application.
  • Automated the process for extraction of data from warehouses and weblogs into HIVE tables by developing workflows and coordinator jobs in Oozie.
  • Scheduled snapshots of volumes for backup to find root cause analysis of failures and document bugs and fixes for downtimes and maintenance of cluster.
  • Tune/Modify SQL for batch and online processes.
  • Commissioning and decommissioning the nodes.
  • Manage cluster through performance tuning and enhancement.

Confidential, SFO, California

Big Data Engineer

Environment: Hadoop 0.20.2 - PIG, Hive, JAVA, AWS, AWS EMR,Cloudera manager, 30 Node cluster with Linux-Ubuntu

Responsibilities:

  • Worked with the business users to gather, define business requirements and analyze the possible technical solutions.
  • Hadoop installation, configuration of multiple nodes in Cloudera platform.
  • Setup and optimize Standalone-System/Pseudo-Distributed/Distributed Clusters.
  • Developed Simple to complex Map/reduce streaming jobs
  • Analyzing data with Hive, Pig and Hadoop Streaming.
  • Build/Tune/Maintain Hive QL and Pig Scripts for reporting purpose.
  • Handled importing of data from various data sources, performed transformations using Hive, Map/Reduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
  • Stored the data in an Apache Cassandra Cluster
  • Used Impala to query the Hadoop data stored in HDFS.
  • Manage and review Hadoop log files.
  • Support/Troubleshoot Map/Reduce programs running on the cluster
  • Load data from Linux file system into HDFS.
  • Install and configure Hive and write Hive UDFs.
  • Create tables, load data, and write queries in Hive.
  • Develop scripts to automate routine DBA tasks using Linux Confidential Scripts, Python

Confidential, NE

Big Data Engineer

Environment: Hortonworks Hadoop 2.0, EMP, Cloud Infrastructure (Amazon AWS), JAVA, Python, HBase, Hadoop Ecosystem, Linux,Scala,Play

Responsibilities:

  • Worked with the business users to gather, define business requirements and analyze the possible technical solutions.
  • Developed job flows in Oozie to automate the workflow for Pig and Hive jobs.
  • Designed and built the reporting application that uses the Spark SQL to fetch and generate reports on HBase table data.
  • Extracted feeds from social media sites such as Facebook, Twitter using Python scripts.
  • Implemented helper classes that access HBase directly from Java using Java API.
  • Integrated MapReduce with HBase to import bulk amount of data into HBase using MapReduce programs.
  • Experienced in converting ETL operations to Hadoop system using Pig Latin Operations, transformations and functions.
  • Extracted the needed data from server and into HDFS and bulk loaded the cleaned data into HBase.
  • Handled different time series data using HBase to store data and perform analytics based on time to improve queries retrieval time.
  • Participated with admins in installation and configuring Map Reduce, Hive and HDFS.
  • Implemented CDH3 Hadoop cluster on CentOS, assisted with performance tuning and monitoring.
  • Used Hive to analyze data ingested into HBase and compute various metrics for reporting on the dashboard.
  • Managed and reviewed Hadoop log files.
  • Worked on Scala Play framework for application development.
  • Involved in review of functional and non-functional requirements.

Confidential, Houston, TX

Java/J2EE Developer

Environment: Java, J2EE, JSP 1.2, Garbage Collection, Multi-threading, Spring1.2,Hibernate 2.0, JSF1.2,EJB 1.2, IBM WebSphere, Servlets, JDBC, XML, XSLT, DOM, CSS, HTML, DHTML, SQL, Java Script, Log4J, ANT1.6, WSAD6.0, Oracle 9i,SOA, Web Services, WSDL.

Responsibilities:

  • Actively participated in requirements gathering, analysis, design, and testing phases.
  • Designed use case diagrams, class diagrams, and sequence diagrams as a part of Design Phase using Rational Rose.
  • Developed the entire application implementing MVC Architecture integrating JSF with Hibernate and spring frameworks.
  • Designed User Interface using Java Server Faces (JSF), Cascading Style Sheets (CSS), and XML.
  • UsedJNDI to perform lookup services for the various components of the system.
  • Developed the Enterprise Java Beans (Stateless Session beans) to handle different transactions such as online funds transfer, bill payments to the service providers.
  • Developed deployment descriptors for the EJB have to deploy on Web Sphere Application Server.
  • ImplementedService Oriented Architecture (SOA)using JMS for sending and receiving messages while creating web services.
  • DevelopedWeb Services for data transfer from client to server and vice versa using Apache Axis, SOAP, WSDL, and UDDI.
  • Developed XML documents and generated XSL files for Payment Transaction and Reserve Transaction systems.
  • Implemented various J2EE Design patterns like Singleton, Service Locator, Business Delegate, DAO, Transfer Object, and SOA.

Confidential

Java/J2EE Developer

Environment: Java, J2EE, JSP 1.2, Performance Tuning, Spring1.2,Hibernate 2.0, JSF1.2,EJB 1.2, IBM WebSphere6.0, Servlets, JDBC, XML, XSLT, DOM, CSS, HTML, DHTML, SQL, Java Script, Log4J, ANT1.6, WSAD6.0, Oracle 9i, Windows 2000

Responsibilities:

  • Developed User Interfaces module usingJSP,JavaScript, DHTML and form beansfor presentation layer.
  • Developed Servlets and Java Server Pages (JSP).
  • Developed PL/SQLqueries, and wrote stored procedures andJDBC routines to generate reports based on client requirements.
  • Enhancement of the System according to the customer requirements.
  • Involved in the customization of the available functionalities of the software for an NBFC (Non-BankingFinancialCompany).
  • Involved in putting proper review processes and documentation for functionality development.
  • Providing support and guidance for Production and Implementation Issues.
  • Used Java Script validation in JSP.
  • UsedHibernateframework to access the data from back-end SQL Server database.
  • Used AJAX (Asynchronous JavaScript and XML) to implement user friendly andefficient client interface.
  • UsedMDBfor consuming messages from JMS queue/topic.
  • Designed and developed Web Application usingStrutsFramework.
  • ANT to compile and generate EAR, WAR, and JAR files.
  • Created test case scenarios for Functional Testing and wrote Unit test cases with JUnit.
  • Responsible for Integration, unit testing, system testing and stress testing for all the phases of project.

We'd love your feedback!