Spark Consultant Resume
Minneapolis, MN
SUMMARY
- Around 9+ years of strong experience in software development using BigData, Hadoop, Apache Spark Java/J2EE, Scala, Python technologies.
- Solid Mathematics, Probability and Statistics foundation and broad practical statistical and data mining techniques cultivated through various industry work and academic programs
- Involved in teh Software Development Life Cycle (SDLC)phases which include Analysis, Design, Implementation, Testing and Maintenance.
- Strong technical, administration, &mentoring knowledge in Linux and Bigdata/Hadoop technologies.
- Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Pentaho, Hbase, Zookeeper, Sqoop, Oozie, Cassandra, Flume and Avro.
- Experienced teh deployment of Hadoop Cluster using Puppet tool
- Work experience wif cloud infrastructure like Amazon Web Services (AWS).
- Experience in importing and exporting teh data using Sqoop from HDFS to Relational Database systems/mainframe and vice - versa
- Installing, configuring and managing of Hadoop Clusters and Data Science tools.
- Managing teh Hadoop distribution wif Cloudera Manager, Cloudera Navigator, Hue.
- Setting up teh High-Availability for Hadoop Clusters components and Edge nodes.
- Experience in developing Confidential scripts and Python Scripts for system management.
- Well versed in using Software development methodologies like Rapid Application Development (RAD), Agile Methodology and Scrum software development processes.
- Experience wif Object Oriented Analysis and Design (OOAD)methodologies.
- Experience in installations of software, writing test cases, debugging, and testing of batch and online systems.
- Experience in Production, quality assurance (QA), SIT (System Integration testing) and user acceptance (UA) testing.
- Expertise in J2EEtechnologies like JSP, Servlets, EJBs 2.0, JDBC, JNDI and AJAX.
- Extensively worked on implementing SOA (Service Oriented Architecture) using XML Web services (SOAP, WSDL, UDDI and XML Parsers).
- Worked wif XML parsers like JAXP (SAX and DOM)and JAXB.
- Expertise in applying Java Messaging Service (JMS)for reliable information exchange across Java applications.
- Proficient wif Core Java, AWT and also wif teh markup languages like HTML 5.0,XHTML, DHTML, CSS, XML 1.1, XSL, XSLT, XPath, XQuery, Angular.js, Node.js
- Worked wif version control systems like Subversion, Perforce, and GIT for providing common platform for all teh developers.
- Articulate in written and verbal communication along wif strong interpersonal, analytical, and organizational skills.
- Highly motivated team player wif teh ability to work independently and adapt quickly to new and emerging technologies.
- Creatively communicate and present models to business customers and executives, utilizing a variety of formats and visualization methodologies.
TECHNICAL SKILLS
Languages: C,C++,Java 6,Java 7, Java 8, Scala.
Big Data Skills: Map reduce, Hadoop,Spark,Kafka,Storm
Web Technologies: HTML5 JavaScript, Ajax, CSS, JQuery, XML,BootStrap.
Servers: WebSphere, Tomcat 6.x,MIIS (Microsoft Internet Information Server)
Case Tools and IDE: Eclipse, NetBeans, RAD,IntelliJ,Netezza.
Frameworks in Hadoop: Spark, Kafka, Storm
Databases: DB2, Oracle and MySQL Server
Version Tools: SVN, CVS, ClearCase
Web Services: SOAP, REST
PROFESSIONAL EXPERIENCE
Confidential, Minneapolis, MN
Spark Consultant
Environment: Apache Spark, Pyspark, Spark Streaming, Spark SQL, Apache Kafka, Apache Flume, Python Pandas, Cassandra,, Hortonworks (HDP) 2.3, AWS, Scala, AKKA, Hive, PIG, Big Data.
Responsibilities:
- Performed hands-on data manipulation, transformation, hypothesis testing and predictive modeling Developed robust set of codes dat are tested, automated, structured and efficient
- Evaluate, refine, and continuously improve teh efficiency and accuracy of existing Predictive Models using Netezza.
- Involved in developing several Data Pipelines Using Apache Kafka.
- Extensively worked wif logs aggregation by using Apache Kafka.
- Ingesting teh data into teh Spark Streaming Jobs from different sources like Apache Kafka,Flume,HDFS.
- Extensively worked wif all kinds of data Un-Structured, Semi-Structured
- Collaborated on insights wif other Data Scientists, Business Analysts, and Partners.
- Evaluate, refine, and continuously improve teh efficiency and accuracy of existing Predictive Models.
- Utilized various data analysis and data visualization tools to accomplish data analysis, report design and report delivery.
- Developed Scala and SQL code to extract data from various databases
- Champion new innovative ideas around teh Data Science and Advanced Analytics practices
- Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.
- Uploaded data to Hadoop Hive and combined new tables wif existing databases.
- Developed statistical models to forecast inventory and procurement cycles.8
- Developed Python code to provide data analysis and generate complex data report.
- Deployed teh Cassandra cluster in cloud (Amazon AWS) environment wif scalable nodes as per teh business requirement.
- Implemented teh data backup strategies for teh data in teh Cassandra cluster.
- Generated teh data cubes using hive, Pig, JAVA Map-Reducing on provisioning Hadoop cluster in AWS.
- Implemented teh ETL design to dump teh Map-Reduce data cubes to Cassandra cluster.
- Imported teh data from relational databases into HDFS using Sqoop.
- Implemented POC for using APACHE IMPALA for data processing on top of HIVE.
- Utilized Python Panda Frame to provide data analysis.
- Worked on Horton works 2.3 distribution
- Utilized Python regular expressions operation (NLP) to analysis customer review.
- Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores, NOSQL, Hadoop, PIG, MySQL and Oracle databases
Confidential, Southborough,MA
Big Data Engineer
Environment: Hortonworks (HDP 2.2), HDFS, Batch-Processing,MapReduce, Apache Cassandra,YARN, Spark, Scala,Hive, Pig, Flume, Sqoop, Chef,Puppet,Python, Oozie, ZooKeeper, Ambari, Oracle Database, MySQL, HBase, SparkSQL, Avro, Parquet, RCFile, JSON, UDF, Play,Java (jdk1.7), Multi-Threading, Performance Tuning, CentOS
Responsibilities:
- Involved in architecture design, development and implementation of Hadoop deployment, backup and recovery systems.
- Experience in working on multi-Petabyte clusters both administration and development.
- Developed Chef modules to automate teh installation, configuration and deployment of ecosystem tools, OS's and network infrastructure at a cluster level.
- Performed advanced procedures like text analytics and processing, using teh in-memory computing capabilities of Spark using Scala.
- Performed cluster co-ordination and assisted wif data capacity planning and node forecasting using ZooKeeper.
- Involved in performance Tuning of Hadoop clusters
- Implemented Hadoop framework to capture user navigation across teh application to validate teh user interface and provide analytic feedback/result to teh UI team.
- Executed custom interceptors for Flume to filter data and defined channel selectors to multiplex teh data into different sinks.
- Extracted data from Oracle SQL server and MySQL databases to HDFS using Sqoop.
- Optimized MapReduce jobs to use HDFS efficiently by using Gzip, LZO, Snappy compression techniques.
- Experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Created Hive tables to store teh processed results in a tabular format and written Hive scripts to transform and aggregate teh disparate data.
- Experience in using Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
- Responsible for cluster maintenance, rebalancing blocks, commissioning and decommissioning of nodes, monitoring and troubleshooting, manage and review data backups and log files.
- Driving teh application from development phase to production phase using Continuous Integration and Continuous Deployment (CICD) model using Chef, Maven and Jenkins.
- Develop Pentaho Kettle Graphs to cleanse and transform teh raw data into useful information and load it to a Kafka Queue (further loaded to HDFS) and Neo4j database for UI team to display it using teh Web application.
- Automated teh process for extraction of data from warehouses and weblogs into HIVE tables by developing workflows and coordinator jobs in Oozie.
- Scheduled snapshots of volumes for backup to find root cause analysis of failures and document bugs and fixes for downtimes and maintenance of cluster.
- Tune/Modify SQL for batch and online processes.
- Commissioning and decommissioning teh nodes.
- Manage cluster through performance tuning and enhancement.
Confidential, SFO, California
Big Data Engineer
Environment: Hadoop 0.20.2 - PIG, Hive, JAVA, AWS, AWS EMR,Cloudera manager, 30 Node cluster wif Linux-Ubuntu
Responsibilities:
- Worked wif teh business users to gather, define business requirements and analyze teh possible technical solutions.
- Hadoop installation, configuration of multiple nodes in Cloudera platform.
- Setup and optimize Standalone-System/Pseudo-Distributed/Distributed Clusters.
- Developed Simple to complex Map/reduce streaming jobs
- Analyzing data wif Hive, Pig and Hadoop Streaming.
- Build/Tune/Maintain Hive QL and Pig Scripts for reporting purpose.
- Handled importing of data from various data sources, performed transformations using Hive, Map/Reduce, loaded data into HDFS and Extracted teh data from MySQL into HDFS using Sqoop
- Analyzed teh data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
- Stored teh data in an Apache Cassandra Cluster
- Used Impala to query teh Hadoop data stored in HDFS.
- Manage and review Hadoop log files.
- Support/Troubleshoot Map/Reduce programs running on teh cluster
- Load data from Linux file system into HDFS.
- Install and configure Hive and write Hive UDFs.
- Create tables, load data, and write queries in Hive.
- Develop scripts to automate routine DBA tasks using Linux Confidential Scripts, Python
Confidential, NE
Big Data Engineer
Environment: Hortonworks Hadoop 2.0, EMP, Cloud Infrastructure (Amazon AWS), JAVA, Python, HBase, Hadoop Ecosystem, Linux,Scala,Play
Responsibilities:
- Worked wif teh business users to gather, define business requirements and analyze teh possible technical solutions.
- Developed job flows in Oozie to automate teh workflow for Pig and Hive jobs.
- Designed and built teh reporting application dat uses teh Spark SQL to fetch and generate reports on HBase table data.
- Extracted feeds from social media sites such as Facebook, Twitter using Python scripts.
- Implemented helper classes dat access HBase directly from Java using Java API.
- Integrated MapReduce wif HBase to import bulk amount of data into HBase using MapReduce programs.
- Experienced in converting ETL operations to Hadoop system using Pig Latin Operations, transformations and functions.
- Extracted teh needed data from server and into HDFS and bulk loaded teh cleaned data into HBase.
- Handled different time series data using HBase to store data and perform analytics based on time to improve queries retrieval time.
- Participated wif admins in installation and configuring Map Reduce, Hive and HDFS.
- Implemented CDH3 Hadoop cluster on CentOS, assisted wif performance tuning and monitoring.
- Used Hive to analyze data ingested into HBase and compute various metrics for reporting on teh dashboard.
- Managed and reviewed Hadoop log files.
- Worked on Scala Play framework for application development.
- Involved in review of functional and non-functional requirements.
Confidential, Houston, TX
Java/J2EE Developer
Environment: Java, J2EE, JSP 1.2, Garbage Collection, Multi-threading, Spring1.2,Hibernate 2.0, JSF1.2,EJB 1.2, IBM WebSphere, Servlets, JDBC, XML, XSLT, DOM, CSS, HTML, DHTML, SQL, Java Script, Log4J, ANT1.6, WSAD6.0, Oracle 9i,SOA, Web Services, WSDL.
Responsibilities:
- Actively participated in requirements gathering, analysis, design, and testing phases.
- Designed use case diagrams, class diagrams, and sequence diagrams as a part of Design Phase using Rational Rose.
- Developed teh entire application implementing MVC Architecture integrating JSF wif Hibernate and spring frameworks.
- Designed User Interface using Java Server Faces (JSF), Cascading Style Sheets (CSS), and XML.
- UsedJNDI to perform lookup services for teh various components of teh system.
- Developed teh Enterprise Java Beans (Stateless Session beans) to handle different transactions such as online funds transfer, bill payments to teh service providers.
- Developed deployment descriptors for teh EJB have to deploy on Web Sphere Application Server.
- ImplementedService Oriented Architecture (SOA)using JMS for sending and receiving messages while creating web services.
- DevelopedWeb Services for data transfer from client to server and vice versa using Apache Axis, SOAP, WSDL, and UDDI.
- Developed XML documents and generated XSL files for Payment Transaction and Reserve Transaction systems.
- Implemented various J2EE Design patterns like Singleton, Service Locator, Business Delegate, DAO, Transfer Object, and SOA.
Confidential
Java/J2EE Developer
Environment: Java, J2EE, JSP 1.2, Performance Tuning, Spring1.2,Hibernate 2.0, JSF1.2,EJB 1.2, IBM WebSphere6.0, Servlets, JDBC, XML, XSLT, DOM, CSS, HTML, DHTML, SQL, Java Script, Log4J, ANT1.6, WSAD6.0, Oracle 9i, Windows 2000
Responsibilities:
- Developed User Interfaces module usingJSP,JavaScript, DHTML and form beansfor presentation layer.
- Developed Servlets and Java Server Pages (JSP).
- Developed PL/SQLqueries, and wrote stored procedures andJDBC routines to generate reports based on client requirements.
- Enhancement of teh System according to teh customer requirements.
- Involved in teh customization of teh available functionalities of teh software for an NBFC (Non-BankingFinancialCompany).
- Involved in putting proper review processes and documentation for functionality development.
- Providing support and guidance for Production and Implementation Issues.
- Used Java Script validation in JSP.
- UsedHibernateframework to access teh data from back-end SQL Server database.
- Used AJAX (Asynchronous JavaScript and XML) to implement user friendly andefficient client interface.
- UsedMDBfor consuming messages from JMS queue/topic.
- Designed and developed Web Application usingStrutsFramework.
- ANT to compile and generate EAR, WAR, and JAR files.
- Created test case scenarios for Functional Testing and wrote Unit test cases wif JUnit.
- Responsible for Integration, unit testing, system testing and stress testing for all teh phases of project.
