SPARK DEVELOPER/ Big Data Developer Resume Corvallis, Oregon - Hire IT People

SUMMARY

IT experience of around 8 years which includes around 4 years of comprehensive experience as a Hadoop Developer.
In depth understanding of Hadoop Architecture and its components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce, MRV1 and MRV2(YARN) concepts.
Proficiency in BIG Data technologies such as MapReduce, Pig, Hive, ZooKeeper, HBase, Sqoop, Oozie, Flume and spark for data storage and analysis.
Good experience in working with various Hadoop distributions like Cloudera, Hortonworks and Apache Hadoop
Knowledge in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper and Flume.
Capable of creating real time data streaming solutions and batch style large scale distributed computing applications using Apache Spark, Spark Streaming, Kafka and Flume.
Experience in managing and reviewing Hadoop log files.
Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
Extensive hold over Hive and Pig core functionality by writing custom UDFs.
Experience in building, maintaining multiple Hadoop clusters (prod, dev. etc.,) of different sizes and configuration and setting up the rack topology for large clusters.
Experience in installation, configuration, supporting and managing - Cloudera’s Hadoop platform along with CDH3&4 clusters.
Experience in setting up monitoring infrastructure for Hadoop cluster using Nagios and Ganglia
Development expertise of the RDBMS likeORACLE,SYBASE,TERADATA,NETEZZA,MS SQL etc
Proficient in Java, J2ee, JDBC, Collections, Servlets, JSP, Struts, Spring, Hibernate, JAXB, JSON, XML, XSLT, XSD, JMS, WSDL, WADL, REST, SOAP Web services, CXF, Groovy, Grails, Jersey, Gradle and Eclipse Link .
Experience on Source control repositories like SVN and GITHUB.
Good knowledge in querying data from Cassandra for searching grouping and sorting.
Quality hands on UNIX commands and Deployment of Applications in Server.
Loaded the dataset into Hive for ETL (Extract, Transform and Load) operation.
Excellent problem solving skills, high analytical skills, good communication and interpersonal skills.
Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong
Experience working with Build tools like Maven and Ant.
Experienced in both Waterfall and Agile Development (SCRUM) methodologies
Effective leadership quality with good skills in strategy, business development, client management and project management.
Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.
Motivation, Initiative and Project Management Attributes.

TECHNICAL SKILLS

Languages: C, C++,Java/J2EE,Scala,Python,SQL, HiveQL, PIGLatin

Hadoop Ecosystem: HDFS, MapReduce,MRUnit, YARN, Hive, Pig, HBase, Impala, Zookeeper, Sqoop, Oozie, Apache Cassandra, scala, Flume, Spark, Apache ignite, Avro, AWS.

Web Technologies: Servlets, JSP, J2EE, JDK, JDBC

Framework: Core Spring, Spring DAO, Spring MVC, Hibernate

Web/Application Servers: Jetty, Apache Tomcat

Scripting Languages: JavaScript, jQuery, AJAX, JSTL, CSS

Markup Languages: HTML, XML

XML: DOM, SAX, DTD, XSD, SOAP, REST, JAXB, XSL, XSLT

Databases: Oracle, MySQL, MS SQL Server 2005, Derby, MS Access, Apache Cassandra

OS: MS-Windows 95/98/NT/2000/XP/7, Linux, Unix, Solaris 5.1

Methodologies: OOP, Agile, Scrum, Extreme Programming

Version Control Tools: SVN, CVS, Git

Tools: Eclipse Maven, ANT, JUnit, TestNG, Jenkins, Soap UI, Putty, Log4j, Bugzilla

ETL Tools: Ab Initio GDE 1.14/1.15/3.0.4 Co-Operating System 2.14, 2.15, 3.16

PROFESSIONAL EXPERIENCE

Confidential, Corvallis, Oregon

SPARK DEVELOPER/ Big Data Developer

RESPONSIBILITIES:

Strong experience in working with Spark-SQL, Data Frames and Pair RDD's.
Experienced in handling large dataframes using Partitions, Spark in-Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
Experience in writing SPARK UDF’s which is a feature of SPARK SQL which helps in code optimization.
Creation of clusters using Databricks and maintaining the performance
Actively involved in the SPARK tuning techniques by successfully caching the RDD’s and increase the number of executor’s per node etc
Experienced in working with different level’s of data compressions like JSON,PARQUET,SNAPPY etc and writing into S3 with the desired partitioning.
Creation of notebooks using Scala for component Test and System Integration tests
Successfull cretion on workflow’s using Databricks REST API’s for running the production data.
Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
Knowledge on Databricks airflow.
Experience in working with Cloudberry tool.
Succesfull creation of daily scheduled jobs using jenkins CLI and chron jobs.
Successfull creation of metric files(JSON format) for the kibana dashboards.
Worked on developing Unix Shell scripts to automate Spark-Sql
Spark transformations using Spark and Scala
Worked on creating SPARK jobs using SCALA
Experience in working with GitHub,GitBash,GitK etc
Experiece in working with Rally to provide the test case and test plans for the daily CI’s and also updating defects,user stories etc using RALLY API’s.
Experience in working with SPARK using SCALA using functionally and improving the performace.
Using Spark SQL to query the headers to learn about the composition of the data allowing us to compare data from various sources.
Preparation of flowcharts and diagrams to illustrate sequence of steps programs must follow and describe logical operatins involved.
Involved in the gathering of Business requirements and preparation of Information Template used for identifying data elements for future reporting needs.
Define project deliverables and critical target dates to be reflected in the project plan.
Using Amazon Redshift to load the data from cloud
Maintain documentation that supports system configuration, training and user experience.
Create playbooks for the data pipeline and wiki pages.
Responsible for providing solutions.

Environment: Databricks, SPARK, Scala, Python, AWS, wiki, Rally, SharePoint, Jenkins, GIT, Cloudberry, and Amazon RedShift.

Confidential, St.louis, Missouri

Spark / HADOOP DEVELOPER

RESPONSIBILITIES:

Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
Real time streaming the data using Spark with Kafka.
Responsible for building scalable distributed data solutions using Hadoop.
Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager
Upgrading the Hadoop Cluster from CDH3 to CDH4, setting up High availability Cluster and integrating HIVE with existing applications.
Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD’s.
Installed Oozie workflow engine to run multiple Hive and Pig jobs
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop.
Worked extensively with Sqoop for importing metadata from Oracle.
Configured Sqoop and developed scripts to extract data from MySQL into HDFS
Created Hbase tables to store various data formats of PII data coming from different portfolios
Helped with the sizing and performance tuning of the Cassandra cluster
Involved in the process of Cassandra data modelling and building efficient data structures.
Trained and mentored analyst and test team on Hadoop framework, HDFS, Map Reduce concepts, Hadoop Ecosystem
Responsible for architecting Hadoop clusters
Assist with the addition of Hadoop processing to the IT infrastructure
Perform data analysis using Hive and Pig

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Spark, Cloudera Manager, StormCassandra, Pig, Sqoop, PL/SQL, MySQL, Windows, Horton worksOozie, HBase

Confidential, Kansas City, Missouri

Spark / HADOOP DEVELOPER

RESPONSIBILITIES:

Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java and python for data cleaning and preprocessing.
Importing and exporting data into HDFS and Hive using Sqoop.
Explored Spark, Kafka along with other open source projects to create a realtime analytics framework
Develop wrapper using shell scripting for Hive, Pig, Sqoop, Scala jobs
Worked on developing Unix Shell scripts to automate Spark-Sql
Experienced in defining job flows.
Experienced in managing and reviewingHadooplog files.
Participated in development/implementation of Hortonworks environment.
Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
Load and transform large sets of structured, semi structured and unstructured data.
Responsible to manage data coming from different sources.
Involved in NOSQL databases like HBase, Apache CASSANDRA in implementing and integration.
Supported Map Reduce Programs those are running on the cluster.
Involved in loading data from UNIX file system to HDFS.
POC conducted for different use cases and documented on AWS Platform
Installed and configured Hive and also written Hive UDFs in java and python.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
Experience in Schema defining
Experience migrating MapReduce programs into Spark transformations using Spark and Scala
Writing Pig Latin scripts to process the data and also written UDF in java and python.
Wrote Map Reduce programs in Java to achieve the required Output.
Application performance optimization for Cassandra cluster.
Worked with the infrastructure and admin team in designing, modeling, sizing and configuring Hadoop cluster of 15 nodes on AWS EC2
Written Hive queries for data analysis to meet the Business requirements.
Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop. Cluster co-ordination through Zookeeper.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, Hadoop distribution of Hortonworks, Scala, AWS, SQL, PIG, Zookeeper, Scala, Spark, Sqoop, Flume, Teradata, CentOS, Servlets, JDBC, JSP,JSTL, JPA, Apache, JavaScript, Eclipse, CVS, CSS, Xml, JSon

Confidential - Santa Clara, CA

JAVA DEVELOPER

RESPONSIBILITIES:

Actively involved in Analysis, Detail Design, Development, System Testing and User Acceptance Testing.
Developing Intranet Web Application using J2EE architecture, using JSP to design the user interfaces, and JSP tag libraries to define custom tags and JDBC for database connectivity.
Implemented struts framework (MVC): developed ActionServlet, Action Form bean, configured the struts-config descriptor, implemented validator framework.
Extensively involved in database designing work with Oracle Database and building the application in J2EE Architecture.
Integrated messaging with MQSERIES classes for JMS, which provides XML message Based interface. In this application publish-and-subscribe model of JMS is used.
Developed the EJB-Session Bean that acts as Facade, will be able to access the business entities through their local home interfaces.
Evaluated and worked with EJB's Container Managed Persistent strategy.
Used Webservices - WSDL and SOAP for getting Loan information from third party and used SAX and DOM XML parsers for data retrieval
Experienced in writing the DTD for document exchange XML. Generating, parsing and displaying the XML in various formats using XSLT and CSS.
Used XPath 1.0 for selecting nodes and XQuery to extract and manipulate data from XML documents.
Coding, testing and deploying the web application using RAD 7.0 and Websphere Application Server 6.0.
Used JavaScript's for validating client side data.
Wrote unit tests for the implemented bean code using JUnit.
Extensively worked on UNIX Environment.
Data is exchanged in XML format, which helps in interoperability with other software applications.

Environment: Struts 2, Rational Rose, JMS, EJB, JSP, RAD 7.0, Websphere Application Server 6.0, XML parsers, XSL, XQuery, XPath 1.0, HTML, CSS, JavaScript, IBM MQSeries, ANT, JUnit, JDBC, Oracle, Unix, SVN.

Confidential

JAVA DEVELOPER

RESPONSIBILITIES:

Developed light weight business component and integrated applications using struts 1.2
Designed and developed front-end, middleware and back-end applications.
Optimized Server/client side validation.
Worked together with the team in helping transition from Oracle to DB2.
Developed the global logging module which was used across all the modules using Log4J components.
Developed the presentation layer for the credit enhancement module in JSP.
Struts 1.2 were used to implement the Model View Layer (MVC) architecture. Validations were done on the client side as well as the server side.
Involved in the configuration management using ClearCase.
Detecting and resolving errors/defects in the quality control environment.
Using Ibatis for mapping Java classes with database.
Involved in Code review and integration testing.
Used Debugging tools such as PMD, Find Bugs and checkstyle.

Environment: Java v1.6, J2EE 6, Struts 1.2, iBatis, XML, JSP, CSS, HTML, JAVASCRIPT, JQuery, Oracle 10g, DB2, Unix, RAD, ClearCase, WebSphere V8.0 (beta)

Confidential

Oracle PL/SQL DEVELOPER

RESPONSIBILITIES:

Extensively involved in business requirements analysis and translating them into warehouse design.
Assisted in designing the logical and physical data model using ERwin 3.x, identifying the Fact tables and Dimension tables.
Improved integrity by identifying and creating relationships between tables.
Created PL/SQL packages and procedures to build business rules to load data.
Performance optimization, procedure definition, queries tuning, database design.
Developed Shell Scripts to automate file manipulation.
Created and executed detailed test plans and scripts to verify software functionality and adherence to business requirements.
Development of user screens to facilitate business requirements.
Designed processes to extract and transform data from flat files and other relational sources (SQL Server, Oracle) to load into relational targets (Oracle).
Developed shell scripts for batch processing, environment set-up, invoking SQL*Loader and checking for file status.
Documented all the PL/SQL scripts involved in the process.

Environment: ERwin 3.0.1, Oracle 9i, Toad for Oracle, MS SQL Server, Microsoft VSS, Microsoft VB.Net, SQL*Loader, Windows 2000, UNIX.

We provide IT Staff Augmentation Services!

Spark Developer/ Big Data Developer Resume

Corvallis, OregoN

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship