Spark Developer/ Big Data Developer Resume
Corvallis, OregoN
SUMMARY
- IT experience of around 8 years which includes around 4 years of comprehensive experience as a Hadoop Developer.
- In depth understanding of Hadoop Architecture and its components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce, MRV1 and MRV2(YARN) concepts.
- Proficiency in BIG Data technologies such as MapReduce, Pig, Hive, ZooKeeper, HBase, Sqoop, Oozie, Flume and spark for data storage and analysis.
- Good experience in working with various Hadoop distributions like Cloudera, Hortonworks and Apache Hadoop
- Knowledge in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper and Flume.
- Capable of creating real time data streaming solutions and batch style large scale distributed computing applications using Apache Spark, Spark Streaming, Kafka and Flume.
- Experience in managing and reviewing Hadoop log files.
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Extensive hold over Hive and Pig core functionality by writing custom UDFs.
- Experience in building, maintaining multiple Hadoop clusters (prod, dev. etc.,) of different sizes and configuration and setting up the rack topology for large clusters.
- Experience in installation, configuration, supporting and managing - Cloudera’s Hadoop platform along with CDH3&4 clusters.
- Experience in setting up monitoring infrastructure for Hadoop cluster using Nagios and Ganglia
- Development expertise of the RDBMS likeORACLE,SYBASE,TERADATA,NETEZZA,MS SQL etc
- Proficient in Java, J2ee, JDBC, Collections, Servlets, JSP, Struts, Spring, Hibernate, JAXB, JSON, XML, XSLT, XSD, JMS, WSDL, WADL, REST, SOAP Web services, CXF, Groovy, Grails, Jersey, Gradle and Eclipse Link .
- Experience on Source control repositories like SVN and GITHUB.
- Good knowledge in querying data from Cassandra for searching grouping and sorting.
- Quality hands on UNIX commands and Deployment of Applications in Server.
- Loaded the dataset into Hive for ETL (Extract, Transform and Load) operation.
- Excellent problem solving skills, high analytical skills, good communication and interpersonal skills.
- Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong
- Experience working with Build tools like Maven and Ant.
- Experienced in both Waterfall and Agile Development (SCRUM) methodologies
- Effective leadership quality with good skills in strategy, business development, client management and project management.
- Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.
- Motivation, Initiative and Project Management Attributes.
TECHNICAL SKILLS
Languages: C, C++,Java/J2EE,Scala,Python,SQL, HiveQL, PIGLatin
Hadoop Ecosystem: HDFS, MapReduce,MRUnit, YARN, Hive, Pig, HBase, Impala, Zookeeper, Sqoop, Oozie, Apache Cassandra, scala, Flume, Spark, Apache ignite, Avro, AWS.
Web Technologies: Servlets, JSP, J2EE, JDK, JDBC
Framework: Core Spring, Spring DAO, Spring MVC, Hibernate
Web/Application Servers: Jetty, Apache Tomcat
Scripting Languages: JavaScript, jQuery, AJAX, JSTL, CSS
Markup Languages: HTML, XML
XML: DOM, SAX, DTD, XSD, SOAP, REST, JAXB, XSL, XSLT
Databases: Oracle, MySQL, MS SQL Server 2005, Derby, MS Access, Apache Cassandra
OS: MS-Windows 95/98/NT/2000/XP/7, Linux, Unix, Solaris 5.1
Methodologies: OOP, Agile, Scrum, Extreme Programming
Version Control Tools: SVN, CVS, Git
Tools: Eclipse Maven, ANT, JUnit, TestNG, Jenkins, Soap UI, Putty, Log4j, Bugzilla
ETL Tools: Ab Initio GDE 1.14/1.15/3.0.4 Co-Operating System 2.14, 2.15, 3.16
PROFESSIONAL EXPERIENCE
Confidential, Corvallis, Oregon
SPARK DEVELOPER/ Big Data Developer
RESPONSIBILITIES:
- Strong experience in working with Spark-SQL, Data Frames and Pair RDD's.
- Experienced in handling large dataframes using Partitions, Spark in-Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Experience in writing SPARK UDF’s which is a feature of SPARK SQL which helps in code optimization.
- Creation of clusters using Databricks and maintaining the performance
- Actively involved in the SPARK tuning techniques by successfully caching the RDD’s and increase the number of executor’s per node etc
- Experienced in working with different level’s of data compressions like JSON,PARQUET,SNAPPY etc and writing into S3 with the desired partitioning.
- Creation of notebooks using Scala for component Test and System Integration tests
- Successfull cretion on workflow’s using Databricks REST API’s for running the production data.
- Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- Knowledge on Databricks airflow.
- Experience in working with Cloudberry tool.
- Succesfull creation of daily scheduled jobs using jenkins CLI and chron jobs.
- Successfull creation of metric files(JSON format) for the kibana dashboards.
- Worked on developing Unix Shell scripts to automate Spark-Sql
- Spark transformations using Spark and Scala
- Worked on creating SPARK jobs using SCALA
- Experience in working with GitHub,GitBash,GitK etc
- Experiece in working with Rally to provide the test case and test plans for the daily CI’s and also updating defects,user stories etc using RALLY API’s.
- Experience in working with SPARK using SCALA using functionally and improving the performace.
- Using Spark SQL to query the headers to learn about the composition of the data allowing us to compare data from various sources.
- Preparation of flowcharts and diagrams to illustrate sequence of steps programs must follow and describe logical operatins involved.
- Involved in the gathering of Business requirements and preparation of Information Template used for identifying data elements for future reporting needs.
- Define project deliverables and critical target dates to be reflected in the project plan.
- Using Amazon Redshift to load the data from cloud
- Maintain documentation that supports system configuration, training and user experience.
- Create playbooks for the data pipeline and wiki pages.
- Responsible for providing solutions.
Environment: Databricks, SPARK, Scala, Python, AWS, wiki, Rally, SharePoint, Jenkins, GIT, Cloudberry, and Amazon RedShift.
Confidential, St.louis, Missouri
Spark / HADOOP DEVELOPER
RESPONSIBILITIES:
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Real time streaming the data using Spark with Kafka.
- Responsible for building scalable distributed data solutions using Hadoop.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager
- Upgrading the Hadoop Cluster from CDH3 to CDH4, setting up High availability Cluster and integrating HIVE with existing applications.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD’s.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Configured Sqoop and developed scripts to extract data from MySQL into HDFS
- Created Hbase tables to store various data formats of PII data coming from different portfolios
- Helped with the sizing and performance tuning of the Cassandra cluster
- Involved in the process of Cassandra data modelling and building efficient data structures.
- Trained and mentored analyst and test team on Hadoop framework, HDFS, Map Reduce concepts, Hadoop Ecosystem
- Responsible for architecting Hadoop clusters
- Assist with the addition of Hadoop processing to the IT infrastructure
- Perform data analysis using Hive and Pig
Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Spark, Cloudera Manager, StormCassandra, Pig, Sqoop, PL/SQL, MySQL, Windows, Horton worksOozie, HBase
Confidential, Kansas City, Missouri
Spark / HADOOP DEVELOPER
RESPONSIBILITIES:
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java and python for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Explored Spark, Kafka along with other open source projects to create a realtime analytics framework
- Develop wrapper using shell scripting for Hive, Pig, Sqoop, Scala jobs
- Worked on developing Unix Shell scripts to automate Spark-Sql
- Experienced in defining job flows.
- Experienced in managing and reviewingHadooplog files.
- Participated in development/implementation of Hortonworks environment.
- Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Involved in NOSQL databases like HBase, Apache CASSANDRA in implementing and integration.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- POC conducted for different use cases and documented on AWS Platform
- Installed and configured Hive and also written Hive UDFs in java and python.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Experience in Schema defining
- Experience migrating MapReduce programs into Spark transformations using Spark and Scala
- Writing Pig Latin scripts to process the data and also written UDF in java and python.
- Wrote Map Reduce programs in Java to achieve the required Output.
- Application performance optimization for Cassandra cluster.
- Worked with the infrastructure and admin team in designing, modeling, sizing and configuring Hadoop cluster of 15 nodes on AWS EC2
- Written Hive queries for data analysis to meet the Business requirements.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop. Cluster co-ordination through Zookeeper.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, Hadoop distribution of Hortonworks, Scala, AWS, SQL, PIG, Zookeeper, Scala, Spark, Sqoop, Flume, Teradata, CentOS, Servlets, JDBC, JSP,JSTL, JPA, Apache, JavaScript, Eclipse, CVS, CSS, Xml, JSon
Confidential - Santa Clara, CA
JAVA DEVELOPER
RESPONSIBILITIES:
- Actively involved in Analysis, Detail Design, Development, System Testing and User Acceptance Testing.
- Developing Intranet Web Application using J2EE architecture, using JSP to design the user interfaces, and JSP tag libraries to define custom tags and JDBC for database connectivity.
- Implemented struts framework (MVC): developed ActionServlet, Action Form bean, configured the struts-config descriptor, implemented validator framework.
- Extensively involved in database designing work with Oracle Database and building the application in J2EE Architecture.
- Integrated messaging with MQSERIES classes for JMS, which provides XML message Based interface. In this application publish-and-subscribe model of JMS is used.
- Developed the EJB-Session Bean that acts as Facade, will be able to access the business entities through their local home interfaces.
- Evaluated and worked with EJB's Container Managed Persistent strategy.
- Used Webservices - WSDL and SOAP for getting Loan information from third party and used SAX and DOM XML parsers for data retrieval
- Experienced in writing the DTD for document exchange XML. Generating, parsing and displaying the XML in various formats using XSLT and CSS.
- Used XPath 1.0 for selecting nodes and XQuery to extract and manipulate data from XML documents.
- Coding, testing and deploying the web application using RAD 7.0 and Websphere Application Server 6.0.
- Used JavaScript's for validating client side data.
- Wrote unit tests for the implemented bean code using JUnit.
- Extensively worked on UNIX Environment.
- Data is exchanged in XML format, which helps in interoperability with other software applications.
Environment: Struts 2, Rational Rose, JMS, EJB, JSP, RAD 7.0, Websphere Application Server 6.0, XML parsers, XSL, XQuery, XPath 1.0, HTML, CSS, JavaScript, IBM MQSeries, ANT, JUnit, JDBC, Oracle, Unix, SVN.
Confidential
JAVA DEVELOPER
RESPONSIBILITIES:
- Developed light weight business component and integrated applications using struts 1.2
- Designed and developed front-end, middleware and back-end applications.
- Optimized Server/client side validation.
- Worked together with the team in helping transition from Oracle to DB2.
- Developed the global logging module which was used across all the modules using Log4J components.
- Developed the presentation layer for the credit enhancement module in JSP.
- Struts 1.2 were used to implement the Model View Layer (MVC) architecture. Validations were done on the client side as well as the server side.
- Involved in the configuration management using ClearCase.
- Detecting and resolving errors/defects in the quality control environment.
- Using Ibatis for mapping Java classes with database.
- Involved in Code review and integration testing.
- Used Debugging tools such as PMD, Find Bugs and checkstyle.
Environment: Java v1.6, J2EE 6, Struts 1.2, iBatis, XML, JSP, CSS, HTML, JAVASCRIPT, JQuery, Oracle 10g, DB2, Unix, RAD, ClearCase, WebSphere V8.0 (beta)
Confidential
Oracle PL/SQL DEVELOPER
RESPONSIBILITIES:
- Extensively involved in business requirements analysis and translating them into warehouse design.
- Assisted in designing the logical and physical data model using ERwin 3.x, identifying the Fact tables and Dimension tables.
- Improved integrity by identifying and creating relationships between tables.
- Created PL/SQL packages and procedures to build business rules to load data.
- Performance optimization, procedure definition, queries tuning, database design.
- Developed Shell Scripts to automate file manipulation.
- Created and executed detailed test plans and scripts to verify software functionality and adherence to business requirements.
- Development of user screens to facilitate business requirements.
- Designed processes to extract and transform data from flat files and other relational sources (SQL Server, Oracle) to load into relational targets (Oracle).
- Developed shell scripts for batch processing, environment set-up, invoking SQL*Loader and checking for file status.
- Documented all the PL/SQL scripts involved in the process.
Environment: ERwin 3.0.1, Oracle 9i, Toad for Oracle, MS SQL Server, Microsoft VSS, Microsoft VB.Net, SQL*Loader, Windows 2000, UNIX.