Spark/big Data Developer Resume
Irving, TX
PROFESSIONAL SUMMARY:
- Overall 6 years of IT experience, including 2+ years of Hadoop/Big data Experience, 4 years of Java Programming involved in entire Software Development Life Cycle which includes Design, Developing, Implementing, Testing and maintenance of various web - based applications using Java, J2EE Technologies.
- Experience in working with Cloudera, Hortonworks Distributions.
- Experience in dealing with large data sets and making performance improvements
- Experience in Implementing Spark with the integration of Hadoop Ecosystem.
- Experience in using Spark RDD for parallel processing datasets in HDFS, MySQL and other data sources.
- Experience in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Experience in using different build tools like SBT and Maven.
- Implemented Spark Streaming for fast data processing.
- Experience in designing and developing Applications in Spark using Scala.
- Skilled in integrating Kafka with Spark streaming for high speed data processing.
- Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
- Experience in data cleansing using Spark Map and Filter Functions.
- Experience in developing and Debugging Hive Queries.
- Experience in performing read and write operations on HDFS filesystem.
- Experience working on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), setting up EMR (Elastic MapReduce).
- Good Experience in Data importing and Exporting to Hive and HDFS with Sqoop.
- Experience in creating Hive Tables and loading the data from different file formats.
- Experience in processing the data using Hive HQL for data Analytics.
- Extending Hive Core functionality by writing UDF’s for Data Analysis.
- Implemented Partitioning, Dynamic Partition, Buckets in HIVE.
- Experience in dealing with the different file formats like Sequence files, Avro and Parquet.
- Experience in using Producer and Consumer API’s of Apache Kafka.
- Extensively used Apache Flume to collect the logs and error messages across the cluster.
- Good in using version control like GITHUB and SVN
- Worked with MySQL, Oracle 11g databases.
- Strong Knowledge on UNIX/LINUX commands.
- Strong Knowledge on Python scripting Language.
- Adequate knowledge of Scrum, Agile and Waterfall methodologies.
TECHNICAL SKILLS:
Big Data Technologies: Apache Hadoop, Map Reduce, Hive, Pig, HBase, Sqoop, Spark, HDFS, Apache Kafka, Apache Flume, Apache oozie, Apache Zookeeper, Cassandra.
Hadoop Distributions: Cloudera, Hortonworks.
Programming Languages: Scala, Python, Java.
Scripting Languages: Angular2, Java Script.
Build Tools: Maven, SBT.
Version Control Tools: Git, SVN.
Cloud: AWS.
Databases: MySQL, Oracle 10g,11g.
NOSQL Databases: HBase, Cassandra.
Operating Systems: Windows 7/10, Linux (Cent OS, Red hat, Ubuntu), Mac OS.
Development Tools: IntelliJ IDEA, Eclipse, NetBeans.
PROFESSIONAL EXPERIENCE:
Confidential, Irving, TX
Spark/Big Data Developer
Responsibilities:
- Worked under the Cloudera distribution CDH 5.13 version.
- Involved in Ingesting weblog data into HDFS using Kafka.
- Processed Json Data with Spark SQL.
- Performed Cleansing the data to get a desired format.
- Involved in writing Spark SQL Data frames into Parquet Files.
- Involved in Tuning Spark Jobs for optimal Efficiency.
- Written the Scala functions, procedures, Constructors and Traits.
- Created Hive tables to load the transformed Data.
- Performed partitions and bucketing in hive for easy data classification.
- Involved in Analyzing data by writing queries using HiveQL for faster data processing.
- Involved in working with Sqoop for loading the data into RDBMS.
- Created a data pipeline using Oozie which runs on daily basis.
- Involved in Persisting Metadata into HDFS for further data processing.
- Loading data from Linux Filesystems to HDFS and vice-versa.
- Involved in creating tables, partitioning, bucketing of table and creating UDF's along with fine tuning in Hive.
- Loaded the Cleaned Data into the hive tables and performed some analysis based on the requirements.
- Responsible in performing sort, join, aggregations, filter, and other transformations on the datasets.
- Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review sessions.
Environment: HDFS, Apache Spark, Apache Hive, Scala, Oozie, Flume, Kafka, Agile Methodology, Cloudera, Cassandra.
Confidential, Farmington, CT
Big Data/Hadoop Developer.
Responsibilities:
- Worked under the Hortonworks HDP Enterprise.
- Worked on large sets of structured and semi-structured data.
- Involved in copying large data from Amazon S3 buckets to HDFS using Flume.
- Used Spark SQL with Scala for creating data frames and performed transformations on data frames.
- Involved in working with Avro Files using Spark SQL
- Written UDF’s in Spark SQL using Scala.
- Performed data Aggregation operations using Spark SQL queries.
- Configured Spark streaming to receive data from Kafka and store the streamed data to HDFS using Scala.
- Implemented Hive Partitioning and bucketing for data analytics.
- Worked on Performance and Tuning operations in Hive.
- Extensively used Maven Build tool for code repository.
- Used Git has Version Control System.
- Involved in working with Sqoop to export the data from Hive to S3 buckets
- Used Oozie and Oozie coordinators to deploy end to end data processing pipelines and scheduling the work flows.
Environment: Apache Spark, Apache Flume, Amazon S3, Apache Sqoop, Apache Oozie, Apache Kafka, Hive, Apache.
Confidential
Java Developer
Responsibilities:
- Involved in requirement collection and analysis.
- Involved in developing front-end screens using JSP, Struts and HTML
- Involved in implementing persistent data management using JDBC .
- Involved in problem analysis and coding
- Design and coding of screens involving complex calculations on various data windows accessing different tables on the oracle database .
- Developed screens for Patient Registration, Inventory of Medicines, Billing of Services and Asset Modules .
- Used JSF framework in developing user interfaces using JSF UI Components, validate Events and Listeners.
- Created several pieces of the JSF engine, including value bindings, bean discovery, method bindings, event generation and component binding,
- Involved in unit testing, integration testing, SOAP UI testing, smoke testing, system testing and user acceptance testing of the application.
- Wrote stored procedures, Database Triggers.
- Involved in debugging and troubleshooting related to production and environment issues
- Performed Unit testing.
Environment: JSP, Servlets, SQL, PL/SQL, WebSphere Application Server, Oracle 9i, JavaScript, Windows XP, html, Unix shell script, Junit.
Confidential
Java Developer
Responsibilities:
- Involved in the complete development, testing and maintenance of the application.
- Designed UI Screens using Servlets, JavaScript, CSS, Ajax, DHTML, XSL, XHTML and HTML.
- Implemented Patterns such as Singleton, Factory, Facade, Prototype, Decorator, Business Delegate and MVC.
- Created Session Beans to handle the business logic associated with the Inspection.
- Developed and deployed various Entity EJBs and session EJBs.
- Involved in the Object-Oriented Requirement Analysis Phase of the project in order to gather business logic requirement.
- Development of GUI using JSP.
- Coding of JSP Pages for External Application (EXA) using Custom Tag Library which create standard tag used in the application.
- Involved in designing application based on MVC Architecture.
- Developed Session beans to implement the core Business logic.
- Designed use case diagrams, class diagrams and sequence diagrams using Microsoft Visio tool.
- Involved in coding the helper classes for better data exchange between different layers.
- Provided production support by fixing bugs.
- Performed unit testing, system testing and user acceptance test.
- Used CVS for version control.
Environment: Java, Servlets, JSP, CSS3, XML, DHTML, EJB, JavaScript AJAX, DB2, Web Services, Web Sphere Application Server, Log4j, CVS, JUnit, IBM RAD, UML.