Spark Developer Resume
Cherry Hill, NJ
Summary:
- Has 6 years of experience in IT industry with 3 years as a Hadoop/Spark Developer in Big Data technologies development and 3 years as a Java developer.
- Expertise of technologies in Hadoop ecosystem like Spark, Kafka, Zookeeper, HDFS, Hive, Pig, Sqoop, RDBMS, MapReduce, Hbase, Apache Phoenix and so on.
- Working with data extraction, transformation and load from different RDBMS, HDFS, Data warehouse and NOSQL to Spark and vice versa.
- Experience in manipulating the streaming data to clusters through Kafka and Spark - Streaming.
- Management and monitoring large number of nodes in cluster with Ambari.
- Highly focus on Spark-shell, Spark-core, Spark-SQL and Spark-Streaming
- Programming with Python/Scala to manipulate different kinds of Spark context and configration.
- Experience in querying data in data warehouse like Hive and Impala.
- Importing and exporting data from RDBMS into HDFS using Sqoop.
- Modeling data and load data from Spark and HDFS into RDBMS like MySQL and Oracle.
- Use Apache Oozie control Workflow of each applications in Hadoop-Ecosystem.
- Hands on experience with resource management and job scheduling of Spark in HDFS like Yarn and Mesos in Hadoop Ecosystem.
- Experience in preprocess Data using Pig and write code Pig-Latin.
- Work related with built index use Lucene and Elastic Search, for other team’s API.
- Have knowledge in Kafka’s tuning, involve in set Kafka’s properties.
- Proficient in integration of Apache Phoenix and HBase.
- Proficiency in JSON, XML, CSV, Apache Avro, Apache Parquet and other formats of data.
- Experience with Cloudera, Hortonworks and AWS-EMR Distribution System of Hadoop.
- Involve in Front-end technologies like JavaScript, HTML5, CSS3, Python, JSON, XML, HTTP, Node.JS, JQuery.
- Involve in Back-end technologies like Java, J2EE, Spring, Node.JS, JDBC, Struts, Apache Tomcat, Servlet, MySQL, SQLite, RESTFul API and JUnit.
- Experience in Agile development with tool like JIRA and Waterfall methodology.
- Comfortable in Linux, UNIX environments. And Expertise in Unix-Shell.
- Extensive experience with IDE's like Eclipse, Atom, Visual Studio and WebStrom.
- Highly motivated and versatile team player with the ability to work independently & adapt quickly to new emerging technologies.
TECHNICAL SKILLS:
Hadoop Ecosystem \ Framework: MapReduce, Hive, Pig, Flume, \ Hadoop, Spark 1.5+, Spring.MVC, HBase, Sqoop, Oozie, \ Node.JS \Spark 1.5+, HDFS, YARN, Kafka 0.8, \
Databases\ Methodologies: Oracle 11g/12c, MySQL, SQL Server, Agile Scrum, Waterfall, \
Languages\ Web Technologies: Java 1.7, Scala 2.11, C++, SQL, HiveQL, \ Servlet, Tomcat, JSP, JDBC, HTML5, JQuery, \Pig Latin, JavaScript, Shell-Scripting, \ JSON, XML, CSS3 \
Systems\ Other: Windows, Linux, UNIX \ Eclipse, Maven, JUnit, Avro, Git, JIRA\
PROFESSIONAL EXPERIENCE:
Confidential, Cherry Hill, NJ
Spark Developer
Responsibilities:
- Responsible for ETL. Load data from different RDBMS(MySQL and Oracle), HDFS, Data warehouse(Hive and Impala) and NOSQL(Hbase and Cassandra) to Spark
- Accomplish transformations and converting data formats using Spark into format by requirements. And load data into required location for other team’s convenience.
- Development and deployment of Hadoop cluster in Linux, using Spark, Kafka, Oozie, Mesos, Yarn, HDFS, RDBMS and HBase.
- Administrate and monitoring Spark Application’s performance in Ambari or in Log.
- Administrate Spark Application, configure parameters for Spark
- Design data importing method from Web Applications to Kafka clusters with Java.
- Process and structure data utilizes Spark-Core, Spark-streaming and Spark-SQL.
- Migrating data from Spark-RDD into HDFS and NoSQL like Cassandra/Hbase.
- Manipulate Spark to read data from different RDBMS and NoSql for the batch process.
- Process different format of data like JSON, CSV, XML to Parquet utilize spark.
- Handle and convert data utilize each transformation and action in spark.
- Experienced in Spark’s performance turning, optimization, resource management, job schedule.
- Configure and turning Spark to achieve better performance in Spark
- Development with Scala for Data Streaming, Processing and Testing Purposes.
- Implement Sqoop to migrating data from RDBMS into HDFS and vice-versa.
- Extra different kind of data like transactional record, User profile, data of user behavior and so on.
- Extensive coding experience in Java and Scala, utilize Eclipse and Maven.
- Implement Apache Impala as query engine for large amount of data in Hadoop
- Working with Data Warehouse like Hive, Impala and Oracle, involve in manipulate data using HiveQL and SQL.
- Involve in data modeling in RDBMS like Oracle and MySQL.
- Involve in Scala coding corporate with Backend Team.
- Cooperate with Backend team to built index of table use Lucene and Elastic Search.
- Familiar with version control tool like Git.
- Involve in Extract-Transform-Load data from Hbase by utilize Apache Phoenix.
- Involve and Contributed towards develop and deploy a Data Pipeline utilize technologies related with Hadoop-Ecosystem.
- Hands on operations with Cloudera, Hortonworks and AWS-EMR, EC2, EBS.
- Work in the agile environment; serve for Waterfall methodology, effectively communicated at higher levels of an organization in Management and Technical roles.
Environment: RDBMS, Hadoop, Spark, Kafka, Parquet, Avro, HDFS, Yarn, Hive, Impala, Sqoop, HBase, Cassandra, Scala 2.11, Oracle, Java, JSON, SQL, Linux, Cloudera, AWS, Eclipse, Maven, Git.
Confidential, NYC
Hadoop Developer
Responsibilities:
- Responsible for utilize data pipeline using Kafka, Spark, MapReduce, Pig, Hive and Sqoop to ingest, transform and analyzing customer behavioral data and data of transactional environments.
- Implement Pig in Pig-Latin to handle the preprocessing of data and make data clean.
- Experience in querying data using Data warehouse like Hive and Impala
- Imported the data from different RDBMS into HDFS using Sqoop and vice versa.
- Used Kafka for process streaming data. Configured Kafka as message system to read and write messages from external data sources.
- Implemented Spark using Python/Scala and utilizing Spark Core, Spark Streaming and Spark SQL for faster processing of data instead of MapReduce in Java
- Involve in MapReduce in Java to process large volume of data.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop MapReduce and Spark.
- Use HBase as storage for large volume of data, integrate with Apache phoenix.
- Exported the analyzed data from HDFS to the relational databases using Sqoop to further visualize and generate reports for the BI team.
- Have management experience in large node clusters.
- Scheduled and executed workflows in Oozie to run Hive and Spark jobs.
- Used to monitor and manage the Hadoop cluster using Ambari.
- Hands on operations with sandbox like Cloudera, Hortonworks.
Environment: Kafka, Spark, Scala 2.11, Hadoop, HDFS, Hive, Pig, Sqoop, HBase, Oozie, Java, MySQL, python, scala, SQL, Linux, Ambari, Cloudera, Hortonworks.
Confidential Mutual, NY
Hadoop Developer
Responsibilities:
- Responsible for data migrating and data analyze in HDFS.
- Handled importing and exporting data using Sqoop from HDFS to RDBMS and vice-versa.
- Developed MapReduce programs in Java to process data.
- Accomplish data preprocess with Pig and write Pig-Latin
- Storage and query data in Hive and write Hive-QL.
- Load data into Oracle and MySql. Involve in data modeling.
- Convert data from different format like JSON, XML, CSV, TXT to Parquet
- Involve in designed and create storage procedure in HBase
- Effectively communicated at all levels of an organization’s Management and Technicians.
- Working closely with BI team and realize business logic in our application.
Environment: Hadoop, Hive, Pig, Sqoop, HBase, Java, JSON, MySQL, Oracle, Linux
Confidential
Back-end / Android Developer
Responsibilities:
- Designed and developed of application using Spring, JDBC, Struts, Servlet, MySQL
- Deployed and configure Apache Tomcat as web server.
- Involve in design user interface in HTML5, CSS3, JavaScript, XML and JQuery
- Experience with handle Hibernate as the use of persistence layer for mapping an object-oriented domain model to database.
- Developed database schema and SQL queries for querying, inserting and managing database
- Used Maven scripts to fetch, build, and deploy application to development environment
Environment: Spring.MVC, Hibernate, Struts, JSP, JavaScript, MySQL, Java7, Servlets, Apache Tomcat, CSS3, HTML5, Eclipse, JDBC.