Hadoop/spark Developer Resume
Kansas City, MissourI
SUMMARY
- 4+ years of IT professional experience with full project lifecycle development in J2EE technologies, Requirements analysis, Design, Development, Testing, Big Data, Deployment and production support of software applications.
- Experience in analyzing data using Hadoop Ecosystem including HDFS, Hive, HiveQL, Spark, Spark Streaming, SparkSQL, MLLib, Kafka, HBase, and Zookeeper.
- Involved in converting Hive/SQL queries into Spark Transformations using RDD's and Scala.
- Migrated the traditional MapReduce jobs to Spark jobs to improve the Speed of Data.
- Experienced in WAMP (Windows, Apache, MYSQL) and LAMP (Linux, Apache, MySQL) Architecture.
- Experience in working with Horton Works Hadoop stack and Amazon Web Services (AWS) suite.
- Very good understanding of Hadoop architecture and the daemons ofHadoop - Name Node, Data Node, Resource Manager, Node Manager, Task Tracker, Job Tracker.
- Good knowledge and experience in developing SOAP and REST APIs and frameworks like Django and Flask.
- Building Data Warehousing and Datamart solution in Teradata and Big data platforms.
- Experienced in developing Web Services with java programming language.
- Experience in developing web applications and implementing Model View Control (MVC) architecture using server-side applications Django, Flask and Pyramid.
- Hands on experience in installing, configuring and using ecosystem components likeHadoopMap Reduce, HDFS, HBase, Oozie, Hive, HCatalog, Pig, Flume.
- Performed Data Integration between different Databases and to HDFS, Hive and Hbase using Talend Integration and Talend Big Data tools.
- Experience in using database stage like oracle connector, Teradata connector, ODBC connector.
- Experience in Designing, Compiling, Testing, and Scheduling and Running Data Stage jobs.
- Experienced in developing Map Reduce programs using Apache Hadoop for working with Big Data.
- Expertise in back-end procedure development, for RDBMS, Database Applications using SQL and PL/SQL.
- Good knowledge with Big Data on Azure - Data lake store, Data Factory.
- Good Knowledge on Informatica and worked when connected to Oracle using Informatica and used various transformations to perform the ETL tasks.
- Hands on experience on writing Queries, Stored procedures, Functions and Triggers by using SQL.
- Experienced in utilizing Java tools in business, Web, and client-server environments including Java Platform, J2EE, EJB, JSP, Java Servlets, Struts, and Java database Connectivity (JDBC) technologies.
- Experience in writing Complex SQL Queries involving multiple tables inner and outer joins.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
TECHNICAL SKILLS
Languages: C, Java, Scala
Hadoop Distribution: Hortonworks, Cloudera
Hadoop Eco Systems: HDFS, MapReduce, Yarn, Pig, Hive, HiveQL, HBase, Sqoop, Flume, Oozie, Zookeeper, Cassandra, Kafka, Scala, Spark, Spark Streaming, Spark SQL and Storm.
Technologies: JSP, J2EE, JDBC, Hibernate, Spring, Ajax, RESTful web services
Development Tools(IDEs): Eclipse, NetBeans, Intellij
Web/Application Servers: Tomcat, WebLogic, IBM WebSphere, JBOSS
Database: Oracle 11g, SQL server 2008, MySQL, MS SQL Server, HBase
Platforms: Windows, Unix, Linux
Testing Tools: Junit, JIRA
Version Control Tools: Git, GitHub
Methodologies: Agile (SCRUM), Waterfall
Build Tools: Maven, Gradle
PROFESSIONAL EXPERIENCE
Confidential, Kansas City, Missouri
Hadoop/Spark Developer
Responsibilities:
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Zoo Keeper, Sqoop, Flume, Spark and Kafka.
- Developed Spark code using Scala and Spark -SQL for faster testing and processing of data.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Exploring with theSpark improving the performance and optimization of the existing algorithms in Hadoop usingSpark Context,Spark -SQL, Data Frame, Pair RDD’s,Spark YARN.
- Experienced with batch processing of data sources using ApacheSpark.
- Developed analytical components using Scala,Spark, YARN andSpark Stream.
- Experienced with NoSQL databases like HBase, MongoDB and Cassandra.
- Installed Hadoop, Map Reduce, HDFS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
- Developed Kafka producer and consumers, Spark and Hadoop MapReduce jobs.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Involved in converting Map Reduce programs intoSpark transformations usingSpark RDD’s on Scala.
- DevelopedSpark scripts by using Scala Shell commands as per the requirement.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Load the data into Spark RDD and do in memory data Computation to generate the Output response.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs with control flows.
- Expertise in different Data Modelling and Data Warehouse design and development.
Environment: Hadoop, HDFS, Spark, MapReduce, Pig, Hive, Sqoop, Kafka, HBase, Oozie, Flume, Scala, Java, SQL Scripting and Linux Shell Scripting.
Confidential, Naperville IL
Spark/Java Developer
Responsibilities:
- Worked on developing streaming application using Spark Streaming (2.x). The end to end data flow includes NiFi, Kafka, Spark Streaming and HBase.
- Developed Spark code using Scala andSpark -SQL for faster testing and processing of data.
- Developed Kafka producer and consumers, Spark and Hadoop MapReduce jobs.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HBase.
- Load the data into Spark RDD and do in memory data Computation to generate the Output response
- I have been involved in streaming the data i.e. Json format from different Kafka topics and loading the data into HBase in real time.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- I have developed a real-time data validation checks on the streaming data before loading to HBase tables.
- Worked on Row Key design and table design.
- Daily reports are generated on the HBase tables using Spark HBase API. Reports include Audit batch reports and Data Validations reports.
- Involved on created Time series data for the daily data, which would help for further time series analysis and conduct machine learning techniques on the time series data.
Environment: Spark, Spark Streaming, Java, Scala, HBase, Hive, Kafka, Intellij, NiFi, Zeppelin
Confidential, Ann Arbor, MI
Software Engineer
Responsibilities:
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Zoo Keeper, Sqoop, Flume, Spark and Kafka.
- Developed Spark code using Scala andSpark -SQL for faster testing and processing of data.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop usingSpark Context,Spark -SQL, Data Frame, Pair RDD’s,Spark YARN.
- Experienced with batch processing of data sources using Apache Spark.
- Developed analytical components using Scala, Spark, YARN andSpark Stream.
- Experienced with NoSQL databases like HBase, MongoDB and Cassandra.
- Installed Hadoop, Map Reduce, HDFS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
- Involved in Data Extraction from Oracle, Flat files and XML files using Talend by using Java as Backend Language.
- Wrote UNIX shell scripts in combination with the Informatica sessions to process the source files and load into staging database.
- Developed Kafka producer and consumers, Spark and Hadoop MapReduce jobs.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Involved in converting Map Reduce programs intoSpark transformations usingSpark RDD’s on Scala.
- DevelopedSpark scripts by using Scala Shell commands as per the requirement.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Load the data into Spark RDD and do in memory data Computation to generate the Output response.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs with control flows.
- Expertise in different Data Modelling and Data Warehouse design and development.
Environment: Hadoop, HDFS, Spark, MapReduce, Pig, Hive, Sqoop, Kafka, HBase, Oozie, Flume, Scala, Java, SQL Scripting, Oracle and Linux Shell Scripting.
Confidential
Software Engineer
Responsibilities:
- Implemented server-side programs by using Servlets and JSP.
- Designed, developed and validated User Interface using HTML, Java Script, XML and CSS.
- Implemented MVC using Struts Framework.
- Involved in implementing the DAO pattern for database access and used the JDBC API extensively.
- Used XML Web services for transferring data between different applications and retrieving credit information from the credit bureau.
- Used XML with DTD and its references with the files. Used JAXB API to bind XML schema to java classes.
- Used JMS-MQ Bridge to send messages securely, reliably and asynchronously to WebSphere MQ, which connects to the legacy systems.
- Tested the application functionality with JUnit Struts Test Cases.
- GUI was developed using JSF and Java Swing.
- Developed logging module-using Log4J to create log files to debug as well as trace application.
- Used CVS for version control.
- Extensively used ANT as a build tool. Deployed the applications on IBM Web Sphere Application Server.
- Handled the database access by implementing Controller Servlet.
- Implemented PL/SQL stored procedures and triggers.
- Used JDBC prepared statements to call from Servlets for database access.
- Used Log4J for any errors in the application. Written test cases using Junit.
Environment: Java 1.4, J2EE, JSP, Servlets, HTML, DHTML, XML, JavaScript, Eclipse, WebLogic, Struts, Web Sphere MQ 5.3, Java SDK 1.4, MVC, Core Java, Servlet 2.2, JSP 2.0, JDBC, PL/SQL, XML Web Services, XML DTD, Apache Tomcat, ASP, Spring1.0.2, SOAP, WSDL, JavaScript, Windows 2000, Oracle 9i, JUnit, CVS, ANT 1.5 and Log4J.
