Bigdata And Spark Developer Resume
4.00/5 (Submit Your Rating)
Dallas, TX
SUMMARY:
- Around 7+ years of IT experience in Analysis, design, development, implementation, maintenance and support with experience in developing strategic methods for deploying big data technologies to efficiently solve Big Data processing requirement.
- Around 5 years of experience on BIG DATA using HADOOP framework and related technologies such as HDFS, HBASE, MapReduce, Spark, Hbase, HIVE, PIG, FLUME, OOZIE, SQOOP, and ZOOKEEPER.
- Experience in data analysis using HIVE, PIG LATIN, HBASE and custom Map Reduce programs in Java.
- Experience in writing custom UDFs in JAVA and SCALA for HIVE and PIG TO EXTEND THE FUNCTIONALITY.
- Experience with Cloudera and Horton works distributions.
- Over 3 - years experience on SPARK, SCALA, HBASE and KAFKA.
- Developed analytical components using KAFKA, SCALA, SPARK, HBASE and SPARK STREAM.
- Experience in working with Flume to load the log data from multiple sources directly into HDFS.
- Pretty Good knowledge On the Hortonworks administration and security things such as Apache Ranger, Knox Gateway, High Availability.
- Performed Hadoop backup Strategy to take the backup of hive, HDFS, Hbase, oozie etc.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS.
- Involved in creating HDINSIGHT cluster in Confidential AZURE PORTAL also created EVENTSHUB and AZURE SQL DATABASES.
- Worked on a clustered Hadoop for Windows Azure using HDInsight and HORTONWORKS Data Platform for Windows.
- Built real time pipeline for streaming data using EVENTSHUB/ Confidential AZURE Queue and SPARK STREAMING.
- Loaded the aggregated data into Hbase for reporting purpose
- Read the data from Hbase to Spark to perform Join on different tables.
- Created the Hbase tables for validation, audit and offset management table.
- Created logical view instead of tables in order to enhance the performance of hive queries.
- Involved in developing Hive DDLS to create, alter and drop Hive tables
- Pretty Good Knowledge on hive Optimization techniques like Vectorization and column-based optimization.
- Written oozie workflow to invoke the Jobs in predefined Interval.
- Expert in scheduling Oozie coordinator based on input data events it starts Oozie workflow when input data is available.
- On Other Hand working on POC with Kafka and NIFI to pull the real-time events into Hadoop Box.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, SPARK-SQL, DATA FRAME, PAIR RDD'S and YARN.
- Experienced in managing Hadoop Cluster using HORTONWORKS AMBARI.
RELEVANT EXPERIENCE:
BIGDATA AND SPARK DEVELOPER
Confidential, Dallas, TX
Responsibilities:
- Developed framework to encrypt sensitive data (SSN, Account number ...etc.) in all kinds of datasets and moved datasets one S3 bucket to another.
- Processed datasets like Text, Parquet, Avro, Fixed Width, Zip, Gz, JSON and XML.
- Developed framework to check data quality of datasets, schema defined in cloud. worked on Amazon Web service(AWS) to integrate EMR with Spark 2 and S3 storage and Snowflake
- Configured Spark streaming to receive real time data from the Kafka and store the stream data into AWS S3 using Scala.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
- Used Spark - Streaming APIs to perform required transformations and actions on the learner data model which gets the data from Kafka in near real time.
- Worked on migrating Map Reduce programs into Spark transformations using Spark and Used File Broker to schedule workflows to run Spark jobs to transform data on a persistent schedule.
- Experience developing, deploying Shell Scripts for automation/notification/monitoring.
- Extensively used Apache Kafka, Apache Spark, HDFS and Apache Impala to build a near real time data pipelines that get, transform, store and analyze click stream data to provide a better personalized user experience.
- Worked on Performance tuning on Spark Application.
- Worked with Apache Spark SQL and data frame functions to perform data transformations and aggregations on complex semi structured data.
- Hands on experience in creating RDDs, transformations and actions while implementing Spark applications.
Environment: AWS, SPARK, HIVE, SPARK SQL, KAFKA, EMR, SNOWFLAKE, NEBULA,HIVEPYTHON, SCALA, MAVEN, JUPYTER NOTEBOOK, VISUAL STUDIO, UNIX SHELL SCRIPTING.
SPARK DEVELOPER (AWS and KAFKA)
Confidential, Plano, Texas
Responsibilities:
- Developed data pipeline using EVENTHUBS, SPARK, HIVE, PIG AND AZURE SQL DATABASE to ingest customer behavioral data and financial histories into HDINSIGHT cluster for analysis.
- Involved in creating HDINSIGHT cluster in Confidential AZURE PORTAL also created EVENTSHUB and AZURE SQL DATABASES.
- Worked on a clustered Hadoop for Windows Azure using HDInsight and HORTONWORKS Data Platform for Windows.
- Spark Streaming collects this data from EVENTSHUB in near-real-time and performs necessary transformations and AGGREGATION on the fly to build the common learner data model and persists the data in AZURE DATABASE.
- Used PIG to do transformations, event joins, filter boot traffic and SOME PRE-AGGREGATIONS before storing the data onto azure database.
- Expertise with the tools in Hadoop Ecosystem including PIG, HIVE, HDFS, YARN, OOZIE, AND ZOOKEEPER. Hadoop architecture and its components.
- Involved in integration of Hadoop cluster with spark engine to perform BATCH and GRAPHX operations.
- Exploring with the SPARK improving the performance and optimization of the existing algorithms in Hadoop using SPARK CONTEXT, SPARK-SQL, DATA FRAME, PAIR RDD'S, SPARK YARN.
- I have been experienced with SPARK STREAMING to ingest data into SPARK ENGINE.
- Import the data from different sources like EVENTHUBS, COSMOS into SPARK RDD.
- Developed SPARK CODE using SCALA and Spark-SQL/Streaming for faster testing and processing of data.
- Involved in converting Hive/SQL queries into SPARK TRANSFORMATIONS using Spark RDDs, and SCALA.
- Developed multiple POCs using SCALA and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Worked on the SPARK SQL and SPARK STREAMING modules of Spark extensively and Used SCALA to write code for all Spark use cases.
- Used DATAFRAME API in Scala for converting the distributed collection of data organized into named columns.
- Involved in converting the JSON data into DATAFRAME and stored into hive tables.
- Experienced with AZCOPY, LIVY, WINDOWS POWERSHELL and CURL to submit the spark jobs on HDINSIGHT CLUSTER.
- Analyzed the SQL scripts and designed the solution to implement USING SCALA.
Environment: AZURE, SPARK, HIVE, SPARK SQL, KAFKA, HORTON WORKS, JBOSS DROOLS, HIVE, PIG, OOZIE, HBASE,PYTHON, SCALA, MAVEN, JUPYTER NOTEBOOK, VISUAL STUDIO, UNIX SHELL SCRIPTING.
BIGDATA DEVELOPER
Confidential
Responsibilities:
- Importing and exporting data into HDFS and Hive using Sqoop.
- Used Bash Shell Scripting, Sqoop, AVRO, Hive, Pig, Java, Map/Reduce daily to develop ETL, batch processing, and data storage functionality.
- Used Pig to do data transformations, event join sand some pre-aggregations before storing the data on the HDFS.
- Exploited Hadoop MySQL-Connector to store Map Reduce results in RDBMS.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Worked on loading all tables from the source database schema through Sqoop.
- Worked on designed, coded and configured server side J2EE components like JSP, AWS and JAVA.
- Collected data from different databases (i.e. Oracle, MySQL) to Hadoop
- Used Oozie and Zookeeper for workflow scheduling and monitoring.
- Worked on Designing and Developing ETL Workflows using Java for processing data in HDFS/Hbase using Oozie.
- Experienced in managing and reviewing Hadoop log files.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Working on extracting files from MySQL through Sqoop and placed in HDFS and processed.
- Supported Map Reduce Programs those running on the cluster.
- Cluster coordination services through Zoo Keeper.
- Involved in loading data from UNIX file system to HDFS.
- Created several Hive tables, loaded with data and wrote Hive Queries in order to run internally in MapReduce.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
JAVA DEVELOPER
Confidential
Responsibilities:
- Involved in the analysis, design, and development and testing phases of Software Development Life Cycle (SDLC)
- Designed and developed framework components, involved in designing MVC pattern using Struts and spring framework.
- Responsible for developing Use case, Class diagrams and Sequence diagrams for the modules using UML and Rational Rose.
- Developed the Action Classes, Action Form Classes, created JSPs using Struts tag libraries and configured in Struts-config.xml, Web.xml files.
- Involved in Deploying and Configuring applications in Web Logic Server.
- Used SOAP for exchanging XML based messages.
- Used Confidential VISIO for developing Use Case Diagrams, Sequence Diagrams and Class Diagrams in the design phase.
- Developed Custom Tags to simplify the JSP code. Designed UI screens using JSP and HTML.
- A ctively involved in designing and implementing Factory method, Singleton, MVC and Data Access Object design patterns.
- Web services used for sending and getting data from different applications using SOAP messages. Then used DOM XML parser for data retrieval.
- Wrote JUNIT test cases for Controller, Service and DAO layer using MOCKITO, DBUNIT.
- Developed unit test cases using proprietary framework which is similar to JUNIT.
- Used JUnit framework for unit testing of application and ANT to build and deploy the application on WebLogic Server