We provide IT Staff Augmentation Services!

Bigdata/spark Developer Resume

4.00/5 (Submit Your Rating)

San Jose, CA

SUMMARY

  • 9years of professional IT experience in analyzing requirements, designing, building, highly distributed mission critical products and Applications.
  • 4+ years of Data Analytics experience in Apache Hadoop Cloudera and Hortonworks Distributions
  • Expertise in core Hadoop and Hadoop technology stack which includes HDFS, Map Reduce, Oozie, Hive, Sqoop, Pig, Flume,Teradata,HBase, Spark, Storm, Kafka and Zookeeper.
  • Experience in AWS cloud environment and on s3 storage and ec2 instances and deploying in it.
  • In - depth knowledge of Statistics, Machine Learning, Data mining.
  • Developed schedulers that communicated the the cloud based services (aws) to retrieve the data.
  • Experienced in implementing complex algorithms on semi/unstructured data using Map reduce programs.
  • Expertise knowledge on Microsoft Azure.
  • Experienced in working with structured data using Hive QL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
  • Experienced in migrating ETL kind of operations using Pig transformations, operations and UDF's.
  • Good knowledge on Python.
  • Spark Streamingcollects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (Hbase).
  • Experienced in implementing POC's to migrate iterative map reduce programs into Spark transformations using Scala.
  • Specialization in Data Ingestion, Processing, Development from Various RDBMS data sources into a Hadoop Cluster using Map Reduce/Pig/Hive/Sqoop
  • Experienced in implementing unified data platform to get data from different data sources using Apache Kafka brokers, cluster, Java producers and Consumers.
  • Excellent Working Knowledge in Spark Core, Spark SQL, Spark Streaming.
  • Developed Spark jobs usingscalain test environment for faster data processing and used Spark SQL for querying.
  • Experienced in working with in-memory processing frame work like Spark transformations, SprakSQL and Spark streaming using scala.
  • Excellent understanding and knowledge of NOSQL databases like HBase, Cassandra, MongoDB, Teradata and on Data warehouse.
  • Implemented Frameworks using java and python to automate the ingestion flow.
  • Involved in NoSQL (Datastax Cassandra) database design, integration and implementation and written scripts and invoked them using CQLSH.
  • Involved in data modeling in Cassandra and Involved in implementing sharding and replication strategies in MongoDB.
  • Designed, developed, and monitored Oracle-NoSQL databases, Apache web and cloud server frameworks in LINUX for high performance, VMWare cloud storage for performance-query tuning, ETL processes, large file storage.
  • Experienced in implementing custom interceptors and sterilizers in flume for specific customer requirements.
  • Experienced with batch processing of data sources using Apache Spark, elastic search.
  • Tool monitored log input from several datacenters, via Spark Stream, was analyzed in Apache Storm and data was parsed and saved into Cassandra.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems MYSQL, Oracle, Teradataand vice versa.
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
  • Good Exposure on Apache Hadoop Map Reduce programming, PIG Scripting and Distribute Application and HDFS.
  • Expertise knowledge on apache nifi.
  • Experience in managing Hadoop clusters using Cloudera Manager Tool.
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Worked on Cluster co-ordination services throughZookeeper.
  • Actively involved in coding using CoreJavaand collection API's such as Lists, Sets and Maps.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Experience on different operating systems like UNIX, Linux and Windows.
  • Experience on Java Multi-Threading, Collection, Interfaces, Synchronization, and Exception Handling.
  • Involved in writing PL/SQL stored procedures, triggers and complex queries.
  • Worked in Agile environment with active scrum participation.

TECHNICAL SKILLS

Hadoop/Big Data: HDFS, Map reduce, HBase, Pig, Hive, Sqoop, MongoDB, Cassandra, Flume, Oozie, Zookeeper, AWS, Spark, Kafka, Teradata, Storm, ETL, Informatica, solr, scala, Jenkins, Apache nifi, presto, Microsoft Azure.

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, Java Beans, Maven, Gradle, JUnit, TestNG.

IDE’s: Eclipse, Net beans, Intellij Idea.

Frameworks: MVC, Struts, Hibernate, Spring.

Programming languages: C,C++, Java, Python, Ant scripts, Linux shell scripts

Databases: Oracle 11g/10g/9i, MYSQL, DB2, MS-SQL SERVER, teradata

Web Servers: Web Logic, Web Sphere, Apache Tomcat

Web Technologies: HTML, XML, JavaScript, AJAX, SOAP, WSDL, JAX-RS, Restful, JAX-WS.

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

Version Controls: CVS, SVN, GIT.

PROFESSIONAL EXPERIENCE

Confidential, SAN JOSE, CA

BigData/Spark Developer

Responsibilities:

  • Worked on analyzingHadoopcluster and different big data analytical and processing tools including Sqoop, Pig, Hive, Spark, Kafka andPyspark.
  • Worked on MapR platform team for performance tuning of hive and spark jobs of all users.
  • Using Hive TEZ engine to increase the performance of the applications.
  • Working on incidents created by users for platform team on hive and spark issues by monitoring hive and spark logs and fixing it or else by raising MapR cases.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Worked on Hadoop Data Lake for ingesting data from different sources such as oracle and Teradata through INFOWORKSingestion tool.
  • Worked on ARCADIA for creating analytical views on top of tables as if the batch is loading also no issue in reporting or table locks as it will point to arcadia view.
  • Scripts to automate permissions and storing the assigned permission in a table and if updated in table also the permission will be assigned to updated group.
  • Worked on Python API for converting assigned group level permissions to table level permission using MapR ace by creating a unique role and assigning through EDNA UI.
  • Migrating various Hive UDF's and queries into Spark SQL for faster requests.
  • Configured to receive real time data from the ApacheKafka and store the stream data to HDFS using Kafka connect.
  • Hands on experience in Spark using scala and pythoncreating RDD's, applying operations -Transformation and Actions.
  • Extensively perform complex data transformations in Spark using Scala language.
  • Involved in converting Hive/SQL queries into Spark transformations using Scala.
  • UsedPyspark and scala languages to process the data.
  • Used Bitbuket and Git repositories.
  • Used text, AVRO, ORC and Parquet file formats for Hive tables.
  • Experienced Scheduling jobs using Crontab.
  • Developed and implemented hive custom UDFs involving date functions.
  • Used sqoop to import data from Oracle, Teradata to Hadoop.
  • Used TESScheduler engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, Spark, Kafka and Sqoop.
  • Experienced in creating recursive and replicated joins in hive.
  • Experienced in developing scripts for doing transformations using Scala.
  • Involved in developing Shell scripts to orchestrate execution of all other scripts and move the data files within and outside of HDFS.
  • Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.
  • Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
  • Experienced in creating the shell scripts and made jobs automated.

Confidential, Charlotte, NC

Spark Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive,Sqoop, spark, Scala, Impala, Python, Linux, Shell Scripts,Cloudera, Teradata, java transformation of data into xml.
  • Process involved extracting data through sqoop, Transforming Data using Pig,Hive,Pyspark, Scala and loading data into oracle/Teradata.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Worked on writing automaticsqoop scripts to import/export the data from external sources such as Oracle and Teradata.
  • Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Involved in converting Hive/SQL queries into Spark transformations using SparkRDD'S and Scala.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the tableau.
  • Extensively worked with Scala / Spark SQL for data cleansing and generating Data Frames to transform them into row DF’s to populate the aggregate tables in Hive.
  • Adept at developing generic Spark-Scala methods for transformations and designing schema for rows.
  • Adept at writing efficient Spark-Scala code to generate aggregation functions on Data Frames according to business logic.
  • Designed ETL pipelines of loading data from RDBMS(oracle, teradata) into hive datawarehouse.
  • Designed ETL pipelines of loading flatfiles into hive datawarehouse and doing it in automation by generic script.
  • Worked on supporting ETL datastage jobs in production environment.
  • Worked on exporting from Hadoop hive database or hadoop file from hdfs/AWS to external sources (Oracle).
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in Map Reduce way.
  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Worked on tuning the performance Pig querieson production jobs.
  • Knowledge on presto and analyzed large data sets by running queries.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Worked on getting the data from oracle to HDFS/AWS as file in sqoop and then created a view on top of it for querying the data and then for automating it.
  • Worked on handling special characters in Data using hive and pig.
  • Managed and reviewed Hadoop log files.
  • Worked on automating batch using Jill scripts for autosys.
  • Worked on schedule jobs through autosysand migrating jobs to all Higher environments.
  • Worked on migrating the code to higher environments.
  • Supported ST and prod runs.
  • Implemented partitioning, bucketing and worked on Hive, using file formats and compressions techniques with optimizations.
  • Extensively worked on Text, ORC, Avro and Parquet file formats and compression techniques like Snappy, Gzip and Zlib.
  • Plan, design and launch solution for building Hadoop cluster on cloud by usingEMRandEC2ofAWS
  • Strong experience in working withElastic MapReduceand setting up environments on AmazonAWS EC2instances.
  • Developed Spark scripts using Python, Spark SQL to access hive tables in spark for faster data processing.
  • Extensively used Spark SQL, Pyspark API's for querying and transformation of data residing in Hive.
  • Large data sets were analyzed using Pig scripts and Hive queries.
  • Worked on custom Pig loaders to work with a variety of data formats such as JSON, CSV etc.
  • Extensively used the Teradata utilities like BTEQ, DDL Commands and DML Commands (SQL).
  • Created a BTEQ script for pre population of the work tables prior to the main load process.
  • Performance Tuning of sources, Targets, mappings and SQL queries in transformations.
  • Worked exclusively with the Teradata SQL Assistant to interface with the Teradata
  • Developed Pig Latin scripts to extract data from the web server output files to load into HDFS.
  • Developed Shell Script to perform Data Profiling on the ingested data with the help of hive.
  • Experience in scripting for automation, and monitoring using Shell scripts.
  • Worked on java transformation of reading a hive table and converting those into xml file for CCP.

Confidential, Dublin, OH

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive,Hbase database and Sqoop, Impala, Flume, Cassandra, zookeeper, AWS, Cloudera.
  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it using Map Reduce programs.
  • Implemented Map reduce programs to retrieveresults from unstructured data set.
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from HDFStoMYSQL, Oracle, Teradatausing Sqoop.
  • Worked on reading multiple data formats on HDFS using Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Worked on a POC to compare processing time of Impala with SparkSQL for efficiency batch processing.
  • Developed Spark Applications for various business logics using Scala, Python.
  • Involved in moving the data between HDFS and AWS S3 by using apache distCp.
  • Involved in pulling the data from Amazon S3 data lake and built Hive tables using Hive Context in Spark
  • Involved in running hive queries and spark jobs on data stored in S3.
  • Run short term ad-hoc queries, jobs on the data stored on S3 using AWS EMR. hive
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Built data platforms, pipelines, storage systems using the Apache Kafka, Apache Storm and search technologies such as elastic search.
  • Experienced in implementing POC's to migrate iterative map reduce programs into Spark transformations using Scala.
  • Experienced with batch processing of data sources using Apache Spark, elastic search.
  • Experience in AWS cloud environment and on s3 storage and ec2 instances
  • Good knowledge about Cassandra architecture, read, write paths and query.
  • Developed Spark jobs usingscalain test environment for faster data processing and used Spark SQL for querying.
  • ConfiguredSparkstreaming to receive real time data from theKafkaand store the stream data to HDFS.
  • Designed and implemented SOLR indexes for the metadata that enabled internal applications to reference Scopus content.
  • Used Spark for Parallel data processing and better performances using Scala.
  • Extensively used Pig for data cleansing and extract the data from the web server output files to load into HDFS.
  • Developed a data pipeline usingKafkaand Storm to store data into HDFS.
  • Implemented Kafka Java producers, create custom partitions, configured brokers and implemented High level consumers to implement data platform.
  • Managed and reviewed Hadoop log files.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Installed and configured Pig and also written Pig Latin scripts.
  • Used Maven as the build tool and is scheduled/triggered by Jenkins (build tool).

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Java, SQLSERVER, Sqoop, Java (jdk 1.6), Spark,kafka, AWS, MongoDB, Storm, Cassandra, ETL, Python, REST API, XML, JSON, solr, cloudera, Oracle, Teradata, scala, GIT, Agile, Jenkins, Elastic Search.

Confidential - Memphis, TN

Hadoop Developer

Responsibilities:

  • Worked on writing Map Reduce jobs to discover trends in data usage by customers.
  • Worked on and designed Big Data analytics platform for processing customer interface preferences and comments using Java, Hadoop, Hive, Impala and Pig, Cloudera.
  • Importing and exporting data into HDFS and Hive using Sqoop from Oracle, Teradata and vice versa.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Migrating various hive UDF’s and queries into Spark SQL for faster requests as part of POC implementation.
  • Developed fan-out workflow using flume for ingesting data from various data sources like Webservers, Rest API by using different sources and ingested data into Hadoop with HDFS sink.
  • Implemented Frameworks using java and python to automate the ingestion flow.
  • Experienced in defining job flows to run multiple Map Reduce and Pig jobs using Oozie.
  • Installed and configured Hive and also written Hive QL scripts.
  • Experienced with performing analytics on Time Series data using HBase.
  • Implemented Hive Generic UDF's to implement business logic.
  • Experienced with accessing Hive tables to perform analytics from java applications using JDBC.
  • Experienced in running batch processes using Pig Scripts and developed Pig UDFs for data manipulation according to Business Requirements.
  • Used Maven as the build tool and is scheduled/triggered by Jenkins (build tool).
  • Experience with streaming work flow operations and Hadoop jobs using Oozie workflow and scheduled through AUTOSYS on a regular basis.
  • Performed operation using Partitioning pattern in Map Reduce to move records into different categories.
  • Experienced with multiple file in HIVE, AVRO, Sequence file formats.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Pig Script.

Environment: Casandra, Map jobs, Spark SQL, ETL, Pig Scripts, Flume, Hadoop BI, Pig UDF’s, Oozie, AVRO, Hive, Map Reduce, Java, Eclipse, Zookeeper, Oracle, Python, REST API, JSON, XML, cloudera, GIT, Agile, Jenkins.

Confidential, Indianapolis, IN

Hadoop Developer

Responsibilities:

  • Involved in the Complete Software development life cycle (SDLC) to develop the application.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop, Cassandra, zookeeper.
  • Involved in loading data from LINUX file system to HDFS.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Importing and exporting data into HDFS and Hive using Sqoop from Oracle, Teradata and vice versa.
  • Implemented test scripts to support test driven development and continuous integration.
  • Developed multiple Map Reduce jobs in java for data cleaning.
  • Implemented business logic by writing Pig UDF's in Java and used various UDFs from Piggybanks and other sources.
  • Installed and configured Hadoop Map Reduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in Map Reduce way.
  • Supported MapReduce Programs those are running on the cluster.
  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Worked on tuning the performance Pig queries.
  • Mentored analyst and test team for writing Hive Queries.
  • Installed Oozie workflow engine to run multiple Mapreduce jobs.
  • Worked on zookeeper for coordinating between different master node and datanodes

Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Linux, Java, Oozie, Hbase, zookeeper, SQL SERVER, python, REST API, JSON, XML, Oracle, Teradata, GIT, Agile.

Confidential, Orlando, FL

Java /J2EE Developer

Responsibilities:

  • Work with business users to determine requirements and technical solutions.
  • Followed Agile methodology (Scrum Standups, Sprint Planning, Sprint Review, Sprint Showcase and Sprint Retrospective meetings).
  • Developed business components using core java concepts and classes like Inheritance, Polymorphism, Collections, Serialization and Multithreading etc.
  • Used SPRING framework that handles application logic and makes calls to business make them as Spring Beans.
  • Implemented, configured data sources, session factory and used Hibernate Template to integrate Spring with Hibernate.
  • Developed web services to allow communication between applications through SOAP over HTTP with JMS and mule ESB.
  • Actively involved in coding using CoreJavaand collection API's such as Lists, Sets and Maps
  • Developed a Web Service (SOAP, WSDL) that is shared between front end and cable bill review system.
  • Implemented Rest based web service using JAX-RS annotations, Jersey implementation for data retrieval with JSON.
  • Developed MAVEN scripts to build and deploy the application onto Web logic Application Server and ran UNIX shell scripts and implemented autodeployment process.
  • Used Maven as the build tool and is scheduled/triggered by Jenkins (build tool).
  • Develop JUNIT test cases for application unit testing.
  • Implement Hibernate for data persistence and management.
  • Used SOAP UI tool for testing web services connectivity.
  • Used SVN as version control to check in the code, Created branches and tagged the code in SVN.
  • Used RESTFUL Services to interact with the Client by providing the RESTFUL URL mapping.
  • Used Log4j framework to log/track application and debugging.

Environment: JDK 1.6, Eclipse IDE, Core Java, J2EE, Spring, Hibernate, Unix, Web Services, SOAP UI, Maven, Web logic Application Server, SQL Developer, Camel, Junit, SVN, Agile, SONAR, Log4j, REST, Log 4j, JSON, JBPM.

Confidential

Java Developer

Responsibilities:

  • Involved in analysis, design and development of Expense Processing system.
  • Created used interfaces using JSP.
  • Developed the Web Interface using Servlets, Java Server Pages, HTML and CSS.
  • Developed the DAO objects using JDBC.
  • Business Services using the Servlets and Java.
  • Design and development of User Interfaces and menus using HTML 5, JSP, Java Script, client side and server side validations.
  • Developed GUI using JSP, Struts frame work.
  • Involved in developing the presentation layer using Spring MVC/Angular JS/JQuery.
  • Involved in designing the user interfaces using Struts Tiles Framework.
  • Used Spring 2.0 Framework for Dependency injection and integrated with the Struts Framework and Hibernate.
  • Used Hibernate 3.0 in data access layer to access and update information in the database.
  • Experience in SOA (Service Oriented Architecture) by creating the web services with SOAP and WSDL.
  • Developed JUnit test cases for all the developed modules.
  • Used Log4J to capture the log that includes runtime exceptions, monitored error logs and fixed the problems.
  • Used RESTFUL Services to interact with the Client by providing the RESTFUL URL mapping.
  • Used CVS for version control across common source code used by developers.
  • Used ANT scripts to build the application and deployed on Web logic Application Server 10.0.

Environment: - Struts1.2, Hibernate3.0, Spring2.5, JSP, Servlets, XML,SOAP, WSDL, JDBC, JavaScript, HTML, CVS, Log4J, JUNIT, Web logic App server, Eclipse, Oracle, Restful.

We'd love your feedback!