We provide IT Staff Augmentation Services!

Spark Developer Resume

3.00/5 (Submit Your Rating)

New, YorK

SUMMARY

  • Around 8years of professional IT experience which includes 4+ years ofexperience in Big data space with hands on expertise in development on Hadoop Platform and Java.
  • Expertise in executing best - in-class risk models and decision logic in Splunk.
  • Extensive experience withSplunkSearching and Reporting modules, Knowledge Objects, Administration, Add-On's, Dashboards, Clustering and Forwarder Management, Visualizations, alerts, reports.
  • Extensive knowledge aboutSplunk/Hunkarchitecture and its various components (indexer, forwarder, search head, deployment server, virual indexers,providers), Heavy and Universal forwarder, License model
  • Created and ManagedSplunkDatabase connect Identities, Database Connections, Database Inputs, Outputs, lookups, access controls
  • Proficiency in Java, Hadoop Map Reduce, Pig, Hive, Oozie, Sqoop, Flume, Zookeeper, Impala and NoSQL Database.
  • Good exposure on usage of NoSQL database column-oriented, HBase.
  • Extensive experience writing custom Map Reduce programs for data processing and UDFs for both Hive and Pig in Java.
  • Strong experience in analyzing large amounts of data sets writing Pig scripts and Hive queries.
  • Extensive experience in working with structured data using Hive QL, join operations, writing custom UDF’s and experienced in optimizing Hive Queries.
  • Extensive experiences in working with semi/unstructured data by implementing complex map reduce programs using design patterns.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database.
  • Experience in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, telnet sourcesetc.
  • Adequate knowledge and working experience in Agile & Waterfall methodologies.
  • Great team player and quick learner with effective communication, motivation, and organizational skills combined with attention to details and business improvements.

TECHNICAL SKILLS

Hadoop/Big Data Technologies: Splunk, Hunk, Forwarder, DB connect, HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, Apache Spark, Python, impala, Zookeeper and Cloudera Manager, MapRclusters, Hbase, Amazon Web Services

Monitoring and Reporting: Tableau, Jaseprsoft

Build Tools: SQL server management studio, Eclipse

Programming & Scripting: Core JAVA, C, SQL, Shell Scripting

Databases: Microsoft SQL server, Teradata, MySQL

PROFESSIONAL EXPERIENCE

Confidential, New York

Spark Developer

Responsibilities:

  • Developed a PySpark code for saving data in to AVRO and Parquet format and building hive tables on top of them.
  • Developed an equivalent PySpark code for existing SAS code to extract summary insights on the hive tables.
  • Responsible for datatype, count and header validations for the ingested data.
  • Assisted team in code reviews bug fixes.
  • Responsible for writing RESTful services to invoke and run the Apache NiFi process.
  • Configured NiFiingestion tool for dynamic parameterization using python script and JSON files.

Environment: Hadoop, HDP, My Eclipse IDE, Python 2.7, PySpark, Hive, Sqoop, Shell Scripting, Linux.

Confidential, New York

Data Architect

Responsibilities:

  • Involved in modeling different key risk indicators in Splunkand building extensive Hive queries to understand customer behavior across the customer life cycle.
  • Converting existing hive queries to Spark SQL queries to reduce execution time.
  • Successfully implemented Proof of concept in Splunk on risk modeling which covers 3 different risk types such as Credit, Operational and Compliance.
  • Extensively used various risk reporting tools such as Tableau and Jasepersoft to understand risk types and levels at Confidential .
  • CreatedReports, Alerts and Dashboardsin Splunk which demonstrate various risk levels.
  • Installed and configured heavy, universal, and intermediate forwarders to bring customer data from production systems.
  • Created and ManagedSplunkDB connect Identities, Database Connections, Database Inputs, Outputs, lookups, access controls.
  • Designing and maintaining production-qualitySplunkdashboards.
  • Splunkconfiguration that involves different web application and batch, create Saved search and summary search, summary indexes.
  • Experience with search ahead clustering and Index clustering.
  • Extracted various fields using field extractor, field extractions (rex) and calculated fields to optimize the search performance and reduce the load on the search ahead.
  • Configured various summary indexes by created saved searches to collect the aggregated data to run create dashboards on top of summary index.
  • IntegratedSplunkwith Global Alert Repository to show alerts to executive leaders at Confidential .
  • Use techniques to optimize searches for better performance, Search time vs. Index time field extraction. And understanding of configuration files, precedence and working.
  • Lead the team in actively implementing smartSplunksolutions.
  • In depth experience with props.conf, transforms. conf, inputs.conf
  • Assisted various other power users in optimizing the searches.
  • Configured Hunk to read customer transaction data from Hadoop Ecosystems such as HDFS and Hive.

Environment: Splunk 6.4.1, Hunk 6.4., DB connect v2.0, HDP MapR 3.1, YARN, Hive 1.2.1, UNIX Shell Scripting, Teradata, MS SQL server 2014.

Confidential

Big Data Engineer

Responsibilities:

  • Design and develop data ingestion framework using Hadoop stacks and expertise in analyzing the logs and diagnosis the issues
  • Used Flume for log analysis
  • Used sequence and AVRO file formats and snappy compressions while storing data in HDFS
  • Developed UNIX scripts to download files from FTP to MELD HDFS and load the data into stage and base hive tables after partitioning and bucketing
  • Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV, JSON
  • Designed the framework for historical/incremental load
  • Created Hive tables to store the processed results in a tabular format in Base Schema Developed pig scripts to perform ETL operations and write UDFs if needed
  • Importing data into HDFS and HIVE using Sqoop from Teradata and Oracle databases
  • Worked on migrating projects from MapR to Confidential Works

Environment: Centos 6.4, JDK 1.7, HDP 2.1, YARN, Sqoop 1.4.4, Pig0.12, Hive 0.12, Flume1.4.0,Ambari, UNIX Shell Scripting, WinSCP, Teradata, Oracle 11.6.

Confidential, Bentonville, AR.

Hadoop Developer

Responsibilities:

  • Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
  • Worked on automation of delta feeds from, Teradata using Sqoop, also from FTP Servers to Hive.
  • Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis
  • Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analysed the imported data using Hadoop Components
  • Established custom MapReduce programs in order to analyze data and used Pig Latin to clean unwanted data
  • Did various Performance tuning like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
  • Participated in requirement gathering from the Experts and Business Partners and converting the requirements into technical specifications
  • Implemented daily workflow for extraction, processing and analysis of data with Oozie.
  • Involved in loading data from LINUX file system to HDFS.

Environment: Hadoop, Pig, Hive, Sqoop, Flume, MapReduce, HDFS, LINUX, Oozie.

Confidential

SQL/JAVA Developer

Responsibilities:

  • Involved in database design.
  • Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, Stored procedures, Views in Oracle 10g.
  • Created User Interface using JSP.
  • Involved in integration testing the Business Logic layer and Data Access layer.
  • Used technologies like JSP, JavaScript, HTML, XML for Presentation tier
  • Involved in JUnit testing of the application using JUnit framework.
  • ImplementedStored Procedures functions and views to retrieve the data.
  • Responsible to mentor/work with team members to make sure the standards and guidelines are followed and delivery of tasks in time.

Environment: JSP, Servlets, JDBC, JAVA, Eclipse, UNIX, SQL

We'd love your feedback!