Spark Developer Resume
New, YorK
SUMMARY
- Around 8years of professional IT experience which includes 4+ years ofexperience in Big data space with hands on expertise in development on Hadoop Platform and Java.
- Expertise in executing best - in-class risk models and decision logic in Splunk.
- Extensive experience withSplunkSearching and Reporting modules, Knowledge Objects, Administration, Add-On's, Dashboards, Clustering and Forwarder Management, Visualizations, alerts, reports.
- Extensive knowledge aboutSplunk/Hunkarchitecture and its various components (indexer, forwarder, search head, deployment server, virual indexers,providers), Heavy and Universal forwarder, License model
- Created and ManagedSplunkDatabase connect Identities, Database Connections, Database Inputs, Outputs, lookups, access controls
- Proficiency in Java, Hadoop Map Reduce, Pig, Hive, Oozie, Sqoop, Flume, Zookeeper, Impala and NoSQL Database.
- Good exposure on usage of NoSQL database column-oriented, HBase.
- Extensive experience writing custom Map Reduce programs for data processing and UDFs for both Hive and Pig in Java.
- Strong experience in analyzing large amounts of data sets writing Pig scripts and Hive queries.
- Extensive experience in working with structured data using Hive QL, join operations, writing custom UDF’s and experienced in optimizing Hive Queries.
- Extensive experiences in working with semi/unstructured data by implementing complex map reduce programs using design patterns.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database.
- Experience in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, telnet sourcesetc.
- Adequate knowledge and working experience in Agile & Waterfall methodologies.
- Great team player and quick learner with effective communication, motivation, and organizational skills combined with attention to details and business improvements.
TECHNICAL SKILLS
Hadoop/Big Data Technologies: Splunk, Hunk, Forwarder, DB connect, HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, Apache Spark, Python, impala, Zookeeper and Cloudera Manager, MapRclusters, Hbase, Amazon Web Services
Monitoring and Reporting: Tableau, Jaseprsoft
Build Tools: SQL server management studio, Eclipse
Programming & Scripting: Core JAVA, C, SQL, Shell Scripting
Databases: Microsoft SQL server, Teradata, MySQL
PROFESSIONAL EXPERIENCE
Confidential, New York
Spark Developer
Responsibilities:
- Developed a PySpark code for saving data in to AVRO and Parquet format and building hive tables on top of them.
- Developed an equivalent PySpark code for existing SAS code to extract summary insights on the hive tables.
- Responsible for datatype, count and header validations for the ingested data.
- Assisted team in code reviews bug fixes.
- Responsible for writing RESTful services to invoke and run the Apache NiFi process.
- Configured NiFiingestion tool for dynamic parameterization using python script and JSON files.
Environment: Hadoop, HDP, My Eclipse IDE, Python 2.7, PySpark, Hive, Sqoop, Shell Scripting, Linux.
Confidential, New York
Data Architect
Responsibilities:
- Involved in modeling different key risk indicators in Splunkand building extensive Hive queries to understand customer behavior across the customer life cycle.
- Converting existing hive queries to Spark SQL queries to reduce execution time.
- Successfully implemented Proof of concept in Splunk on risk modeling which covers 3 different risk types such as Credit, Operational and Compliance.
- Extensively used various risk reporting tools such as Tableau and Jasepersoft to understand risk types and levels at Confidential .
- CreatedReports, Alerts and Dashboardsin Splunk which demonstrate various risk levels.
- Installed and configured heavy, universal, and intermediate forwarders to bring customer data from production systems.
- Created and ManagedSplunkDB connect Identities, Database Connections, Database Inputs, Outputs, lookups, access controls.
- Designing and maintaining production-qualitySplunkdashboards.
- Splunkconfiguration that involves different web application and batch, create Saved search and summary search, summary indexes.
- Experience with search ahead clustering and Index clustering.
- Extracted various fields using field extractor, field extractions (rex) and calculated fields to optimize the search performance and reduce the load on the search ahead.
- Configured various summary indexes by created saved searches to collect the aggregated data to run create dashboards on top of summary index.
- IntegratedSplunkwith Global Alert Repository to show alerts to executive leaders at Confidential .
- Use techniques to optimize searches for better performance, Search time vs. Index time field extraction. And understanding of configuration files, precedence and working.
- Lead the team in actively implementing smartSplunksolutions.
- In depth experience with props.conf, transforms. conf, inputs.conf
- Assisted various other power users in optimizing the searches.
- Configured Hunk to read customer transaction data from Hadoop Ecosystems such as HDFS and Hive.
Environment: Splunk 6.4.1, Hunk 6.4., DB connect v2.0, HDP MapR 3.1, YARN, Hive 1.2.1, UNIX Shell Scripting, Teradata, MS SQL server 2014.
Confidential
Big Data Engineer
Responsibilities:
- Design and develop data ingestion framework using Hadoop stacks and expertise in analyzing the logs and diagnosis the issues
- Used Flume for log analysis
- Used sequence and AVRO file formats and snappy compressions while storing data in HDFS
- Developed UNIX scripts to download files from FTP to MELD HDFS and load the data into stage and base hive tables after partitioning and bucketing
- Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV, JSON
- Designed the framework for historical/incremental load
- Created Hive tables to store the processed results in a tabular format in Base Schema Developed pig scripts to perform ETL operations and write UDFs if needed
- Importing data into HDFS and HIVE using Sqoop from Teradata and Oracle databases
- Worked on migrating projects from MapR to Confidential Works
Environment: Centos 6.4, JDK 1.7, HDP 2.1, YARN, Sqoop 1.4.4, Pig0.12, Hive 0.12, Flume1.4.0,Ambari, UNIX Shell Scripting, WinSCP, Teradata, Oracle 11.6.
Confidential, Bentonville, AR.
Hadoop Developer
Responsibilities:
- Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
- Worked on automation of delta feeds from, Teradata using Sqoop, also from FTP Servers to Hive.
- Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analysed the imported data using Hadoop Components
- Established custom MapReduce programs in order to analyze data and used Pig Latin to clean unwanted data
- Did various Performance tuning like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Participated in requirement gathering from the Experts and Business Partners and converting the requirements into technical specifications
- Implemented daily workflow for extraction, processing and analysis of data with Oozie.
- Involved in loading data from LINUX file system to HDFS.
Environment: Hadoop, Pig, Hive, Sqoop, Flume, MapReduce, HDFS, LINUX, Oozie.
Confidential
SQL/JAVA Developer
Responsibilities:
- Involved in database design.
- Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, Stored procedures, Views in Oracle 10g.
- Created User Interface using JSP.
- Involved in integration testing the Business Logic layer and Data Access layer.
- Used technologies like JSP, JavaScript, HTML, XML for Presentation tier
- Involved in JUnit testing of the application using JUnit framework.
- ImplementedStored Procedures functions and views to retrieve the data.
- Responsible to mentor/work with team members to make sure the standards and guidelines are followed and delivery of tasks in time.
Environment: JSP, Servlets, JDBC, JAVA, Eclipse, UNIX, SQL
