Sr. Hadoop/spark Developer Resume
Fairfax, VA
SUMMARY:
- 7 years of IT experience in Architecture, Analysis, design, development, implementation, maintenance and support with experience in developing strategic methods for deploying Big Data Technologies to efficiently solve Big Data processing requirement.
- Experience on BIG DATA using HADOOP framework and related technologies such as HDFS, HBASE, MAPREDUCE, HIVE, PIG, FLUME, OOZIE, SQOOP, TALEND and ZOOKEEPER.
- Experience on Apache Spark, STORM and KAFKA.
- Played a vital role in Launching Spark on Yarn and pretty good knowledge on Spark configuration and monitoring Scheduled jobs.
- Worked on Spark Jobs In - order to do Tuning and Optimizing an End to End BENCHMARK which includes setting up different Configuration and LEVERAGING Spark SPECULATION to Identify and Re-Schedule Slow Running Tasks.
- Developed couple of Spark applications with the help of Spark SQL and DATA FRAME API Not only SPARK SQL but also great hands of expertise on Spark Streaming.
- Helped the big data analytics team with implementation of Python scripts for Sqoop, Spark and Hadoop batch data streaming.
- Wrote Python code within the Hadoop framework to solve Natural Language Processing problems.
- Leveraged Spark (Pyspark) to manipulate unstructured data and apply text mining on user's table utilization data.
- Designed Data Quality Framework to perform schema validation and data profiling on Spark (Pyspark).
- Involved in performing the Linear Regression using Scala API and Spark .
- Experienced in working with Amazon Web Services (AWS) using EC2 for computing, S3 as storage mechanism and deployment of Hadoop cluster using Puppet Tool.
- Responsible for Hadoop production support to run the Hadoop autosys job's and validate the data and communicate to business.
- Experienced in writing MapReduce programs that work with different file formats like Text, Sequence, Xml, parquet and Avro.
- Extensively used Oozie workflow engine to run multiple Hive and Pig jobs.
- Strong Knowledge of using PIG and Hive for processing and analyzing large volumes of data.
- Knowledge in creating the TABLEAU Dashboards with relational and multi-dimensional databases including Oracle, MYSQL and HIVE, gathering and manipulating data from various sources.
- Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa according to client's requirement.
- Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC, SOAP and RESTful web services.
- Very Good understanding and Working Knowledge of Object Oriented Programming (OOPS),
- Multithreading in Core Java, J2EE, Web Services (REST, SOAP), JDBC, Java Script and JQuery.
- Experience in Scrum, Agile and Waterfall models.
PROFESSIONAL EXPERIENCE:
Confidential- Fairfax, VA
Sr. Hadoop/Spark Developer
Responsibilities:
- Responsible for design development of Spark SQL Scripts based on Functional Specifications.
- Responsible for Spark Streaming configuration based on type of Input Source.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Import the data from different sources like HDFS/HBase into SparkRDD.
- Experienced with Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark -SQL, Data Frames and Pair RDD's.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Developed the services to run the Map-Reduce jobs as per the requirement basis.
- Importing and exporting data into HDFS and HIVE, PIG using Sqoop.
- Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and also written Pig/Hive UDFs.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Developing design documents considering all possible approaches and identifying best of them.
- Loading Data into HBase using Bulk Load and Non-bulk load.
- Developed scripts and automated data management from end to end and sync up b/w all the clusters.
- Assisted the Hadoop team with developing Map-Reduce scripts in Python .
- Analyzed the SQL scripts and designed the solution to implement using Pyspark.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
Environment: Hive, HBase, Flume, Java, Impala, Splunk, Pig, Spark, Oozie, Oracle, Yarn, GitHub, Junit, Tableau, Unix, Cloudera, Flume, Sqoop, HDFS, Java, Scala, Python.
Confidential - Indianapolis, IN
Sr. Hadoop/Spark Developer
Responsibilities:
- Collaborate with the Internal/Client BA's in understanding the requirement and architect a data flow system.
- Developed complete End to End Bigdata processing in Hadoop echo system.
- Optimized hive scripts to use HDFS efficient by using various compression mechanisms.
- Developed Spark code using Scala and Spark -SQL/Streaming for faster processing of data.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Worked in writing Spark SQL scripts for optimizing the query performance.
- Created algorithms for all complex Map and reduce functionalities of all MapReduce programs.
- Written Sqoop scripts to import and export data in various RDBMS systems.
- Written PIG scripts to process unstructured data and available to process in Hive.
- Created hive schemas using performance techniques like partitioning and bucketing.
- Used SFTP to transfer and receive the files from various upstream and downstream systems.
- Configured Unix service id's and AD groups in all the environments (DEV, SIT, UAT and PROD) to access the resources based on the AD groups.
- Developed Python code to provide data analysis and generate complex data report.
- Utilized Python Panda Frame to provide data analysis.
- Utilized Python regular expressions operation (NLP) to analysis customer review.
- Developed MapReduce jobs in Python for data cleaning and data processing.
- Analyzed the SQL scripts and designed the solution to implement using PYSPARK.
- Worked in exporting data from Hive tables into Netezza database.
- Worked with Hadoop administration team for configuring servers at the time of cluster migration.
Environment: Map Reduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Splunk, Kafka, Oracle 11g, Netezza, Cloudera, Eclipse, Python, Scala, Spark SQL, Tableau, Teradata, Unix Shell Scripting
Confidential - Fishers, IN
Hadoop Developer
Responsibilities:
- Worked extensively in creating MAPREDUCE jobs to power data for search and aggregation. Designed a data warehouse using HIVE.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Using HIVE, MAPREDUCE, and loaded data into HDFS.
- Worked with systems engineering team to plan and deploy new HADOOP environments and expand existing HADOOP clusters.
- Monitored workload, job performance and capacity planning using CLOUDERA Manager.
- Worked extensively with SQOOP for importing metadata from Oracle.
- Experience in working with application teams to install operating system, HADOOP updates, patches, version upgrades as required.
- Extensively used PIG for data cleansing.
- Created partitioned tables in HIVE.
- Worked with business teams and created Hive queries for ad hoc access.
- Evaluated usage of OOZIE for Workflow Orchestration.
- Mentored analyst and test team for writing Hive Queries.
- Strong Knowledge in writing MAPREDUCE programs with Java API to cleanse Structured and un-structured data.
- Experience in RDMS such as Oracle, Teradata.
- Worked on loading the data from MYSQL to HBASE where necessary using SQOOP.
- Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
- Gained very good business knowledge on claim processing, fraud suspect identification, appeals process etc.
- Environment: Hadoop, Map Reduce, Hive, Pig, Sqoop, Aws, Java, Oozie, MySQL.
Confidential - Fishers, IN
Hadoop Developer
Responsibilities:
- Importing and exporting data into HDFS and HIVE using SQOOP.
- Used Bash Shell Scripting, SQOOP, AVRO, Hive, Pig, Java, Map Reduce daily to develop ETL, batch processing, and data storage functionality.
- Used PIG to do data transformations, event joins and some PRE-AGGREGATIONS before storing the data on the HDFS.
- Exploited HADOOP MYSQL-Connector to store Map Reduce results in RDBMS.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Worked on loading all tables from the reference source database schema through SQOOP.
- Worked on designed, coded and configured server side J2EE components like JSP, AWS and JAVA.
- Collected data from different databases (i.e. Oracle, MYSQL) to HADOOP.
- Used OOZIE and ZOOKEEPER for workflow scheduling and monitoring.
- Worked on Designing and Developing ETL Workflows using Java for processing data in HDFS/HBASE using OOZIE.
- Experienced in managing and reviewing HADOOP log files.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using SQOOP imports.
- Working on extracting files from MYSQL through SQOOP and placed in HDFS and processed.
- Supported Map Reduce Programs those running on the cluster.
- Cluster coordination services through ZOOKEEPER.
- Involved in loading data from UNIX file system to HDFS.
- Created several Hive tables, loaded with data and wrote Hive Queries in order to run internally in Map Reduce.
- Developed Simple to complex Map Reduce Jobs using HIVE and PIG.
Environment: Apache Hadoop, Aws, MapReduce, HDFS, Hive, Java, SQL, Pig, Zookeeper, Java (Jdk1.6), Flat Files, Oracle 11g/10g, MySQL, Windows NT, Unix, Sqoop, Hive, Oozie, HBase.
Confidential - El Segundo, CA
Hadoop Developer
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Map Reduce, Hive, Pig and Sqoop.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond
accordingly to any warning or failure conditions.
- Managing and scheduling Jobs on a Hadoop cluster.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Developed a Map reduce program to make the structured data.
- Written Hive queries for data analysis to meet the business requirements.
- Created Hive tables and worked on them using Hive QL.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Managed and reviewed Hadoop log files.
- Scheduled the MapReduce and Pig jobs using Oozie work flow.
- Created Hive tables and worked on them using Hive QL.
- Created Dimensions and Measures in Tableau.
- Created Analyzed reports using Tableau Desktop and Tableau server.
Environment: HDFS, Hadoop, Pig, Hive, Sqoop, Map Reduce, Oozie, Java 6/7, SQL Server 2012, Tableau Server, UNIX Shell Scripting, Agile Methodology.
Confidential
Java / SQL Developer
Responsibilities:
- Involved extensively in Requirement elicitation and analysis.
- Creating the SSIS Packages and Stored procedures.
- Worked extensively in Performance tuning and Query optimization.
- Co-Ordinate with the offshore team.
- Involved in Client Business Meetings.
- Investigating, analyzing and documenting the reported defects.
- Implemented stored procedures, views, synonyms and functions.
- Creating, documenting and performing unit-test plans to ensure the Quality of the product.
- Played a key role in preparing LLD and Functional Specification documents.
Environment: SQL Server 2008, Business Intelligence Studio (BIDS), SSIS, Windows XP, JAVA, JQUERY, Java Script, JSP, Servlets.