Hadoop/spark Developer Resume
Irving, TX
SUMMARY
- Over 9+ years of strong experience with Hadoop and Mainframe Operations and worked on enterprise applications using Hadoop components like HDFS, Map Reduce(YARN), SQOOP, PIG, HIVE, HBase, Oozie, Apache Kafka, Spark with Scala, Cassandra and Tableau.
- Strong working experience in Mainframe Production support environment.
- 4+ years of exclusive experience in Hadoop and its components like HDFS, Map Reduce(yarn), Apache Pig, Hive, Sqoop, HBase and Oozie, Cassandra
- Expertise in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Good working knowledge with Apache Hive and PIG
- Expertise in creating Hivetables and Hivequeries using HiveQL
- Strong exposure towards Performance tuning and optimization techniques in Hive
- Good working knowledge with Oozie for scheduling jobs
- Good hands-on experience in Apache Spark with Scala
- Involved in writing the Pig scripts to reduce the job execution time
- Used Hbase in accordance with PIG/Hive as and when required for real time low latency queries.
- Good understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Good understanding of HDFS Designs, Daemons and HDFS high availability (HA).
- Experience in installation, configuration, supporting and managing - Cloudera Hadoop platform along with CDH4&5 clusters.
- Good knowledge on NOSQL Data bases such as HBase, Cassandra.
- Experience in preparing reports and dashboards using data visualizations tools like Tableau
- Expertise in trouble shooting and bug reporting using defect tracking tools.
TECHNICAL SKILLS:
Programming languages: Core Java, ScalaHADOOP/BIG DATAHDFS, MapReduce, Yarn, Hive, Pig, HBase, Sqoop, Flume, Oozie, Zoo keeper, Apache Spark, Kafka
Mainframes Operations Tools: CA7, JOB TRAC,TWS-OPC, Control-M, Net Cool.
Presentation Tools: WebEx, Skype, Team Viewer
Databases: Oracle 10g, DB2, NoSQL,Cassandra
IDE Tools: Eclipse 3.x, NetBeans 5.0/5.5
Documentation Tools: MS Word, MS Visio.
Visualization tools: Tableau
Operating Systems: Windows XP/2000/NT/98/95, UNIX, LINUX
PROFESSIONAL EXPERIENCE
Hadoop/Spark Developer
Confidential, Irving,TX
Responsibilities:
- Developed and executed shell scripts to automate the jobs
- Wrote complex Hive queries.
- Worked on reading multiple data formats on HDFS using Scala
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL
- Analyzed the SQL scripts and designed the solution to implement using Scala
- Involved in loading data from UNIX file system to HDFS
- Extracted the data from Databases into HDFS using Sqoop
- Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS.
- Manage and review Hadoop log files.
- Involved in analysis, design, testing phases and responsible for documenting technical specifications
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
- Worked on the core and Spark SQL modules of Spark extensively..
- Involved in im porting the real time data to hadoop using Kafka and implemented the Oozie job for daily imports.
Environment: Hadoop, HDFS, Hive, Scala, Spark, SQL, Pig,Tableau
Hadoop Developer
Confidential - Plano, TX
Responsibilities:
- Installed and configured Hadoop on a cluster.
- Developed multiple MapReduce Jobs in java for data cleaning and pre-processing
- Developed Simple to complex Map Reduce Jobs using Hive and Pig
- Extending Hive and Pig core functionality by writing custom UDFs
- Analyzed large data sets by running Hive queries and Pig scripts
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in mapreduce way.
- Experienced in defining job flows using Oozie
- Experienced in managing and reviewing Hadoop log files
- Experienced in collecting, aggregating, and moving large amounts of streaming data into HDFS using Flume.
- Load and transform large sets of structured, semi structured and unstructured data
- Responsible to manage data coming from different sources and application
- Working Knowledge in NoSQL Databases like HBase and Cassandra.
- Good Knowledge of analyzing data in HBase using Hive and Pig.
- Involved in Unit level and Integration level testing.
- Prepared design documents and functional documents.
- Based on the requirements, addition of extra nodes to the cluster to make it scalable.
- Involved in running Hadoop jobs for processing millions of records of text data
- Involved in loading data from local file system (LINUX) to HDFS
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Responsible to manage data coming from different sources.
- Assisted in exporting analysed data to relational databases using Sqoop
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing
- Submit a detailed report about the daily activities on a weekly basis.
Environment: Hadoop-HDFS, Pig, Sqoop, HBase, Hive, Flume MapReduce, Cassandra, Oozie and MySql
System Analyst
Confidential, Richardson, TX
Responsibilities:
- Worked on importing data from various sources and performed transformations using MapReduce, Hive to load data into HDFS.
- Configured Sqoop jobs to import data from RDBMS into HDFS using Oozie workflows.
- Worked on setting up Pig, Hive and HBase on multiple nodes and developed using Pig, Hive, HBase and MapReduce.
- Solved small file problem using Sequence files processing in Map Reduce.
- Written various Hive and Pig scripts.
- Created HBase tables to store variable data formats coming from different portfolios.
- Performed real time analytics on HBase using Java API and Rest API.
- Implemented HBase Co - processors to notify Support team when inserting data into HBase Tables.
- Worked on compression mechanisms to optimize MapReduce Jobs.
- Analyzed the customer behavior by performing click stream analysis and to ingest the data used flume.
- Experienced with working on Avro Data files using Avro Serialization system.
- Implemented business logic by writing UDF's in Java and used various UDF's from Piggybanks and other sources.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Horton works, Map Reduce, HBase, HDFS, Hive, Pig, SQL, Cloudera Manager, Sqoop, Flume, Oozie.Senior Operations Professional (
Confidential
Responsibilities:
- Monitoring consoles of 17 LPAR’s which includes Production, finance, networking and testing
- performing health checks for jobs in regular intervals
- Suppress and re-instate the jobs
- Hold and release the jobs.
- Ad-hoc request to Schedule Particular applications
- Moving Elements from Development to Production.
- Manage JES spool and analyze hardware and software problem informing to the support team.
- Display / Start/ Stop CICS.
- Perform IPL on all the LPARs during maintenance slots and troubleshooting during the IPL.
- Perform pre and post health checkups before and after the IPL.
Environment: CA7,JCl, mainframes hardware and Z O/SEngineer (Mainframes Operations)
Confidential
Responsibilities:
- Console monitoring of all LPARs (Production, Development and Testing).
- Monitoring batch jobs, Subsystems, Online regions & Databases.
- Taking care of each production batch job abends by following operator instruction and escalating to appropriate support group if required.
- ABEND fixing by rerunning, purging, deleting, place jobs on hold/confirm status also rescheduling job as per requirement.
- Bringing down and bringing up Subsystems & Online regions like-CICS as per requirement.
- Monitoring and responding to Outstanding Replies (WTORs) and also checking Contention regularly.
- Raising tickets for Critical Servers and assigning to appropriate resolver group.
- Bringing down and bringing up Subsystems & Online regions like-CICS & IMS as per requirement.
- Preparing weekly and monthly Report, Dashboard for client.
- Saturday morning ‘tie in’ on Old World AXA Life TWS.
- Technical window on weekends