We provide IT Staff Augmentation Services!

Senior Big Data Developer And Architect Resume

4.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY:

  • 12 years of IT experience (software engineering) on Application design, development, migration, integration and maintenance on Hadoop, Spark and Java platform in financial domain
  • 4 years of Big Data Engineering experience in Hadoop & Spark
  • Commendable knowledge on Spark architecture including Spark Core, Spark SQL, DataFrames, Spark Streaming, Spark MLlib, Spark GraphX APIs
  • In depth understanding of Hadoop Architecture including YARN and various components such as HDFS, Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts
  • Worked extensively with CDH (Cloudera Distribution Including Apache Hadoop)
  • Experience in installing, configuring Hadoop components like Map Reduce, HDFS, HBase, ZooKeeper, Oozie, Hive, Sqoop, Pig, Flume using Cloudera Distributed Platform
  • Worked in hadoop & spark system engineering teams to define various design & implementation standards
  • Experienced in writing Spark programs/application in Scala using Spark APIs for Data Extraction, Transformation and Aggregation
  • Expertise in processing large sets of structured, semi - structured data in Spark & Hadoop, and store them in HDFS
  • Experience in converting SQL queries into Spark Transformations using Spark RDDs, DataFrames and Scala, and performed map-side joins on RDD's
  • Experienced in Spark SQL and Spark DataFrames using Scala
  • Experience in creating Real-Time Data streaming solutions using Apache Spark Streaming
  • Experience in creating DStreams from sources like Flume, Kafka and performed different Spark transformations and actions on it
  • Experienced in Spark Framework on both batch and real-time data processing
  • Experience in developing Kafka Consumer API using Spark Scala applications
  • Experienced in using Sqoop to import and export data from different RDBMS Servers like MySQL, Oracle and Teradata into HDFS and Hive
  • Developed MapReduce programs in Java for data cleansing, data filtering, and data aggregation
  • Expertise in working with Hive - creating tables, data distribution by implementing Partitioning and Bucketing, developing, tuning & optimizing the HQL queries
  • Worked in loading data into Hive tables and writing Hive adhoc queries that will run internally in MapReduce and different execution engines like Spark (Hive on Spark)
  • Experienced in analyzing data using PIG Latin scripts
  • Experienced in designing tables and views for reporting using Impala
  • Experience in developing and designing POCs for Spark cluster, compared the performance of Spark, with Hive, PIG and MapReduce
  • Performance tuning of Spark jobs by changing the configuration properties and using broadcast variables
  • Experienced in Spark optimization improvement of the existing algorithms in Hadoop using Spark Context, Spark-SQL, DataFrame, Pair RDD's and YARN
  • Experienced in writing Unix/Shell Scripting for various functionalities
  • Experienced in automating Sqoop, Hive, Java, MapReduce,Shell scripting etc using Oozie workflow
  • Worked on Platform migration - Mainframe, Teradata system decommissioning, and brining entire data to HDFS
  • Experienced with Maven, Jenkins, continuous building environments
  • Hands on experience on Amazon Web Services (AWS) components like Amazon Ec2 instances, S3 buckets.
  • Experienced in working with different file formats - Avro, Parquet, fixed length, EBCDIC, text file, XML, JSON, CSV
  • Experience with different RDBMS databases using DB2,Oracle,Teradata,MySQL and Exadata
  • Good understanding of algorithms, data structures, performance optimization techniques and object-oriented programming
  • Experience with web service API’s (REST, SOAP)
  • Worked in different compression techniques like Gzip, LZO, Snappy and Bzip2
  • Strong knowledge in Wealth Management platform, Financial and Brokerage firms
  • Good working experience with various SDLC methodologies using both Agile and waterfall Model
  • Excellent Team building, Analytical, Interpersonal and communication skills
  • Ability to work on multiple software systems, ability to quickly learn new technologies, adapt to new environments, self-motivated, team player

TECHNICAL SKILLS:

Primary skills: Spark core, RDDs, DataFrames, Spark SQL, Spark Streaming, Hadoop, HDFS, Yarn, MapReduce, Pig, Hive, Impala, Sqoop, Oozie, NoSQL, HBase, Java, scala, unix/shell scripting

Additional skills: Teradata, Oracle, SQL server, MySQL, Netezza, DB2, MS Access, python, hue, avro, parquet, Docker, BlinkDB, Cassandra, MongoDB, Splunk, Kafka, SQL, Unix, Linux, Windows 2007/2000/XP, MS DOS, IBM OS/390 O/S MVS/ESA, z/OS, Autosys, IBM TWS (Tivoli Workflow Scheduler), OOPs (Object Oriented Programming), Kerberos, Jenkins, bash, COBOL, JCL, REXX, CICS, J2EE, AWS, EC2, S3, AMI, Web Services, REST, HTML, XML, SOLA (Service Oriented Legacy Architecture), Javascript, Stored Procedures, MQ, working experience with web-services, NDM, sFTP, SVN, Visual Source Safe, Git, Eclipse, Teradata SQL Assistance, TOAD, Tectia, CA workload automation iXP, File-aid, DFSORT, Endevor, CA Panvalet, VISIO, SPUFI, INTERTEST, Via-Soft, IBM Debug tool, Xpeditor, PLATINUM, MS office, Lotus notes, REXX, Easytrieve, File Manager, Abend-aid, MS Visual Basic, Visual Source Safe (VSS), OPC & CA-7 Scheduler, IBM RDz (Rational Developer for System z), Mainframe Express (MFE), Sprint Tool Suite, Quality Centre

PROFESSIONAL EXPERIENCE:

Confidential, Dallas, TX

Senior Big Data Developer and Architect

Roles and Responsibilities:

  • Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, analysing, processing & publishing the data for downstreams.
  • Created HLD for the enhancement or new development.
  • Developed Spark RDD transformations, actions, and DataFrame's, case classes for the required input data and performed the data transformations using Spark-Core.
  • Used Scala programming as well to perform transformations and applying business logic.
  • Used Hive queries in Spark-SQL for analysis and processing the data.
  • Creating Hive tables, loaded data and wrote Hive queries, which will invoke and run MapReduce tasks in the background.
  • Implemented partitioning, dynamic partition in Hive.
  • Worked with ORC, Parquet & json Hive tables.
  • Importing data into HDFS using Sqoop.
  • Worked with Spark Context, Spark-SQL, DataFrames, Pair RDD's, Spark Streaming.
  • Created UDFs for specific functionalities.
  • Developed shell scripts for running Hive scripts.
  • Used Tivoli Workflow Scheduler (TWS) for scheduling the jobs.

Environment: s: Hortonworks, HDFS, MapReduce, hive, sqoop, Java core, scala, Spark, unix shell scripting, orc, parquet, json, SQL, IBM TWS, Oracle, Unix, Linux, SVN, Eclipse, WinSCP

Confidential, Charlotte, NC

Senior Spark and Hadoop Developer

Roles and Responsibilities:

  • Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, analysing, processing & publishing the data for downstreams.
  • Developed Spark RDD transformations, actions, and DataFrame's, case classes for the required input data and performed the data transformations using Spark-Core.
  • Used Scala programming as well to perform transformations and applying business logic.
  • Used Hive queries in Spark-SQL for analysis and processing the data.
  • Creating Hive tables, loaded data and wrote Hive queries, which will invoke and run MapReduce tasks in the backend.
  • Implemented partitioning, dynamic partition, indexing and buckets in Hive.
  • Used custom SerDe’s in hive.
  • Worked with Parquet & Avro Hive tables with Snappy compression.
  • Importing and exporting data from/into HDFS using Sqoop.
  • Converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
  • Worked with Spark Context, Spark-SQL, DataFrames, Pair RDD's, Spark Streaming.
  • Involved in designing and developing HBase tables and storing aggregated data from Hive table.
  • Developed MapReduce programs and pig & hive scripts for large data processing.
  • Created UDFs, UDAFs, UDTFs for specific functionalities to be used across teams.
  • Worked with NDM, sFTP connection (passwordless SSH using RSA key) set-ups, and file transfers using it.
  • Developed shell scripts for running Hive scripts in Hive and Impala.
  • Created oozie workflows to run spark, hive, pig, unix shell scripts, mapreduce, java programs.
  • Used Autosys instances for scheduling the oozie workflows.
  • Worked on Platform Migration - moving code & data to another BigData platform, and adjust the code, properties to adapt new platform/environment.
  • Worked on Mainframe decommissioning - wrote COBOL equivalent programs in Java, scala, hive, pig, Spark core, Hive on Spark SQL, Unix shell scripting etc.
  • Actively helped other team members in understanding the Mainframe logic and also helping them in converting it to Java, Hadoop, scala and spark.
  • Worked on performance tuning of various processes/components in hive, pig, mapreduce, RDD, DataFrames etc.
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
  • Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive/Hbase tables.
  • Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
  • Created reusable components, managed the environments, code migration & maintenance using SVN etc., as part of System Engineering team.

Environment: s: Cloudera CDH 5.x, HDFS, MapReduce, pig, hive, Impala, sqoop, oozie, Java core, scala, Spark, unix shell scripting, python, hue, avro, parquet, Docker, SQL, Autosys, COBOL, JCL, PROC Teradata, DB2, VSAM, SQL server, MySQL, Netezza, Unix, Linux, Windows 2007/2000/XP, MS DOS, IBM OS/390 O/S MVS/ESA, z/OS, SVN, Eclipse, Teradata SQL Assistance, TOAD, Tectia, CA workload

Confidential, Malvern, PA

Hadoop Developer

Roles and Responsibilities:

  • Extracted and updated the data into HDFS using Sqoop import and export, for historical data as well.
  • Created Hive tables, loading with data and writing Hive queries.
  • Created Avro schemas for Hive Avro tables, worked with Hive Parquet tables.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Created pig scripts to do transformations, joins and some pre-aggregations before storing the data onto HDFS.
  • Created MapReduce programs for various needs.
  • Exported processed data from Hadoop to relational databases and external file systems using Sqoop.
  • Proficient in AWS services like EC2, S3, RDS, IAM, and CloudFormation.
  • Possess good knowledge in creating and launching EC2 instances using AMI’s of Linux, RHEL, and Windows and wrote shell scripts to bootstrap instance.
  • Worked on migration project of moving current applications in traditional data canters to AWS.
  • Worked in Impala to support portal team and users to query data from hdfs
  • Orchestrated many Sqoop scripts, pig scripts, Hive queries, mapreduce programs using Oozie workflows and sub-workflows.
  • Converted some modules written in Java, Cobol etc. into Pig, Hive, MapReduce.
  • Monitored workload, job performance and capacity planning using Cloudera manager
  • Good knowledge about Yarn and Yarn architecture.
  • Injected the data from External and Internal Flow Organizations.
  • Participated in daily scrum meetings and iterative development.
  • Experienced in IDE like Eclipse.
  • Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.
  • Ensured that all the standards & best practices were followed in all codes.
  • Prepare jUnit test cases for test driven development
  • Used Maven dependency for continuous integration
  • Involved in entire project development life cycle, Agile methodology
  • Statistical reports generated for the users and management
  • Documentation of applications in the system and updated upon in enhancements

Environment: s: HDFS, MapReduce, pig, hive, Impala, sqoop, oozie, Java core, unix shell scripting, avro, parquet, SQL, Autosys, SQL server, Oracle, AWS, S3, EC2, SVN, Eclipse, Teradata SQL Assistance, TOAD, Tectia, CA workload automation iXP, NDM, sFTP, Unix, Linux, Windows 2007/2000/XP

Confidential, Pennington, NJ

Java Developer

Roles and Responsibilities:

  • Project Management activities including timelines, cost, change requests, status gathering from all impacted teams etc.
  • Monitoring a team of 10 and assigning work
  • Requirement gathering directly from Business partners, users
  • Impact analysis, effort estimation, timeline calculations and design the Java & mainframe application for the project
  • BRS, SRS, HLD, LLD, SAD document creation of Java & mainframe application for the project
  • Development of programs for mainframe application
  • Perform coding, unit test, system test and intersystem test thoroughly
  • POC for Hadoop application using HDFS & related eco-systems like pig, hive etc.
  • Support for the user acceptance testing and created test cases for the users
  • Implementation using a release plan & post-production support
  • Code version maintenance of Mainframe elements using Endevor, Java using SVN and .net code using Visual Source Safe (VSS)
  • Autowire Java objects using Spring Dependency Injection
  • Prepare jUnit test cases for test driven development
  • Configure Spring framework in the application
  • Maven dependency for continuous integration
  • Involved in entire project development life cycle, Software Development Life Cycle (SDLC)
  • Delegating and coordinating work with offshore team
  • Prepared meeting minutes and follow up with issues raised during the meetings
  • Prepared run chart, operation instructions to schedule the jobs, and also used OPC scheduler for lower environments testing
  • Created process flow diagrams of the applications
  • Statistical reports generated for the users and management
  • Documentation of applications in the system and update upon in enhancements
  • Autowire Java objects using Spring Dependency Injection
  • Prepare jUnit test cases for test driven development

Environment: s: Java, Spring 3.0, Hibernate 3.0, COBOL, JCL, PROC, EZYTRIEVE, REXX, CICS, SQL, HTML, XML, SOLA (Service Oriented Legacy Architecture), Stored Procedures, Quality Center, VSAM, TSO/ISPF, MQ, working experience with web-services, BMC, HDFS, pig, hive, autosys, File-aid, DFSORT, Endevor, VISIO, SPUFI, INTERTEST, IBM Debug tool, Xpeditor, NDM, PLATINUM, MS office, REXX, Easytrieve, File Manager, Abend-aid, MS Visual Basic, OPC Scheduler, IBM RDz, Eclipse, DB2, VSAM, SQL server MS Access, IBM OS/390 O/S MVS/ESA, z/OS, Windows 2007/2000/XP, MS DOS, Unix

Confidential, Pennington, NJ

Software Developer

Roles and Responsibilities:

  • Delegating and coordination with offshore team when in Onshore
  • Coordination with onshore & offshore team when in offshore
  • Monitoring a team of 6 and assigning work
  • Monitoring and solving abends of the production batch job runs
  • Impact analysis, and design the mainframe application for the project
  • HLD, LLD, SAD document creation of mainframe application for the project
  • Development of programs
  • Perform coding, unit test, system test and intersystem test thoroughly
  • Support for the user acceptance testing and created test cases for the users
  • Implementation using a release plan & post-production support
  • Code version maintenance of using Endevor
  • Prepared run chart, operation instructions to schedule the jobs
  • Testing all the functionalities in QA-Plex region (Similar like Production environment) using OPC scheduler for batch jobs
  • Analysing the programs/jobs for any enhancements and modifications towards permanent resolutions for the issues encountered through tickets and user’s requests
  • Created process flow diagrams of the applications
  • Statistical reports generated for the users and management
  • Documentation of applications in the system and update upon in enhancements

Environment: s: DB2, VSAM, MS Access, COBOL, JCL, PROC, EZYTRIEVE, REXX, CICS, SQL, SOLA (Service Oriented Legacy Architecture), Java, Stored Procedures, Quality Centre, TSO/ISPF, MQ, BMC, File-aid, DFSORT, Endevor, CA Panvalet, VISIO, SPUFI, INTERTEST, Via-Soft, Xpeditor, NDM, PLATINUM, MS office, REXX, Easytrieve, File Manager, Abend-aid, OPC & CA-7 Scheduler, IBM OS/390 O/S MVS/ESA, z/OS, Windows XP

We'd love your feedback!