We provide IT Staff Augmentation Services!

Lead Big Data Developer Resume

SUMMARY:

  • 12+ years of professional IT experience with Data Warehousing/Big Data which includes 3.5 years of experience in Big Data ecosystem related technologies like Hadoop, Map Reduce Pig, Hive and Spark.
  • In - depth knowledge of Hadoop architecture and its components like YARN,HDFS, Name Node, Data Node, Job Tracker, Application Master, Resource Manager, Task Tracker and Map Reduce programming paradigm.
  • Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, Spark, and Sqoop and exposure on HBase.
  • Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture
  • Hands on experience in in-memory data processing with Apache Spark with Scala
  • Building massively scalable multi-threaded applications for large data processing primarily with Apache Spark with Scala on Hadoop..
  • Experienced in analyzing data by using Hive query language, Pig and Map Reduce..
  • Knowledge and experience in job work-flow scheduling and monitoring tools like Oozie and Zookeeper.
  • Good knowledge / experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
  • Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, and advanced data processing. Experience optimizing ETL workflows.
  • Experience in Object Oriented Analysis and Design (OOAD) and development of software using Java, Scala,C++ in UNIX platform...
  • Extensive use of core Java Collections, Generics, Exception Handling, and Design Patterns for functionality,
  • Experience in database design. Used Oracle PL/SQL, Sybase T-SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle …
  • Experience in IBM datastage, Informatica Power Center, Data Integration HUB ETL tools.

TECHNICAL SKILLS:

Big Data / HadoopEcosystems: Hadoop,Spark, YARN, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Oozie, Flume,Impala,Kafkaetc.

Programming Languages: Java, Scala,Python,C++, Sybase T-SQL, Oracle PL/SQL, UNIX Shell Scripting, PERL Scripting, Grunt Shell, Pig Latin, Hive SQL,NOSQL

RDBMS: SYBASE (11x to 15x), Oracle 10g,11g

Tools: Eclipse, IntelliJ Idea,MAGELLAN, Hive ConnectorAQT, RAPID SQL, T-SQL Developer, TOAD, Autosys,Informatica Data Integration Hub, RPM, SBM, HUE, Textpad, CuteFTP, WinSCP

Operating Systems: HP UNIX, IBM AIX Unix, LINUX,Windows

ETL Tools: IBM Datastage 7.5,8.0,8.5,9x, Informatica Power Center, Data Integration Hub

Version Management Tool: Serena PVCS,GIT, Clear Case

PROFESSIONAL EXPERIENCE:

Confidential

Lead Big Data Developer

Responsibilities:

  • Process raw data at scale in Hadoop big data platform and Loading from disparate data sets from various environments.
  • Developed ETL data flows using Hadoop and Spark in Scala ECO system components.
  • Leading the development of large - scale, high-speed, and low - latency data solutions in the areas of large scale data manipulation, long - term data storage, data warehousing, low - latency retrieval systems, real - time reporting and analytics Data applications ..
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, RDD's, Spark YARN.
  • Implemented Spark advanced procedures like text analytics and processing using the in-memory computing capabilities.
  • Using the Spark framework Enhanced and optimized product Spark code to aggregate, group and run data mining tasks.
  • Developed Spark jobs for faster data processing and used Spark SQL for querying.
  • Implemented Spark best practices like partitions, caching and check pointing for faster.
  • Write jobs for process unstructured data into a structural data for analysis pre-processing, fuzzy matching and ingesting data...
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Written business logic with Hive with the help of MAGELLAN for analytical credit bureau reporting..
  • Involved in ingestion process to CORNERSTONE after data is cleaned and business logic is applied...
  • Creating various analytical report using Hive, HiveQL in MapRed Hadoop environment..
  • Involved in designing various configuration of Hadoop and hive for better performance...
  • Debugged hive backend maprduce for fixing issues and performance tuning..
  • Involved in Scheduling all Jobs with the central event engine..

Environment: Java, Scala, Spark, HDFS, Map Reduce, YARN, Hive, Sqoop, Pig,Unix,Oozie Scheduler, Shell Scripts, Magellan, Cornerstone, Informatica Data Integration Hub.

Confidential

Lead Big Data Developer

Responsibilities:

  • Developed MapReduce programs to parse the raw data populate staging tables and store the refined data in partitioned tables in the EDW. Push data as delimited files into HDFS
  • Usage of different Hadoop ecosystem Component like Hive, Pig, Spark. Building massively scalable
  • Multi-threaded applications for large data processing primarily with Apache Spark on Scala..
  • Develop and implement advanced algorithms using Apache spark on Scala for various analytical purpose..
  • Create a series of Spark jobs using Scala, operating in Spark resource management (YARN)
  • Experience in deploying Spark jobs to production, troubleshooting and debugging Spark jobs.
  • Very Good exposer to Scala functional programming.
  • Wrote Pig scripts to process unstructured data and create structure data for use with Hive.
  • Hands on experience in writing custom UDF's and also custom input and output formats.
  • Developed the Sqoop scripts in order to make the interaction between Hive and RDBMS Database.
  • Created Managed tables and External tables in Hive and loaded data from HDFS.
  • Created partitioned tables in Hive for best performance and faster querying
  • Developed scripts and batch jobs to schedule various Hadoop programs.
  • Improved performance on MapReduce Jobs by creating combiners, Partitioning and Distributed Cache.
  • Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
  • Actively updated the upper management with daily updates on the progress of project that include the Classification levels that were achieved on the data.
  • Involved in performance tuning of Pig and Hive scripts, create UDFs in Hive and Pig Latin. for repetitive tasks

Environment: Java, Scala Python, Spark, HDFS, Map Reduce, YARN, Hive, Sqoop, Pig, FlumeOozie Scheduler, Shell Scripts, IBM Datastage, Informatica Data Integration Hub.

Confidential

Senior Hadoop Developer

Responsibilities:

  • Push data as delimited files into HDFS
  • Usage of different Hadoop Component like Hive, Pig, Spark.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined
  • data in partitioned tables in the EDW.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW Reference tables and historical metrics. Loaded the data from RDBMS SEREVR to Hive using Sqoop.
  • Wrote Pig scripts to process unstructured data and create structure data for use with Hive.
  • Hands on experience in writing custom UDF's and also custom input and output formats.
  • Developed the Sqoop scripts in order to make the interaction between Hive and RDBMS Database.
  • Created Managed tables and External tables in Hive and loaded data from HDFS.
  • Created partitioned tables in Hive for best performance and faster querying
  • Used Flume to channel data from different sources to HDFS.
  • Exposure in spark iterative processing.
  • Improved performance on MapReduce Jobs by creating combiners, Partitioning and Distributed Cache.
  • Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
  • Performed SQOOP import from Oracle to load the data in HDFS and directly into Hive tables.
  • Actively updated the upper management with daily updates on the progress of project that include the Classification levels that were achieved on the data.

Environment: Java 1.5, Python, Spark, HDFS, Map Reduce, YARN, Hive, Sqoop, Pig, Flume, HBase, Oozie Scheduler, Shell Scripts, IBM Datastage, Informatica

Senior Developer

Confidential

Responsibilities:

  • Developed business algorithms using Java/C++ object oriented programming, Multithreading, Collections etc.
  • Extensive use of core Java Collections, Generics, Exception Handling, and Design Patterns for functionality.
  • Configured Oracle connection pool, which is included in Oracle JDBC driver JAR file, to allow concurrent access to the database and optimize performance.
  • Software architecture involves data modeling an Application Server to calculate real-time components
  • Development and maintenance of database procedures, reports and automated jobs of assigned and unassigned database projects
  • Used DataStage as an ETL tool to extract data from sources systems loaded the data into the Oracle database.
  • Designed and Developed Data stage Jobs to Extract data from heterogeneous sources, Applied transform logics to extracted data and Loaded into Data Warehouse Databases.
  • Created Datastage jobs using different stages like Transformer, Aggregator, Sort, Join, Merge, Lookup, Data Set, Funnel, Remove Duplicates, Copy, Modify, Filter, Change Data Capture, Change Apply, Sample, Surrogate Key, Column Generator, Row Generator, Etc.
  • Responsible for database performance tuning and working as a subject matter expert (SME) of this application.
  • Worked on UNIX Shell scripting for scheduling and automation of application.

Environment: Java, Spring, C++, Python, AIX UNIX, Sybase 15.0, Oracle 11g, T-SQL, PL/SQL, Data Stage, Shell Scripting

Senior Developer

Confidential

Responsibilities:

  • Created an application for automatic scheduling and handling of batch system commands.
  • Development of core algorithmic components using Java oriented programming, Multithreading, Collections for margin calculation using Value at Risk (VaR).
  • Work on very high throughput, low latency, safe recovery in case of a crash.
  • Extensive use of core Java Collections, Generics, Exception Handling, and Design Patterns for functionality.
  • Configured Oracle connection pool, which is included in Oracle JDBC driver JAR file, to allow concurrent access to the database and optimize performance.
  • PL/SQL and TSQL Programming and identifies data sources, constructs data decomposition diagrams, provides data flow diagrams and documents the process. Additionally this will write code for database access, modifications, and constructions including stored procedures.
  • Responsible for code optimization and database performance tuning.
  • Responsible for creating database object like Procedure, Function, Trigger and Cursor. In Oracle PL/SQL and Sybase TSQL
  • Used Perl scripting for data validation
  • Involved in designing and development of trade warehouse. By extracting and loading data from distributed database to warehouse database.

Environment: Java, Spring, C++, AIX UNIX, Sybase 15.0, Shell Scripting, Data Stage, Autosys, Oracle, PL/SQL, Python

Senior Developer

Confidential

Responsibilities:

  • Written Perl and shell scripts for the automation of the system.
  • Enhance legacy components built in Java /C++ Multithreading, STL (Vector, Map, Deque)
  • Building and deployment of JAR files on test, stage systems in Tomcat
  • Extensively used the JDBC API for database connectivity.
  • Participated in overall design and derived detail design using UML class diagrams.
  • Implemented custom data structures using Java collection framework
  • Created UNIX shell scripts to generate reports.
  • Database Design & TSQL Programming.
  • Responsible for database performance tuning.
  • Client Interaction for Requirement analysis and monitoring the project.

Environment: JAVA, Spring, C++, Sybase 15.0, T-SQL, AIX UNIX, Shell Scripting, DataStage

Senior Developer

Confidential

Responsibilities:

  • Building and deployment of JAR files on test, stage systems in Tomcat
  • Extensively used the JDBC API for database connectivity.
  • Participated in overall design and derived detail design using UML class diagrams.
  • Implemented custom data structures using Java collection framework
  • Database Design & TSQL Programming and technical Design for DB.
  • Responsible for database performance tuning.
  • Responsible for creating database object like Procedure, Function, Trigger and Cursor.
  • Data cleanup & data movement between various environments i.e. Development, Staging and production.
  • Development and maintenance of database procedures, reports and automated jobs ongoing maintenance of assigned and unassigned database projects
  • Enhancement of existing database functions based on new/changing business requirements
  • Report, research, correct, and test defects in reports, stored procedures and other database related functions
  • Involved in designing and development of trade warehouse.

Environment: Java,Spring, C++, Sybase 15.0, T-SQL, AIX UNIX, Shell Scripting, DataStage

Senior Developer

Confidential

Responsibilities:

  • Building and deployment of JAR files on test, stage systems in Tomcat
  • Extensively used the JDBC API for database connectivity.
  • Participated in overall design and derived detail design using UML class diagrams.
  • Implemented custom data structures using java collection framework
  • Used Spring to perform Dependency injection among all bean class involved in business logic operation.
  • Involved in the process of database design for the entire application
  • Modified, maintained and developed the TSQL codes to develop new enhancements
  • Performance tune the existing long running stored procedures.
  • Client Interaction for Requirement analysis and monitoring the project.

Environment: Java, Spring,C++, Sybase 15.0, T-SQL, AIX UNIX, Shell Scripting

Confidential

Technical Lead

Responsibilities:

  • Working in real time trading engine and order management system.
  • Have worked on developing a low latency and high frequency real time trading engine.
  • Developed order management and trade execution system in C++ Multithreading,Unix and Sybase T-SQL
  • Fully responsible for end-to-end Backend Development of the Application in C++, Unix and Sybase T-SQL, Oracle 10g, and UNIX shell scripting. Worked proficiently with Sybase Open Server and Communicate with C.
  • Worked on performance tuning of the application.
  • Written stored procedures, Cursors, Created indexes and involved in day to day database operations
  • Worked on IBM Data stage for extraction and transformation of data from live environment to historical environment.
  • Written shell scripts for the automation and monitoring of the live system
  • Understanding clients change request and converting high level design to low level design.
  • Supporting the client in the live trading session and debugging the reported issues.

Environment: C++, Sybase 15.0, T-SQL Oracle 10g, PL/SQL, Datastage, SQL Developer and HP UNIX 11i

Hire Now