Spark-python Developer And Data Modeler Resume
Summit, NJ
SUMMARY:
- Strong industry leader with 17 years of experience covering Data Management diversified areas covering Data Architecture, Database Implementation - SQL Development, Data Modeling and design, Data Quality, Meta Data Management, Data Profiling and Data Governance along with extensive experience in implementing E nterprise Data Warehousing and Master Data for financial and other sectors.
- Possess subject matter expertise in Teradata FSDM/FSLDM, Complaisance and Trade Surveillance, Risk Data, Wholesale Customer data domain, customer complaints and surveillance.
- Around 5+ years of experience as HDFS/PySpark developer using Big Data Technologies like Hadoop Ecosystems, Spark Ecosystems .
- In-Depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming with PySpark and PANDA library.
- Experience in using Accumulator variable, broadcast variable, RDD caching for Spark streaming. Expertise in using Spark-SQL with various data sources like Json and Hive .
- Expertise in transferring data from RDBMS to HDFS and HIVE table using SQOOP and SPARK server. Experience in creating tables, partitioning, bucketing, loading and aggregating data using Hive .
- Migrating the coding from hive to Oracle using Python SPARK SQL (PySpark), Data Frames and RDD. Experience data processing like collecting, aggregating, moving from various sources using Spark (PySpark) and Panda libraries. Experience in NoSQL column-oriented database and its integration with Hadoop Cluster.
- Hands on with strong development skills in the area of SQL, UNIX Shell scripting, Linux, Oracle, PL/SQL, Teradata, SQL Server, Perl and Python scripting. In Depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, RDDs for Pyspark and PANDA libraries
- Tools and Methodologies includes ERWIN, Power Designer, ER studio, SPARX System Enterprise Architect for data modeling which includes ODS, conceptual, logical and Physical data models.
- Prepared Data Dictionary, data standards, Data lineage and Metadata to capture information which aid in implementing models. .
TECHNICAL SKILLS:
Data Modeling Tools: Power Designer 16.5, ER Studio, Erwin, SPARX Enterprise Architect, MS Visio.
Databases & Tools: Oracle 12c, Oracle 11 G, SQL server, DB2, TOAD, PL-SQL Developer, DB2, Teradata, SQL Server, 4.X, Service Now, Spark, Apache Hadoop 2.2,Sqoop,Hive
Programming Languages: SQL, PL/SQL, No-SQL, Python 3.x, Perl, PySpark, UNIX shell Scripting, Linux
Version Control: VSS, SVN, Control-M,GIT Hub
PROFESSIONAL EXPERIENCE:
Confidential, Summit, NJ
Spark-Python Developer and Data Modeler
Responsibilities:
- Developing Spark programs using Python API’s to compare the performance of Spark with Hive and Oracle.
- Implemented Spark using PySpark libraries for faster testing and processing of data.
- Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
- Used PySpark-SQL to load JSON data and create schema RDD, Data Frames and loaded it into Hive Tables and handled Structured data using Spark-SQL.
- Imported required tables from RDBMS to HDFS using Sqoop and used PySpark RDDs to get real time streaming of data into HBase.
- Analyzed the SQL scripts and designed solutions to implement using PySpark.
- Implemented PySpark using Python and utilizing data frames and temporary table SQL for faster processing of data.
- Performing Business Area Analysis and logical and physical data modeling for Data Warehouse / Data Mart applications as well as Operational applications enhancements and new development.
- Develop Technical Metadata and Business Glossary for all wholesale LOBs business systems by partnering with the IT and business systems teams.
- The data is cleaned in Oracle and loaded into a new table which is moved into HDFS using sqoop.
- Working with Data Stewards to establish metadata registry responsibilities.
Tools: Oracle 12C database, Teradata, TOAD, Power Designer, ERStudio, SPARX Enterprise architect, Pyspark, Spark SQL,Hive,Sqoop.
Confidential, New York, NY
Senior PL-SQL developer and Data Architect
Responsibilities:
- Created new DB objects like Tables, Procedures, Functions, Triggers and View using Oracle PL-SQL.
- Taking care of DB Performance issues by tuning SQL queries and stored procedures by using SQL Profiler and Oracle execution plan tables.
- Involved in writing Company's Metadata standards and practices, including naming standards, modeling guidelines and Data Warehouse Strategy. Identified capabilities and supporting applications imbalances resulting in revisions to corporate data models and reporting capabilities.
- Analyze data across multiple sources to design and document logical and physical data models and maintain the Enterprise Data Warehouse Data Model. Gather and define data requirements based on specific application business requirements.
- Prepare normalized models and dimensional models. Define conceptual, logical and physical data models
- Understand crucial areas in existing process and new process for rating surveillance and design the data model which will maintain and support all surveillance processes including new and existing one
- Communicate all designing requirement to all implementation partners which includes application, database and ETL team. Data Warehouse architecture, data design and implementation
- Responsible for the Change Control and Release of Enterprise Data Models for Data Warehouse Subject Area Data Marts for all rating applications along with Logical and Physical design for Dimensional data warehouse designs using following tools: CA/ERWIN; Quest/TOAD; Oracle RDBMS; Ralph Kimball methodology.
Tools: Oracle 12C database, PL/SQL Developer, TOAD, Oracle SQL Developer, Erwin, ERStudio, Perl.
Confidential, New York, NY
Spark-Python Developer and Data Modeler-Analyst
Responsibilities:
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in mapReduce way.
- Importing and exporting data in HDFS and Hive using Sqoop.
- Used Hive to do transformations, event joins and some pre-aggregations before storing the data onto HDFS
- Used Sqoop efficiently transfer data between databases and HDFS and used PySpark to stream the log data from servers.
- Primary business focus is in the area of trade surveillance for different IB products, Pricing and Valuation, compliance and control room application understanding along with complete data security and sensitivity for Control Room data.
- Position reporting data analysis for reference security master data alignment and 13F holding fillings.
- Developed Spark/MapReduce jobs to parse the JSON files for Oracle data.
- Data Analysis with the help of Python Scripting for AML alert data files and few process implementations to store AML Alert data in DB in Python.
- Data analysis for Watch List/Restricted List for existing Control Room.
- Source data mapping for transaction data for different asset classes in order utilize this data for Actimize trade surveillance modules.
- Data Analyst/Data Modeler performing Business Area Analysis and logical and physical data modeling for Data Warehouse / Data Mart applications as well as Operational applications enhancements and new development. Used the Ralph Kimball Methodology for Data Warehouse/Data Mart designs.
- Used Spark (RDD) and python for processing and transformation of data and integration with popular NoSQL, Oracle database for huge volumes of data.
- ER data modelling and logical database designs. Metadata and data taxonomy management.
- Ensuring integrity of backend designs and reporting data marts. Initiating data design and reviews of high-level design requirements.
Tools: Oracle 10g, 11g database, PL/SQL Developer, TOAD, Oracle SQL Developer, Power Designer 16.5,Apache Hadoop2.3, Sqoop,Spark,PySpark,PythonDodd FRANK CAPP (Capital Adequacy Planning and Programming)
Confidential, Mount Laurel, NJ
Lead Data Modeler and Datawarehouse Developer
Responsibilities:
- Working with solution architect & business analysts to define implementation design and coding of the assigned modules/responsibilities with highest quality (bug free) with the help of Teradata FSLDM.
- Involved in performance tuning of code using execution plan and SQL Profiler.
- Oracle DB implementation and ETL process development.
- Involved in Migration activity for few modules from IBM DB2 to Oracle. Implemented Python Scripts to validate source system data.
- Participating with key management resources in the strategic analysis and planning requirements for Data Warehouse/Data Mart reporting and data mining solutions.
- Manage newly built Enterprise Data Warehouse, Analytics Data Mart and the Customer Data Platform.
- Data Quality Management and Data Architecture standardization.
- Managed the meta-data for the Subject Area models for both Operational & Data Warehouse/Data Mart applications.
Tools: Oracle 10g, 11g database, IBM DB2, PL/SQL Developer, TOAD, Oracle SQL Developer, Power designer 16.5, ERStudio,Erwin. Python.
Confidential, NY
Spark-Python Developer and Data Modeler
Responsibilities:
- Loaded data from Oracle into HDFS using sqoop batch jobs for post trade transaction for different reporting applications.
- Developed SPARK programs using Python to load data from Oracle to HDFS and Hive external tables.
- Loaded JSON and XML file data using Spark-SQL and created RDD schema to load same data into HIVE tables.
- Implemented performance effective solutions for long running SQL processes by using PySpark Data Frame processing.
- Implemented CSV file load processes in Oracle in PySpark by using Panda libraries.
- Data Modeling (logical, dimensional and physical) for various projects, work with application teams to analyze data requirements, review data model and address the functional and nonfunctional requirements (performance, geographical separation, auditing, archiving).
- As a core member of architecture team, analyze requirements for various programs such as Balance sheet of Funding and Liquidity Attribution, GBB, MBS conceptualize possible solutions and client presentation
- Core member of the solution team responsible for proposing architecture, high level design for Investor Tax platform for multiple geographies (US, UK, Singapore and India)
- FPML data processing implementation for FX products.
- Led data model team on Data Marts (Star Schema) for Volcker reporting Data model set up and implementation.
- Define best practices for data modelling and database development.
- Built and tuned complex and large data load from various sources to ODS (Operational Data System), Financial Star Schema and Campaign Analysis Mart ETL (Informatica Power Center 1.7/Oracle Analytical SQL)
Tools: Oracle 10g, 11g database, PL/SQL Developer, TOAD, Oracle SQL Developer, Oracle Data Modeler, Informatica, Erwin, Apache Hadoop 2.2, Sqoop,Spark,PySpark,Python.
Confidential, Boston
Senior Database Developer
Responsibilities:
- Assess, Troubleshoot, analyze performance issues during 9i to 11g migration, nightly batch processes and advise solutions.
- SQL tuning, 10053 trace analysis, responsible for improving PLSQL refactoring, improving performance from hours to minutes, optimize distributed processing, reduce database resource usage - logical IOs, temporary table space usages, CPU consumption latch contention etc. Played crucial role in go-live of the 11g Upgrade project.
- Design database objects including tables, indexes, views, materialized views, sequences and referential integrity for a reporting data warehouse. Develop and maintain database programs including packages, procedures, functions and triggers.
Tools: Oracle 10g, 11g database, PL/SQL Developer, TOAD, Oracle SQL Developer, Erwin.