Data Engineer/data Analytics Engineer Resume
Jersey City, NJ
SUMMARY
- 15+ years of experience in Architecture, Analysis, Engineering, Modeling, Design and Development of diverse Application and Data Management Enterprise Software in several complex distributed computing environments in areas of Wealth Management, Personal Banking, Commercial Banking, Risk Management, Research, Asset Management, Trading, Prime Brokerage, Sales and Marketing.
- Detail - oriented hands-on Data Architect/Engineer, well versed in executing of large number of key corporate projects from inception to deployment.
- Breadth and Depth of technical expertise and application development in areas of Data Management, Data Warehousing/Business Intelligence, NoSQL, Distributed Computing, Storage, EAI, Workflow and Business Rules Management, ETL/ELT, and Data Visualization.
- Have significant proven experience in delivering server-side software fit for enterprise use in areas of Data Management, DWBI/Data Analytics as well as User Oriented Application solutions on a number of technology platforms, including J2EE, .NET, Oracle/RAC/Exadata, Greenplum, Teradata, DB2 UDB, Sybase/IQ, MS SQL Server, Informatica Power Center, Alpine Data Labs, Dataiku, OBIEE, Business Objects, Qlikview, Tableau, Cognos, Hyperion Essbase, etc.
- Strong knowledge of relational and dimensional data modeling techniques. Introduced and enforced data modeling best practices and standards. Experienced in designing and architecting OLAP solutions/reporting frameworks.
- Have solid experience implementing Big Data Analytics solutions, using Hadoop/Yarn, Hive, Pig, HBase, Spark, Pivotal/Greenplum/DCA, Cassandra, MongoDB, MarkLogic, etc.
- Solid understanding of business processes in areas of Prime Brokerage, Retail Brokerage/Wealth Management, Investment Banking, Banking (Loans, Mortgages, Risk Management, etc.), Order Management, Trading and Booking, Broker/Dealer operations, Financial Accounting Concepts.
- Solid experience in areas of Market Risk, Credit Risk, and Operational Risk. Understanding of functioning and back end processing of cash and derivative instruments such as: Futures, Options (vanilla and exotic), Fixed Rate CMOs, Interest Rate Swaps, CDS, etc. Sound knowledge of trades clearing and settlement processes. Familiarity with Securities Industry regulations and a number of areas of Compliance and Regulatory Reporting: Order Audit Trail System, SAR (Suspicious Activity Reporting), BASEL II/III, SOX, Dodd Frank, etc. Took and passed Registered Financial Adviser NASD testing (Series 7).
- Solid experience using multiple ETL toolsets (Informatica PowerCenter, Data Stage, SSIS). Experience with metadata management, data quality (Informatica Data Analyzer, SAS DataFlux) and data profiling.
- Experience in designing reporting solutions, scorecards and dashboards using Tableau, Qlikview, Business Objects, OBIEE, Cognos, and SSRS.
- Sound knowledge of math, statistics. Experience in Data Mining/Data Analysis, with using Alpine Data Labs and Dataiku. Some experience and exposure to coding using R, SAS, MADlib, PL\R, PivotalR.
- Extensive experience with the Software Development Lifecycle including analysis, design and estimation through to build and unit test, change management and source control. Understanding of industry methodologies such as ITIL and Six Sigma. Sound understanding of modern software development practices (Agile, automated testing, continuous integration, specification by example, XP, etc.)
- Sound experience with creating modeling and coding standards, Data Governance policies and procedures
- Broad working knowledge of technology applications and their impact on the business
PROFESSIONAL EXPERIENCE
Confidential, Jersey City, N J
Data Engineer/Data Analytics Engineer
Responsibilities:
- Participated in planning, designing and strategizing the roadmap around a high throughput data pipeline architecture assembling large, complex data sets, including data from vendors like Moody’s, Intex, Trepp, Mcdash, Embs, etc.
- Aforementioned loans, trading and reference data are used by data science team to analyze, visualize and model information owned by Structured Product Group (SPG) on enterprise-wide MT Discovery Platform.
- Designed and developed real-time data processing pipelines using Python, Kafka, Spark Streaming, Hive, Pig, Sqoop, SQL, Shell Scripting.
- Participated in design and modelling of No SQL data stores (HBase) as well as in architecture and storage design for HBase, HDFS, Hive and Spark file systems.
- The data processed comes in different types and formats: flat/csv files, json, xml, etc. and extracted from all types of databases/datastores, queues as well as from vendors websites.
Environment: Hadoop, CDH, Hive, Pig, Sqoop, Spark, HBase, Python, NumPy, Pandas, Sybase, DB2 UDB, Oracle, Vertica, Informatica PC, Autosys, Git
Confidential, New York, NY
Data Engineer/Data Analytics Developer
Responsibilities:
- Designed and implemented “Data Analytics” platform to analyze, visualize and provide reporting for Interest Rate Risk Management Business Unit.
- That aforementioned Analytics platform is based on Greenplum DB, Hadoop Clusters, Alpine Data Labs, Python, and Perl and has included data architecture design (logical data models, physical database set up, design of DB objects, stored procedures, etc.) design of data store on Hadoop and interconnection between DB and distributed data structures, development of data integration routines/APIs (PL/pgSQL, Python, Perl, Informatica, ksh/bash), stats/mining (Alpine, PL/R, MADlib), and reports writing (Tableau).
- Performed Conceptual and logical data modeling, relational and dimensional data model design, created and maintained optimal data pipeline architecture assembling large, complex data sets that meet functional / non-functional business requirements.
- Create structures that implemented, and enforced unified set of models that comply to data modeling standards used by enterprise data modelers, prepared database object deployment DDL and DML scripts for Test, QA, UAT and Prod environments.
- Supported building the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using Relational, NoSQL and Hadoop technologies into Bank Data Warehouse and Sandbox/Analytical Environment. That involved creation of more than three hundred ETL/ELT new jobs (Informatica, Python, Perl, DB Procs) to ingest and process data from Teradata, Sybase, DB2 LUW databases, Webservice/MAD Server (Bloomberg rates, etc.) as well as numerous xml, html, and flat files (rates, loan processing data, etc.), into the aforementioned Analytical Hadoop/Greenplum) platform.
- Took part in architecture and storage design for HDFS, Hive and Spark file systems.
- Designed and ported to a new platform/new data pipeline up to a hundred of new jobs/programs (Greenplum, PL/R, Alpine, Python, PL/Python) to replace SAS and Excel spreadsheets-based models and reports in areas of Mortgage Yield, Strat Interest Rates, Prepayments, Loans, HFI Prepayment, CME, etc.
- Participated in company-wide initiative to migrate data storage and processing from EDW Teradata platform to Hadoop based Data Lake with moving/porting about 50% of team used data and processing to that platform. All new development that is data intensive is implemented in Data Lake from start of 2018 as well as all new stats models created by Data Science team were “productionalized” by us on the same DL platform using Spark.
- Performed database administration, performance tuning, and development using Greenplum DB.
- During time that I was on this job we were able to create an environment that have 300+ daily/hourly/real-time feeds comprising now of close to 100TB of data, and it is already used to produce more than a hundred of statistical models and reports for Risk Management and Loan Marketing and Processing Business Units.
Environment: Greenplum 4.3, Teradata 14.x/15.x, Sybase 12/15.x, DB2 10.5, Composite, Informatica PowerCenter 10/9.6, CA Erwin Data Modeler 9.x, Hadoop, CDH 4.x, Hive, Pig, Tableau, Sqoop, Spark, Java, J2EE, Python, NumPy, Pandas, Perl, Autosys, TWS, SAS, R, Alpine Data Labs, Dataiku, pl/R, pl/Python, KSH, Git
Confidential, New York, NY
Database/Data Architect/Engineer
Responsibilities:
- Participated in a project to replace a number of vendors-owned Order Management and Compliance/Reporting Systems (e.g., SunGard - Brass & VTC, GL Trade, etc.) by Confidential ’s internally built Fix Engine and Compliance Reporting Modules.
- Responsible for designing & building out of the Data Layer (database, in-memory data store, HDFS/file storage) to store all pre-processed Fix messages (up to 10MM messages per day/5TB per year) as well as preparing/modifying/cleaning up data (ETL and DQ) to produce all required compliance reports (e.g., OATS, TSR, ORF, etc.) to stock exchanges/regulatory authorities as well as reporting to WFS management and WFS compliance department.
- Performed conceptual and logical data modeling, relational and dimensional data model design. Participated in storage design for HBase, HDFS, and Hive underlying file formats.
- Additional responsibilities included data analysis, design of ETL/ELT and reporting (PL/SQL, Perl, Python, Tableau) layers, building a process to offload historical data to a Hadoop cluster, and coordinating setting up Hive/Impala instances in order to be able to run reports/query data across all data platforms (Oracle & HFDS).
- As part of my engagement, I closely coordinated data management related work with Java developers from Fix Engine development team as well as with all business/compliance groups involved.
- Two phases of the project concerned with onboarding in corporate data store messages sent to and received from Broadridge FS and Alta Trade Vendor systems have been successfully completed in 07/2016.
Environment: Oracle 11.2, CA Erwin Data Modeler 9.x, CDH 4.x, Impala, Hive, Tableau, Sqoop, Java, J2EE, XML, Linux, bash, ksh, Python, Perl, Control-M, Scrum, Jira
Confidential
Sr. Solution /DWBI/Data Architect /Sr. Data Analytics Engineer
Responsibilities:
- As a member of Scrum team participated in building a large custom developed Data Analytics/DWH software system to enable the acquisition, cleansing, transformation and analysis of a broad range of internal and external data for Confidential (JPMC) Risk Quant Team
- As Data Architect took leading role in the design, development, implementation and support of the JPMC Asset Management data warehousing and reporting platform and financial analysis tools.
- Performed conceptual and logical data modeling as well as relational and dimensional data model design.
- Took active part in completion of new Risk Management Department’s DWH development initiatives implemented on Oracle platform, such as Single Security, Product Management Unification, etc.
- Participated in successful completion of the 1 st phase of the Risk Management Database/Data Warehouse Layer redesign. Two Oracle databases (20TB) with up to a hundred tables and PL/SQL packages/procedures were redesigned, recoded, and tuned.
- Participated in successful completion of the 2 nd phase of the same DWH Redesign project. It involved migration/porting of significant number of the current production Oracle DB objects and PL/SQL procedures to EMC Greenplum DCA platform with most of Informatica jobs either significantly tuned or replaced by Java/Gateway.
- Developed and tuned performance of data loads (PL/SQL and Informatica code) related to Risk IT Management Dashboard This subject area presented a significant slice of the Risk IT Oracle DWH (10TB of data). Eventually I migrated/ported data objects and code related to this application from Oracle to Greenplum platform.
- Performed division-wide migration of Risk Management data and porting of Pig analytical scripts from old decommissioned Hadoop cluster (v.1.x) to a new one (v2.0).
- Implemented Informatica environment migration from version 8.6 to 9.5 BDE for Risk IT team.
Environment: Oracle 11.2, Toad Data Modeler, Pivotal DCA/Greenplum 4.2, PgAdmin III, CDH 3.5/4.x, Yarn, Impala, Spark, Pig, Sqoop, Java, J2EE, Gateway, JSON, XML, Linux, bash, Informatica 9.x BDE, Qlikview, Tableau, Perl, Control-M, Scrum, Jira, Perforce