Sr Big Data Architect Resume
Irvine, CA
SUMMARY
- Over 12+ years of experience in Software development lifecycle - Software analysis, design, development, testing, deployment and maintenance.
- Working as Big Data Architect for the last 4 years and having strong background of big data stack like Spark, Scala,Kafka, Hadoop, HDFS, MapReduce, Hive,Cassendra, Python,SQOOP, and PIG.
- Hands-on experience wif Apache Spark and its components (Spark core and Spark SQL)
- Experienced in converting HiveQL queries into Spark transformations using Spark RDDs and Scala
- Hands on experience in in-memory data processing wif Apache Spark
- Developed Spark scripts by using Scala shell commands as per the requirement
- Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark wif Hive and SQL/Oracle
- Broad understanding and experience of real-time analytics and batch processing using apache spark.
- Hands on experience in AWS (Amazon Web Services),Cassendra,Kafka,python and cloud computing.
- Experience wif agile development methodologies like Scrum and Test-Driven Development, Continuous Integration
- Ability to translate business requirements into system design
- Experience in importing and exporting data from HDFS to RDBMS/ non-RDBMS and vice-versa using SQOOP
- Analyzed large amounts of data sets by writing Pig scripts and Hive queries.
- Hands on experience in writing pig Latin scripts and pig commands
- Experience wif front end technologies like HTML, CSS and JavaScript
- Experienced in using tools like Eclipse, NetBeans, GIT, Tortoise SVN and TOAD.
- Experience in database development using SQL and PL/SQL and experience working on databases like Oracle 9i/11g, MySQL and SQL Server.
- Effective team player and excellent communication skills wif insight to determine priorities, schedule work and meet critical timelines.
- Certified in FINRA (Financial Industry Regulatory Authority, Inc)
TECHNICAL SKILLS
Big Data: Apache Spark, Scala, Map Reduce, HDFS, HBase, Hive, Pig, SQOOP, PostgreSQL
Databases: Oracle 9i/11g, My SQL, SQL Server 2000/2005
Hadoop distributions: Cloudera, Hortonworks, AWSDWH (Reporting) OBIEE 10.1.3.2.0/11 gDWH (ETL) Informatica Power Center 9.6.x
Languages: SQL, PL/SQL, Python,Java
UI: HTML, CSS, JavaScript
Defect Tracking Tools: Quality Center, JIRA
Tools: SQL Tools, TOAD
Version Control: Tortoise SVN, GitHub
Operating Systems: Windows ..., Linux/Unix
PROFESSIONAL EXPERIENCE
Confidential, Irvine CA
Sr Big Data Architect
Responsibilities:
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Hands on experience in Spark,Cassendra,kafka,Python and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
- Used HIVE to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS.
- Developed park code and Spark-SQL for faster testing and processing of data.
- Snapped the cleansed data to the Analytics Cluster for reporting purpose to Business.
- Hands on experience on AWS platform wif S3 & EMR.
- Experience on working wif different data types like FLATFILES, ORC, AVRO and JSON.
- Automation of Business reports using Bash scripts in UNIX on Data lake by sending them to business owners.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems and suggested some solution.
Environment: Apache Spark, Scala, Spark-Core, Spark-SQL, Python, Hadoop, MapReduce, HDFS, Hive, Pig, MongoDB, Sqoop, Oozie,Kafka, MySQL, Python,Java (jdk1.7), AWS
Confidential, Sunnyvale CA
Senior Big Data Architect
Responsibilities:
- Build patterns according to business requirements to help find violations in the market and generate alerts by using Big Data technology (Hive, Tez and Spark) on AWS
- Worked as a Scrum Master, facilitating team productivities and monitoring project progress by applying Agile Methodology Scrum and Kanban on JIRA board to ensure quality of deliverables
- Optimize the long-run pattern by writing shell-scripts and using optimization settings in Hive (e.g. successfully changed 20 hours daily pattern into 7 hours run by figuring out data skew in TB level table, which was adopted company-wise and saved around 50,000 USD per year)
- Migrate on-prem RDBMS (Oracle, Greenplum) code into HiveQL and Spark SQL running on AWS EMR
- Participate in Machine Learning project, including decision tree modeling and feature engineering
- Responsible for ETL and data warehouse process to transfer and register data into AWS S3
- Develop Hive UDF functions wif Java and modify framework code wif Python
Environment: Apache Spark, Scala, Spark-Core, Spark-Streaming, Python,Spark-SQL, Hadoop, MapReduce, HDFS, Hive,Kafka, Pig, MongoDB, Sqoop, Oozie, MySQL, Java (jdk1.7), AWS
Confidential
Senior Apache Spark Consultant
Responsibilities:
- Gather business requirements for the project by coordinating wif Business users and data warehousing (front-end) team members.
- Involved in products data injection into HDFS using Spark
- Created partitioned tables and bucketed data in Hive to improve the performance
- Use Amazon Web Services (AWS), EC2 for computing and S3 as storage mechanism.
- Load data into MongoDB using hive-mongo connection jars for the purpose of reports generation.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Loaded the data into Spark RDD and do in memory data Computation to generate the output response.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
- Handled importing of data from various data sources from Oracle into HDFS vice-versa using Sqoop.
- Involved in converting Map Reduce programs into Spark transformations using Spark RDD's on Scala.
- Migrating various Hive UDF's and queries into Spark SQL for faster requests.
- Involved in creating Hive tables, loading wif data and writing hive queries which run internally in MapReduce.
Environment: Apache Spark, Scala, Spark-Core, Spark-Streaming, Spark-SQL, Hadoop, MapReduce, HDFS, Hive, Pig, Kafka,MongoDB, Sqoop, Oozie,Python, MySQL, Java (jdk1.7), AWS
Confidential
Big Data Developer
Responsibilities:
- Lead the AML Cards North America development and DQ team successfully to implement the compliance project.
- Involved in the project from POC and worked from data staging till saturation of DataMart and reporting. Worked in an onsite-offshore environment.
- Completely responsible for creating data model for storing & processing data and for generating & reporting alerts. dis model is being implemented as standard across all regions as a global solution.
- Involved in discussions and guiding other region teams on SCB Big data platform and AML cards data model and strategy.
- Responsible for technical design and review of data dictionary (Business requirement).
- Responsible for providing technical solutions and work arounds.
- Migrate of the needed data from Data warehouse and Product processors into HDFS using SQOOP and importing various formats of flat files into HDFS.
- Involved in discussion wif source systems for issues related to DQ in data.
- Implemented partitioning, dynamic partitions, buckets and Custom UDF's in HIVE.
- Used Hive to process data and Batch data filtering.
- Supported and Monitored Map Reduce Programs running on the cluster.
- Monitored logs and responded accordingly to any warning or failure conditions.
- Responsible for preserving code and design integrity using SVN and SharePoint.
Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Hive, Pig, HBase, Zookeeper, Oozie, MongoDB, Python,Java, Sqoop
Confidential
Oracle Database Developer
Responsibilities:
- Designed, developed, and maintain an internal interface application allowing one application to share data wif another.
- Analyzed 90% of all changes and modifications to the interface application.
- Coordinated development work efforts that spanned multiple applications and developers.
- Developed and maintain data models for internal and external interfaces.
- Worked wif other Bureaus in the Department of State to implement data sharing interfaces.
- Attended Configuration Management Process Working Group and Configuration Control Board meetings.
- Performed DDL (CREATE, ALTER, DROP, TRUNCATE and RENAME), DML (INSERT, UPDATE, DELETE and SELECT) and DCL (GRANT and REVOKE) operations where permitted.
- Design and develop database applications.
- Design the database structure for an application.
- Estimate storage requirements for an application.
- Specify modifications of the database structures for an application.
- Keep the database administrator informed of required changes.
- Tune the application during development.
- Establish an application's security requirements during development.
- Created Functions, Procedures and Packages as part of the development.
- Assisted the Configuration Management group to design new procedures and processes.
- Lead the Interfaces Team wif responsibility to maintain and support both internal and external interfaces.
- Responsible for following all processes and procedures in place for the entire Software Development Life Cycle.
- Wrote documents in support of the SDLC phases. Documents include requirements and analysis reports, design documents, and technical documentation.
- Created MS Project schedules for large work efforts.
Environment: Oracle 9i, Informatica 7.1.x, Control-M, TOAD, Linux/Unix