Big Data Architect Resume
SUMMARY
- J2EE/JEE Application Components: Java 1 - 8, J2EE 2-4, AJAX, Jersey Framework, JavaScript, XML, XSL, HTML, CSS, jQuery
- 10+ years of experience inSDLCwith key emphasis on the trending Big Data Technologies -Spark, Scala, Spark MLlib, Hadoop, Tableau in Cloudera / Hortonworks / MapR / AWS / GCP.
- Extensive experience indata modeling, data architect,data warehousing&business intelligenceconcepts.
- Architect, design&develop Big DataSolutions practice including set up Big Data roadmap, build supporting infrastructure and team to provide Big Data.
- Architecting and implementing Portfolio Recommendation Analytics Engine usingHadoop MR, Oozie, Spark SQL, Spark MLlib.
- Excellent understanding ofHadoop architectureand underlyingframeworkincluding storage management.
- Expertise in architecting Big data solutions usingData ingestion, Data Storage
- Experienced in Worked onNoSQL databases - HBase &MongoDB, database performance tuning &data modeling.
- Extensive knowledge in architecture design of Extract, Transform, Load environment usingJava, Spark, BigQuery.
- Experience in usingPL/SQLto writeStored Procedures,FunctionsandTriggers.
- Experience in integration of various data sources definitions likeSQL Server, Oracle, Sybase,ODBC connectors & Flat Files.
- Experience in Handling Huge volume of data in/out fromTeradata/IBM DB2 / Big Data.
- Experience in development of Big Data projects usingHadoop, Hive, HDP andMap Reduceopen source tools/technologies.
- Experience inAmazon AWS EC2, Dynamo DB, S3 and other services
- Expertise in data analysis, design and modeling using tools likeErwin.
- Expertise inBig Data architecture(AWS, GCP, Hortonworks, Cloud era) distributed system,MongoDB, NoSQL.
- Hands on experience onHadoop /Big Datarelated technology experience inStorage, Querying, Processing and analysis of data.
- Experienced in using various Hadoop infrastructures such asMapreduce,Hive,Sqoop, andOozie.
- Experience inAmazonEMR, Spark, Kinesis, S3, ECS, Cloud watch,Lambda, Athena,Zeppelin & Airflow.
- Experienced in testing data inHDFSandHivefor each transaction of data.
- Strong Experience in working with Databases likeOracle 12C/11g/10g/9i, DB2, SQL Server 2008andMySQLand proficiency in writing complexSQLqueries.
- Experienced in using database tools likeSQLNavigator,TOAD.
- Experienced with Sparkimproving the performance and optimization of the existing algorithms inHadoopusingSpark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
- Experienced in usingFlumeto transfer log data files toHadoop Distributed File System(HDFS)
- Knowledge and experience in job work-flow scheduling and monitoring tools likeOozie and Zookeeper.
- Good experience inShell programming.
- Worked on Machine Learning models using Python / AWS.
- Trained in Machine Learning landscape of Statistics, Python, Jupyter, Matplotlib, PyPlot, Tableau, Streamlit, RShiny.
TECHNICAL SKILLS
J2EE/JEE Application Components: Java 1-8, J2EE 2-4, AJAX, Jersey Framework, Spring Boot/Core, MVC, Servlets, JSP, JavaScript, XML, XSL, HTML, CSS, jQuery
J2EE Containers and Services: IBM WAS 5.1 / 6.0, BEA Weblogic 5.1 / 6, Apache, JNDI and JDBC
Databases: IBM DB2 8, Oracle 7/8/9/10g/11g, MS SQL Server 2000/2005, Sybase, Informix and MySQL
Developer Tools: NetBeans, Eclipse, IntelliJ IDEA, PL/SQL Developer, Maven, JIRA, SharePoint, Autosys, and Oracle Developer
Business Analysis/Reporting/Other Tools: Magic draw, Visio, Actuate 4.x/5.x/6.x/7.x
Operating Systems: UNIX (Solaris, HP-UX, Red Hat Linux), Windows 3. / NT / 2000 / XP, Autosys
Big Data: Hadoop MRv1/MRv2, Hive, HBase, Drill, Solr, Spark, Sqoop, Pig, Scala, Drill, Amazon EMR, Amazon Redshift, AWS Athena, AWS Sagemaker, AWS Lambda, Apache Airflow, Apache Kafka, Apache Kudu and Zookeeper
Distributions: - Hortonworks, MapR, Cloudera, AWS and GCP
Machine Learning: Scala MLLib/ML, pySpark, Python, Anaconda, Jupyter, NumPy, Scitikit-Learn, Matplotlib, Pandas and Spyder
Languages and Systems (Working Knowledge): Talend Studio, C, C++, Python, R, MongoDB and Tableau
PROFESSIONAL EXPERIENCE
Confidential
Big Data Architect
Responsibilities:
- Worked on end-to-end workflow of each jobs from requirements gathering thru Development, Unit Testing, Code Review, creating ESP Scheduler jobs using Airflow in Test, UAT and PROD environments.
- Worked on design/development of BigQuery jobs to ingest data from Teradata into BigQuery for Database Ingestion jobs.
- Optimization techniques were used to fix Spark performance issues when running on cluster.
- Global configuration files were used to drive the memory and various other requirements of individual Spark jobs.
- Airflow DAGs were developed to chain Spark and Query Ingestion jobs.
- CI/CD pipeline was implemented using Jenkins followed by Airflow job triggers.
Confidential
Big Data Architect
Responsibilities:
- Working on User stories with the Agile model of development using JIRA.
- Working on design/development of multiple Spark jobs in Scala that ingest heavy data in size of 2TB from Oracle.
- Working on Transformation jobs to take Parquet data files from HDFS with the final output as a CSV file stored on HDFS.
- Copy of Allocated data was stored in Impala tables for BI consumption, other was used for building ML models.
- Used the crunched down Parquet files as input data to analytics developed using Apache Spark Machine Learning and MLLib packages and pySpark.
- As part of OCR pipeline, implemented Text mining to transpose words and phrases in unstructured data into numerical values.
- Used machine learning and statistical modeling techniques to develop and evaluate algorithms to improve performance, quality, data management and accuracy.
- Developed K-means Clustering approach on Sagemaker using Python to create a Prediction model of Customer Cash reserve as part of SEC Rule 15c3-3(e) from the mined text data stored in AWS S3.
- A separate Linear regression model was built on Sagemaker using Python to test on Beta/Dev data and test on (presently QA data) involving Broker-Traders Firms/Customers data involving prediction of missing data in terms of Allocation codes etc.
- A few optimization techniques were also used as part of the Machine Learning model development.