Big Data Architect Resume

SUMMARY

J2EE/JEE Application Components: Java 1 - 8, J2EE 2-4, AJAX, Jersey Framework, JavaScript, XML, XSL, HTML, CSS, jQuery
10+ years of experience inSDLCwith key emphasis on the trending Big Data Technologies -Spark, Scala, Spark MLlib, Hadoop, Tableau in Cloudera / Hortonworks / MapR / AWS / GCP.
Extensive experience indata modeling, data architect,data warehousing&business intelligenceconcepts.
Architect, design&develop Big DataSolutions practice including set up Big Data roadmap, build supporting infrastructure and team to provide Big Data.
Architecting and implementing Portfolio Recommendation Analytics Engine usingHadoop MR, Oozie, Spark SQL, Spark MLlib.
Excellent understanding ofHadoop architectureand underlyingframeworkincluding storage management.
Expertise in architecting Big data solutions usingData ingestion, Data Storage
Experienced in Worked onNoSQL databases - HBase &MongoDB, database performance tuning &data modeling.
Extensive knowledge in architecture design of Extract, Transform, Load environment usingJava, Spark, BigQuery.
Experience in usingPL/SQLto writeStored Procedures,FunctionsandTriggers.
Experience in integration of various data sources definitions likeSQL Server, Oracle, Sybase,ODBC connectors & Flat Files.
Experience in Handling Huge volume of data in/out fromTeradata/IBM DB2 / Big Data.
Experience in development of Big Data projects usingHadoop, Hive, HDP andMap Reduceopen source tools/technologies.
Experience inAmazon AWS EC2, Dynamo DB, S3 and other services
Expertise in data analysis, design and modeling using tools likeErwin.
Expertise inBig Data architecture(AWS, GCP, Hortonworks, Cloud era) distributed system,MongoDB, NoSQL.
Hands on experience onHadoop /Big Datarelated technology experience inStorage, Querying, Processing and analysis of data.
Experienced in using various Hadoop infrastructures such asMapreduce,Hive,Sqoop, andOozie.
Experience inAmazonEMR, Spark, Kinesis, S3, ECS, Cloud watch,Lambda, Athena,Zeppelin & Airflow.
Experienced in testing data inHDFSandHivefor each transaction of data.
Strong Experience in working with Databases likeOracle 12C/11g/10g/9i, DB2, SQL Server 2008andMySQLand proficiency in writing complexSQLqueries.
Experienced in using database tools likeSQLNavigator,TOAD.
Experienced with Sparkimproving the performance and optimization of the existing algorithms inHadoopusingSpark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
Experienced in usingFlumeto transfer log data files toHadoop Distributed File System(HDFS)
Knowledge and experience in job work-flow scheduling and monitoring tools likeOozie and Zookeeper.
Good experience inShell programming.
Worked on Machine Learning models using Python / AWS.
Trained in Machine Learning landscape of Statistics, Python, Jupyter, Matplotlib, PyPlot, Tableau, Streamlit, RShiny.

TECHNICAL SKILLS

J2EE/JEE Application Components: Java 1-8, J2EE 2-4, AJAX, Jersey Framework, Spring Boot/Core, MVC, Servlets, JSP, JavaScript, XML, XSL, HTML, CSS, jQuery

J2EE Containers and Services: IBM WAS 5.1 / 6.0, BEA Weblogic 5.1 / 6, Apache, JNDI and JDBC

Databases: IBM DB2 8, Oracle 7/8/9/10g/11g, MS SQL Server 2000/2005, Sybase, Informix and MySQL

Developer Tools: NetBeans, Eclipse, IntelliJ IDEA, PL/SQL Developer, Maven, JIRA, SharePoint, Autosys, and Oracle Developer

Business Analysis/Reporting/Other Tools: Magic draw, Visio, Actuate 4.x/5.x/6.x/7.x

Operating Systems: UNIX (Solaris, HP-UX, Red Hat Linux), Windows 3. / NT / 2000 / XP, Autosys

Big Data: Hadoop MRv1/MRv2, Hive, HBase, Drill, Solr, Spark, Sqoop, Pig, Scala, Drill, Amazon EMR, Amazon Redshift, AWS Athena, AWS Sagemaker, AWS Lambda, Apache Airflow, Apache Kafka, Apache Kudu and Zookeeper

Distributions: - Hortonworks, MapR, Cloudera, AWS and GCP

Machine Learning: Scala MLLib/ML, pySpark, Python, Anaconda, Jupyter, NumPy, Scitikit-Learn, Matplotlib, Pandas and Spyder

Languages and Systems (Working Knowledge): Talend Studio, C, C++, Python, R, MongoDB and Tableau

PROFESSIONAL EXPERIENCE

Confidential

Big Data Architect

Responsibilities:

Worked on end-to-end workflow of each jobs from requirements gathering thru Development, Unit Testing, Code Review, creating ESP Scheduler jobs using Airflow in Test, UAT and PROD environments.
Worked on design/development of BigQuery jobs to ingest data from Teradata into BigQuery for Database Ingestion jobs.
Optimization techniques were used to fix Spark performance issues when running on cluster.
Global configuration files were used to drive the memory and various other requirements of individual Spark jobs.
Airflow DAGs were developed to chain Spark and Query Ingestion jobs.
CI/CD pipeline was implemented using Jenkins followed by Airflow job triggers.

Confidential

Big Data Architect

Responsibilities:

Working on User stories with the Agile model of development using JIRA.
Working on design/development of multiple Spark jobs in Scala that ingest heavy data in size of 2TB from Oracle.
Working on Transformation jobs to take Parquet data files from HDFS with the final output as a CSV file stored on HDFS.
Copy of Allocated data was stored in Impala tables for BI consumption, other was used for building ML models.
Used the crunched down Parquet files as input data to analytics developed using Apache Spark Machine Learning and MLLib packages and pySpark.
As part of OCR pipeline, implemented Text mining to transpose words and phrases in unstructured data into numerical values.
Used machine learning and statistical modeling techniques to develop and evaluate algorithms to improve performance, quality, data management and accuracy.
Developed K-means Clustering approach on Sagemaker using Python to create a Prediction model of Customer Cash reserve as part of SEC Rule 15c3-3(e) from the mined text data stored in AWS S3.
A separate Linear regression model was built on Sagemaker using Python to test on Beta/Dev data and test on (presently QA data) involving Broker-Traders Firms/Customers data involving prediction of missing data in terms of Allocation codes etc.
A few optimization techniques were also used as part of the Machine Learning model development.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship