Senior Big Data Engineer/project Lead Resume
New Orleans, LA
SUMMARY:
- Seeking a position as a Sr. BigData Architect with an opportunity to architect & design Big Data solutions on Amazon AWS. Worked on architecture and development of big data solutions using Splunk, Elastic Search, Kibana, logstash, InfluxDB, Grafana, Apache Nifi, Hadoop, Hive, Tableau, and MangoDB. Knowledge on using Kafka, Spark, AWS components in architecting solutions
- Possess strong analytical and excellent communication skills. Quick learner and a good team player, with a strong capability of leading teams. Detailed oriented, having excellent problem solving capabilities. High dedication and strong customer support.
- Operating Systems: IOS, Linux, Windows. Architecture: Distributed computing and storage (MapReduce / HDFS), business rule / workflow design, design patterns (Repository, GoF, DI/IoC, MVC, Observer, SOA), objected - oriented programming
- Technologies: PrestoDB, Drill, Druid, PipelineDB, Redis, Elasticsearch, Spark, Hive, HDFS, MapReduce, dotnet core, Ubuntu .04, SaltStack, AWS EC2, docker, rethinkdb
- Programming: C# (ASP.NET/core), python (iPython, django), SQL (MS, PostgreSQL), Java (MapReduce), PigLatin (Pig), LINQ, HTML, CSS, javascript (javascript, d3, jQuery).
- Tools: Apache Kafka, Amazon EMR, CDH4.2, Hadoop, Oozie, Flume, Sqoop, Hue, IntelliJ, Eclipse, GIT, BitBucket, Source Tree, JIRA, RAD7, WAS6, IBM Heap Analyzer, JMeter, Visio, Rational Rose, Clear Case, Clear Quest, Synergy, SVN.
- Languages: Kafka Connect, Kafka Streams, Spark Streaming, Spark ETL, Spark SQL, Java Map Reduce, PIG, Hive, Impala, Hue, UML Modelling, UseCase Analysis, Design Patterns (Core & J2EE), OOAD, Java8, J2EE (JSP, Servlets, EJB), Web Services, Ant, Python.
- Frameworks: Apache Confluent, Stratio BigData, Apache Spark, Amazon AWS, Datastax Enterprise, Hadoop (MapReduce/HDFS), Spring3.0, Hibernate, Struts, Kerberos.
- Data Types: CSV, JSON, Avro & Parquet.
- AWS Services: APIGateway, Lambda, EMR, Kinesis, IAM, EC2, S3, EBS, Data Pipeline, VPC, Glacier & Redshift.
- Databases: Cassandra, MongoDB, SQL Server, MySql, Oracle 9i/10g.
- Data Analytics: Tableau, Mahout, RevolutionR, Talend
- Domain: Medical, Lifestyle, Banking, Brokerage and Hospitality.
AREAS OF EXPERTISE:
Cloud programming: Amazon AWS, Java 1.8, EMR, S3, EC2
Enterprise Architecture frameworks: TOGAF, Gartner, Oracle SOA
Big Data Analytics: Alteryx, Pentaho, Jaspersoft, Tableau, Microstrategy, SAP Business Objects, and custom reporting
Big Data technologies: Hadoop Horton Works HDP 2.3, Teradata UDA, Amazon AWS; HDFS, Map Reduce, Hive, Mongo DB, HBase, Spark, Storm
Operating systems: UNIX, Windows, and Linux
Programming languages: JAVA, C, C++, SQL and JavaScript
Messaging systems: SOA, JMS, ESB, Web Services, Kafka, RabbitMQ, Tibco BW 5.6, EMS, Oracle AQ, JBOSS SOA
Graphic User Interface: Web 2.0
Web Technologies: J2EE, EJB, JSP, Servlets, SOAP, REST, JMS, Apache Tomcat 1.7, spring framework, Open source, Application Servers (Jboss, Webshpere)
Communication and Management protocols: TCP/IP, SNMPv2, CMIP, HTTP, and XML
Client-Server model: CORBA, BSD Sockets, RMI, Web Services
Database: Oracle 12g, MS SQL server, MySQL, Hadoop, Mongo DB, hibernate
Object-oriented technology: OMT, UML, design pattern
PROFESSIONAL SUMMARY:
Senior Big Data Engineer/Project lead
Confidential, New Orleans, LA
Responsibilities:
- Professional Services designing big data Solutions for clients with Cloudera Hadoop and Hadoop eco-system technologies, including Hive, HBase, Storm, and Spark. Also design ETL processes for clients. Cloud experience with AWS, Cloud Foundry and Azure. Design, configure, and deploy solutions to cloud using Hortworks Azure and Apache Kafka. Use cloud storage S3, Azure blob Storage
- Implemented the VPC architecture on Amazon AWS with the infrastructure teams to deploy instances in Dev and Prod environments. This architecture ensures all the security controls were put in place to meet the HIPAA requirements to protect PII and PHI sensitive member information.
- Worked on integration of various datasets both into the Data warehouse and the Big Data platform, bringing in data from, DOTCOM-online data, CHAMP-meeting data, SMV-MDM data, WELLO-Coaching data, Teletech-Chat&Call data, Exact Target-Mail data, Reflexis-Workforce data, Click Tools-Satisfaction data, etc.
Environment: Informatica 9.x, OS/390 Enterprise edition and server edition, DB2, Mainframe, DB2, Shell Scripts, Oracle, Green plum 4.2.3, Power center, Metadata Manager, Jaspersoft Reporting, Proactive Monitoring, Administrator, Web Services, Messaging Queues, Power exchange, hadoop, Vertica Analytics server, Mysql, Spark, kafka, Cassandra, Oracle 12c, Java, MS BI STACK, Postgres SQL
Sr. Big Data Architect
Confidential, New Orleans, LA
Responsibilities:
- Architect big data solutions using Hadoop, HBase, Hive, and other related Hadoop technologies. Create data pulls from MySQL, Salesforce.com, websites that provide weather, financial, government economic indicators, sporting events, holiday, gender information, etc. Design HBase data store format, Hadoop job processing using Hadoop YARN map/reduce, data import into HBase. Create custom Hive Serde's and custom Hadoop recordread/fileformaters. Make data available through services and infrstracture using Domain Driven Design and Microservices patterns. Evaluate analysis tools for data scientists. Provide build system and repositories for all big data projects.
- Designing and implementing the core-data-pipeline using Spark Streaming on EMR. This is a near real time streaming application, which enables multiple teams to ingest journal data, via consumers registered to the data-pipeline. Each stream of data flows via RabbitMQ, which is integrated to the pipelines, fans out the data to multiple heterogeneous systems, registered via their consumers. Currently the activity data is being aggregated to 20steps/day/user which translates into 400 records/sec of journal data (10’s TB/yr.), next year the intent is ingest the raw un-aggregated data (10,000 steps/day/user), which translates into 150K mgs/sec (100’s TB/yr.).
Confidential, New York City, NY
Senior Big Data Architect
Responsibilities:
- Worked on Architecting, Designing and Implementing the GADS (Global Analytic Datastore) on Amazon AWS. This is a Big Data platform implemented using the Hybrid Big Data Architecture to hold 5-10 years of historical data from various sources (100s of Terabytes of data), which would provide a customer & product centric view of the data. This would enable the analysts to visualize customer behavior and his journey across various products, customer retention, customer interactions, etc. This would also provide the Business Intelligence to perform predictive and statistical analysis using Big Data technologies.
- The GADS platform was designed following the Hybrid Architecture on Amazon cloud technologies with structured data hosted on Redshift with the ETL pipeline build in Talend and semi/unstructured data would be hosted within S3 buckets using transient EMR clusters, data pipeline, Oozie, Hive, Impala and later migrated to Spark & Spark SQL. The idea was to centralize the data onto the cloud to do further analytics using Tableau and RevolutionR.
- Implemented various Tableau visualizations to identify customer interactions, retention, bookings, engagements, etc. across various dimensions.
- Architected and Implemented the Tableau Server architecture to distribute the Tableau dashboards across the organization.
Environment: Informatica 9.x, OS/390 Enterprise edition and server edition, DB2, Mainframe, DB2, Shell Scripts, Oracle, Green plum 4.2.3, Power center, Metadata Manager, Jaspersoft Reporting, Proactive Monitoring, Administrator, Web Services, Messaging Queues, Power exchange, hadoop, Vertica Analytics server, Mysql, Spark, kafka, Cassandra, Oracle 12c, Java, MS BI STACK, Postgres SQ
Confidential, New York
Big Data Architect
Responsibilities:
- Working for Fraud Technologies architecting, designing and implementing a big data solution for fraud detection and analytics. This product called HULC is intended to hold 13 months of historical data from various sources (100s of Terabytes of data), which would provide a consolidated view of the customer’s products across the bank. This would provide the business analysts the business intelligence to perform various analytics using various big data technologies.
- The other aspect of this product called ELECTRO is to perform ETL transformation on the raw data before it would be processed for scoring and alert detection in the bank.
- Responsible for designing the Cassandra data model for Venom, DFP & Flash projects and integrating them into the application design. Venom data model holds the monetary & non-monetary transactions, DFP holds the online login transactions & Flash holds the alerts generated by HULC.
- Responsible for Architecting the solution, defining the integration points with the fraud-scoring engine, capacity planning, deciding key technologies and designing and implementing the solutions.
Environment: Informatica 9.x, OS/390 Enterprise edition and server edition, DB2, Mainframe, DB2, Shell Scripts, Oracle, Green plum 4.2.3, Power center, Metadata Manager, Jaspersoft Reporting, Proactive Monitoring, Administrator, Web Services, Messaging Queues, Power exchange, Hadoop, Vertica Analytics server, Mysql, Spark, kafka, Cassandra, Oracle 12c, Java, MS BI STACK, Postgres SQL
Confidential, New York
Solutions Architect
Responsibilities:
- SME on MongoDB for the Big Data initiatives, responsible for architecting and designing big data scalable solutions for teams across the organization.
- Being a part of the architecture group responsible for setting up Standards and Best Practices, building POCs, reviewing program level initiatives, vendor management, etc.
- Responsible for introducing Big Data tools and technologies into the bank, presenting and implementing POCs for Tableau, Mahout, RevolutionR, Impala, Pentaho, etc.These above mentioned projects have been implemented using various big data technologies like Cloudera Hadoop CDH4.2, Java MapReduce, Pig, Hive, Oozie, Flume, Cassandra, Sqoop and Solr.
Environment: Informatica 9.x, OS/390 Enterprise edition and server edition, DB2, Mainframe, DB2, Shell Scripts, Oracle, Green plum 4.2.3, Power center
Confidential, New York
Technical Architect
Responsibilities:
- Participating in the Distributed Computing Initiative using Hadoop, Hive & PIG implementations. The initiative was to build a Data Fabric platform within the organization to enable parallel computation and analysis of large files emerging from the trading desks.
- As part of the emerging technology initiative, working on setting up the Hadoop distributed cluster on the Amazon AWS cloud and building POC for implementing Map-Reduce jobs in Java and monitoring the same using the Web UI in fully distributed mode. Also implementing PIG Latin data processing scripts for parallel processing.
- Responsible for driving the Cloud Computing practices at Confidential . Executed a comparative study of Amazon, Azure & MS Private Cloud, building and deploying a Java & .NET application and exploring various cloud features like Elastic Computing, Cloud Storage Services, Identity & Access Management, Load Balancing & Auto Scaling, etc.
Environment: Informatica 9.x, OS/390 Enterprise edition and server edition, DB2, Mainframe, DB2, Shell Scripts, Oracle, Green plum 4.2.3, Power center
Confidential, New Jersey
Project Lead
Responsibilities:
- Responsible for Design/Development of the Customer Activation System for on boarding Clients/Users enabling them to use the Confidential group of products and services. This Admin product goes beyond the client & user on boarding with services such as Admin Agent Management, Service management, Contacts & Reports.
- The application is developed with the UI in .NET, interacting with the Application web services developed in Java and finally being integrated with the End Systems/Applications using the Provisioning product.
- The application has been developed in Technologies like .NET, Web Services, Spring, Hibernate, XML, JMS using tools and products like RAD7, Websphere, TIBCO, TFS, and Oracle10g.
Confidential
Big Data Product Manager / Architect, Santa Monica, CA
Responsibilities:
- Architected and implemented new log processing pipeline leveraging lambda architecture for real-time aggregation and deep multi-dimensional reporting handling over 200TB of log data and disparate telemetry per day. Lead efforts to develop full-stack advanced data exploration and alerting platform.
- Evaluated various Big Data technologies and created proof of concepts to determine best fit.
- Developed various customer facing and internal-use modules for ASP.NET/C#/dotnet core portals department.