Portfolio Architect Resume
Phoenix, AZ
SUMMARY
- 13 years of work IT experience which includes 5 years experience in building large scale distrubuted data processing and in - depth knowledge of hadoop architecture MR1 & MR2 (YARN) and 7+ years experience in Java/J2EE application development
- Experience in all phases of software development life cycle & Agile Methodology
- Expertise in implementing, consulting, managing hadoop clusters and eco system comopoents like HDFS, MapReduce, Pig, Hive, Flume, Oozie & Zookeeper
- Proficiency in batch processing using hadoop mapreduce, pig & hive ; real time processing using Storm ; Stream processing of data with Spark steaming with Scala
- Hands on experience in writing Pig/Hive Scripts and custom UDF's
- Experience in partitioning, bucketing and joins in Hive
- Experience in query optimization and performance tuning with Hive
- Hands on experience in importing and exporting data to/from RDBMS and HDFS/HBase/Hive thru Sqoop full refresh and incremental
- Hands on experience in loading the log data from multiple sources into HDFS thru Flume Agent
- Experience in configuring and implementing Flume components such as Source, Channel and Sink
- Experience in writing Oozie workflow and parallel workflow execution using Fork and controlling child workflow thru Coordinator
- Experience in working with Zookeeper for co-ordination of hadoop components like hbase, kafka
- Experience in NoSQL databases like HBase, MongoDB and good exposure to Cassandra and Graphx
- Experience in search engines like Elastic Search and good exposure to Solr & Impala
- Experience in using SequenceFile, RCFile, Avro and exposure to ORC, Parquet for data serialization
- Good exposure to schedulers like Fair, Capacity and Adaptive to improve the performance
- Experience working with various hadoop distributions like OpenSource Apache, Cloudera, HortonWorks & MapR
- Experience in administrative tasks such as installing, configuring, comission & decommission of nodes, backups & recovery of hadoop nodes in the cluster
- Good Exposure to general purpose language Python & Scala
- Programming experience in UNIX Shell Script.
- Strong analytical skills with ability to quickly understand client’s business needs.
- Experience with Agile daily standup meetings, writing user Stories, evaluating story points, creating tasks, ETA tasks, task progress with daily burn-down chart, completing the backlogs
- Research-oriented, motivated, proactive, self-starter with strong technical, analytical and interpersonal skills.
TECHNICAL SKILLS
Programming Languages: Java, C, C++, Scala, Python
BigData Technologies: HDFS, MapReduce, YARN, Hive, Hue, Beeswax, Pig, Sqoop, Flume, Oozie, Zookeeper
NoSQL: HBase, MongoDB
RDBMS: MySQL, Oracle, SqlServer, DB2
Data Ingestion Tools: Flume, Sqoop, Kafka
Monitoring Tools: Ganglia, Nagios, Splunk
Visualization Tools: Pentaho, Kibana, Tableau
Realtime Streaming and Processing: Storm, Spark Streaming
Data Mining Tools: R, SPSS, RapidMiner
Operating Systems: Windows 9x/2000/XP/7/8/10, Linux, UNIX, Mac
Development Tools: Eclipse, RSA, RAD
Build and Log Tools: Ant, Maven, Log4j
Version Control: CVS, SVN, GitHub
PROFESSIONAL EXPERIENCE
Confidential, Phoenix, AZ
Portfolio Architect
Responsibilities:
- Developed Spark scripts by using Scala shell commands.
- Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation
- Responsible for building scalable distributed data solutions using Hadoop.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself
- Spark DataFrame API’s and Scala Case class to process GB’s of Dataset
- Developed a Apache Spark Streaming Job using scala to analyze streaming data
- Writing transformations and actions to process complex in Spark using Scala
- Hive queries and partitions to store the data in internal tables
- Unix Shell /Pig script to preprocess the data stored in the CornerStone Platform
- MapReduce to process the stored data for multiple use cases
Environment: Hadoop, HDFS, MapReduce, Pig, Hive, Spark, Scala, MapR Distribution
Confidential, Eden Prairie, MN
Senior Big Data Lead Consultant
Responsibilities:
- Designing and developing Logical Data Models for the Legacy & CornerStone Databases
- Creation of Sqoop scripts for tables using Linux Scripts
- Creation, Deletion & Execution of Sqoop Jobs in sqoop metastore
- HBase Table's hbase row key design and mapping with RDBMS table column names
- Mapping of HBase Table columns with Hive External table columns
- Historical and Incremental Importing of RDBMS data to HBase table using metastore
- Validation of Sqoop scripts, Hive Scripts, Hbase Scripts
- Creation of Hbase Tables and column families, altering the column families, providing permission to hbase tables, defining regionserver space
- Automation of workflow thru oozie
- Parallel execution of imports for multiple tables in the database using Fork in oozie
- Written transformation and actions on Scala to processs complex data
- Bug fixing and production support running processes.
- Participated in SCRUM Daily stand-up, sprint planning, Backlog grooming & Retrospective meetings.
Environment: Hadoop, HDFS, MapReduce, Pig, Hive, Oozie, HBase, Spark, Scala, ZooKeeper, MapR Distribution
Confidential, Atlanta, GA
Senior Big Data Lead Consultant
Responsibilities:
- Designing technical architecture and developed various Big Data workflows using MapReduce, Hive, YARN, Kafka, Storm & Spark
- Deployed on premise cluster and tuned the cluster for optimal performance for job execution needs and processes large data sets.
- Built re-usable Hive UDF libraries for business requirements which enabled various business analysts to use these UDF’s in Hive querying.
- Used FLUME to dump the application server logs into HDFS.
- The logs that are stored on HDFS are analyzed and the cleaned data is imported into Hive warehouse which enabled end business analysts to write Hive queries.
- Experience in working with search engine ElasticSearch in getting real time data analytics integrating with Kibana dashboard.
- Setup Kafka cluster on AWS, configuring and troubleshooting on Kafka brokers
- Worked on creating Kafka topics, emitting thru producers, stored in partitions and consuming thru consumers
- Implemented Kafka Custom Producer/Consumer for publishing messages to topic and subscription from the topics and written the topology.
- Written Spouts to read data from Kafka message broker and passing to processing logic
- Written Bolts to filter, aggregate, join interacting with data stores and emit tuples for the subsequent bolts to process
- Written Storm topology which defines the flow of data between the edges
- Experience in data migration from RDBMS & processed events from Storm Bolts to Cassandra
- Worked with various HDFS file formats like Avro, Sequence File and various compression formats like Snappy, bzip2 & lz2.
- Participated in SCRUM Daily stand-up, sprint planning, Backlog grooming & Retrospective meetings.
Environment: MapReduce, Pig, Hive, FLUME, JDK 1.6, Linux, Kafka, Storm, Spark, Elastic-Search, YARN, Hue, HiveServer2, Impala, HDFS, Oozie, Splunk, Git, Kibana, Linux Scripting
Confidential, Bloomington, IL
Senior Big Data Consultant
Responsibilities:
- Design and Development Application Architecture and setup Hadoop Environment.
- Set up Splunk Servers and Forwarders on the cluster nodes
- The configuration for additional data nodes was managed using Chef
- Written Linux Scripts & Cron Jobs for Monitoring Services & the health Cluster health.
- Developed Map Reduce programs to cleanse and parse data in HDFS obtained from various data sources and to perform joins on the Map side
- Written M/R jobs to process trip summary & scheduled to execute hourly, daily, weekly, monthly & quarterly.
- Responsible for loading machine data into Hadoop cluster coming from different sources using Flume
- Written workflow on Oozie to schedule M/R Jobs
- Configured Flume and written Custom Sinks and Sources
- Used Flume to collect, aggregate, and store the log data from different web servers.
- Ingested data into HBase and retrieve using Java API's
- Used SPARK SQL from extracting data from different data sources and placing the processed data into NoSQL(MongoDB)
- Used SPARK for analyzing the machine emitted & sensor data to help extracting data sets for meaningful information such as location, driving speed, acceleration, braking speed, driving pattern and so on.
- Created SPARK SQL(metadata) tables to store the processed results in a tabular format
- Used Git as version control to checkout and check-in of files.
- Reviewed high level design & code & mentoring team members.
- Participated in SCRUM Daily stand-up, sprint planning, Backlog grooming & Retrospective meetings.
Environment: Hadoop, MapReduce, OpenStack, Flume-NG, Free IPA, HBase 0.98.2, MongoDB, Spark, Kerberos, PostgreSQL, RabbitMQ Server, Map/Reduce, HDFS, ZooKeeper, Oozie, Splunk, GitHub, Chef
Confidential
Senior BigData Engineer
Responsibilities:
- Design and Development of Hadoop Stack
- Analyzed the functional specification
- The configuration of data nodes on the cluster was managed using CHEF.
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and semi-structured data.
- Load data to External tables by using Hive Scripts
- Performed aggregate Joins, transformation using Hive queries
- Implemented Partitions, Dynamic Partitions, Buckets in Hive
- Optimized HIVE SQL queries and thus improved the job performance
- Developed Sqoop scripts to import and export the data from relational sources and handled incremental loading on the customer and transaction data by date
- Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop
- Used Oozie to automate/schedule business workflows which invoke Sqoop, MapReduce and Pig jobs as per the requirements
- Performed Hadoop cluster environment administration that includes adding & removing cluster nodes, cluster capacity planning, performance tuning, cluster monitoring, and trouble shooting
- Written Unit Test Cases for Hive Scripts
Environment: Java, Hadoop, HDFS, MapReduce, Pig, Hive, Flume, Oozie, ZooKeeper, CHEF
Confidential
Senior Software Engineer
Responsibilities:
- Understanding the functional requirements of the client for designing the technical specifications, to develop the system and subsequently documenting the requirement
- Responsible for developing class diagrams, sequence diagrams
- Designed and implemented a separate middle ware Java component on Fusion
- Reviewed high level design & code & mentoring team members.
- Participated in SCRUM Daily stand-up, sprint planning, Backlog grooming & Retrospective meetings.
Environment: Java1.6, Oracle Fusion Middleware, Eclipse, WebSphere, Spring F/w
Confidential
Senior Software Engineer
Responsibilities:
- Understanding the functional requirements of the client for designing the technical specifications, to develop the system and subsequently documenting the requirement.
- Prepared LLD - Class Diagrams, Sequence Diagrams, Activity Diagram using Enterprise Architect UML Tool
- Worked on Hibernate, Spring IOC, DAO, JSON Parsing
- Prepared Unit test cases for the developed UI.
- Responsible for problem tracking, diagnosis, replications, troubleshooting, and resolution of client problems.
Environment: Java, Confidential Proprietary F/w using DOJO, Hibernate, Spring, DB2, RSA, Rational ClearCase, RPM, RQM, Mantis
Confidential
IT Consultant
Responsibilities:
- Understanding the functional requirements of the client for designing the technical specifications, to develop the system and subsequently documenting the requirement.
- Prepared LLD - Class Diagrams, Sequence Diagrams, Activity Diagram using Enterprise Architect UML Tool
- Developing UI on JSF with RichFaces
- Writing TestNG test Cases
- Ensuring appropriate process standards are met and maintained.
- Involved in preparing Adhoc Reports.
Environment: Windows, Unix, Java, Struts, Hibernate, Tomcat, Lenya, Remedy Tool, WinSCP, Putty, VPN, Eclipse