We provide IT Staff Augmentation Services!

Big Data Platform Architect Resume

3.00/5 (Submit Your Rating)

San Diego, CA

PROFESSIONAL SUMMARY:

Over 9 years of IT experienced focused in the development space, with the last 4 years having worked as a Hadoop Architect & developer. Prior to entering the Big Data market, he worked solely as a core Java developer. In his current role, he is a BigData Platform Architect Confidential, where he responsible for both architecting and development. Prior to that, he worked Confidential as a Hadoop developer, where he setup the Hadoop ecosystem.

TECHNICAL SKILLS:

Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Spark, Storm, Oozie

Operating Systems: Windows XP, Windows 7/8, Linux Distro (Ubuntu, Mint, Fedora)

Languages: Java, Python, R, C#, Haskell

Java Technologies: JDBC 4.1, Servlets 2.4, JSP 2.0

Web Technologies: HTML, JavaScript, jQuery, AJAX

Scripting Languages: UNIX Shell Script, K Shell

Frameworks: Spring 4.0, Hibernate 5.0

RDBMS DB: Oracle, MySQL, PostgreSQL, IBM DB2

NoSQL Technologies: Cassandra, MongoDB, Neo4j, HBase

Servers: Tomcat, JBoss, Web Logic

Tools: & Utilities: Eclipse, Net Beans, My Eclipse, SVN, Git, Maven, SOAP UI, JMX explorer, XML Spy

PROFESSIONAL EXPERIENCE:

Big Data Platform Architect

Confidential, San Diego CA

Responsibilities:

  • Designed and build scalable infrastructure and bigdata platform to collect and process very large amounts of data including different real time events generated via towers as a form of semi structured data.
  • Demonstrated thought leadership and guide management people on bigdata adoption, setting up best practices, governance structures.
  • Act as an internal resource to help capture real time data and emulated the streaming behaviors to analysis various data ingestion rates via storm & spark route.
  • Effectively communicated architecture to the upper management to showcase the potential of offline, near real time and real time analytics.
  • Extensive use of Apache Ambari via Hortonwork Data Platform for Managing, Monitoring and reviewing whole infrastructure lives operations & activity. Also, Managing and Supporting the MapReduce job, spark job, storm topologies to record and report on performance metrics, time, status, user or resource usage.
  • Debugging & troubleshooting the concurrent job workloads that may impact or be impacted by failures or bottlenecks.
  • Managing activity across the cluster with role - based access to various tools like JIRA & Git.
  • Architecting various utility helper classes to ease the development process to dynamic KPI generation on runtime with minimal effort of coding.
  • Also, worked as an agile team member to carry out any activity, did pair programming or supported work and provided code review & performance optimization.
  • Actively schedule the daily status call with team members to follow scrum process to meetup deliverables on time.

Environment: Hadoop, Linux, MapReduce, HDFS, Shell Scripting, Java 8, Log4j, Mockito, Git, Eclipse, pyCharm, Maven, JIRA, Apache Storm, Apache Spark, Apache Hadoop, Zookeeper, Hortonwork Data Platform, Apache Ambari, R, Python, Predictive Analytics, Time Series Analysis.

Confidential, San Jose, CA

Hadoop Architect

Responsibilities:

  • Architectured different kind of MapReduce Jobs to parse the raw data, populate staging tables and store the refined data in partitioned tables in Hive.
  • Crafted Hive queries that helped market analysts spot emerging trends by comparing fresh data with Historic tables to build historical metrics on daily comparison basis.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data and categorized into different dimensions.
  • Interacting daily and weekly basis with sponsors, stakeholders or business users to help them on review processes of analytics reports and explain the cause of data anomaly & explain them technical challenges, feasibility and flexibility.
  • Extensive use of Splunk for HadoopOps for Managing, Monitoring and reviewing whole infrastructure lives operations & activity.
  • Also, Managing and Supporting the MapReduce job to rapidly sort, filter and report on performance metrics, time, status, user or resource usage.
  • Diluting the concurrent job workloads that may impact or be impacted by failures or bottlenecks.
  • Managing user activity across the cluster with role-based access to various tools like svn, JIRA, Git etc.
  • Architecting various utility helper classes to ease the development process to help team member to build or perform task activity quickly.
  • Worked as an agile team member to carry out any activity, did pair programming or supported work and provided code review & performance optimization existing MapReduce programs like customized partitioner or combiner or Input reader classes etc.
  • Performing daily status call to follow scrum process to meetup deliverables on time.
  • Also be the part of triage call to handle defect reported by tester team or QA team.
  • Communicating with Admin Team to resolve any configuration related issues.
  • In-placed the MongoDB to allows the cache large amount of data to be available for Front-End team.

Environment: Hadoop, Linux, MapReduce, HDFS, Hive, Pig, Shell Scripting, Sqoop2, Java 7, Log4j, Mockito, Git, Eclipse, MySQL, Maven, Gradle, JIRA, Jenkins, MongoDB, Apache Crunch, Apache DataFu, Apache Flume, Apache Hadoop, Parquet, Solr, ZooKeeper.

Confidential, Bothell, WA

Hadoop Developer

Responsibilities:

  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems and suggested some solution translation via incooperating Apache Spark.
  • With the help of Apache Spark, we are capable to process and discover patterns from the potential customer viewing experience on real-time as a form of events and being able to respond to them immediately is a capability that could yield a cross selling business, for purposes such as customer retention, targeted advertising, auto-adjustment of complexity level, and so on (This is newly scope we already targeted as a merging of Confidential & Confidential & Confidential ).
  • I also utilized Spark streaming algorithms to identify patterns over time, and make more targeted predictions and decisions.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
  • Used Splunk for HadoopOps for Managing, Monitoring and reviewing whole infrastructure lives operations & activity.
  • Also Managed MapReduce job to rapidly sort, filter and report on performance metrics, time, status, user or resource usage.
  • Identify concurrent job workloads that may impact or be impacted by failures or bottlenecks.
  • Created a definitive record of user activity across the cluster and with role-based access to Splunk searches same.
  • Introduced some helper classes to get data from HBase tables for specific business use cases so that other application can use them without implementing similar logic to handle Confidential their end.
  • Worked as an agile team member to carry out any activity, did pair programming or supported work and provided code review & performance optimization existing MapReduce programs like customized partitioner or combiner or Input reader classes etc.
  • Attending daily status call to follow scrum process to complete each user story within timeline.
  • Also be the part of triage call to handle defect reported by tester team or QA team.
  • Coordinating with EM to resolve any configuration related issues.
  • I also developed some Helper class for abstracting different NoSQL database connection & usage it act as core toolkit.
  • Enhanced some existing MapReduce written in python scripts.
  • Attending the daily call or alternate day calls with business users to discuss each flow and convert their discussed part into user stories and further break down it to individual task and deliver same within particular sprint.

Technologies Used: Hadoop, Linux, CDH & HDP, MapReduce, HDFS, Hive, Pig, Shell Scripting, Sqoop, Java 7, NoSQL, Eclipse, Oracle 11g, Maven, Log4j, Mockito, Git, ATG ecommerce, Spring, Apache Kafka, Apache Spark, Logstash, ElasticSearch, solr, Splunk.

Confidential, NY

Associate Hadoop Consultant

Responsibilities:

  • Understand the exact requirement of report from the Business groups and users.
  • Frequent interactions with Business partners.
  • Good Understanding of the Hadoop Distributed File System and Eco System (MapReduce, Pig, Hive, Sqoop)
  • Worked on to setup Hadoop ecosystem.
  • Managed and reviewed Hadoop Log files.
  • Responsible writing PIG Script and Hive queries for data processing
  • Running Sqoop for importing metadata from Oracle
  • Creation of shell script to collect raw logs from different machines.
  • Created schema for each folder to define schema.
  • Implemented Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, AND UNION.
  • Written lots PIG UDF to process some complex data
  • Coded many MapReduce program to process unstructured logs file.
  • Worked on Import and export data into HDFS and Hive using Sqoop
  • Used parameterize pig script and optimized script using illustrate and explain.
  • Involved in the process of configuring HA, Kerberos security issues and name node failure restoration activity time to time as a part of zero downtime.
  • Implemented FAIR Scheduler as well.

Technologies Used: Hadoop, Linux, MapReduce, HDFS, Hive, Pig, Shell Scripting, Sqoop, Java 6, Eclipse, Oracle 10g, Javascripts, Servlets, Nodejs, JMS, Ant, Log4j and Junit.

Confidential, Pittsburg, PA

Software Engineer

Responsibilities:

  • Coordinate with the Technical Director on current programming tasks.
  • Collaborate with other programmers to design and implement features.
  • Quickly produce well-organized, optimized, and documented source code.
  • Create and document software tools required by artists or other developers.
  • Debug existing source code and polish feature sets.
  • Contribute to technical design documentation.
  • Work independently when required.
  • Continuously learn and improve skills.

Technologies Used: Windows, Java 6, Java Card API, Java Communication API, Eclipse, Ant, Log4j and Junit.

We'd love your feedback!