We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Pittsburgh, PA

SUMMARY

  • 8 years of comprehensive IT experience in Big Data domain with tools like Hadoop, Hive and other open source tools/technologies in Banking, Healthcare, Insurance, and Energy.
  • Good understanding of NoSQL Databases and hands on work experience in writing applications on Nosql databases like Cassandra and Mongodb.
  • Substantial experience writing MapReduce jobs in Java, Pig, Flume, Zookeeper and Hive and Storm.
  • Experience in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and MapReduce open source tools/technologies.
  • Extensive Knowledge on automation tools such as Puppet and Chef.
  • Hands - on experience with Productionalizing Hadoop applications such as administration, configuration management, monitoring, debugging, and performance tuning.
  • Hands on experience in installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, Hbase, AVRO, Zookeeper, Oozie, Hive, HDP, Cassandra, Sqoop, Pig, Flume.
  • Supported online gap site for deployment of application and modeling changes on RHEL servers for agile continuous integration using Subversion repositories. Red hat Linux, Centos, Microsoft servers.
  • Extensive experience in SQL and NoSQL development with also e xperience working with Apache SOLR for indexing and querying.
  • In-depth understanding of Data Structure and Algorithms.
  • Expertise in J2EE Technology - Servlets, JSP, EJB, RMI, JDBC, JNDI, Java
  • Expertise in developing GUI (Graphical User Interfaces) using JAVA Swings, JSF.
  • Experience in web-based languages such as HTML, CSS, PHP, XML and other web methodologies including Web Services and SOAP.
  • Extensive experience in all the phases of the software development lifecycle (SDLC).
  • Experience in deploying applications in heterogeneous Application Servers TOMCAT, Weblogic and Oracle Application. Server.
  • Extensive knowledge of NoSQL databases such as Hbase.Worked on Multi Clustered environment and setting up Cloudera Hadoop echo System.
  • Background with traditional databases such as Oracle, Teradata, Netezza, SQL Server, ETL tools / processes and data warehousing architectures.
  • Wrote a technical paper and created slideshow outlining the project and showing how Cassandra can be potentially used to improve performance.
  • Extensive experience in designing analytical/OLAP and transactional/OLTP databases.
  • Proficient using ERwin to design backend data models and entity relationship diagrams (ERDs) for star schemas, snowflake dimensions and fact tables.
  • Excellent conceptual knowledge of NoSQL databases such as HBase,MongoDB Cassandra.
  • Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.

TECHNICAL SKILLS

  • Hadoop, HIVE, HDP, PIG, Sqoop,Flume
  • MapReduce, Splunk, HDFS, Zookeeper, Storm
  • Shell, Python, AVRO
  • AIX 5.1, Red Hat Linux. Centos
  • Lucene and Solr
  • Apache Contributor
  • Puppet and Chef
  • JIRA, SDLC, MongoDB,Cassandra
  • Cloudera, HortonWorks
  • Datastage,Talend Open Studio
  • Tableau, Qlikview, Giraph
  • Java, J2EE, JSP, Servlets, Struts, Hibernate, Spring
  • Spring MVC, Spring DAO,Spring Security, Spring WS,
  • Drools, Jboss Enterprise portal, Web Services, JSF
  • HTML, DHTML, XML, XSLT, CSS, Ajax, SOAP, JavaScript, Web Services.
  • IBM DB2,Teradata,MySql,NoSql,Oracle 11i/10g/9i
  • AWS (Amazon Web Services), EMR
  • Data Pipeline and Redshift
  • ETL Tool (Informatica)
  • Data Warehouse/Business Intelligence (BI)
  • Control M,Vertis, Datastage
  • Achieving Sales Performance Goals

PROFESSIONAL EXPERIENCE

Confidential, PITTSBURGH, PA

BIG DATA ENGINEER

Responsibilities:

  • Worked on Distributed/Cloud Computing (Map Reduce/Hadoop, Pig, Hbase, AVRO, Zookeeper, etc.), Amazon Web Services (S3, EC2, EMR, etc.), Oracle SQL Performance Tuning and ETL, Java 2 Enterprise, Web Development, Mobile Application Development (Objective-C, Java Native Mobile Apps, Mobile Web Apps), Agile Software Development, Team Building & Leadership, Engineering Management, Internet of Things (Amateur Sensor Networks, Embedded Systems and Electrical Engineering).
  • Setting up and supporting Cassandra(1.2)/DataStax (3.2) for POC and prod environments using industry's best practices.
  • Working as a lead on Big Data Integration and Analytics based on Hadoop, SOLR and webMethods technologies.
  • Working in implementing Hadoop with the AWS EC2 system using a few instances in gathering and analyzing data log files.
  • Communicate with developers using in-depth knowledge of Cassandra Data Modeling for converting some of the applications to use Cassandra instead of Oracle
  • Working on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Working on united health done loading files to Hive and HDFS from MongoDB.
  • Founded and developed environmental search engine engine using PHP5, JAVA, Lucene/SOLR, Apache and MYSQL.
  • Respoe evaluation of Big Data software like Splunk, Hadoop for augmenting the warehouse, identified use cases and led Big Data Analytics solution development for Customer Insights and Customer Engagement teams.
  • Designed techniques and wrote effective and successful programs in JAVA, Linux shell scripting to push the large data including the Text and Byte type of data to successfully migrate to NOSQL Stores using various Data Parser techniques in addition to Mapreduce jobs.
  • Used the Spark - Cassandra Connector to load data to and from Cassandra.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Worked on TOAD for Data Analysis, ETL/Informatica for data mapping and the data transformation between the source and the target database.
  • Worked on apache Solr to search device and user details.
  • Working on Hive/Hbase vs RDBMS, imported data to Hive, HDP created tables, partitions, indexes, views, queries and reports for BI data analysis.
  • Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Currently working on XML parsing using PIG, Hive, HDP and Redshift.
  • Working on architected solutions that process massive amounts of data on corporate and AWS cloud based servers.
  • Uses Splunk to detect any malicious activity against web servers.
  • Tuned the Hadoop Clusters and Monitored for the memory management and for the Map Reduce jobs, to enable healthy operation of Mapreduce jobs to push the data from SQL to Nosql store.
  • Ran many performance tests using the Cassandra-stress tool in order to measure and improve the read and write performance of the cluster.
  • Managed works including indexing data, tuning relevance, developing custom tokenizers and filters, adding functionality includes playlist, custom sorting and regionalization with Solr Search Engine.
  • Built, Stood up and delivered HADOOP cluster in Pseudo distributed Mode with Namenode, Secondary Namenode, Job Tracker, and the Task tracker running successfully with Zookeeper installed, configured and Apache Accumulo ( NO SQL Google's Big table) is stood up in Single VM environment.
  • Experience in the Extraction, Transformation and Loading of data from multiple sources into Data Warehousing using Informatica Power Center, OLTP, and DSS. nsible for building scalable distributed data solutions using Datastax Cassandra.
  • Hands on experience installing, configuring, administering, debugging and troubleshooting Apache and Datastax Cassandra clusters.
  • Led the evaluation of Big Data software like Splunk, Hadoop for augmenting the warehouse, identified use cases and led Big Data Analytics solution development for Customer Insights and Customer Engagement teams.
  • Designed techniques and wrote effective and successful programs in JAVA, Linux shell scripting to push the large data including the Text and Byte type of data to successfully migrate to NOSQL Stores using various Data Parser techniques in addition to Mapreduce jobs.
  • Used the Spark - Cassandra Connector to load data to and from Cassandra.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Worked on TOAD for Data Analysis, ETL/Informatica for data mapping and the data transformation between the source and the target database.
  • Working on Hive/Hbase vs RDBMS, imported data to Hive, HDP created tables, partitions, indexes, views, queries and reports for BI data analysis.
  • Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Currently working on XML parsing using PIG, Hive, HDP and Redshift.
  • Working on architected solutions that process massive amounts of data on corporate and AWS cloud based servers.
  • Uses Splunk to detect any malicious activity against web servers.
  • Tuned the Hadoop Clusters and Monitored for the memory management and for the Map Reduce jobs, to enable healthy operation of Mapreduce jobs to push the data from SQL to Nosql store.
  • Ran many performance tests using the Cassandra-stress tool in order to measure and improve the read and write performance of the cluster.
  • Managed works including indexing data, tuning relevance, developing custom tokenizers and filters, adding functionality includes playlist, custom sorting and regionalization with Solr Search Engine.
  • Built, Stood up and delivered HADOOP cluster in Pseudo distributed Mode with Namenode, Secondary Namenode, Job Tracker, and the Task tracker running successfully with Zookeeper installed, configured and Apache Accumulo ( NO SQL Google's Big table) is stood up in Single VM environment.
  • Experience in the Extraction, Transformation and Loading of data from multiple sources into Data Warehousing using Informatica Power Center, OLTP, and DSS.
  • Continuous monitoring and managing EMR cluster through AWS Console.
  • Working in importing streaming logs and aggregating the data to HDFS through Flume.
  • Delivered Working Widget Software using EXTJS4, HTML5, RESTFUL Web services, JSON Store, Linux, HADOOP, ZOOKEEPER, NO SQL databases, JAVA, SPRING Security, JBOSS Application Server for Big Data analytics.
  • Working on building HCatalog schemas through sqoop.
  • Developed a custom AVRO Framework capable of solving small files problem in Hadoop and also extended Pig and Hive tools to work with it.
  • Working with a complex data set and modeled it in Mahout.
  • Working on use cases, data requirements and business value for implementing a Big Data Analytics platform.
  • Working on configuring and Maintaining Hadoop environment on AWS.
  • Developed application component interacting with MongoDB.
  • Working on Modifying Chef Recipes used to configure the Hadoop stack.
  • Evaluate Puppet framework and tools to automate the cloud deployment and operations.
  • Working on Installing and configuring Hive, HDP, Pig, Sqoop, Flume, Storm and Oozie on the Hadoop cluster.
  • Developed Use cases and Technical prototyping for implementing PIG, HDP, HIVE and HBASE.
  • Analyzed the alternatives for NOSQL Data stores and intensive documentation for HBASE vs. Accumulo data stores.
  • Working on transformation processes using ETL tools like INFORMATICA POWERCENTER 8.x/ 9.0/9.1/9.5.
  • Working in analyzing data using Hive, Pig, Storm and custom MapReduce programs in Java.

Confidential, SANJOSE, CA

HADOOP DEVELOPER

Responsibilities:

  • Implemented Hadoop stack and different big data analytic tools, migration from Oracle to Hadoop.
  • Performed benchmark comparison with NoSQL DB like MongoDB (with Python) vs. HBase for POC.
  • Set up MongoDB for a large TB table using Clob.Python programming using Pymongo driver.
  • Designed and implemented 10 node Hadoop Cluster.
  • Experienced in Linux and Unix command line.
  • Managed 24X7 uptime and performance tuning.
  • Experience implementing ETL processes using map-reduce code in Java.
  • Java based development for calling Hadoop API.
  • Hbase development and tuning.
  • Imported and exported data into HDFS and Hive using Sqoop.
  • Extensively used HDFS commands.
  • Hive SQL queries.
  • Experienced with Hive on Internal and External tables.
  • Experienced with Hive on Optimizing and Partitioning tables.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Jobs management using Fair scheduler.
  • Cluster coordination services through Zoo Keeper.
  • Installed and configured Hive and also written Hive UDFs.
  • Experienced in managing and reviewing Hadoop log files
  • Used Teradata utilities like MultiLoad, Tpump, and Fast Load to load data into Teradata datawarehouse from Oracle.
  • Good experience in Data Modeling with expertise in creating Star, Snow-Flake Schemas, FACT and Dimensions Tables, Physical and LogicalData Modeling using Powerdesigner.
  • Performance tuning - AWR, ADDM, ASH, Tracing, Gather Stats, PL/SQL, and SQL tuning.
  • Real application clusters monitoring for perf environment, tuning, using OEM (Oracle Enterprise Manager), Statspack, AWR, trace files, alert log, and V$ dynamic views.
  • Refresh, backup, and upgrade to 11.2.03.
  • Reviewed database code from development team and identified performance bottlenecks.
  • Set up and managed RMAN, hot and cold backups, and xports expdp and impdp.
  • Managed all primary on-call DBA activities such as checking on table space problems, checking for fragmtentation, checking on nightly/weekly backup problems, responding to any Oracle errors, and deadlock conditions.
  • Periodically gathered stats and monitored/fixed resource intensive SQL.
  • Designed and implemented database backup/restore infrastructure using Oracle RMAN.
  • Worked with various Informatica client tools like source analyzer, mapping designer, Mapplet Designer, Informatica Repository Manager, and Workflow Manager.
  • Used various active transformations like filter transformation, router transformation, joiner transformation, and aggregator transformation.
  • Designed complex mappings like SCD type 1 and type 2 applications using Informatica.
  • Troubleshot session and workflow logs for data discrepancy and job failures.
  • Installing critical RedHat Linux patches and Solaris on all environments.
  • Install and configure JBOSS Application servers over Apache web server.
  • Configure JBOSS cache cluster in PROD environment.
  • Installed and configured OpenSSH and TCP Wrappers.
  • Shell Scripting for automating the System Admin tasks.
  • Configuration of Kernel parameters for Oracle Database.

Confidential, WATERTOWN, MA

HADOOP DEVELOPER

Responsibilities:

  • Installed and configured Hadoop through Amazon Web Services in cloud.
  • Designed, planned and delivered proof of concept and business function/division based implementation of Big Data roadmap and strategy project (Apache Hadoop stack with Tableau) in UnitedHealthcare using Hadoop.
  • Developed MapReduce jobs in java for data cleaning and preprocessing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Used Bash Shell Scripting, Sqoop, AVRO, Hive, HDP, Redshift, Pig and Java Mapreduce daily to develop ETL, batch processing, and data storage functionality.
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Worked on NoSQL databases including Hbase and MongoDB.
  • Worked on data for classification of different Health Care Boards using Mahout.
  • Used data stores included Accumulo/Hadoop and graph database.
  • Exploited Hadoop MySQL - Connector to store Mapreduce results in RDBMS.
  • Worked on Business Intelligence (BI)/Data Analytics, Data Visualization, Big Data with Hadoop and Cloudera based projects, SAS/R, Data warehouse Architecture Design and MDM/Data Governance.
  • Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and ZooKeeper.
  • Worked on deployed technologies for exclusively off-site using the Amazon infrastructure and ecosystem (EMR, Redshift, Hive, DynamoDB)
  • Worked on loading all tables from the reference source database schema through sqoop.
  • Worked on designed, coded and configured server side J2EE components like JSP, AWS and JAVA.
  • Used Dell Crowbar as a wrapper to Chef to deploy Hadoop.
  • Developed scalable Big Data Architecture which process the Terabytes of semi-structure data to extract business insights.
  • Collected data from different databases( i.e. Teradata, Oracle, MySql) to Hadoop
  • Used oozie and Zookeeper for workflow scheduling and monitoring.
  • Worked on Designing and Developing ETL Workflows using Java for processing data in HDFS/Hbase using Oozie.
  • Queried and analyzed data from Datastax Cassandra for quick searching, sorting and grouping.
  • Created User defined types to store specialized data structures in Cassandra.
  • Experienced in managing and reviewing Hadoop log files.
  • Responsible for coding of .net data ingestion tool for solr 4.1 and integration/adaptation of current IIS/.net/Microsoft solution into solr exclusively.
  • Created design approach to lift and shift the existing mappings to Netezza.
  • Conduct vulnerability analyses; reviewing, analyzing and correlating threat data from available sources such as Splunk.
  • Working on extracting files from MongoDB through Sqoop and placed in HDFS and processed.
  • Worked with Different file formats (AVRO, RC file Format )
  • Worked on Data Architecture, Data Modelling, ETL, Data Migration, Performance tuning and optimization.
  • Worked on Hadoop installation & configuration of multiple nodes on AWS EC2 system.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Supported Mapreduce Programs those are running on the cluster.
  • Jobs management using Fair scheduler.
  • Managed works including indexing data, tuning relevance, developing custom tokenizers and filters, adding functionality includes playlist, custom sorting and regionalization with Solr Search Engine.
  • Cluster coordination services through Zookeeper.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and also written Hive UDFs.
  • Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in mapreduce way.
  • Led the project execution on cost and schedule requirement in Accumulo.
  • Worked on setting up Pig, Hive, Redshift and Hbase on multiple nodes and developed using Pig, Hive, Hbase, MapReduce and Storm.
  • Design and implement data processing using AWS Data Pipeline.
  • Drove holistic tech transformation to Big Data platform for UnitedHealthcare, create strategy, define blueprint, design roadmap, build end-to-end stack, evaluate leading technology options, benchmark selected products, migrate products, reconstruct information architecture, introduce metadata management, leverage machine learning, productionize consolidated data store: Hadoop, MR, Hive, HADOOP and MapReduce.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig.
  • Worked on automate monitoring and optimizing large volume data transfer processes between Hadoop clusters and AWS.
  • Strong knowledge on Data Warehousing experience using Informatica Power Center.
  • Configure and manage Splunk Forwarders,Splunk Indexers and Splunk Search Heads.
  • Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
  • Data scrubbing and processing with Oozie.

Hire Now