We provide IT Staff Augmentation Services!

Big Data Engineer Resume

5.00/5 (Submit Your Rating)

Plano, TX

SUMMARY:

  • Having 8+ years of IT experience in Design, Development, Maintenance and Support of Big Data Applications and BI tools.
  • Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design to load data into Hadoop environment.
  • Expertise in writing Hive UDF, Generic UDF's to in corporate complex business logic into Hive Queries.
  • Extensive experience on importing and exporting data using stream processing platforms like Flume and Kafka.
  • Experience in collecting log data from web servers and channel them into HDFS using Flume, Kafka and Spark streaming.
  • Hands on experience in dimensional data modeling, Star Schema Modeling, Snow Flake Modeling, Physical and Logical data modeling .
  • Proficient with various RDBMS technologies including SAP, Oracle 11g/10g/9i/8i, MS SQL 2012/2008/2005, Teradata 14/13and MS Access .
  • Experience in creating the dashboards using Qlikview .
  • Developed QlikView dashboards using Chart Box ( drill - down, drill-up and cyclic grouping ), list, input field, table box and calendar.
  • Good knowledge on AWS infrastructure services Amazon Simple Storage Service (Amazon S3),EMR, and Amazon Elastic Compute Cloud (Amazon EC2).
  • Over 4+ years of experience with Bigdata Hadoop core and Eco-system components like HDFS, MR, Yarn, Hive, Impala, Sqoop, Flume, Oozie, Hbase,Zookeeper and Pig.
  • Exposure to Spark, Spark Streaming, Spark MLlib, Scala and Creating the Data Frames handled in Spark with Python.
  • Hands on experience in working on Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
  • Experience in using D-Streams, Accumulator,Broadcastvariables, RDD caching for Spark Streaming.
  • Expertise in importing data from AWS S3 into ApacheSpark and performed transformations and actions on RDD’S
  • Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark MLlib, Spark Streaming and Spark SQL.
  • Experience with batch processing of data sources using Apache Spark
  • Strong experience and knowledge of real time data analytics using Spark, Kafka and Flume.
  • Hands on experience in Capturing data from existing relational databases (Oracle, MySQL, SQL and Teradata) that provide SQL interfaces using Sqoop .
  • Hands on experience in Avro, Parquet and JSON Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
  • Experience in Apache Solr to implement indexing and write custom Solr segments to optimize the search.
  • Designed, developed data integration program in Hadoop environment with NoSQL, data store Cassandra for data access and analysis.
  • Expertise in creating Cassandra tables to store various data formats of data coming from various sources.
  • Expertise in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning, bucketing, writing and optimizing the HiveQL queries.
  • Experience in composing shell scripts to dump the shared information from MySQL servers to HDFS.
  • Experience in data workflow scheduler Zoo-Keeper and Oozieto manage Hadoop jobs by Direct Acyclic Graph(DAG) of actions with the control flows.
  • Experienced in performance tuning and real time analytics in both relational database and NoSQLdatabase( HBase ).
  • Experience on Mongo DB, Cassandra and various No-sqldatabases like Hbase
  • Experience a set analysis to provide custom functionality in QlikView applications.
  • Knowledge on TIBCO Spotfire, Tableau dashboard tools.
  • Good experience in Agile development process and Jira.

TECHNICAL SKILLS:

Big Data Ecosystem: HDFS, Hadoop, Zookeeper, Hive, Yarn, Pig, Sqoop, Oozie, Flume, Kafka, Spark, Solr.

Operating System: Windows, Linux, Unix.

Database Languages: SQL, PL/SQL, Oracle.

Programming languages: Scala, Python.

Databases: IBM DB2, Oracle, SQL Server, Teradata, MySQL, Hbase, Cassandra.

IDE: Eclipse, IntelliJ, Jupyter Notebook.

Tools: TOAD, SQL Developer, Teradata SQL Assistant, Log4J.

Web Services: WSDL, SOAP, REST.

BI Tools: Qlikview, Tableau, Spotfire, SAP Business Objects

ETL Tools: Talend ETL, Talend Studio, Informatica

Web/App Server: UNIX server, Apache Tomcat, Websphere, Weblogic.

Methodologies: Agile, Waterfall, UML, Design Patterns.

PROFESSIONAL HISTORY:

Confidential, Plano, TX

Big Data Engineer

Responsibilities:

  • Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
  • Experienced in designing and deployment of Hadoop cluster and different big data analytic tools including Pig, Hive, Flume, Hbase and Sqoop .
  • Imported weblogs and unstructured data using the Apache Flume and store it in Flume channel.
  • Loaded the CDRs from relational DB using Sqoop and other sources to Hadoop cluster by Flume .
  • Developed simple and complex programs in Hive, Pig and Python for Data Analysis on different data formats.
  • Analyzed substantial data sets by running Hive queries and Pig scripts.
  • Managed and reviewed Hadoop and HBase log files.
  • Experience in creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive .
  • Experienced in writing Spark Applications in Python.
  • Used SparkSQL to handle structured data in Hive.
  • Created Spark data frames and performed analysis using Spark SQL.
  • Imported semi-structured data from Avro files using Pig to make serialization faster
  • Processed the web server logs by developing Multi-hop flume agents by using AvroSink and loaded into MongoDB for further analysis.
  • Experienced in converting Hive/SQL queries into Spark transformations using SparkRDD, Scala and Python.
  • Experienced in connecting Avro Sink ports directly to Spark Streaming for analyzation of weblogs.
  • Imported data from AWSS3 and into SparkRDD and performed transformations and actions on RDD’s.
  • Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
  • Worked on MongoDB for distributed Storage and Processing.
  • Implemented Collections and Aggregation Frameworks in MongoDB.
  • Good knowledge in using MongoDBCRUD operations.
  • Responsible for using Flume sink to remove the date from Flume channel and deposit in No-SQL database like MongoDB
  • Worked on ELK stack like Elastic Search, Log Stash and Kibana.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Loaded JSON-Styled documents in NoSQL database like MongoDB and deployed the data in cloud service Amazon Redshift.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in Amazon EMR.
  • Used Zookeeper to provide coordination services to the cluster.
  • Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generated data visualizations using Qlikview .
  • Followed Agile methodology for the entire project.
  • Experienced in Extreme Programming, Test-Driven Development and Agile Scrum

Environment: Spark, SparkSQL, HDFS, Sqoop, Oozie, Python, Pandas, Matplotlib, R, SAS, Teradata 13.0/14.0/15.0, Qlikview 11.0, Spotfire 6.5/7.0,AWS S3, EC2, EMR, RDS,Hive Oracle 11g/109/9i/8i, Informatica 9.5.1

Confidential, San Rafael, CA

Hadoop/Spark Developer

Responsibilities:

  • Developed Spark Applications by using Python and Scala and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN .
  • Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in Near real time and persist it to Cassandra .
  • Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
  • Consumed XML messages using Kafka & processed xml using Spark Streaming to capture UI updates .
  • Developed Pre-processing job using Spark Data frames to flatten JSON documents to flat file.
  • Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
  • Implemented Elastic Search on Hive data warehouse platform.
  • Good understanding of Cassandra architecture, replication strategy, gossip, snitch etc.
  • Designed Column families in Cassandra and Ingested data from RDBMS, performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
  • Used the Spark DataStax Cassandra Connector to load data to and from Cassandra.
  • Experienced in Creating data-models for client data sets, analysed the data from Casandra tables for quick searching, sorting and grouping using the Cassandra Query Language (CQL).
  • Tested the cluster Performance using Cassandra-stress tool to measure and improve the Read/Writes .
  • Created viewtable in Impala on the same Hive table for querying the data.
  • Used Hive QL to analyse the partitioned and bucketed data, Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.
  • Used Apache Kafka to aggregate web log data from multiple servers and make them available in Downstream systems for analysis.
  • Experience in using Avro, Parquet and JSON file formats, developed UDFs in Hive.
  • Developed Autosys job for scheduling.
  • Experience working with Apache SOLR for indexing and querying.
  • Created custom SOLR Query segments to optimize ideal search matching.
  • Worked with Log4j framework for logging debug, info & error data.
  • Performed transformations like event joins, filter bot traffic and some pre-aggregations using PIG.
  • Developed Sqoop and Kafka Jobs to load data from RDBMS, External Systems into HDFS and HIVE.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Generated various kinds of reports using Qlikviewand Tableau based on client’s requirements.
  • Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.
  • Responsible for generating actionable insights from complex data to drive real business results for various application teams and worked in Agile Methodology projects extensively.
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: Spark, Spark SQL, Cloudera, HDFS, Hive, Impala, Apache Kafka, Sqoop, Java (JDK SE 6, 7), Scala, Shell scripting, Linux, MySQL Oracle Enterprise DB, Jenkins, Eclipse, Oracle, Git, Oozie, MySQL, Soap, NIFI, Cassandra and Agile Methodologies.

Confidential, Lexington, KY

Hadoop Developer

  • Analyzedlarge data sets by running Hive queries and Pig scripts .
  • Involved in creating Hive tables and loading and analyzing data using Hive queries .
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Involved in loading data from LINUX file system to HDFS .
  • Responsible for managing data from multiple sources.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Load and transform large sets of structured, semi structured data.
  • Responsible to manage data coming from different sources.
  • Assisted in exporting analysed data to relational databases using Sqoop .
  • Created and maintained Technical documentation for launching HADOOPClusters and for executing Hive queries and Pig Scripts .

Environment: Hadoop, HDFS, Pig, Hive, HBase, Sqoop, LINUX

Confidential, SanDiego, CA

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
  • Developed Spark program in Scala to analyze reports to process huge amount of data from multiple data sources.
  • Experienced in implementing static and dynamic partitioning in Hive .
  • Extensively used Sqoop to import/export data between RDBMS and Hive tables , incremental imports and created Sqoop jobs for last saved value.
  • Involved in loading data from LINUX file system to HDFS .
  • Developing Scripts and Autosys Jobs to schedule a bundle (group of coordinators), which consists of various Hadoop Programs using Oozie .
  • Created Oozie workflow engine to run multiple Hive jobs.
  • Experienced with using different kind of compression techniques to save data and optimize data transfer over network using Snappy in Hive tables .
  • Used Zookeeper for providing coordinating services to the cluster.

Environment: Hadoop, HDFS, Hive, Flume, Sqoop, Pig, Teradata, MongoDB, Ubuntu, UNIX, and Maven.

Confidential

BI Developer

Responsibilities:

  • Interacted with the End Users, Business Analysts and responsible for Involved in training sessions to impart knowledge on designed universes and published reports.
  • Involved in the designing and Building of Universes, Classes and objects .
  • Defined Aliases and Contexts to resolve loops.
  • Set up Universe level conditions to restrict data based on report requirements.
  • Customized Universe Level Prompts as per Functional Requirements.
  • Improved the performance and reduced the coding with the use of @Functions (@select, @prompt, @where and @aggregate aware)
  • Applied row level and object level securities on different groups and users.
  • CreatedWebireports with multiple data providers and synchronized the data usingMerge Dimensions.
  • Created reportsfrom two different universes and applied merged dimensions.
  • Designed the report using the Hyperlink and document links.
  • Developed QlikView dashboards using Chart Box (drill-down, drill-up and cyclic grouping), list, input field, table box and calendar and date island)
  • Created a set analysis to provide custom functionality in QlikView applications.
  • Scheduled the QlikView jobs and monitored regularly.
  • Creating the user access and resolving the issues related to access.
  • Worked on Spotfire designer and created dashboards.

Environment: SAP BO 3.0/3.1 SP3, WEBI, QlikView, Spotfire, Teradata, Windows 2008/2012, Oracle, MS SQL Server 2008, Informatica, SAP BW

Confidential

Business Objects Administrator/Developer

Responsibilities:

  • Responsible for maintenance of Business Objects Environments including installation, configuration, patching, upgrades, monitoring and backups.
  • Participated in the implementation of Windows AD and Single Sign on.
  • Implementedmonitoring and backup strategies for high availability.
  • Implemented report burstingto deliver reports to Enterprise Users and Dynamic Recipients in various output formats.
  • Managed users, folders, securities and groups using CMC.
  • Used the formatting options like cross tab representation, section breaks, data sort options, calculations, font, colors, etc.
  • Created Webi reports using complex variables, multiple breaks, sections, sorts, prompts
  • Migrated the ETL code between the development/test/production environments.
  • Fixed bugs and issues related to migration and conversion like multi-value, #syntax, #data sync, formatting, missing data, and formulas.
  • Created Section Access to restrict the Qlikview dashboards to few users.
  • Created set analysis, circular and drill down grouping.

Environment: SAP Business Objects3.1 SP3, Web Intelligence, Universe Designer, InfoView, Crystal Reports 2008, SAP BODS3.1, Import Wizard, Live Office, SAP BW, VPN, Windows Server 2003

We'd love your feedback!