We provide IT Staff Augmentation Services!

Big Data Engineer Resume

2.00/5 (Submit Your Rating)

Irving, TX

SUMMARY

  • Excellent understanding of Data Preprocessing, Data Preparation, Data Architecture, Data Pipelining, Data Modeling, Data Warehousing, Data Marts, and Reporting concepts.
  • Solid understanding of distributed data architecture for large scale parallel processing as well as traditional data architecture to handle data using traditional tools.
  • Extensive knowledge of Hadoop architecture, frameworks, services, platforms, tools, and clusters including Name Node, Data Node, and Yet Another Resource Manager (YARN).
  • Worked on Hadoop distributions including Cloudera (CDH), Hortonworks (HDP), and AWS EMR.
  • Experience in installation, configuration, and implementation of Hadoop ecosystem components including Hadoop Distributed File Systems (HDFS), MapReduce, Spark, Pig, Hive, Impala, Sqoop, Oozie, Zookeeper, Ambari, and Hue.
  • Worked extensively in developing Big Data Architecture and frameworks to handle the large volume of data using Big Data technologies like Hive, Pig, Sqoop, Spark, etc.
  • Experience in the data ingestion process and providing real - time data streaming solutions using Big Data Platforms like Apache Spark core, Spark SQL, Spark Streaming, Data Frames, Kafka, and Storm.
  • Proficient in migrating data warehouses and databases into Hadoop/NoSQL platforms using tools like Sqoop, Flume, Kafka.
  • Worked extensively on HiveQL, Pig Latin, Impala, Spark APIs to explore, cleanse, aggregate, transform, and store a massive amount of data.
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper and automation of data movement between different Hadoop systems using Apache Nifi.
  • Involved in troubleshooting and performance tuning of Hadoop Applications and worked on Spark D-Streams for streaming various levels of caching, and optimization techniques.
  • Experienced in MapReduce programming using MapReduce, Scala, Java, and Python.
  • Expertise in Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos NO SQL DB, Azure Big Data Technologies like Hadoop and Apache Spark, and Databricks.
  • Worked on distributed computing architectures like AWS products like EC2, Redshift, EMR, Elastic search, etc.
  • Generated the data cubes using Hive, Pig, Java, MapReduce to deploy the Hadoop cluster on AWS.
  • Worked on migrating Hadoop and Spark clusters into Amazon Web Services and Azure platform i.e, AWS EMR to process big data across Hadoop clusters on Amazon Simple Storage Service (S3).
  • Expertise in AWS data migration between different database platforms like Local SQL Server to Amazon RDS, EMR HIVE, and in managing Hadoop log files in AWS S3.
  • Provided support on AWS Cloud infrastructure automation with multiple tools including Gradle, Chef, Nexus, Docker, and monitoring tools such as Splunk and CloudWatch.
  • Experienced in Machine Learning and Data Mining Algorithms and Techniques like Classification, Clustering, Regression, Decision Trees, Random Forest, Bootstrap Forest, Neural Networks, Boosted Neural Networks, Artificial Neural Networks.
  • Extensively worked on Text Analytics, and on providing ML and Data Mining solutions to various business problems through data visualizations using Python and R programming.
  • Solid knowledge in design, implementation, and maintenance of Extract, Transform, Load architecture using Informatica Power Center, Oracle Data Integrator (ODI), SQL Server Integration Services.
  • Strongly involved in the integration of various data sources definitions like SQL Server, SAP, Oracle, ODBC connectors, and Flat Files for ETL processes.
  • Expertise in the transactions, scripts, querying, and performance tuning of relational databases like SQL, PSQL, PL/SQL, SQL Server, Oracle, MySQL, DB2, and non-relational databases like HBase, DynamoDB, MongoDB, Cassandra.
  • Involved in writing T-SQL, Oracle PL/SQL Scripts, Stored Procedures, and Triggers for business logic implementation.
  • Highly skilled in data analysis, time series analysis, gap analysis, identification of dimensions, facts & aggregate tables, measures, and hierarchies.
  • Experience in designing connection pools and schemas like Star and Snowflake Schema on the OBIEE repository.
  • Strong expertise in the creation and performance tuning of business reports using tools like OBIEE, BI Publisher, Microsoft Power BI, Tableau and SQL Server tools like SQL Server Analysis Services (SSAS), SQL Server Reporting Services (SSRS).
  • Created interactive UI Design in OBIEE Answers and Dashboards, including experience in User Interface Design using CSS, HTML, and Javascript.
  • Experienced in testing and monitoring ETL workflow, business reports, and databases for the transaction of data between the data source and target.
  • Good experience in R programming, SAS programming, and Shell scripting.
  • Security and Sharing rules implementation at the object, field, and record level for different users at different levels of the organization.
  • Experience in implementing/supporting data security on the group and UI security using web groups in OBIEE WebLogic.
  • Participated in all stages of the Software Development Life Cycle methodologies such as Agile, Scrum, and Waterfall model.

TECHNICAL SKILLS

Technologies: Big Data and Hadoop Ecosystem, Machine Learning, Data Mining, Data Analytics, Business Intelligence

Big Data Ecosystem: HDFS, Yarn, MapReduce, Pig, Hive, Impala, Sqoop, Spark, Flume, Zookeeper, Oozie, Kafka, Rundeck, Ambari, Hue

Hadoop Technologies: Apache Hadoop 1.x, Apache Hadoop 2.x, Cloudera, Hortonworks, AWS EMR

Machine Learning: Regression, Decision Tree, Clustering, Neural Network, Bootstrap Forest, Random Forest, Classification, Natural Language Processing

Data Visualization Tools: Microsoft Power BI, Tableau, OBIEE, Oracle BI Publisher, SSAS, SSRS, JMP Pro

Data Integration Tools: ODI, Informatica, SSIS, DAC

Languages: Python, Java, Scala, C, C++, R, SAS, UNIX/LINUX Shell Scripting, Javascript

Relational Databases: SQL, PL/SQL, PSQL, SQL Server, MySQL, Oracle

Non-Relational Databases: Cassandra, MongoDB, DynamoDB, HBase

Containers: Docker, Kubernetes

Project Management: Agile, Waterfall

Operating Systems: Windows XP,Vista,7,8,10, MAC, Ubuntu

Version Control: Eclipse, Jupyter, Anaconda, Microsoft Visual Studio, IntelliJ, BDeaver, STS, WinSCP, PuTTY, GIT

Software: Microsoft Office, Adobe Suite

PROFESSIONAL EXPERIENCE

Confidential, Irving, TX

Big Data Engineer

Responsibilities:

  • Excellent interpersonal and communication skills to understand problem statements and client requirements.
  • Involved in the requirement gathering phase, created user stories, and estimated story points for tasks in Jira along with the client and team.
  • Participated in Scrum meetings, Sprint Planning, Retrospective, and Demo at the end of a sprint in Agile workflow.
  • Implemented a robust data ingestion pipeline to load real-time streaming data into HDFS using Kafka.
  • Transformed the input data from source systems using Kafka custom encoders and developed a structured schema using the Kafka broker.
  • Implemented Elastic Search Connector through Kafka Connect API with Kafka as the source and elastic search as a sink.
  • Improved data ingestion process by optimizing Apache NiFi.
  • Designed and configured the metadata i.e, architecture, schema, and warehouse in Hive/Impala to perform data cleansing, transformation, and analysis by developing MapReduce programs.
  • Analyzed the partitioned data and executed queries on Parquet tables to analyze data using HiveQL/Spark SQL.
  • Migrated an existing feed from Hive to Spark, which reduced the latency of feeds in existing HiveQL and applied Spark procedures like text analytics and in-memory processing.
  • Implemented Spark jobs using SparkQL to perform the transformation on the data.
  • Loaded and updated the data in the Cassandra database using Spark SQL and the Spark DataStax Cassandra Connector.
  • Created data architecture, models, and columnar families in the Cassandra database, and ingested transformed data from RDBMS for faster searching, sorting, grouping using Cassandra Query Language.
  • Used Cassandra-stress tool to test the cluster performance and improve the read/writes in the cluster.
  • Performed data imputation techniques to interpret missing values and data analysis using Python with its libraries like Pandas and Pyspark.
  • Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.
  • Developed Oozie Workflows for daily incremental loads using Kafka to extract data from source systems and import it into Hive tables.
  • Used Docker to maintain Docker Hub, Images for middleware implementation and to automate the deployment process through Jenkin.
  • Utilized the Docker-Compose tool to configure YAML format and used Kubernetes to manage container applications through its nodes, config-maps, selector, and services.
  • Implemented Jenkins pipelines to control microservices in the Docker registry and then deployed to Kubernetes.
  • Migrated the complete application to AWS for processing and storage data using EC2 and S3 and implemented Elastic search and MapReduce on the Hadoop data warehouse to set up the Hadoop environment on AWS EC2.
  • Presented the results to business users, authorities, and other development and engineering teams through data visualization tools like Tableau, Microsoft Power BI, and Oracle BI Publisher.
  • Created graphical reports, tabular reports, scatter plots, geographical maps, dashboards, and parameters on Tableau and Microsoft Power BI.
  • Created BI Publisher reports for pixel-perfect reports with fixed formats.
  • Used Jira to create and manage user stories, bug tracking, and project progress.
  • Worked with Network, Database, Application, QA, and BI groups to guarantee information quality and accessibility.
  • Responsible for producing significant experiences from data to drive actual business results for different application groups.
  • Worked in Agile Methodology broadly and worked with the SCRUM team in delivering user stories on schedule for each Sprint.

Environment: Cloudera, Scala, Kafka, Zookeeper, Hive, Sqoop, Python, Apache Spark, Apache Cassandra, Apache NiFi, Oozie, AWS, Axure, AWS EMR, Informatica, Oracle, SQL Server, Dynamo DB, Tableau, BI Publisher, Microsoft Power BI, PL/SQL, SQL Developer, UNIX, Eclipse, STS

Confidential

Big Data Developer

Responsibilities:

  • Strong analytical abilities to break down and comprehend customer and business requirements.
  • Participated in Requirement Gathering Sessions and made User Stories in Jira by cooperating with the Business Analyst and the Development team.
  • Involved in Daily Scrum meetings, Sprint Grooming, Planning, Retrospective, and Demo.
  • Responsible for building scalable distributed data solutions on Cloudera distributions.
  • Installed and configured Hadoop and responsible for maintaining clusters, resource manager on HDFS.
  • Worked on data ingestion of data from SQL Server, MySql, Oracle, Teradata to HDFS, and Hive utilizing Flume.
  • Created and scheduled Map-Reduce jobs using MapReduce programming, and Java to perform transformations on data in HDFS.
  • Involved in the migration of data from Hadoop to multiple data sets like Oracle and DB2 utilizing Sqoop for loading information.
  • Expertise in working with Hive including tasks like creating tables, static and dynamic partitioning, and enhancing the Hive QL queries.
  • Involvement with composing Ad-Hoc Queries in Hive and analyzing information utilizing HiveQL and Impala.
  • Extensive involvement with performing ETL on structured, semi-structured information utilizing Pig Latin Scripts.
  • Expertise in moving structured schema among Pig and Hive and data wrangling utilizing H Catalog.
  • Transformed the semi-structured log information to fit into the pattern of the Hive tables utilizing Pig.
  • Proficient in implementing Hive DDL's, Hive UDF's and Pig UDF's utilizing Python, Java for evaluation, filtering, loading, and storing of information.
  • Implemented data pipelines to provide mappers by using Chained Mappers API.
  • Configured Flume agents on various data sources to catch the streaming log information from the web servers.
  • Developed Oozie workflows to run and schedule different Hive and pig jobs.
  • Worked on the Hue interface to play out the data querying and loading on HDFS using Hive and Impala.
  • Continuously monitored the Hadoop cluster using Zookeeper on Cloudera Manager.
  • Created reports for business clients utilizing data visualization tools like OBIEE, Oracle Business Intelligence Publisher, and Tableau.
  • Created session and repository variables on the variable manager to refresh data in reports content dynamically.
  • Created the Materialized View to extract information from multiple data sources in the OBIEE administration.
  • Created intelligent UI Design on OBIEE Answers, Delivers, and Dashboards utilizing CSS, HTML, and JavaScript.
  • Developed Tableau reports on the business information to examine the examples in the business.
  • Designed and created Dashboards for Analytical purposes utilizing Tableau.
  • Developed Shell, Perl, and Python scripts to automate the workflow.
  • Created the total security framework for reports in OBIEE, BI Publisher, and Tableau.
  • Actively reported to the higher administration with day by day reports on the advancement of the task.

Environment: Hue, Cloudera, Flume, Rundeck, Scala, Kafka, Zookeeper, Hive, Pig, Impala, Sqoop, Python, Oozie, Informatica, Oracle, Tableau, OBIEE, BI Publisher, PL/SQL, SQL Developer, UNIX, Eclipse

Business Intelligence Developer/Data Analyst

Cofidential

Responsibilities:

  • Gathered the business requirements along with business analysts and documented it.
  • Involved in full life cycle Business Intelligence implementations and understanding of all aspects of an implementation project using OBIEE.
  • Involved in the planning of data warehouse, schema, models, and data marts.
  • Designed and developed Extract, Transform, and Load (ETL) architecture using Informatica and loaded data from sources like flat files, XML, DB2, Oracle to a target system like Oracle, and then to the Data marts.
  • Created transformation in Informatica Power Center using different look-ups like connected, unconnected, and dynamic look-ups with caches like the persistent cache to load newly created or existing tables.
  • Developed Mapplets, which was used to map the columns from source to target and look-ups to find the data.
  • Developed update filters to change the data and capture the timestamp of every load in a table.
  • Created and monitored sessions using the workflow manager and workflow monitor.
  • Produced releases to migrate Informatica code to test and then production environment.
  • Experience in Fusion Middleware Technology, concepts, and data warehouse architecture.
  • Provided end-to-end business intelligence solutions by configuring metadata and building OBI Repository.
  • Worked on the installation, configuration, and set up of the OBIEE administration tool and platform.
  • Defined Key Performance Metrics, facts, dimensions, hierarchies, schemas like Star and SnowFlake schemas, and created a data model on the OBIEE repository.
  • Configured OBIEE Metadata Objects, including repository, tables, schemas, variables, and reports.
  • Designed and developed Informatica ETLs Views, Materialized Views, and modeling in OBIEE.
  • Developed the metadata layers, including the Physical layer, the Business Model and Mapping layer, and the Presentation layer, and created multiple connection pools.
  • Created aliases of tables and performed physical join between the tables in the physical layer.
  • Created logical tables and joins, and applied business logic, aggregation to the data in the BMM layer.
  • Developed Dimensional Hierarchies, Level Based Measures, and added multiple sources to business model objects.
  • Created productivity reports, Ad-hoc reports, and interactive dashboards, filters, prompts for business end-users on OBIEE Answers and BI Publisher.
  • Designed interactive dashboards in OBIEE and BI Publisher using drill-down, guided navigation, prompts, filters, and variables.
  • Experienced in creating Pivot tables, Bar charts, Pie charts, Performance Tiles, Column selector, drillable, guided navigation, union reports, presentation variables, inline prompt reports.
  • Worked on action items like iBots in OBIEE to provide real-time, personalized, and actionable intelligence data.
  • Involved in Oracle PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Migrated reports, filters, and dashboards between Dev, Test, and Prod environments.
  • Created and managed the roles and user access system in the OBIEE WebLogic and configured security for Users, Groups, and Application Roles.
  • Extensively worked on Multiuser Development Environment (MUDE).
  • Involved in performance tuning, monitoring, usage tracking, debugging, unit testing, version control, and content migrations.
  • Involved in daily, weekly, and monthly meetings to discuss the progress and updates in the project.

Environment: Informatica, Oracle, OBIEE, BI Publisher, PL/SQL, SQL Developer, UNIX.

We'd love your feedback!