Big Data Engineer Resume Dallas, TX - Hire IT People

SUMMARY

me has around 7+ years of IT experience in software development and support wif experience in developing strategic methods for deploying Big Data technologies to efficiently solve Big Data processing requirement.
Expertise in Hadoop eco system components HDFS, Map Reduce, Yarn, HBase, Pig, Sqoop, Spark, Spark SQL, Spring boot, Spark Streaming, and Hive for scalability, distributed computing, and high performance computing.
Experience in using Hive Query Language for data Analytics.
Experienced in Installing, Maintaining and Configuring Hadoop Cluster.
Strong knowledge on creating and monitoring Hadoop clusters on Amazon EC2, VM, Hortonworks Data Platform 2.1 & 2.2, CDH3, CDH4 Cloudera Manager on Linux, Ubuntu OS.
Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
Having Good knowledge on Single node and Multi node Cluster Configurations.
Strong knowledge in NOSQL column oriented databases like HBase, Cassandra, MongoDB, and Mark Logicand its integration wif Hadoop cluster.
Expertise on Scala Programming language and Spark Core.
Worked wif AWS based data ingestion and transformations.
Worked wif Cloud Break and Blue Print to configure AWS plotform.
Worked wif data warehouse tools like Informatica, Talend.
Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
Good knowledge on Amazon EMR, Amazon RDS S3 Buckets, Dynamo DB, RedShift.
Analyze data, interpret results, and convey findings in a concise and professional manner
Partner wif Data Infrastructure team and business owners to implement new data sources and ensure consistent definitions are used in reporting and analytics
Promote full cycle approach including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to the requestor
Good experience on Kafkaand Storm.
Worked wif Docker to establish connection between Spark and NEO4J database.
Knowledge of java virtual machines (JVM) and multithreaded processing.
Hands on experience working wif ANSI SQL.
Strong programming skills in designing and implementation of applications using Core Java, J2EE, JDBC, JSP, HTML, Spring Framework, Spring batch framework, Spring AOP, Springboot, Struts, JavaScript, Servlets.
Experience in build scripts using Maven and do continuous integrations systems like Jenkins.
Java Developer wif extensive experience on various Java Libraries, API’s,and frameworks.
Hands on development experience wif RDBMS, including writing complex Sql queries, Stored procedure,and triggers.
Very Good understanding of SQL, ETL and Data Warehousing Technologies
Knowledge of MS SQL Server 2012/2008/2005 and Oracle 11g/10g/9i and E-Business Suite.
Expert in TSQL, creating and using Stored Procedures, Views, User Defined Functions, implementing Business Intelligence solutions using SQL Server 2000/2005/2008.
Developed Web-Services module for integration using SOAP and REST.
NoSQL database experience onHBase, Cassandra,DynamoDB.
Flexible wif Unix/Linux and Windows Environments working wif Operating Systems like Centos 5/6, Ubuntu 13/14, Cosmos.
Has sound knowledge on designing data warehousing applications wif using Tools like Teradata, Oracle,and SQL Server.
Experience working wif Solr for text search.
Experience on using Talend ETL tool.
Experience in working wif job scheduler like Autosys and Maestro.
Strong in databases like Sybase, DB2, Oracle, MS SQL,Clickstream.
Strong understanding of Agile Scrum and Waterfall SDLC methodologies.
Strong Working experience in snowflake.
Hands on experience wif automation tools such as Puppet, Jenkins,chef,Ganglia,Nagios.
Strong communication, collaboration & team building skills wif proficiency at grasping new Technical concepts quickly and utilizing them in a productive manner.
Adept in analyzing information system needs, evaluating end-user requirements, custom designing solutions and troubleshooting information systems.
Strong analytical and Problem solving skills.

TECHNICAL SKILLS

Hadoop/Big Data Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, impala, Spark, Zookeeper and Cloudera Manager,Splunk.

NO SQL Database: HBase, Cassandra

Monitoring and Reporting: Tableau, Custom shell scripts

Hadoop Distribution: Horton Works, Cloudera, MapR

Build Tools: Maven, SQL Developer

Programming & Scripting: JAVA, C, SQL, Shell Scripting, Python, Scala

Java Technologies: Servlets, JavaBeans, JDBC, Spring, Hibernate, SOAP/Rest services

Databases: Oracle, MY SQL, MS SQL server, Teradata

Web Dev. Technologies: HTML, XML, JSON, CSS, JQUERY, JavaScript, angular JS

Version Control: SVN, CVS, GIT

Operating Systems: Linux, Unix, Mac OS-X, Cen OS, Windows10, Windows 8, Windows 7, Windows Server 2008/2003

Hadoop/Big Data Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, impala, Spark, Zookeeper and Cloudera Manager,Splunk.

NO SQL Database: HBase, Cassandra

PROFESSIONAL EXPERIENCE

Confidential -Dallas, TX

Big Data Engineer

Responsibilities:

Developed Data Pipeline wif Kafka and Spark.
Contributedindesigning the Data Pipeline wif Lambda Architecture.
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
Involved in installation, configuration, supporting and managing Hadoop clusters.
Worked on AWS Data pipeline to configure data loads from S3 to into Redshift.
Used Spark for interactive queries and processing of streaming data.
Expansively worked wif Partitions, Dynamic Partitioning, bucketing tables in Hive, designed both Managed and External tables, also worked on optimization of Hive queries.
Developed Spark Applications by using Scala,Pythonand Implemented Apache Spark data processing Project to handle data from various RDBMS and Streaming sources.
Worked wif the Spark for improving performance and optimization of the existing algorithms in Hadoop.
Using Spark Context, Spark-SQL, Data Frame, Spark Yarn.
Used Spark Streaming APIs to perform transformations and actions on the fly for building common.
Configured a data model to get data from Kafka in near real time and persist it to Cassandra.
Developed Kafka consumer API in Python for consuming data from Kafka topics.
Experienced in writing live Real-time Processing and core jobs using Spark Streaming wif Kafka as a data pipe-line system.
Migrated an existing on-premises application to AWS.
Used AWS services like EC2 and S3 for small data sets processing and storage.
Experienced in Maintaining the Hadoop cluster on AWS EMR.
Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDDs.

Environment: Big Data Horton Work, Apache Hadoop, Hive, Python, Hue Tool, Zookeeper, Map Reduce, Sqoop, crunch API,Pig 0.10 and 0.11, HCatalog, Unix, Java, JSP, Eclipse, Maven, Oracle, SQL Server, Linux,MYSQL.

Confidential -San Francisco, CA

Big Data Engineer

Responsibilities:

Communicated deliverables status to stakeholders and facilitated periodic review meetings.
Developed Spark streaming application to pulldata from cloud to Hive and HBase.
Built Real-Time Streaming Data Pipelines wif Kafka, Spark Streaming and Hive.
Created a Kafka producer to connect to different external sources and bring the data to a Kafka broker.
Handled schema changes in data stream using Kafka.
Responsible for Kafka operation and monitoring, and handling of messages funneled through Kafka topics.
Coordinated Kafka operation and monitoring wif dev ops personnel; formulated balancing impact of Kafka producer and Kafka consumer message(topic) consumption.
Designed and developed ETL workflows using Python and Scala for processing data in HDFS.
Collected, aggregated, and shuffled data from servers to HDFS using Apache Spark & Spark Streaming.
Worked on importation and claims information between HDFS and RDBMS.
Created Hive External tables and loaded the data into tables and query data using HQL.
Worked on streaming the prepared information to HBase utilizing Spark.
Performed performance calibration for Spark Steaming e.g., setting right Batch Interval time, correct level of executors, choice of correct publishing& memory.
Used HBase connector for Spark.
Performed gradual cleansing and modeling of datasets.
Utilized Avro-tools to build the Avro schema to create external hive tables using PySpark.
Created and managed externaltables to store ORC and Parquet files using HQL.
Developed Apache Airflow DAGs to automate the pipeline.
Created a NoSQL HBase database to store the processed data from Apache Spark.

Environment: Snowflake Web UI, Snow SQL, Hadoop MapR 5.2, Hive, Hue, Azure, Control-M, AWS, Teradata Studio, Oracle 12c, Tableau, Hadoop Yarn, Spark Core, Spark Streaming, Spark SQL, Spark MLlib

Confidential -San Diego, CA

Data Engineer/Hadoop Spark Developer

Responsibilities:

Extensively worked wif Spark-SQL context to create data frames and datasets to preprocess the model data.
Data Analysis: Expertise in analyzing data using Pig scripting, Hive Queries, Sparks (python) and Impala.
Used Hive to implement data warehouse and stored data into HDFS. Stored data into Hadoop clusters which are set up in AWS EMR.
Involved in designing the row key in HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in a sorted order.
Wrote Junit tests and Integration test cases for those Microservice.
Worked in Azure environment for development and deployment of Custom Hadoop Applications.
Work heavily wif Python, C++, Spark, SQL, Airflow, and Looker
Experienced in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive.
Built pipelines to move hashed and un-hashed data from XML files to Data lake.
Developed NiFi workflow to pick up the multiple files from ftp location and move those to HDFS on daily basis.
WrittenTemplatesforAzure Infrastructure as codeusingTerraformto build staging and production environments. IntegratedAzure Log AnalyticswifAzure VMsfor monitoring thelog files, store them and track metrics and usedTerraformas a tool,Manageddifferent infrastructure resourcesCloud,VMware, andDockercontainers.
Scripting: Expertise in Hive, PIG, Impala, Shell Scripting, Perl Scripting, and Python.
Worked wif developer teams on NiFi workflow to pick up the data from rest API server, from data lake as well as from SFTP server and send that to Kafka.
Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
Proven experience wif ETL frameworks (Airflow, Luigi, or our own open sourced garcon)
Created Hive schemas using performance techniques like partitioning and bucketing.
Used Hadoop YARN to perform analytics on data in Hive.
Developed and maintained batch data flow using HiveQL and Unix scripting
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
Build large-scale data processing systems in data warehousing solutions, and work wif unstructured data mining on NoSQL.
S3 - Data Lake Management. Responsible for maintaining and handling data inbound and outbound requests through big data platform.
Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles.
Configured Hadoop tools like Hive, Pig, Zookeeper, Flume, Impala and Sqoop.
Wrote Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
Queried both Managed and External tables created by Hive using Impala.
Developed customized Hive UDFs and UDAFs in Java, JDBC connectivity wif hive development and execution of Pig scripts and Pig UDF’s.

Environment: Hadoop, Microservices, Java, MapReduce, Agile, HBase, JSON, Spark, Kafka, JDBC,AWS, EMR/EC2/S3,Hive, JSON, Pig, Flume, Zookeeper, Impala, Sqoop

Confidential

Hadoop Developer

Responsibilities:

Used Sqoop to expeditiously transfer information between information databases and HDFS and used Flume to stream the log data from servers.
Enforced partitioning, bucketing in Hive for higher organization of the data.
Worked wif totally different file formats and compression techniques to standards.
Loaded information from a UNIX system to HDFS.
Used UNIX system shell scripts to alter the build method, and to perform regular jobs like file transfers between totally different hosts.
Assigned in production support, that concerned observance server and error logs, and foreseeing and preventing potential problems, and escalating issue once necessary.
Documented Technical Specs, Dataflow, information Models, and sophistication Models using Confluence.
Documented needs gatheird from stakeholders.
Wif success loaded files to HDFS from Teradata and loaded from HDFS to HIVE.
Used Zookeeper and Oozie for coordinating the cluster and programming workflows
Involved in researching various available technologies, industry trends, and cutting-edge applications.Data ingestion is done using Flume wif source as Kafka Source & sink as HDFS.
Performed storage capacity management, performance tuning, and benchmarking of clusters.

Environment: Hadoop, Zookeeper, Kafka, UNIX

Confidential

Data Engineer

Responsibilities:

Created and executed Hadoop Ecosystem installation and document configuration scripts on Google Cloud Platform.
Transformed batch data from several tables containing hundreds of thousands of records from SQL Server, MySQL, PostgreSQL, and csv file datasets into data frames usingPySpark.
Developed aPySparkprogram that writesdataframesto HDFS asavrofiles.
Utilized Spark's parallel processing capabilities to ingest data.
Created and executed HQL scripts that creates external tables in a raw layer database in Hive.
Developed a Script that copiesavroformatted data from HDFS to External tables in raw layer.
CreatedPySparkcode that uses Spark SQL to generatedataframesfromavroformatted raw layer and writes them to data service layer internal tables as orc format.
In charge ofPySparkcode, creatingdataframesfrom tables in data service layer and writing them to a Hive data warehouse.
Installed Airflow and created a database in PostgreSQL to store metadata from Airflow.
Configured documents which allow Airflow to communicate to its PostgreSQL database.
Developed Airflow DAGs in python by importing the Airflow libraries.
Utilized Airflow to schedule automatically trigger and execute data ingestion pipeline.

Environment: Cloudera Manager, HDFS, Sqoop, Pig, Hive, Oozie, Spark SQL, Tableau, My SQL, Python, Kafka, flume, Java, Scala, Git.

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Dallas, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship